A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
- Autores
- Grigera, Julián; Gardey, Juan Cruz; Garrido, Alejandra; Rossi, Gustavo Héctor
- Año de publicación
- 2021
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents? structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements? location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity.
Fil: Grigera, Julián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina
Fil: Gardey, Juan Cruz. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina
Fil: Garrido, Alejandra. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina
Fil: Rossi, Gustavo Héctor. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina
17th International Conference on Web Information Systems and Technologies
Setúbal
Portugal
Polytechnic Institute of Setubal - Materia
-
INFORMATION EXTRACTION
WEB ADAPTATION
REFACTORING FOR USABILITY - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/255175
Ver los metadatos del registro completo
id |
CONICETDig_7fb9d1ea7befe2f6b2a56a42f6a29ee6 |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/255175 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM ElementsGrigera, JuliánGardey, Juan CruzGarrido, AlejandraRossi, Gustavo HéctorINFORMATION EXTRACTIONWEB ADAPTATIONREFACTORING FOR USABILITYhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents? structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements? location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity.Fil: Grigera, Julián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; ArgentinaFil: Gardey, Juan Cruz. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; ArgentinaFil: Garrido, Alejandra. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: Rossi, Gustavo Héctor. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina17th International Conference on Web Information Systems and TechnologiesSetúbalPortugalPolytechnic Institute of SetubalScitePressDomínguez Mayo, Francisco JoséMarchiori, MassimoFilipe, Joaquim2021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectConferenciaBookhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/255175A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements; 17th International Conference on Web Information Systems and Technologies; Setúbal; Portugal; 2021; 174-185978-989-758-536-4CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.scitepress.org/Papers/2021/107163/107163.pdfinfo:eu-repo/semantics/altIdentifier/url/https://dblp.org/rec/conf/webist/2021.htmlinfo:eu-repo/semantics/altIdentifier/url/https://webist.scitevents.org/?y=2021Internacionalinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:41:31Zoai:ri.conicet.gov.ar:11336/255175instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:41:31.273CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements |
title |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements |
spellingShingle |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements Grigera, Julián INFORMATION EXTRACTION WEB ADAPTATION REFACTORING FOR USABILITY |
title_short |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements |
title_full |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements |
title_fullStr |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements |
title_full_unstemmed |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements |
title_sort |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements |
dc.creator.none.fl_str_mv |
Grigera, Julián Gardey, Juan Cruz Garrido, Alejandra Rossi, Gustavo Héctor |
author |
Grigera, Julián |
author_facet |
Grigera, Julián Gardey, Juan Cruz Garrido, Alejandra Rossi, Gustavo Héctor |
author_role |
author |
author2 |
Gardey, Juan Cruz Garrido, Alejandra Rossi, Gustavo Héctor |
author2_role |
author author author |
dc.contributor.none.fl_str_mv |
Domínguez Mayo, Francisco José Marchiori, Massimo Filipe, Joaquim |
dc.subject.none.fl_str_mv |
INFORMATION EXTRACTION WEB ADAPTATION REFACTORING FOR USABILITY |
topic |
INFORMATION EXTRACTION WEB ADAPTATION REFACTORING FOR USABILITY |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents? structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements? location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity. Fil: Grigera, Julián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina Fil: Gardey, Juan Cruz. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina Fil: Garrido, Alejandra. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina Fil: Rossi, Gustavo Héctor. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina 17th International Conference on Web Information Systems and Technologies Setúbal Portugal Polytechnic Institute of Setubal |
description |
Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents? structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements? location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/conferenceObject Conferencia Book http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
status_str |
publishedVersion |
format |
conferenceObject |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/255175 A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements; 17th International Conference on Web Information Systems and Technologies; Setúbal; Portugal; 2021; 174-185 978-989-758-536-4 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/255175 |
identifier_str_mv |
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements; 17th International Conference on Web Information Systems and Technologies; Setúbal; Portugal; 2021; 174-185 978-989-758-536-4 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://www.scitepress.org/Papers/2021/107163/107163.pdf info:eu-repo/semantics/altIdentifier/url/https://dblp.org/rec/conf/webist/2021.html info:eu-repo/semantics/altIdentifier/url/https://webist.scitevents.org/?y=2021 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.coverage.none.fl_str_mv |
Internacional |
dc.publisher.none.fl_str_mv |
ScitePress |
publisher.none.fl_str_mv |
ScitePress |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844614446123581440 |
score |
13.070432 |