A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements

Autores
Grigera, Julián; Gardey, Juan Cruz; Garrido, Alejandra; Rossi, Gustavo Héctor
Año de publicación
2021
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents? structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements? location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity.
Fil: Grigera, Julián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina
Fil: Gardey, Juan Cruz. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina
Fil: Garrido, Alejandra. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina
Fil: Rossi, Gustavo Héctor. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina
17th International Conference on Web Information Systems and Technologies
Setúbal
Portugal
Polytechnic Institute of Setubal
Materia
INFORMATION EXTRACTION
WEB ADAPTATION
REFACTORING FOR USABILITY
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/255175

id CONICETDig_7fb9d1ea7befe2f6b2a56a42f6a29ee6
oai_identifier_str oai:ri.conicet.gov.ar:11336/255175
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM ElementsGrigera, JuliánGardey, Juan CruzGarrido, AlejandraRossi, Gustavo HéctorINFORMATION EXTRACTIONWEB ADAPTATIONREFACTORING FOR USABILITYhttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents? structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements? location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity.Fil: Grigera, Julián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; ArgentinaFil: Gardey, Juan Cruz. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; ArgentinaFil: Garrido, Alejandra. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: Rossi, Gustavo Héctor. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina17th International Conference on Web Information Systems and TechnologiesSetúbalPortugalPolytechnic Institute of SetubalScitePressDomínguez Mayo, Francisco JoséMarchiori, MassimoFilipe, Joaquim2021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectConferenciaBookhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/255175A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements; 17th International Conference on Web Information Systems and Technologies; Setúbal; Portugal; 2021; 174-185978-989-758-536-4CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.scitepress.org/Papers/2021/107163/107163.pdfinfo:eu-repo/semantics/altIdentifier/url/https://dblp.org/rec/conf/webist/2021.htmlinfo:eu-repo/semantics/altIdentifier/url/https://webist.scitevents.org/?y=2021Internacionalinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T10:41:31Zoai:ri.conicet.gov.ar:11336/255175instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 10:41:31.273CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
title A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
spellingShingle A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
Grigera, Julián
INFORMATION EXTRACTION
WEB ADAPTATION
REFACTORING FOR USABILITY
title_short A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
title_full A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
title_fullStr A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
title_full_unstemmed A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
title_sort A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements
dc.creator.none.fl_str_mv Grigera, Julián
Gardey, Juan Cruz
Garrido, Alejandra
Rossi, Gustavo Héctor
author Grigera, Julián
author_facet Grigera, Julián
Gardey, Juan Cruz
Garrido, Alejandra
Rossi, Gustavo Héctor
author_role author
author2 Gardey, Juan Cruz
Garrido, Alejandra
Rossi, Gustavo Héctor
author2_role author
author
author
dc.contributor.none.fl_str_mv Domínguez Mayo, Francisco José
Marchiori, Massimo
Filipe, Joaquim
dc.subject.none.fl_str_mv INFORMATION EXTRACTION
WEB ADAPTATION
REFACTORING FOR USABILITY
topic INFORMATION EXTRACTION
WEB ADAPTATION
REFACTORING FOR USABILITY
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents? structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements? location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity.
Fil: Grigera, Julián. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina
Fil: Gardey, Juan Cruz. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina
Fil: Garrido, Alejandra. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina
Fil: Rossi, Gustavo Héctor. Universidad Nacional de La Plata. Facultad de Informática. Laboratorio de Investigación y Formación en Informática Avanzada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina
17th International Conference on Web Information Systems and Technologies
Setúbal
Portugal
Polytechnic Institute of Setubal
description Most documents in the WWW are generated from templates that represent user interface (UI) elements, and later filled with contents. In the field of information extraction, many approaches emerged to analyze the documents? structure, obtain similar features amongst them, and generate wrappers that are used to extract the raw contents from such documents. Therefore, most techniques documented in the literature are optimized to compare full documents, but there are other fields of applicability that require analyzing structural similarity on smaller UI components, like web augmentation or transcoding. In this paper we present two flexible algorithms to measure similarity between DOM Elements by using a mixed approach that considers both elements? location and inner structure. The proposed algorithms were used in the context of two projects: an approach for automatic usability refactoring, and a web accessibility helper. We also present a wrapper induction technique based on such algorithms. Additionally, we present a precision & recall evaluation of our algorithms as compared with other known approaches, applied to DOM elements of different sizes, but smaller than full scaled documents. The proposed algorithms run in linear time, so they are faster than most approaches that analyze structural similarity.
publishDate 2021
dc.date.none.fl_str_mv 2021
dc.type.none.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/conferenceObject
Conferencia
Book
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
status_str publishedVersion
format conferenceObject
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/255175
A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements; 17th International Conference on Web Information Systems and Technologies; Setúbal; Portugal; 2021; 174-185
978-989-758-536-4
CONICET Digital
CONICET
url http://hdl.handle.net/11336/255175
identifier_str_mv A Scoring Map Algorithm for Automatically Detecting Structural Similarity of DOM Elements; 17th International Conference on Web Information Systems and Technologies; Setúbal; Portugal; 2021; 174-185
978-989-758-536-4
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.scitepress.org/Papers/2021/107163/107163.pdf
info:eu-repo/semantics/altIdentifier/url/https://dblp.org/rec/conf/webist/2021.html
info:eu-repo/semantics/altIdentifier/url/https://webist.scitevents.org/?y=2021
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.coverage.none.fl_str_mv Internacional
dc.publisher.none.fl_str_mv ScitePress
publisher.none.fl_str_mv ScitePress
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1844614446123581440
score 13.070432