Rule-Based Matching for Real Estate Features Detection

Autores
Ibáñez Gutkin, Mateo Agustín; Pagano, Alvaro A.; Bazzana Tanevitch, Luciana; Torres, Diego F.
Año de publicación
2025
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.
Instituto de Investigación en Informática
Materia
Ciencias Informáticas
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-nd/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/182588

id SEDICI_c14be410f1dc70bdf130f292f65b500b
oai_identifier_str oai:sedici.unlp.edu.ar:10915/182588
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Rule-Based Matching for Real Estate Features DetectionIbáñez Gutkin, Mateo AgustínPagano, Alvaro A.Bazzana Tanevitch, LucianaTorres, Diego F.Ciencias InformáticasInformation ExtractionRule-based matchingNatural Language ProcessingKnowledge Graph CompletionMost of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.Instituto de Investigación en Informática2025-06info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf49-62http://sedici.unlp.edu.ar/handle/10915/182588enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2583-1info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-nd/4.0/Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:49:52Zoai:sedici.unlp.edu.ar:10915/182588Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:49:52.977SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Rule-Based Matching for Real Estate Features Detection
title Rule-Based Matching for Real Estate Features Detection
spellingShingle Rule-Based Matching for Real Estate Features Detection
Ibáñez Gutkin, Mateo Agustín
Ciencias Informáticas
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion
title_short Rule-Based Matching for Real Estate Features Detection
title_full Rule-Based Matching for Real Estate Features Detection
title_fullStr Rule-Based Matching for Real Estate Features Detection
title_full_unstemmed Rule-Based Matching for Real Estate Features Detection
title_sort Rule-Based Matching for Real Estate Features Detection
dc.creator.none.fl_str_mv Ibáñez Gutkin, Mateo Agustín
Pagano, Alvaro A.
Bazzana Tanevitch, Luciana
Torres, Diego F.
author Ibáñez Gutkin, Mateo Agustín
author_facet Ibáñez Gutkin, Mateo Agustín
Pagano, Alvaro A.
Bazzana Tanevitch, Luciana
Torres, Diego F.
author_role author
author2 Pagano, Alvaro A.
Bazzana Tanevitch, Luciana
Torres, Diego F.
author2_role author
author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion
topic Ciencias Informáticas
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion
dc.description.none.fl_txt_mv Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.
Instituto de Investigación en Informática
description Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.
publishDate 2025
dc.date.none.fl_str_mv 2025-06
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/182588
url http://sedici.unlp.edu.ar/handle/10915/182588
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2583-1
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-nd/4.0/
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.format.none.fl_str_mv application/pdf
49-62
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1844616358470352896
score 13.070432