Rule-Based Matching for Real Estate Features Detection

Autores
Ibañez Gutkin, Mateo Agustín; Pagano, Álvaro A.; Bazzana Tanevitch, Luciana; Torres, Diego
Año de publicación
2025
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.
Materia
Ciencias de la Computación e Información
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-nd/4.0/
Repositorio
CIC Digital (CICBA)
Institución
Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
OAI Identificador
oai:digital.cic.gba.gob.ar:11746/12661

id CICBA_f2f04ca58f9d862f26e36ef9b7cd5965
oai_identifier_str oai:digital.cic.gba.gob.ar:11746/12661
network_acronym_str CICBA
repository_id_str 9441
network_name_str CIC Digital (CICBA)
spelling Rule-Based Matching for Real Estate Features DetectionIbañez Gutkin, Mateo AgustínPagano, Álvaro A.Bazzana Tanevitch, LucianaTorres, DiegoCiencias de la Computación e InformaciónInformation ExtractionRule-based matchingNatural Language ProcessingKnowledge Graph CompletionMost of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.2025-06info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttps://digital.cic.gba.gob.ar/handle/11746/12661enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2583-1info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-nd/4.0/reponame:CIC Digital (CICBA)instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Airesinstacron:CICBA2026-03-26T11:18:08Zoai:digital.cic.gba.gob.ar:11746/12661Institucionalhttp://digital.cic.gba.gob.arOrganismo científico-tecnológicoNo correspondehttp://digital.cic.gba.gob.ar/oai/snrdmarisa.degiusti@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:94412026-03-26 11:18:08.688CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Airesfalse
dc.title.none.fl_str_mv Rule-Based Matching for Real Estate Features Detection
title Rule-Based Matching for Real Estate Features Detection
spellingShingle Rule-Based Matching for Real Estate Features Detection
Ibañez Gutkin, Mateo Agustín
Ciencias de la Computación e Información
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion
title_short Rule-Based Matching for Real Estate Features Detection
title_full Rule-Based Matching for Real Estate Features Detection
title_fullStr Rule-Based Matching for Real Estate Features Detection
title_full_unstemmed Rule-Based Matching for Real Estate Features Detection
title_sort Rule-Based Matching for Real Estate Features Detection
dc.creator.none.fl_str_mv Ibañez Gutkin, Mateo Agustín
Pagano, Álvaro A.
Bazzana Tanevitch, Luciana
Torres, Diego
author Ibañez Gutkin, Mateo Agustín
author_facet Ibañez Gutkin, Mateo Agustín
Pagano, Álvaro A.
Bazzana Tanevitch, Luciana
Torres, Diego
author_role author
author2 Pagano, Álvaro A.
Bazzana Tanevitch, Luciana
Torres, Diego
author2_role author
author
author
dc.subject.none.fl_str_mv Ciencias de la Computación e Información
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion
topic Ciencias de la Computación e Información
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion
dc.description.none.fl_txt_mv Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.
description Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.
publishDate 2025
dc.date.none.fl_str_mv 2025-06
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv https://digital.cic.gba.gob.ar/handle/11746/12661
url https://digital.cic.gba.gob.ar/handle/11746/12661
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2583-1
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-nd/4.0/
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.format.none.fl_str_mv application/pdf
dc.source.none.fl_str_mv reponame:CIC Digital (CICBA)
instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron:CICBA
reponame_str CIC Digital (CICBA)
collection CIC Digital (CICBA)
instname_str Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
instacron_str CICBA
institution CICBA
repository.name.fl_str_mv CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
repository.mail.fl_str_mv marisa.degiusti@sedici.unlp.edu.ar
_version_ 1860736757804302336
score 13.332987