Rule-Based Matching for Real Estate Features Detection
- Autores
- Ibáñez Gutkin, Mateo Agustín; Pagano, Alvaro A.; Bazzana Tanevitch, Luciana; Torres, Diego F.
- Año de publicación
- 2025
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.
Instituto de Investigación en Informática - Materia
-
Ciencias Informáticas
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-nd/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/182588
Ver los metadatos del registro completo
id |
SEDICI_c14be410f1dc70bdf130f292f65b500b |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/182588 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Rule-Based Matching for Real Estate Features DetectionIbáñez Gutkin, Mateo AgustínPagano, Alvaro A.Bazzana Tanevitch, LucianaTorres, Diego F.Ciencias InformáticasInformation ExtractionRule-based matchingNatural Language ProcessingKnowledge Graph CompletionMost of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.Instituto de Investigación en Informática2025-06info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf49-62http://sedici.unlp.edu.ar/handle/10915/182588enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2583-1info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-nd/4.0/Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-29T11:49:52Zoai:sedici.unlp.edu.ar:10915/182588Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-29 11:49:52.977SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Rule-Based Matching for Real Estate Features Detection |
title |
Rule-Based Matching for Real Estate Features Detection |
spellingShingle |
Rule-Based Matching for Real Estate Features Detection Ibáñez Gutkin, Mateo Agustín Ciencias Informáticas Information Extraction Rule-based matching Natural Language Processing Knowledge Graph Completion |
title_short |
Rule-Based Matching for Real Estate Features Detection |
title_full |
Rule-Based Matching for Real Estate Features Detection |
title_fullStr |
Rule-Based Matching for Real Estate Features Detection |
title_full_unstemmed |
Rule-Based Matching for Real Estate Features Detection |
title_sort |
Rule-Based Matching for Real Estate Features Detection |
dc.creator.none.fl_str_mv |
Ibáñez Gutkin, Mateo Agustín Pagano, Alvaro A. Bazzana Tanevitch, Luciana Torres, Diego F. |
author |
Ibáñez Gutkin, Mateo Agustín |
author_facet |
Ibáñez Gutkin, Mateo Agustín Pagano, Alvaro A. Bazzana Tanevitch, Luciana Torres, Diego F. |
author_role |
author |
author2 |
Pagano, Alvaro A. Bazzana Tanevitch, Luciana Torres, Diego F. |
author2_role |
author author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Information Extraction Rule-based matching Natural Language Processing Knowledge Graph Completion |
topic |
Ciencias Informáticas Information Extraction Rule-based matching Natural Language Processing Knowledge Graph Completion |
dc.description.none.fl_txt_mv |
Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values. Instituto de Investigación en Informática |
description |
Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values. |
publishDate |
2025 |
dc.date.none.fl_str_mv |
2025-06 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/182588 |
url |
http://sedici.unlp.edu.ar/handle/10915/182588 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2583-1 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-nd/4.0/ Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-nd/4.0/ Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) |
dc.format.none.fl_str_mv |
application/pdf 49-62 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1844616358470352896 |
score |
13.070432 |