Rule-Based Matching for Real Estate Features Detection
- Autores
- Ibañez Gutkin, Mateo Agustín; Pagano, Álvaro A.; Bazzana Tanevitch, Luciana; Torres, Diego
- Año de publicación
- 2025
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.
- Materia
-
Ciencias de la Computación e Información
Information Extraction
Rule-based matching
Natural Language Processing
Knowledge Graph Completion - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-nd/4.0/
- Repositorio
.jpg)
- Institución
- Comisión de Investigaciones Científicas de la Provincia de Buenos Aires
- OAI Identificador
- oai:digital.cic.gba.gob.ar:11746/12661
Ver los metadatos del registro completo
| id |
CICBA_f2f04ca58f9d862f26e36ef9b7cd5965 |
|---|---|
| oai_identifier_str |
oai:digital.cic.gba.gob.ar:11746/12661 |
| network_acronym_str |
CICBA |
| repository_id_str |
9441 |
| network_name_str |
CIC Digital (CICBA) |
| spelling |
Rule-Based Matching for Real Estate Features DetectionIbañez Gutkin, Mateo AgustínPagano, Álvaro A.Bazzana Tanevitch, LucianaTorres, DiegoCiencias de la Computación e InformaciónInformation ExtractionRule-based matchingNatural Language ProcessingKnowledge Graph CompletionMost of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values.2025-06info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfhttps://digital.cic.gba.gob.ar/handle/11746/12661enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2583-1info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-nd/4.0/reponame:CIC Digital (CICBA)instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Airesinstacron:CICBA2026-03-26T11:18:08Zoai:digital.cic.gba.gob.ar:11746/12661Institucionalhttp://digital.cic.gba.gob.arOrganismo científico-tecnológicoNo correspondehttp://digital.cic.gba.gob.ar/oai/snrdmarisa.degiusti@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:94412026-03-26 11:18:08.688CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Airesfalse |
| dc.title.none.fl_str_mv |
Rule-Based Matching for Real Estate Features Detection |
| title |
Rule-Based Matching for Real Estate Features Detection |
| spellingShingle |
Rule-Based Matching for Real Estate Features Detection Ibañez Gutkin, Mateo Agustín Ciencias de la Computación e Información Information Extraction Rule-based matching Natural Language Processing Knowledge Graph Completion |
| title_short |
Rule-Based Matching for Real Estate Features Detection |
| title_full |
Rule-Based Matching for Real Estate Features Detection |
| title_fullStr |
Rule-Based Matching for Real Estate Features Detection |
| title_full_unstemmed |
Rule-Based Matching for Real Estate Features Detection |
| title_sort |
Rule-Based Matching for Real Estate Features Detection |
| dc.creator.none.fl_str_mv |
Ibañez Gutkin, Mateo Agustín Pagano, Álvaro A. Bazzana Tanevitch, Luciana Torres, Diego |
| author |
Ibañez Gutkin, Mateo Agustín |
| author_facet |
Ibañez Gutkin, Mateo Agustín Pagano, Álvaro A. Bazzana Tanevitch, Luciana Torres, Diego |
| author_role |
author |
| author2 |
Pagano, Álvaro A. Bazzana Tanevitch, Luciana Torres, Diego |
| author2_role |
author author author |
| dc.subject.none.fl_str_mv |
Ciencias de la Computación e Información Information Extraction Rule-based matching Natural Language Processing Knowledge Graph Completion |
| topic |
Ciencias de la Computación e Información Information Extraction Rule-based matching Natural Language Processing Knowledge Graph Completion |
| dc.description.none.fl_txt_mv |
Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values. |
| description |
Most of the information about real estate for sale in the Buenos Aires province, Argentina is unstructured, which means that it does not always follow the same format, making extraction a challenging process. Variability in wording, human errors, noise, and incomplete data further complicate the task. Given the large volume of information available, automated techniques are required to transform unstructured text into structured data. This article presents an approach to extract attribute-value pairs from the information contained in the property listings for the province of Buenos Aires, in order to incorporate this data into a knowledge graph. The approach uses pattern-based information extraction for 17 features with an exhaustive evaluation over two datasets: a ground truth labeled by experts and a dataset containing a real-world use case. The results demonstrates accurate values. |
| publishDate |
2025 |
| dc.date.none.fl_str_mv |
2025-06 |
| dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
| format |
conferenceObject |
| status_str |
publishedVersion |
| dc.identifier.none.fl_str_mv |
https://digital.cic.gba.gob.ar/handle/11746/12661 |
| url |
https://digital.cic.gba.gob.ar/handle/11746/12661 |
| dc.language.none.fl_str_mv |
eng |
| language |
eng |
| dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2583-1 |
| dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-nd/4.0/ |
| eu_rights_str_mv |
openAccess |
| rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-nd/4.0/ |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.source.none.fl_str_mv |
reponame:CIC Digital (CICBA) instname:Comisión de Investigaciones Científicas de la Provincia de Buenos Aires instacron:CICBA |
| reponame_str |
CIC Digital (CICBA) |
| collection |
CIC Digital (CICBA) |
| instname_str |
Comisión de Investigaciones Científicas de la Provincia de Buenos Aires |
| instacron_str |
CICBA |
| institution |
CICBA |
| repository.name.fl_str_mv |
CIC Digital (CICBA) - Comisión de Investigaciones Científicas de la Provincia de Buenos Aires |
| repository.mail.fl_str_mv |
marisa.degiusti@sedici.unlp.edu.ar |
| _version_ |
1860736757804302336 |
| score |
13.332987 |