EpaDB: A database for development of pronunciation assessment systems

Autores
Vidal Dominguez, Jazmin; Ferrer, Luciana; Brambilla, Leonardo Miguel
Año de publicación
2021
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.
Fil: Vidal Dominguez, Jazmin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Brambilla, Leonardo Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language
Graz
Austria
International Speech Communication Association
Materia
Computer assisted language learning
Phonelevel pronunciation assessment
Resources
Nivel de accesibilidad
acceso abierto
Condiciones de uso
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
CONICET Digital (CONICET)
Institución
Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador
oai:ri.conicet.gov.ar:11336/161618

id CONICETDig_718d80e6dcb6d5fdc6108004be3d7adf
oai_identifier_str oai:ri.conicet.gov.ar:11336/161618
network_acronym_str CONICETDig
repository_id_str 3498
network_name_str CONICET Digital (CONICET)
spelling EpaDB: A database for development of pronunciation assessment systemsVidal Dominguez, JazminFerrer, LucianaBrambilla, Leonardo MiguelComputer assisted language learningPhonelevel pronunciation assessmentResourceshttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.Fil: Vidal Dominguez, Jazmin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Brambilla, Leonardo Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and LanguageGrazAustriaInternational Speech Communication AssociationInternational Speech Communication Association2021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectConferenciaJournalhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/161618EpaDB: A database for development of pronunciation assessment systems; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language; Graz; Austria; 2019; 589-593CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2019/vidal19_interspeech.htmlinfo:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2019-1839Internacionalinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T10:08:48Zoai:ri.conicet.gov.ar:11336/161618instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 10:08:48.977CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv EpaDB: A database for development of pronunciation assessment systems
title EpaDB: A database for development of pronunciation assessment systems
spellingShingle EpaDB: A database for development of pronunciation assessment systems
Vidal Dominguez, Jazmin
Computer assisted language learning
Phonelevel pronunciation assessment
Resources
title_short EpaDB: A database for development of pronunciation assessment systems
title_full EpaDB: A database for development of pronunciation assessment systems
title_fullStr EpaDB: A database for development of pronunciation assessment systems
title_full_unstemmed EpaDB: A database for development of pronunciation assessment systems
title_sort EpaDB: A database for development of pronunciation assessment systems
dc.creator.none.fl_str_mv Vidal Dominguez, Jazmin
Ferrer, Luciana
Brambilla, Leonardo Miguel
author Vidal Dominguez, Jazmin
author_facet Vidal Dominguez, Jazmin
Ferrer, Luciana
Brambilla, Leonardo Miguel
author_role author
author2 Ferrer, Luciana
Brambilla, Leonardo Miguel
author2_role author
author
dc.subject.none.fl_str_mv Computer assisted language learning
Phonelevel pronunciation assessment
Resources
topic Computer assisted language learning
Phonelevel pronunciation assessment
Resources
purl_subject.fl_str_mv https://purl.org/becyt/ford/1.2
https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.
Fil: Vidal Dominguez, Jazmin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Brambilla, Leonardo Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language
Graz
Austria
International Speech Communication Association
description In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.
publishDate 2021
dc.date.none.fl_str_mv 2021
dc.type.none.fl_str_mv info:eu-repo/semantics/publishedVersion
info:eu-repo/semantics/conferenceObject
Conferencia
Journal
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
status_str publishedVersion
format conferenceObject
dc.identifier.none.fl_str_mv http://hdl.handle.net/11336/161618
EpaDB: A database for development of pronunciation assessment systems; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language; Graz; Austria; 2019; 589-593
CONICET Digital
CONICET
url http://hdl.handle.net/11336/161618
identifier_str_mv EpaDB: A database for development of pronunciation assessment systems; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language; Graz; Austria; 2019; 589-593
CONICET Digital
CONICET
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2019/vidal19_interspeech.html
info:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2019-1839
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv openAccess
rights_invalid_str_mv https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv application/pdf
application/pdf
dc.coverage.none.fl_str_mv Internacional
dc.publisher.none.fl_str_mv International Speech Communication Association
publisher.none.fl_str_mv International Speech Communication Association
dc.source.none.fl_str_mv reponame:CONICET Digital (CONICET)
instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str CONICET Digital (CONICET)
collection CONICET Digital (CONICET)
instname_str Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_ 1842270058060972032
score 13.13397