EpaDB: A database for development of pronunciation assessment systems
- Autores
- Vidal Dominguez, Jazmin; Ferrer, Luciana; Brambilla, Leonardo Miguel
- Año de publicación
- 2021
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.
Fil: Vidal Dominguez, Jazmin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Brambilla, Leonardo Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language
Graz
Austria
International Speech Communication Association - Materia
-
Computer assisted language learning
Phonelevel pronunciation assessment
Resources - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/161618
Ver los metadatos del registro completo
id |
CONICETDig_718d80e6dcb6d5fdc6108004be3d7adf |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/161618 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
EpaDB: A database for development of pronunciation assessment systemsVidal Dominguez, JazminFerrer, LucianaBrambilla, Leonardo MiguelComputer assisted language learningPhonelevel pronunciation assessmentResourceshttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.Fil: Vidal Dominguez, Jazmin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Brambilla, Leonardo Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and LanguageGrazAustriaInternational Speech Communication AssociationInternational Speech Communication Association2021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectConferenciaJournalhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/161618EpaDB: A database for development of pronunciation assessment systems; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language; Graz; Austria; 2019; 589-593CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2019/vidal19_interspeech.htmlinfo:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2019-1839Internacionalinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-03T10:08:48Zoai:ri.conicet.gov.ar:11336/161618instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-03 10:08:48.977CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
EpaDB: A database for development of pronunciation assessment systems |
title |
EpaDB: A database for development of pronunciation assessment systems |
spellingShingle |
EpaDB: A database for development of pronunciation assessment systems Vidal Dominguez, Jazmin Computer assisted language learning Phonelevel pronunciation assessment Resources |
title_short |
EpaDB: A database for development of pronunciation assessment systems |
title_full |
EpaDB: A database for development of pronunciation assessment systems |
title_fullStr |
EpaDB: A database for development of pronunciation assessment systems |
title_full_unstemmed |
EpaDB: A database for development of pronunciation assessment systems |
title_sort |
EpaDB: A database for development of pronunciation assessment systems |
dc.creator.none.fl_str_mv |
Vidal Dominguez, Jazmin Ferrer, Luciana Brambilla, Leonardo Miguel |
author |
Vidal Dominguez, Jazmin |
author_facet |
Vidal Dominguez, Jazmin Ferrer, Luciana Brambilla, Leonardo Miguel |
author_role |
author |
author2 |
Ferrer, Luciana Brambilla, Leonardo Miguel |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Computer assisted language learning Phonelevel pronunciation assessment Resources |
topic |
Computer assisted language learning Phonelevel pronunciation assessment Resources |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors. Fil: Vidal Dominguez, Jazmin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina Fil: Brambilla, Leonardo Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language Graz Austria International Speech Communication Association |
description |
In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors. |
publishDate |
2021 |
dc.date.none.fl_str_mv |
2021 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/conferenceObject Conferencia Journal http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
status_str |
publishedVersion |
format |
conferenceObject |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/161618 EpaDB: A database for development of pronunciation assessment systems; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language; Graz; Austria; 2019; 589-593 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/161618 |
identifier_str_mv |
EpaDB: A database for development of pronunciation assessment systems; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language; Graz; Austria; 2019; 589-593 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2019/vidal19_interspeech.html info:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2019-1839 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.coverage.none.fl_str_mv |
Internacional |
dc.publisher.none.fl_str_mv |
International Speech Communication Association |
publisher.none.fl_str_mv |
International Speech Communication Association |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1842270058060972032 |
score |
13.13397 |