EpaDB: A database for development of pronunciation assessment systems

Autores: Vidal Dominguez, Jazmin; Ferrer, Luciana; Brambilla, Leonardo Miguel
Año de publicación: 2021
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.
Fil: Vidal Dominguez, Jazmin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
Fil: Brambilla, Leonardo Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language
Graz
Austria
International Speech Communication Association
Materia: Computer assisted language learning
Phonelevel pronunciation assessment
Resources
Nivel de accesibilidad: acceso abierto
Condiciones de uso: https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
Repositorio
Institución: Consejo Nacional de Investigaciones Científicas y Técnicas
OAI Identificador: oai:ri.conicet.gov.ar:11336/161618

Acceder

id	CONICETDig_718d80e6dcb6d5fdc6108004be3d7adf
oai_identifier_str	oai:ri.conicet.gov.ar:11336/161618
network_acronym_str	CONICETDig
repository_id_str	3498
network_name_str	CONICET Digital (CONICET)
spelling	EpaDB: A database for development of pronunciation assessment systemsVidal Dominguez, JazminFerrer, LucianaBrambilla, Leonardo MiguelComputer assisted language learningPhonelevel pronunciation assessmentResourceshttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.Fil: Vidal Dominguez, Jazmin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; ArgentinaFil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Brambilla, Leonardo Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and LanguageGrazAustriaInternational Speech Communication AssociationInternational Speech Communication Association2021info:eu-repo/semantics/publishedVersioninfo:eu-repo/semantics/conferenceObjectConferenciaJournalhttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/161618EpaDB: A database for development of pronunciation assessment systems; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language; Graz; Austria; 2019; 589-593CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2019/vidal19_interspeech.htmlinfo:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2019-1839Internacionalinfo:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-sa/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2026-06-04T11:15:26Zoai:ri.conicet.gov.ar:11336/161618instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982026-06-04 11:15:27.226CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse
dc.title.none.fl_str_mv	EpaDB: A database for development of pronunciation assessment systems
title	EpaDB: A database for development of pronunciation assessment systems
spellingShingle	EpaDB: A database for development of pronunciation assessment systems Vidal Dominguez, Jazmin Computer assisted language learning Phonelevel pronunciation assessment Resources
title_short	EpaDB: A database for development of pronunciation assessment systems
title_full	EpaDB: A database for development of pronunciation assessment systems
title_fullStr	EpaDB: A database for development of pronunciation assessment systems
title_full_unstemmed	EpaDB: A database for development of pronunciation assessment systems
title_sort	EpaDB: A database for development of pronunciation assessment systems
dc.creator.none.fl_str_mv	Vidal Dominguez, Jazmin Ferrer, Luciana Brambilla, Leonardo Miguel
author	Vidal Dominguez, Jazmin
author_facet	Vidal Dominguez, Jazmin Ferrer, Luciana Brambilla, Leonardo Miguel
author_role	author
author2	Ferrer, Luciana Brambilla, Leonardo Miguel
author2_role	author author
dc.subject.none.fl_str_mv	Computer assisted language learning Phonelevel pronunciation assessment Resources
topic	Computer assisted language learning Phonelevel pronunciation assessment Resources
purl_subject.fl_str_mv	https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1
dc.description.none.fl_txt_mv	In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors. Fil: Vidal Dominguez, Jazmin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina Fil: Brambilla, Leonardo Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language Graz Austria International Speech Communication Association
description	In this paper, we describe the methodology for collecting and annotating a new database designed for conducting research and development on pronunciation assessment. While a significant amount of research has been done in the area of pronunciation assessment, to our knowledge, no database is available for public use for research in the field. Considering this need, we created EpaDB (English Pronunciation by Argentinians Database), which is composed of English phrases read by native Spanish speakers with different levels of English proficiency. The recordings are annotated with ratings of pronunciation quality at phrase-level and detailed phonetic alignments and transcriptions indicating which phones were actually pronounced by the speakers. We present inter-rater agreement, the effect of each phone on overall perceived non-nativeness, and the frequency of specific pronunciation errors.
publishDate	2021
dc.date.none.fl_str_mv	2021
dc.type.none.fl_str_mv	info:eu-repo/semantics/publishedVersion info:eu-repo/semantics/conferenceObject Conferencia Journal http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
status_str	publishedVersion
format	conferenceObject
dc.identifier.none.fl_str_mv	http://hdl.handle.net/11336/161618 EpaDB: A database for development of pronunciation assessment systems; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language; Graz; Austria; 2019; 589-593 CONICET Digital CONICET
url	http://hdl.handle.net/11336/161618
identifier_str_mv	EpaDB: A database for development of pronunciation assessment systems; 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language; Graz; Austria; 2019; 589-593 CONICET Digital CONICET
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/url/https://www.isca-speech.org/archive/interspeech_2019/vidal19_interspeech.html info:eu-repo/semantics/altIdentifier/doi/10.21437/Interspeech.2019-1839
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
eu_rights_str_mv	openAccess
rights_invalid_str_mv	https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.format.none.fl_str_mv	application/pdf application/pdf
dc.coverage.none.fl_str_mv	Internacional
dc.publisher.none.fl_str_mv	International Speech Communication Association
publisher.none.fl_str_mv	International Speech Communication Association
dc.source.none.fl_str_mv	reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas
reponame_str	CONICET Digital (CONICET)
collection	CONICET Digital (CONICET)
instname_str	Consejo Nacional de Investigaciones Científicas y Técnicas
repository.name.fl_str_mv	CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas
repository.mail.fl_str_mv	dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar
_version_	1867099434068738048
score	12.832306

EpaDB: A database for development of pronunciation assessment systems

Publicaciones similares