Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
- Autores
- Ferrer, Luciana; Bratt, Harry; Richey, Colleen; Franco, Horacio; Abrash, Victor; Precoda, Kristin
- Año de publicación
- 2015
- Idioma
- inglés
- Tipo de recurso
- artículo
- Estado
- versión publicada
- Descripción
- We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features.
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Sri International; Estados Unidos
Fil: Bratt, Harry. Sri International; Estados Unidos
Fil: Richey, Colleen. Sri International; Estados Unidos
Fil: Franco, Horacio. Sri International; Estados Unidos
Fil: Abrash, Victor. Sri International; Estados Unidos
Fil: Precoda, Kristin. Sri International; Estados Unidos - Materia
-
Computer-Assisted Language Learning
Gaussian Mixture Models
Lexical Stress Detection
Mel Frequency Cepstral Coefficients
Prosodic Features - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- https://creativecommons.org/licenses/by-nc-nd/2.5/ar/
- Repositorio
- Institución
- Consejo Nacional de Investigaciones Científicas y Técnicas
- OAI Identificador
- oai:ri.conicet.gov.ar:11336/38100
Ver los metadatos del registro completo
id |
CONICETDig_c589597e81166969669cdc55daf026ce |
---|---|
oai_identifier_str |
oai:ri.conicet.gov.ar:11336/38100 |
network_acronym_str |
CONICETDig |
repository_id_str |
3498 |
network_name_str |
CONICET Digital (CONICET) |
spelling |
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systemsFerrer, LucianaBratt, HarryRichey, ColleenFranco, HoracioAbrash, VictorPrecoda, KristinComputer-Assisted Language LearningGaussian Mixture ModelsLexical Stress DetectionMel Frequency Cepstral CoefficientsProsodic Featureshttps://purl.org/becyt/ford/1.2https://purl.org/becyt/ford/1We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features.Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Sri International; Estados UnidosFil: Bratt, Harry. Sri International; Estados UnidosFil: Richey, Colleen. Sri International; Estados UnidosFil: Franco, Horacio. Sri International; Estados UnidosFil: Abrash, Victor. Sri International; Estados UnidosFil: Precoda, Kristin. Sri International; Estados UnidosElsevier Science2015-02info:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionhttp://purl.org/coar/resource_type/c_6501info:ar-repo/semantics/articuloapplication/pdfapplication/pdfhttp://hdl.handle.net/11336/38100Ferrer, Luciana; Bratt, Harry; Richey, Colleen; Franco, Horacio; Abrash, Victor; et al.; Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems; Elsevier Science; Speech Communication; 69; 2-2015; 31-450167-6393CONICET DigitalCONICETenginfo:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167639315000151info:eu-repo/semantics/altIdentifier/doi/10.1016/j.specom.2015.02.002info:eu-repo/semantics/openAccesshttps://creativecommons.org/licenses/by-nc-nd/2.5/ar/reponame:CONICET Digital (CONICET)instname:Consejo Nacional de Investigaciones Científicas y Técnicas2025-09-29T09:48:04Zoai:ri.conicet.gov.ar:11336/38100instacron:CONICETInstitucionalhttp://ri.conicet.gov.ar/Organismo científico-tecnológicoNo correspondehttp://ri.conicet.gov.ar/oai/requestdasensio@conicet.gov.ar; lcarlino@conicet.gov.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:34982025-09-29 09:48:04.824CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicasfalse |
dc.title.none.fl_str_mv |
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems |
title |
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems |
spellingShingle |
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems Ferrer, Luciana Computer-Assisted Language Learning Gaussian Mixture Models Lexical Stress Detection Mel Frequency Cepstral Coefficients Prosodic Features |
title_short |
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems |
title_full |
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems |
title_fullStr |
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems |
title_full_unstemmed |
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems |
title_sort |
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems |
dc.creator.none.fl_str_mv |
Ferrer, Luciana Bratt, Harry Richey, Colleen Franco, Horacio Abrash, Victor Precoda, Kristin |
author |
Ferrer, Luciana |
author_facet |
Ferrer, Luciana Bratt, Harry Richey, Colleen Franco, Horacio Abrash, Victor Precoda, Kristin |
author_role |
author |
author2 |
Bratt, Harry Richey, Colleen Franco, Horacio Abrash, Victor Precoda, Kristin |
author2_role |
author author author author author |
dc.subject.none.fl_str_mv |
Computer-Assisted Language Learning Gaussian Mixture Models Lexical Stress Detection Mel Frequency Cepstral Coefficients Prosodic Features |
topic |
Computer-Assisted Language Learning Gaussian Mixture Models Lexical Stress Detection Mel Frequency Cepstral Coefficients Prosodic Features |
purl_subject.fl_str_mv |
https://purl.org/becyt/ford/1.2 https://purl.org/becyt/ford/1 |
dc.description.none.fl_txt_mv |
We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features. Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Sri International; Estados Unidos Fil: Bratt, Harry. Sri International; Estados Unidos Fil: Richey, Colleen. Sri International; Estados Unidos Fil: Franco, Horacio. Sri International; Estados Unidos Fil: Abrash, Victor. Sri International; Estados Unidos Fil: Precoda, Kristin. Sri International; Estados Unidos |
description |
We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features. |
publishDate |
2015 |
dc.date.none.fl_str_mv |
2015-02 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/article info:eu-repo/semantics/publishedVersion http://purl.org/coar/resource_type/c_6501 info:ar-repo/semantics/articulo |
format |
article |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://hdl.handle.net/11336/38100 Ferrer, Luciana; Bratt, Harry; Richey, Colleen; Franco, Horacio; Abrash, Victor; et al.; Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems; Elsevier Science; Speech Communication; 69; 2-2015; 31-45 0167-6393 CONICET Digital CONICET |
url |
http://hdl.handle.net/11336/38100 |
identifier_str_mv |
Ferrer, Luciana; Bratt, Harry; Richey, Colleen; Franco, Horacio; Abrash, Victor; et al.; Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems; Elsevier Science; Speech Communication; 69; 2-2015; 31-45 0167-6393 CONICET Digital CONICET |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167639315000151 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.specom.2015.02.002 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/ |
dc.format.none.fl_str_mv |
application/pdf application/pdf |
dc.publisher.none.fl_str_mv |
Elsevier Science |
publisher.none.fl_str_mv |
Elsevier Science |
dc.source.none.fl_str_mv |
reponame:CONICET Digital (CONICET) instname:Consejo Nacional de Investigaciones Científicas y Técnicas |
reponame_str |
CONICET Digital (CONICET) |
collection |
CONICET Digital (CONICET) |
instname_str |
Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.name.fl_str_mv |
CONICET Digital (CONICET) - Consejo Nacional de Investigaciones Científicas y Técnicas |
repository.mail.fl_str_mv |
dasensio@conicet.gov.ar; lcarlino@conicet.gov.ar |
_version_ |
1844613495220338688 |
score |
13.070432 |