Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology

Autores
Orellana, Marcos; Jiménez Sarango, Ángel Alberto; Zambrano Martínez, Jorge Luis
Año de publicación
2022
Idioma
inglés
Tipo de recurso
documento de conferencia
Estado
versión publicada
Descripción
In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.
Instituto de Investigación en Informática
Materia
Ciencias Informáticas
Automatic Speech Recognition
Word Error Rate
Speech enhancement algorithms
Audio quality improvement
Nivel de accesibilidad
acceso abierto
Condiciones de uso
http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
SEDICI (UNLP)
Institución
Universidad Nacional de La Plata
OAI Identificador
oai:sedici.unlp.edu.ar:10915/140657

id SEDICI_73a177b24e44714464126a013ebd523b
oai_identifier_str oai:sedici.unlp.edu.ar:10915/140657
network_acronym_str SEDICI
repository_id_str 1329
network_name_str SEDICI (UNLP)
spelling Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technologyOrellana, MarcosJiménez Sarango, Ángel AlbertoZambrano Martínez, Jorge LuisCiencias InformáticasAutomatic Speech RecognitionWord Error RateSpeech enhancement algorithmsAudio quality improvementIn recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.Instituto de Investigación en Informática2022-07info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf64-69http://sedici.unlp.edu.ar/handle/10915/140657enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2126-0info:eu-repo/semantics/reference/hdl/10915/139373info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T11:07:47Zoai:sedici.unlp.edu.ar:10915/140657Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 11:07:47.63SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
spellingShingle Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
Orellana, Marcos
Ciencias Informáticas
Automatic Speech Recognition
Word Error Rate
Speech enhancement algorithms
Audio quality improvement
title_short Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_full Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_fullStr Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_full_unstemmed Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_sort Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
dc.creator.none.fl_str_mv Orellana, Marcos
Jiménez Sarango, Ángel Alberto
Zambrano Martínez, Jorge Luis
author Orellana, Marcos
author_facet Orellana, Marcos
Jiménez Sarango, Ángel Alberto
Zambrano Martínez, Jorge Luis
author_role author
author2 Jiménez Sarango, Ángel Alberto
Zambrano Martínez, Jorge Luis
author2_role author
author
dc.subject.none.fl_str_mv Ciencias Informáticas
Automatic Speech Recognition
Word Error Rate
Speech enhancement algorithms
Audio quality improvement
topic Ciencias Informáticas
Automatic Speech Recognition
Word Error Rate
Speech enhancement algorithms
Audio quality improvement
dc.description.none.fl_txt_mv In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.
Instituto de Investigación en Informática
description In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.
publishDate 2022
dc.date.none.fl_str_mv 2022-07
dc.type.none.fl_str_mv info:eu-repo/semantics/conferenceObject
info:eu-repo/semantics/publishedVersion
Objeto de conferencia
http://purl.org/coar/resource_type/c_5794
info:ar-repo/semantics/documentoDeConferencia
format conferenceObject
status_str publishedVersion
dc.identifier.none.fl_str_mv http://sedici.unlp.edu.ar/handle/10915/140657
url http://sedici.unlp.edu.ar/handle/10915/140657
dc.language.none.fl_str_mv eng
language eng
dc.relation.none.fl_str_mv info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2126-0
info:eu-repo/semantics/reference/hdl/10915/139373
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv openAccess
rights_invalid_str_mv http://creativecommons.org/licenses/by-nc-sa/4.0/
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv application/pdf
64-69
dc.source.none.fl_str_mv reponame:SEDICI (UNLP)
instname:Universidad Nacional de La Plata
instacron:UNLP
reponame_str SEDICI (UNLP)
collection SEDICI (UNLP)
instname_str Universidad Nacional de La Plata
instacron_str UNLP
institution UNLP
repository.name.fl_str_mv SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv alira@sedici.unlp.edu.ar
_version_ 1842260581696929792
score 13.13397