Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
- Autores
- Orellana, Marcos; Jiménez Sarango, Ángel Alberto; Zambrano Martínez, Jorge Luis
- Año de publicación
- 2022
- Idioma
- inglés
- Tipo de recurso
- documento de conferencia
- Estado
- versión publicada
- Descripción
- In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.
Instituto de Investigación en Informática - Materia
-
Ciencias Informáticas
Automatic Speech Recognition
Word Error Rate
Speech enhancement algorithms
Audio quality improvement - Nivel de accesibilidad
- acceso abierto
- Condiciones de uso
- http://creativecommons.org/licenses/by-nc-sa/4.0/
- Repositorio
- Institución
- Universidad Nacional de La Plata
- OAI Identificador
- oai:sedici.unlp.edu.ar:10915/140657
Ver los metadatos del registro completo
id |
SEDICI_73a177b24e44714464126a013ebd523b |
---|---|
oai_identifier_str |
oai:sedici.unlp.edu.ar:10915/140657 |
network_acronym_str |
SEDICI |
repository_id_str |
1329 |
network_name_str |
SEDICI (UNLP) |
spelling |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technologyOrellana, MarcosJiménez Sarango, Ángel AlbertoZambrano Martínez, Jorge LuisCiencias InformáticasAutomatic Speech RecognitionWord Error RateSpeech enhancement algorithmsAudio quality improvementIn recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.Instituto de Investigación en Informática2022-07info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf64-69http://sedici.unlp.edu.ar/handle/10915/140657enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2126-0info:eu-repo/semantics/reference/hdl/10915/139373info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2025-09-03T11:07:47Zoai:sedici.unlp.edu.ar:10915/140657Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292025-09-03 11:07:47.63SEDICI (UNLP) - Universidad Nacional de La Platafalse |
dc.title.none.fl_str_mv |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
spellingShingle |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology Orellana, Marcos Ciencias Informáticas Automatic Speech Recognition Word Error Rate Speech enhancement algorithms Audio quality improvement |
title_short |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title_full |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title_fullStr |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title_full_unstemmed |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
title_sort |
Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology |
dc.creator.none.fl_str_mv |
Orellana, Marcos Jiménez Sarango, Ángel Alberto Zambrano Martínez, Jorge Luis |
author |
Orellana, Marcos |
author_facet |
Orellana, Marcos Jiménez Sarango, Ángel Alberto Zambrano Martínez, Jorge Luis |
author_role |
author |
author2 |
Jiménez Sarango, Ángel Alberto Zambrano Martínez, Jorge Luis |
author2_role |
author author |
dc.subject.none.fl_str_mv |
Ciencias Informáticas Automatic Speech Recognition Word Error Rate Speech enhancement algorithms Audio quality improvement |
topic |
Ciencias Informáticas Automatic Speech Recognition Word Error Rate Speech enhancement algorithms Audio quality improvement |
dc.description.none.fl_txt_mv |
In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio. Instituto de Investigación en Informática |
description |
In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio. |
publishDate |
2022 |
dc.date.none.fl_str_mv |
2022-07 |
dc.type.none.fl_str_mv |
info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia |
format |
conferenceObject |
status_str |
publishedVersion |
dc.identifier.none.fl_str_mv |
http://sedici.unlp.edu.ar/handle/10915/140657 |
url |
http://sedici.unlp.edu.ar/handle/10915/140657 |
dc.language.none.fl_str_mv |
eng |
language |
eng |
dc.relation.none.fl_str_mv |
info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2126-0 info:eu-repo/semantics/reference/hdl/10915/139373 |
dc.rights.none.fl_str_mv |
info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
eu_rights_str_mv |
openAccess |
rights_invalid_str_mv |
http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.format.none.fl_str_mv |
application/pdf 64-69 |
dc.source.none.fl_str_mv |
reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP |
reponame_str |
SEDICI (UNLP) |
collection |
SEDICI (UNLP) |
instname_str |
Universidad Nacional de La Plata |
instacron_str |
UNLP |
institution |
UNLP |
repository.name.fl_str_mv |
SEDICI (UNLP) - Universidad Nacional de La Plata |
repository.mail.fl_str_mv |
alira@sedici.unlp.edu.ar |
_version_ |
1842260581696929792 |
score |
13.13397 |