Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology

Autores: Orellana, Marcos; Jiménez Sarango, Ángel Alberto; Zambrano Martínez, Jorge Luis
Año de publicación: 2022
Idioma: inglés
Tipo de recurso: documento de conferencia
Estado: versión publicada
Descripción: In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.
Instituto de Investigación en Informática
Materia: Ciencias Informáticas
Automatic Speech Recognition
Word Error Rate
Speech enhancement algorithms
Audio quality improvement
Nivel de accesibilidad: acceso abierto
Condiciones de uso: http://creativecommons.org/licenses/by-nc-sa/4.0/
Repositorio
Institución: Universidad Nacional de La Plata
OAI Identificador: oai:sedici.unlp.edu.ar:10915/140657

Acceder

id	SEDICI_73a177b24e44714464126a013ebd523b
oai_identifier_str	oai:sedici.unlp.edu.ar:10915/140657
network_acronym_str	SEDICI
repository_id_str	1329
network_name_str	SEDICI (UNLP)
spelling	Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technologyOrellana, MarcosJiménez Sarango, Ángel AlbertoZambrano Martínez, Jorge LuisCiencias InformáticasAutomatic Speech RecognitionWord Error RateSpeech enhancement algorithmsAudio quality improvementIn recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.Instituto de Investigación en Informática2022-07info:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/publishedVersionObjeto de conferenciahttp://purl.org/coar/resource_type/c_5794info:ar-repo/semantics/documentoDeConferenciaapplication/pdf64-69http://sedici.unlp.edu.ar/handle/10915/140657enginfo:eu-repo/semantics/altIdentifier/isbn/978-950-34-2126-0info:eu-repo/semantics/reference/hdl/10915/139373info:eu-repo/semantics/openAccesshttp://creativecommons.org/licenses/by-nc-sa/4.0/Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)reponame:SEDICI (UNLP)instname:Universidad Nacional de La Platainstacron:UNLP2026-05-06T12:43:37Zoai:sedici.unlp.edu.ar:10915/140657Institucionalhttp://sedici.unlp.edu.ar/Universidad públicaNo correspondehttp://sedici.unlp.edu.ar/oai/snrdalira@sedici.unlp.edu.arArgentinaNo correspondeNo correspondeNo correspondeopendoar:13292026-05-06 12:43:37.731SEDICI (UNLP) - Universidad Nacional de La Platafalse
dc.title.none.fl_str_mv	Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title	Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
spellingShingle	Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology Orellana, Marcos Ciencias Informáticas Automatic Speech Recognition Word Error Rate Speech enhancement algorithms Audio quality improvement
title_short	Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_full	Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_fullStr	Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_full_unstemmed	Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
title_sort	Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology
dc.creator.none.fl_str_mv	Orellana, Marcos Jiménez Sarango, Ángel Alberto Zambrano Martínez, Jorge Luis
author	Orellana, Marcos
author_facet	Orellana, Marcos Jiménez Sarango, Ángel Alberto Zambrano Martínez, Jorge Luis
author_role	author
author2	Jiménez Sarango, Ángel Alberto Zambrano Martínez, Jorge Luis
author2_role	author author
dc.subject.none.fl_str_mv	Ciencias Informáticas Automatic Speech Recognition Word Error Rate Speech enhancement algorithms Audio quality improvement
topic	Ciencias Informáticas Automatic Speech Recognition Word Error Rate Speech enhancement algorithms Audio quality improvement
dc.description.none.fl_txt_mv	In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio. Instituto de Investigación en Informática
description	In recent years, Automatic Speech Recognition (ASR) services have performed notable progress in the research efforts of big companies such as Google and Amazon. However, the ASRs are still sensitive to the audio processing quality in other languages. To solve this issue, various speech enhancement algorithms that are the most prominent in improving speech intelligibility were proposed, such as Singular Value Decomposition (SVD), log Minimum Mean Square Error (log-MMSE) and Wiener. By preprocessing the audio files with these algorithms, we seek to reduce the Word Error Rate (WER), which compares the transcription performed by the ASR against a manual transcription. Thus, we can determine the percentage of error that the ASR service has acquired. Results demonstrated that Google is more efficient than Amazon and Vosk counterparts. Also, we decided that applying a Low-pass filter combined with a log-MMSE algorithm to the audio files can substantially reduce the WER percentage of transcription depending on the noise characteristics contained in the audio.
publishDate	2022
dc.date.none.fl_str_mv	2022-07
dc.type.none.fl_str_mv	info:eu-repo/semantics/conferenceObject info:eu-repo/semantics/publishedVersion Objeto de conferencia http://purl.org/coar/resource_type/c_5794 info:ar-repo/semantics/documentoDeConferencia
format	conferenceObject
status_str	publishedVersion
dc.identifier.none.fl_str_mv	http://sedici.unlp.edu.ar/handle/10915/140657
url	http://sedici.unlp.edu.ar/handle/10915/140657
dc.language.none.fl_str_mv	eng
language	eng
dc.relation.none.fl_str_mv	info:eu-repo/semantics/altIdentifier/isbn/978-950-34-2126-0 info:eu-repo/semantics/reference/hdl/10915/139373
dc.rights.none.fl_str_mv	info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
eu_rights_str_mv	openAccess
rights_invalid_str_mv	http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
dc.format.none.fl_str_mv	application/pdf 64-69
dc.source.none.fl_str_mv	reponame:SEDICI (UNLP) instname:Universidad Nacional de La Plata instacron:UNLP
reponame_str	SEDICI (UNLP)
collection	SEDICI (UNLP)
instname_str	Universidad Nacional de La Plata
instacron_str	UNLP
institution	UNLP
repository.name.fl_str_mv	SEDICI (UNLP) - Universidad Nacional de La Plata
repository.mail.fl_str_mv	alira@sedici.unlp.edu.ar
_version_	1864468851834486784
score	13.1485815

Improving audio of emergency calls in Spanish performed to the ECU 911 through filters for ASR technology

Publicaciones similares