Evaluación del reconocimiento de voz entre los servicios de Google y Amazon aplicado al Sistema Integrado de Seguridad ECU 911

Juan José Peralta Vásconez; Carlos Andrés Narváez Ortiz; Marcos Patricio Orellana Cordero; Paúl Andrés Patiño León; Priscila Cedillo Orellana

doi:10.37815/rte.v33n2.840

Evaluation of voice recognition between Google and Amazon services applied to the ECU 911 Integrated Security System

PDF (Spanish) MHT (Spanish)

Published : 2021-11-26

DOI : https://doi.org/10.37815/rte.v33n2.840

Keywords :

Amazon Transcribe, ASR, Google Speech to Text, WER

Juan Jose Peralta Vasconez

Carlos Andres Narvaez Ortiz

Marcos Patricio Orellana Cordero

Paul Andres Patino Leon

Priscila Cedillo Orellana

Abstract

Automatic Speech Recognition (ASR) is one of the branches of artificial intelligence that makes communication between humans and machines possible, making it the closest thing to the interaction between humans. In recent years, ASR systems have increased to the point of achieving near-perfect transcriptions; today, many companies develop ASR systems, such as Google, Amazon, IBM, and Microsoft. This study aims to evaluate the voice recognition systems of Google Speech to Text and Amazon Transcribe to determine which of them offers greater precision when converting audio into text. The accuracy of transcripts was evaluated through the Word Error Rate (WER), which analyzes the deleted, substituted, and inserted words concerning a human transcription reference text. After subjecting the systems to different noise environments, it was observed that the system with the highest performance in transcripts was Amazon Transcribe; therefore, it was concluded that Amazon services showed a higher performance compared to Google services both with audios with a higher background noise level and with audios with a lower background noise level.

DOWNLOADS

Download data is not yet available.

How to Cite

Peralta Vasconez, J. J., Narvaez Ortiz, C. A., Orellana Cordero, M. P., Patino Leon, P. A., & Cedillo Orellana, P. (2021). Evaluation of voice recognition between Google and Amazon services applied to the ECU 911 Integrated Security System. Revista Tecnológica - ESPOL, 33(2), 147-158. https://doi.org/10.37815/rte.v33n2.840

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

Amazon. (n.d.). Amazon Transcribe Guía para desarrolladores. 1–334. https://docs.aws.amazon.com/es_es/transcribe/latest/dg/transcribe-dg.pdf

Assefi, M., Liu, G., Wittie, M. P., & Izurieta, C. (2015). An experimental evaluation of Apple Siri and Google Speech Recognition. 24th International Conference on Software Engineering and Data Engineering, SEDE 2015.

Assefi, M., Wittie, M., & Knight, A. (2015). Impact of network performance on cloud speech recognition. Proceedings - International Conference on Computer Communications and Networks, ICCCN, 2015-Octob. https://doi.org/10.1109/ICCCN.2015.7288417

Filippidou, F., & Moussiades, L. (2020). A Benchmarking of IBM, Google and Wit. https://doi.org/10.1007/978-3-030-49161-1

IANCU, B. (2019). Evaluating Google Speech-to-Text API’s Performance for Romanian e-Learning Resources. Informatica Economica, 23(1/2019), 17–25. https://doi.org/10.12948/issn14531305/23.1.2019.02

Këpuska, V. (2017). Comparing Speech Recognition Systems (Microsoft API, Google API And CMU Sphinx). International Journal of Engineering Research and Applications. https://doi.org/10.9790/9622-0703022024

Kodish-Wachs, J., Agassi, E., Kenny III, P., & Overhage., J. M. (2018). A systematic comparison of contemprary automatic speech recognition.pdf.

Mashao, Daniel J, Isaacs, D. (2010). A Comparison of the Network Speech Recognition and Distributed Speech Recognition Systems and their eect on Speech Enabling Mobile Devices. February, 1–94. https://open.uct.ac.za/handle/11427/11232

Microsoft. (2011). What is the Speech service.

Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R., Can, D., Georgiou, P., Narayanan, S., Leuski, A., & Traum, D. (2013). Which ASR should i choose for my dialogue system? SIGDIAL 2013 - 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference, August, 394–403.

Munot, R., & Nenkova, A. (2019). Emotion impacts speech recognition performance. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Student Research Workshop, 16–21. https://doi.org/10.18653/v1/n19-3003

Muse, L. P., Martins, P. R., Hojda, A., Abreu, P. A. De, & Almeida, P. C. De. (2020). The role of Urban Control and Command Centers in the face of COVID-19: The case of COR in Rio de Janeiro, Brazil. 2020 IEEE International Smart Cities Conference, ISC2 2020. https://doi.org/10.1109/ISC251055.2020.9239068

Service, C., Sheets, P. D., Levels, S., & Support, T. (2019). IBM Cloud Additional Service Description IBM Watson Speech to Text Service Levels and Technical Support. 09, 3–5. https://www-03.ibm.com/software/sla/sladb.nsf/8bd55c6b9fa8039c86256c6800578854/78a62403a2752f7f862583b3006435bd/$FILE/i126-6945-09_03-2019_en_US.pdf

Takashi Kimura, Takashi Nose, Shinji Hirooka, Yuya Chiba, A. I. (2018). Comparison of Speech Recognition Performance Between Kaldi and Google Cloud Speech API (Vol. 2).

Wang, Y. (2019). Sentiment Analysis of Customer Support Phone Dialogues using Fusion-based Emotion Recognition Techniques. https://repository.library.northeastern.edu/files/neu:m044wq01p/fulltext.pdf

Yu Dong, L. D. (2015). Automatic Speech Recognition. In Lecture Notes in Electrical Engineering (Vol. 686). https://doi.org/10.1007/978-981-15-7031-5_63

Article Sidebar

References