Automatic Speech Recognition (ASR) is one of the branches of artificial intelligence that makes communication between humans and machines possible, making it the closest thing to the interaction between humans. In recent years, ASR systems have increased to the point of achieving near-perfect transcriptions; today, many companies develop ASR systems, such as Google, Amazon, IBM, and Microsoft. This study aims to evaluate the voice recognition systems of Google Speech to Text and Amazon Transcribe to determine which of them offers greater precision when converting audio into text. The accuracy of transcripts was evaluated through the Word Error Rate (WER), which analyzes the deleted, substituted, and inserted words concerning a human transcription reference text. After subjecting the systems to different noise environments, it was observed that the system with the highest performance in transcripts was Amazon Transcribe; therefore, it was concluded that Amazon services showed a higher performance compared to Google services both with audios with a higher background noise level and with audios with a lower background noise level.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
Amazon. (n.d.). Amazon Transcribe Guía para desarrolladores. 1–334. https://docs.aws.amazon.com/es_es/transcribe/latest/dg/transcribe-dg.pdf
Assefi, M., Liu, G., Wittie, M. P., & Izurieta, C. (2015). An experimental evaluation of Apple Siri and Google Speech Recognition. 24th International Conference on Software Engineering and Data Engineering, SEDE 2015.
Assefi, M., Wittie, M., & Knight, A. (2015). Impact of network performance on cloud speech recognition. Proceedings - International Conference on Computer Communications and Networks, ICCCN, 2015-Octob. https://doi.org/10.1109/ICCCN.2015.7288417
Filippidou, F., & Moussiades, L. (2020). A Benchmarking of IBM, Google and Wit. https://doi.org/10.1007/978-3-030-49161-1
IANCU, B. (2019). Evaluating Google Speech-to-Text API’s Performance for Romanian e-Learning Resources. Informatica Economica, 23(1/2019), 17–25. https://doi.org/10.12948/issn14531305/23.1.2019.02
Këpuska, V. (2017). Comparing Speech Recognition Systems (Microsoft API, Google API And CMU Sphinx). International Journal of Engineering Research and Applications. https://doi.org/10.9790/9622-0703022024
Kodish-Wachs, J., Agassi, E., Kenny III, P., & Overhage., J. M. (2018). A systematic comparison of contemprary automatic speech recognition.pdf.
Mashao, Daniel J, Isaacs, D. (2010). A Comparison of the Network Speech Recognition and Distributed Speech Recognition Systems and their eect on Speech Enabling Mobile Devices. February, 1–94. https://open.uct.ac.za/handle/11427/11232
Microsoft. (2011). What is the Speech service.
Morbini, F., Audhkhasi, K., Sagae, K., Artstein, R., Can, D., Georgiou, P., Narayanan, S., Leuski, A., & Traum, D. (2013). Which ASR should i choose for my dialogue system? SIGDIAL 2013 - 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Proceedings of the Conference, August, 394–403.
Munot, R., & Nenkova, A. (2019). Emotion impacts speech recognition performance. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Student Research Workshop, 16–21. https://doi.org/10.18653/v1/n19-3003
Muse, L. P., Martins, P. R., Hojda, A., Abreu, P. A. De, & Almeida, P. C. De. (2020). The role of Urban Control and Command Centers in the face of COVID-19: The case of COR in Rio de Janeiro, Brazil. 2020 IEEE International Smart Cities Conference, ISC2 2020. https://doi.org/10.1109/ISC251055.2020.9239068
Service, C., Sheets, P. D., Levels, S., & Support, T. (2019). IBM Cloud Additional Service Description IBM Watson Speech to Text Service Levels and Technical Support. 09, 3–5. https://www-03.ibm.com/software/sla/sladb.nsf/8bd55c6b9fa8039c86256c6800578854/78a62403a2752f7f862583b3006435bd/$FILE/i126-6945-09_03-2019_en_US.pdf
Takashi Kimura, Takashi Nose, Shinji Hirooka, Yuya Chiba, A. I. (2018). Comparison of Speech Recognition Performance Between Kaldi and Google Cloud Speech API (Vol. 2).
Wang, Y. (2019). Sentiment Analysis of Customer Support Phone Dialogues using Fusion-based Emotion Recognition Techniques. https://repository.library.northeastern.edu/files/neu:m044wq01p/fulltext.pdf
Yu Dong, L. D. (2015). Automatic Speech Recognition. In Lecture Notes in Electrical Engineering (Vol. 686). https://doi.org/10.1007/978-981-15-7031-5_63