Extracción de Palabras Clave de Ciberacoso de Textos Breves: un Enfoque de Aprendizaje Automático

William Hermel Astudillo Quituisaca; Priscila Cedillo; Marcos Orellana

doi:10.37815/rte.v36nE1.1207

Extracting Cyberbullying Keywords from Short Texts: A Machine Learning Approach

PDF (Spanish) MHT (Spanish)

Published : 2024-10-15

DOI : https://doi.org/10.37815/rte.v36nE1.1207

Keywords :

Artificial intelligence, Bullying, Cyberbullying, Latent Dirichlet Assignment, Language models

William Hermel Astudillo Quituisaca

https://orcid.org/0009-0009-6189-7492

Priscila Cedillo

https://orcid.org/0000-0002-6787-0655

Marcos Orellana

https://orcid.org/0000-0002-3671-9362

Abstract

Cyberbullying has a negative impact on society due to the consequences suffered by victims, bullies, and bystanders. Widespread access to the internet and social networks, especially among young people without the tools to deal with these situations, makes social education necessary to mitigate the effects of cyberbullying. This study seeks to contribute to this training through the creation of scripts for educational capsules. To this end, a model was developed that automates the search and extraction of data from the social network X using Python and Selenium Web Driver. After a text preprocessing process using Natural Language Processing techniques, the Latent Dirichlet Assignment (LDA) model was applied to identify keywords. Finally, the pre-trained model "text-davinci-003" was used through the OpenAI API to generate the content of the educational capsules, assigning a context and using the identified keywords. The outcome of this proposed research is the generation of a script that includes topics on education and the prevention of bullying and cyberbullying. To ensure the reliability of the text generated by the pre-trained generative model, it was evaluated by an expert in the field using the Goal-Question-Metric (GQM) approach, which validates its potential in generating educational content in the fight against cyberbullying.

DOWNLOADS

Download data is not yet available.

How to Cite

Astudillo Quituisaca, W. H., Cedillo, P., & Orellana, M. . (2024). Extracting Cyberbullying Keywords from Short Texts: A Machine Learning Approach. Revista Tecnológica - ESPOL, 36(E1), 25-38. https://doi.org/10.37815/rte.v36nE1.1207

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

Alim, S. (2015). Analysis of tweets related to cyberbullying: Exploring information diffusion and advice available for cyberbullying victims. International Journal of Cyber Behavior, Psychology and Learning (IJCBPL), 5(4), 31–52.

Arazzi, M., Nicolazzo, S., Nocera, A., & Zippo, M. (2023). The importance of the language for the evolution of online communities: An analysis based on Twitter and Reddit. Expert Systems with Applications, 222, 119847.

Azuela, J. H. S., & Ayala, A. P. (2019). ESTADO DEL ARTE EN INTELIGENCIA ARTIFICIAL Y CIENCIA DE DATOS.

Bayari, R., & Bensefia, A. (2021). Text mining techniques for cyberbullying detection: state of the art. Adv. sci. technol. eng. syst. j, 6(1), 783–790.

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993–1022.

Chen, K., Duan, Z., & Yang, S. (2022). Twitter as research data: Tools, costs, skill sets, and lessons learned. Politics and the Life Sciences, 41(1), 114–130.

Fatima, N., Imran, A. S., Kastrati, Z., Daudpota, S. M., & Soomro, A. (2022). A systematic literature review on text generation using deep neural network models. IEEE Access, 10, 53490–53503.

Garaigordobil, M. (2014). Cyberbullying. Screening de acoso entre iguales: descripción y datos psicométricos. International Journal of Developmental and Educational Psychology, 4(1), 311–318.

Guallar, J., & Traver, P. (2020). Curación de contenidos en hilos de Twitter. Taxonom’ia y ejemplos. Anuario ThinkEPI, 14.

Kamilali, D., & Sofianopoulou, C. (2015). Microlearning as Innovative Pedagogy for Mobile Learning in MOOCs. International Association for Development of the Information Society.

Lugones Botell, M., & Ram’irez Bermúdez, M. (2017). Bullying: aspectos históricos, culturales y sus consecuencias para la salud. Revista cubana de medicina general integral, 33(1), 154–162.

Mancilla-Vela, G., Leal-Gatica, P., Sanchez Ortiz, A., & Vidal, C. (2020). Factores asociados al éxito de los estudiantes en modalidad de aprendizaje en línea: un análisis en minería de datos. Formación universitaria, 13, 23–36. https://doi.org/10.4067/S0718-50062020000600023

Orellana, M., Zambrano-Martinez, J. L., Calle Andrade, R. M., Roldan, A., & Tirado Jarama, A. N. (2023). Generación de Texto Guía para la Detección Automatizada del Acoso y el Ciberacoso. Revista Tecnológica - ESPOL, 35(2), 181–191. https://doi.org/10.37815/rte.v35n2.1049

Peña, A., & Herrera, L. (2021). Indicadores de tecnolog’ia de la información y comunicación. Quito: INEC.

Salmivalli, C., Laninga-Wijnen, L., Malamut, S. T., & Garandeau, C. F. (2021). Bullying prevention in adolescence: Solutions and new challenges from the past decade. Journal of Research on Adolescence, 31(4), 1023–1046.

Sanchez, H., & Kumar, S. (2011). Twitter bullying detection. ser. NSDI, 12(2011), 15.

Van Solingen, R., Basili, V., Caldiera, G., & Rombach, H. D. (2002). Goal question metric (gqm) approach. Encyclopedia of software engineering.

Vázquez, A., Pinto, D., Vilariño, D., & Castro, M. (2017). Modelos para la generación automática de diálogos: Una revisión. Applications of Language & Knowledge Engineering, 163.

Vidal Ledo, M., Vialart Vidal, M. N., Alfonso Sánchez, I., & Zacca González, G. (2019). Cápsulas educativas o informativas. Un mejor aprendizaje significativo. Educación médica superior, 33(2).

Article Sidebar

References