Using Shallow and Deep Learning to Automatically Detect Hate Motivated by Gender and Sexual Orientation on Twitter in Spanish

Arcila Calderón, Carlos; Jiménez Amores, Francisco Javier; Sánchez Holgado, Patricia; Blanco Herrero, David

doi:10.3390/mti5100063

Título

Using Shallow and Deep Learning to Automatically Detect Hate Motivated by Gender and Sexual Orientation on Twitter in Spanish

Autor(es)

Arcila Calderón, Carlos

Jiménez Amores, Francisco Javier

Sánchez Holgado, Patricia

Blanco Herrero, David

Palabras clave

Supervised classification

Deep learning

Machine learning

Misogyny

Feminism

Sexual orientation

Gender identity

Gender discrimination

Hate speech

Twitter

Clasificación UNESCO

63 Sociología

6308 Comunicaciones Sociales

Fecha de publicación

2021-10-13

Editor

MDPI

Citación

Arcila-Calderón, C., Amores, J. J., Sánchez-Holgado, P., & Blanco-Herrero, D. (2021). Using Shallow and Deep Learning to Automatically Detect Hate Motivated by Gender and Sexual Orientation on Twitter in Spanish. Multimodal Technologies and Interaction, 5(10), 63-76. https://doi.org/10.3390/mti5100063

Resumen

[EN] The increasing phenomenon of “cyberhate” is concerning because of the potential social implications of this form of verbal violence, which is aimed at already-stigmatized social groups. According to information collected by the Ministry of the Interior of Spain, the category of sexual orientation and gender identity is subject to the third-highest number of registered hate crimes, ranking behind racism/xenophobia and ideology. However, most of the existing computational approaches to online hate detection simultaneously attempt to address all types of discrimination, leading to weaker prototype performances. These approaches focus on other reasons for hate—primarily racism and xenophobia—and usually focus on English messages. Furthermore, few detection models have used manually generated databases as a training corpus. Using supervised machine learning techniques, the present research sought to overcome these limitations by developing and evaluating an automatic detector of hate speech motivated by gender and sexual orientation. The focus was Spanish-language posts on Twitter. For this purpose, eight predictive models were developed from an ad hoc generated training corpus, using shallow modeling and deep learning. The evaluation metrics showed that the deep learning algorithm performed significantly better than the shallow modeling algorithms, and logistic regression yielded the best performance of the shallow algorithms.

URI

https://hdl.handle.net/10366/160896

DOI

10.3390/mti5100063

Versión del editor

https://www.mdpi.com/2414-4088/5/10/63

Aparece en las colecciones