Peer-Reviewed Journal Details
Mandatory Fields
Kumar, Ritesh; Lahiri, Bornini; and Ojha, Atul Kr.
SN Computer Science
Aggressive and Offensive Language Identification in Hindi, Bangla, and English: A Comparative Study
Optional Fields
Aggression Offensive language Hindi Bangla English Comparison TRAC HASOC BERT
In the present paper, we carry out a comparative study between offensive and aggressive language and attempt to understand their inter-relationship. To carry out this study, we develop classifiers for offensive and aggressive language identification in Hindi, Bangla, and English using the datasets released for the languages as part of the two shared tasks: hate speech and offensive content identification in Indo-European languages (HASOC) and aggression and misogyny identification task at TRAC-2. The HASOC dataset is annotated with the information about offensive language and TRAC-2 dataset is annotated with the information about aggressive language. We experiment with SVM as well as BERT and its different derivatives such as ALBERT and DistilBERT for developing the classifiers. The best classifiers achieve an impressive F-score in between 0.70 and 0.80 for different tasks. We use these classifiers to cross-annotate the two datasets, and look at the co-occurrence of different sub-categories of aggression and offense. The study shows that even though aggression and offense significantly overlaps, but still one does not entail the other.
Grant Details
Publication Themes
Informatics, Physical and Computational Sciences