BanglaSportsEmotion: A Sports-Focused Bangla Sentiment Corpus with Transformer and Classical Machine Learning Benchmarks

DOI: 10.1109/ICCIT68739.2025.11491620

Conference: 2025 28th International Conference on Computer and Information Technology (ICCIT)

Abstract: Sentiment analysis in Bangla, particularly in specialized domains such as sports, remains challenging due to the scarcity of annotated datasets and domain-specific resources. To address this research gap, we introduce the BanglaSportsEmotion dataset, a manually annotated sports related dataset containing 8,582 Bangla comments, fairly balanced across four sentiment classes: Joy, Anger, Support and Toxic. We created this dataset from various online sources, covering a wide range of emotions in sports discussions and following clear annotation guidelines to ensure consistency and reproducibility. To benchmark effectiveness, we evaluated both transformer-based models like BanglaBERT, IndicBERTv2, and XLM-RoBERTa, and traditional machine learning baselines, including Support Vector Classifier and Random Forest with TF–IDF and FastText features, using consistent training strategies and optimized hyperparameters. Of these, BanglaBERT was the best performing with accuracy of 80% and a macro-F1 score of 0.81. Error analysis and results of our work showed a semantic overlap between Toxic and Anger class and highlighting the need for further refinement in distinguishing these sentiments. Our dataset BanglaSportsEmotion is introduced as a new benchmark for sports-related emotion analysis in Bangla. It highlights the consistent superiority of language-specific transfomer based pretraining over traditional approaches in low-resource NLP contexts.