ADVANCING OPINION MINING FOR LOW-RESOURCE LANGUAGES: A CASE STUDY OF UZBEK ABSA DATASET
Keywords:
Natural Language Processing, Uzbek Language, ABSA Dataset, Sentiment Analysis, Low-Resource Languages, UzABSA, Annotation Techniques, KNN Classification, Statistical EvaluationAbstract
The objective of enhancing the availability of natural language processing technologies for low-resource languages has significant importance in facilitating technological accessibility within the populations of speakers of these languages. This study addresses the gap in linguistic resources for the Uzbek language by introducing UzABSA, the first high-quality annotated aspect-based sentiment analysis (ABSA) dataset. The dataset comprises 3,500 document-level reviews and over 6,100 sentence-level instances, collected from Uzbek restaurant reviews. Evaluation of the annotation process using Cohen's kappa and Krippendorff's α demonstrates robust agreement levels. A classification model, K-Nearest Neighbour (KNN), was employed to validate the dataset, achieving accuracy rates of 72% to 88%. This pioneering work provides a foundational resource for advancing sentiment analysis for the Uzbek language.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.