ADVANCING OPINION MINING FOR LOW-RESOURCE LANGUAGES: A CASE STUDY OF UZBEK ABSA DATASET

Rajabov Jaloliddin Shamsuddin o‘g‘li

Authors

Rajabov Jaloliddin Shamsuddin o‘g‘li PhD Student at the National University of Uzbekistan named after Mirzo Ulugbek

Keywords:

Natural Language Processing, Uzbek Language, ABSA Dataset, Sentiment Analysis, Low-Resource Languages, UzABSA, Annotation Techniques, KNN Classification, Statistical Evaluation

Abstract

The objective of enhancing the availability of natural language processing technologies for low-resource languages has significant importance in facilitating technological accessibility within the populations of speakers of these languages. This study addresses the gap in linguistic resources for the Uzbek language by introducing UzABSA, the first high-quality annotated aspect-based sentiment analysis (ABSA) dataset. The dataset comprises 3,500 document-level reviews and over 6,100 sentence-level instances, collected from Uzbek restaurant reviews. Evaluation of the annotation process using Cohen's kappa and Krippendorff's α demonstrates robust agreement levels. A classification model, K-Nearest Neighbour (KNN), was employed to validate the dataset, achieving accuracy rates of 72% to 88%. This pioneering work provides a foundational resource for advancing sentiment analysis for the Uzbek language.