| publication name | Machine Learning-Based Approach for Arabic Dialect Identification |
|---|---|
| Authors | Mahmoud S. Ali;Ahmed H. Ali;Ahmed A. El-Sawy;Hamada A. Nayel |
| year | 2021 |
| keywords | Arabic Dialect Identification; Arabic NLP |
| journal | Proceedings of the Sixth Arabic Natural Language Processing Workshop |
| volume | Not Available |
| issue | Not Available |
| pages | Not Available |
| publisher | Association for Computational Linguistics |
| Local/International | International |
| Paper Link | https://aclanthology.org/2021.wanlp-1.34.pdf |
| Full paper | download |
| Supplementary materials | Not Available |
Abstract
This paper describes our systems submitted to the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). Dialect identification is the task of automatically detecting the source variety of a given text or speech segment. There are four subtasks, two subtasks for country-level identification and the other two subtasks for province-level identification. The data in this task covers a total of 100 provinces from all 21 Arab countries and come from the Twitter domain. The proposed systems depend on five machine-learning approaches namely Complement Naïve Bayes, Support Vector Machine, Decision Tree, Logistic Regression and Random Forest Classifiers. F1 macro-averaged score of Naïve Bayes classifier outperformed all other classifiers for development and test data.