| publication name | Modeling of Arabic Language for Authorship Identification |
|---|---|
| Authors | Heba M. Khalil, Ahmed Taha, Tarek El-Shistawy |
| year | 2021 |
| keywords | |
| journal | the International Journal of Scientific & Technology Research |
| volume | 10 |
| issue | 5 |
| pages | 157–162 |
| publisher | Not Available |
| Local/International | International |
| Paper Link | http://www.ijstr.org/final-print/may2021/Modeling-Of-Arabic-Language-For-Authorship-Identification.pdf |
| Full paper | download |
| Supplementary materials | Not Available |
Abstract
With the vast volume of data processed in digital form today, the need for and capability of analysing and processing this data for forensic authorship authentication has increased. The focus of study has concentrated on English, Spanish, and German. Arabic language has received less attention from the academic community due to the difficulty and length of Arabic sentences. This article provides a set of stylometric features derived from the study of many articles' parts of expression, including adjectives ratio, sentence size, conjunctions, and others. This details is classified into two categories: statistical features and linguistic features. The AdaBoost and Bagging ensemble approaches have been proposed in this research to maximise predictive efficiency in Arabic articles by using multiple learning. The results indicate that the Bagging model achieves average accuracy of 91.5 %, while the AdaBoost model achieves the highest accuracy of 93.6 %.