Abstract

Publication Information

Authors E Amer, HM Khalil, T El-shistawy

Keywords Natural Language Processing; Entity; N-gram; Arabic Wikipedia; Information Extraction

Journal Not Available

Publisher Not Available

Volume Not Available

Issue Not Available

Pages Not Available

publication.type International

Paper Link Not Available

Supplementary Materials Not Available

Abstract

Entities Extraction becomes very important for developing many applications of Natural Language Processing (NLP). In this paper, we present a new algorithm to extract entities from Arabic text. The approach uses the semi-structured knowledge source: Arabic Wikipedia to predict the words that constitutes an Arabic entity. Our method is generic and can be applied directly to other languages to extract entities. The proposed method has been designed to analyze Arabic text hierarchically with variable length N-gram. The experimental results have proven that the proposed system is very efficient in detecting entities from large set of Arabic news

Download Full Paper

Hierarchical N-gram Algorithm for extracting Arabic Entities