Advances and Applications in Statistics
Volume 8, Issue 2, Pages 177 - 192
(April 2008)
|
|
MODELING ARABIC LANGUAGE DIACRITIZED NAMES MARKOVIAN CHAINS
Fawaz S. Al-Anzi (Kuwait)
|
Abstract: Adequate mathematical modeling and efficient algorithm design of natural languages require the existence of solid basic building blocks of knowledge of that particular natural language. It is unfortunate that Arabic language did not have the same attention as other Latin based natural languages in developing tools, mathematical models and databases from researchers in order to help building on them and produce more advanced and complex models for software solutions. In this research proposal, we propose to develop one of the essential building blocks that will help researchers in the development of more advanced and complex models and software solutions of the Arabic language including application of translation and transliterations. The problem we are addressing in this paper is the development of high accuracy Markov chains for the diacritized Arabic names. Results are produced for the corpus database of Arabic names as well as for the envelope database which is a modification of the corpus database. So far, most of the work done in this area concentrated on non-diacritized alphabets. The problem of Arabic names diacritization is important to advanced computer systems that use name repositories such as telephone name directories and passport name directories. These directories can be used to model and produce more accurate data mining, translation and transliteration for Arabic names. |
Keywords and phrases: Arabic language processing, mathematical modeling, Markov chains, translation. |
|
Number of Downloads: 360 | Number of Views: 1218 |
|