A Framework for Name Matching in Arabic Language

اسم الباحث     :    Salah AL-Hagree - Maher Al-Sanabani
سنة النشر     :    2016
ملخص البحث     :

An extensive research has been done for searching an effective algorithm for name matching that is play a vital and crucial role in many applications. Therefore, a many algorithms have been developed to measure the similarity of string but most of them designed mainly to handle Latin based languages. While, the name matching algorithms on Arabic context is rare because the dealing with Arabic context is a challenging task due to the characteristics and unique features of the Arabic language. Consequently, a framework for Arabic name matching has been proposed in this paper. The proposed framework takes a unique features of the Arabic language and the different levels of similarity for the Arabic letters such as phonetic, letter's form and keyboard similarities. Furthermore, the proposed framework has been considered transposition operation and enhanced states of insertion and deletion operations. The carried experiments in this paper have been shown the proposed framework gives more accurate results than the compared algorithms.

Keywords: Arabic Name Matching , Bigram, Matching Framework, Levenshtein Distance.

Introduction

The name matching is hot topic from beginning of computer science. It is a challenge for all community of scientists to devise more efficient algorithm. Therefore, a many name matching algorithms have been developed and used to cope with important topic. The name matching algorithms have been classified into two categories which are Exact matching algorithms [1],[2] and approximate (inexact) string matching algorithms [3],[4],[5],[6].

Discovery and matching of names, personal names or place names or company names or Scientific names is used in an increasing number of applications and it constitutes a central part of many applications such as Customer Relation Management (CRM), Customer Data Integration (CDI), Anti-Money Laundering (AML), Criminal Investigation (CI), HealthCare(HC), and Genealogy Services(GS). If only exact matching was available in these types of applications it would not be possible to deal with name variations, which unavoidably occur in the data and names in real world data sets. Therefore, the exact matching techniques are not suitable for large and complex information system because they cannot be able to retrieve names with more than one acceptable spelling. In order to get more accurate results, an approximate name matching should be applied instead of exact matching. Therefore, the motivation in this paper is providing a matching algorithm for Arabic names that is considering on an approximate string matching algorithms which can be dealing with a technician errors allows in field of computer science. This type of matching algorithms have been implemented in many application such as computational biology “DNA” [7]spelling correction [8],[9] text retrieval [10],[11] [12]Handwriting Recognition, Linking database [13],[14] and Name recognition [15].

This paper is organized as follows. Section 2 illustrates some challenges of Arabic Language. Section 3 describes the related work in field of study. Section 4 demonstrates the proposed framework for Arabic name matching. Section 5 presents the experimental and results discussions. Finally conclusions and future work are presented in Section 6 .

رجوع