wiki:normalize_arabic_translit

Version 4 (modified by casties, 9 years ago) (diff)

--

Normalizing arabic transliterations

Algorithm for normalizing the existing transliterated arabic (_translit fields) in the database.

Currently: source:OpenMind/src/main/java/org/mpi/openmind/repository/utils/NormalizerUtils.java

1. replace letter combinations

Replace the following letter combinations with a single letter.

th t
kh h
dh d
sh s
gh g
"aẗ ", "at ", "ah " "a "
a

2. replace letters with diacritics

Replace all letters with diacritics with the letter without diacritics.

Remove all apostrophes.


For reference:

http://docs.oracle.com/javase/7/docs/api/java/text/Normalizer.html

http://junidecode.sourceforge.net/

http://userguide.icu-project.org/transforms/general