== Normalizing arabic transliterations == Algorithm for normalizing the existing transliterated arabic (_translit fields) in the database. === New === ==== 1. replace letter combinations ==== Replace the following letter combinations with a single letter. || dj, ch || j || || th || t || || kh || h || || dh || d || || sh || s || || gh || g || Replace at the end of a word: || aẗ\b, at\b, ah\b || a || Replace letters: || ỳ || a || ==== 2. replace letters with diacritics ==== Replace all(?) letters with diacritics with the letter without diacritics. Remove all apostrophes. === Currently === source:OpenMind/src/main/java/org/mpi/openmind/repository/utils/NormalizerUtils.java ==== 1. replace letter combinations ==== Replace the following letter combinations with a single letter. || th || t || || kh || h || || dh || d || || sh || s || || gh || g || || "aẗ ", "at ", "ah " || "a " || || ỳ || a || ==== 2. replace letters with diacritics ==== Replace all(?) letters with diacritics with the letter without diacritics. Remove all apostrophes. ------------- For reference: http://docs.oracle.com/javase/7/docs/api/java/text/Normalizer.html http://junidecode.sourceforge.net/ http://userguide.icu-project.org/transforms/general