Changes between Version 7 and Version 8 of normalize_arabic_translit
- Timestamp:
- May 8, 2015, 3:58:51 PM (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
normalize_arabic_translit
v7 v8 1 2 3 == Normalizing arabic transliterations == 1 = Normalizing arabic transliterations = 4 2 5 3 Algorithm for normalizing the existing transliterated arabic (_translit fields) in the database. 6 4 7 == = New ===5 == New == 8 6 9 === = 1. replace letter combinations ====7 === 1. replace letter combinations === 10 8 11 9 Replace the following letter combinations with a single letter: … … 27 25 (replace all y and move to 3.?) 28 26 29 === = 2. remove diacritics ====27 === 2. remove diacritics === 30 28 31 29 Replace all letters with diacritics with the letter without diacritics. 32 30 33 === = 3. replace letters ====31 === 3. replace letters === 34 32 35 33 Replace the following letters to unify the searches: … … 37 35 || g, j || j || 38 36 39 === = Questions ====37 === Questions === 40 38 41 39 What about apostrophes/accents? Normalize to single-quote (U+2019) or apostrophe (U+0027)? … … 47 45 48 46 49 == = Currently ===47 == Currently == 50 48 51 49 source:OpenMind/src/main/java/org/mpi/openmind/repository/utils/NormalizerUtils.java 52 50 53 === = 1. replace letter combinations ====51 === 1. replace letter combinations === 54 52 55 53 Replace the following letter combinations with a single letter. … … 63 61 || ỳ || a || 64 62 65 === = 2. replace letters with diacritics ====63 === 2. replace letters with diacritics === 66 64 67 65 Replace all(?) letters with diacritics with the letter without diacritics.