# HG changeset patch # User casties # Date 1431419543 0 # Node ID 728549225b020bfc2fad6ca3c6723e793e8ec02b # Parent 034df8d5c923d28f0191c84d33ede57e44f6007e first version of new normalizer. diff -r 034df8d5c923 -r 728549225b02 src/main/java/org/mpi/openmind/repository/utils/ArabicTranslitNormalizer.java --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/src/main/java/org/mpi/openmind/repository/utils/ArabicTranslitNormalizer.java Tue May 12 08:32:23 2015 +0000 @@ -0,0 +1,22 @@ +package org.mpi.openmind.repository.utils; + +import java.util.HashMap; +import java.util.Map; +import java.util.regex.Pattern; + +public class ArabicTranslitNormalizer { + + protected static Map multiRepPat = new HashMap(); + { + multiRepPat.put("j", Pattern.compile("ch")); + multiRepPat.put("j", Pattern.compile("dj")); + multiRepPat.put("t", Pattern.compile("th")); + multiRepPat.put("h", Pattern.compile("kh")); + multiRepPat.put("d", Pattern.compile("dh")); + multiRepPat.put("s", Pattern.compile("sh")); + multiRepPat.put("g", Pattern.compile("gh")); + multiRepPat.put("j", Pattern.compile("ch")); + // aẗ\b, at\b, ah\b -> a + multiRepPat.put("a", Pattern.compile("a\u1E97\\b|at\\b|ah\\b")); + } +}