wiki:normal_translit

Transliteration and Normalization

There are three different tasks:

  • the normalization of existing latin transliterations of arabic (in the _translit fields in the database) for searching
  • the normalization of arabic script for searching
  • the romanization of arabic starting from arabic script into one or more latin transliteration forms

Normalization of latin transliterations

The search in the existing _translit fields in the database by ignoring different versions of the same character with or without diacritics requires a normalization of all these characters into the base form.

The normalized form of all transliterated text is saved in the database. When searching the search input is normalized and compared to all normalized text in the database.

Description of the algorithm


Old: The underlying transliteration schema for arabic should be DIN 31635. Normalisation i.e. reduction to ASCII should be done by normalizing the transliteration in DIN 31635.

Last modified 9 years ago Last modified on May 8, 2015, 9:14:00 AM