| Trees | Indices | Help |
|---|
|
|
Utility functions used by to prepare an arabic text to search and index .
|
|||
| Indivudual Functions | |||
|---|---|---|---|
| unicode. |
|
||
| unicode. |
|
||
| unicode. |
|
||
| unicode. |
|
||
| unicode. |
|
||
| Normalize One Function | |||
| unicode. |
|
||
|
|||
HARAKAT_pat = re.compile(r'
|
|||
HAMZAT_pat = re.compile(r'
|
|||
ALEFAT_pat = re.compile(r'
|
|||
LAMALEFAT_pat = re.compile(r'
|
|||
AIN =
|
|||
ALEF =
|
|||
ALEF_HAMZA_ABOVE =
|
|||
ALEF_HAMZA_BELOW =
|
|||
ALEF_MADDA =
|
|||
ALEF_MAKSURA =
|
|||
ALEF_WASLA =
|
|||
BEH =
|
|||
BYTE_ORDER_MARK =
|
|||
COMMA =
|
|||
DAD =
|
|||
DAL =
|
|||
DAMMA =
|
|||
DAMMATAN =
|
|||
DECIMAL =
|
|||
EIGHT =
|
|||
FATHA =
|
|||
FATHATAN =
|
|||
FEH =
|
|||
FIVE =
|
|||
FOUR =
|
|||
FULL_STOP =
|
|||
GHAIN =
|
|||
HAH =
|
|||
HAMZA =
|
|||
HAMZA_ABOVE =
|
|||
HAMZA_BELOW =
|
|||
HEH =
|
|||
JEEM =
|
|||
KAF =
|
|||
KASRA =
|
|||
KASRATAN =
|
|||
KHAH =
|
|||
LAM =
|
|||
LAM_ALEF =
|
|||
LAM_ALEF_HAMZA_ABOVE =
|
|||
LAM_ALEF_HAMZA_BELOW =
|
|||
LAM_ALEF_MADDA_ABOVE =
|
|||
MADDA_ABOVE =
|
|||
MEEM =
|
|||
MINI_ALEF =
|
|||
NINE =
|
|||
NOON =
|
|||
ONE =
|
|||
PERCENT =
|
|||
QAF =
|
|||
QUESTION =
|
|||
REH =
|
|||
SAD =
|
|||
SEEN =
|
|||
SEMICOLON =
|
|||
SEVEN =
|
|||
SHADDA =
|
|||
SHEEN =
|
|||
SIX =
|
|||
STAR =
|
|||
SUKUN =
|
|||
TAH =
|
|||
TATWEEL =
|
|||
TEH =
|
|||
TEH_MARBUTA =
|
|||
THAL =
|
|||
THEH =
|
|||
THOUSANDS =
|
|||
THREE =
|
|||
TWO =
|
|||
WAW =
|
|||
WAW_HAMZA =
|
|||
YEH =
|
|||
YEH_HAMZA =
|
|||
ZAH =
|
|||
ZAIN =
|
|||
ZERO =
|
|||
__package__ =
|
|||
simple_LAM_ALEF =
|
|||
simple_LAM_ALEF_HAMZA_ABOVE =
|
|||
simple_LAM_ALEF_HAMZA_BELOW =
|
|||
simple_LAM_ALEF_MADDA_ABOVE =
|
|||
|
|||
Strip vowel from a text and return a result text. The striped marks are :
Example: >>> text=u"الْعَرَبِيّةُ" >>> strip_tashkeel(text) العربية
|
Strip tatweel from a text and return a result text. Example: >>> text=u"العـــــربية" >>> strip_tatweel(text) العربية
|
Normalize Hamza forms into one form, and return a result text. The converted letters are :
Example: >>> text=u"أهؤلاء من أولئكُ" >>> normalize_hamza(text) اهءلاء من اولءكُ
|
Normalize Lam Alef ligatures into two letters (LAM and ALEF), and Tand return a result text. Some systems present lamAlef ligature as a single letter, this function convert it into two letters, The converted letters into LAM and ALEF are :
Example: >>> text=u"لانها لالء الاسلام" >>> normalize_lamalef(text) لانها لالئ الاسلام
|
Normalize some spellerrors like, TEH_MARBUTA into HEH,ALEF_MAKSURA into YEH, and Tand return a result text. In some context users omit the difference between TEH_MARBUTA and HEH, and ALEF_MAKSURA and YEh. The conversions are:
Example: >>> text=u"اشترت سلمى دمية وحلوى" >>> normalize_spellerrors(text) اشترت سلمي دميه وحلوي
|
Normalize input text and return a result text. Normalize a text by :
Example: >>> text=u'أستشتري دمـــى آلية لأبنائك قبل الإغلاق' >>> normalize_searchtext(text) استشتري دمي اليه لابناءك قبل الاغلاق
|
|
|||
HARAKAT_pat
|
| Trees | Indices | Help |
|---|
| Generated by Epydoc 3.0.1 on Mon Mar 01 01:33:45 2010 | http://epydoc.sourceforge.net |