|
tesseract
3.04.01
|
#include <word_unigrams.h>
Public Member Functions | |
| WordUnigrams () | |
| ~WordUnigrams () | |
| int | Cost (const char_32 *str32, LangModel *lang_mod, CharSet *char_set) const |
Static Public Member Functions | |
| static WordUnigrams * | Create (const string &data_file_path, const string &lang) |
Protected Member Functions | |
| int | CostInternal (const char *str) const |
Definition at line 34 of file word_unigrams.h.
| tesseract::WordUnigrams::WordUnigrams | ( | ) |
Definition at line 32 of file word_unigrams.cpp.
| tesseract::WordUnigrams::~WordUnigrams | ( | ) |
Definition at line 38 of file word_unigrams.cpp.
| int tesseract::WordUnigrams::Cost | ( | const char_32 * | key_str32, |
| LangModel * | lang_mod, | ||
| CharSet * | char_set | ||
| ) | const |
Split input into space-separated tokens, strip trailing punctuation from each, determine case properties, call UTF-8 flavor of cost function on each word, and aggregate all into single mean word cost.
Definition at line 154 of file word_unigrams.cpp.
|
protected |
Search for UTF-8 string using binary search of sorted words_ array.
Definition at line 249 of file word_unigrams.cpp.
|
static |
Load the word-list and unigrams from file and create an object The word list is assumed to be sorted in lexicographic order.
Definition at line 57 of file word_unigrams.cpp.