Package org.languagetool.tokenizers.eo
Class EsperantoWordTokenizer
java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.eo.EsperantoWordTokenizer
- All Implemented Interfaces:
Tokenizer
-
Field Summary
FieldsFields inherited from class org.languagetool.tokenizers.WordTokenizer
REMOVED_EMOJI -
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.languagetool.tokenizers.WordTokenizer
getProtocols, getTokenizingCharacters, isCurrencyExpression, isEMail, isUrl, joinEMails, joinEMailsAndUrls, joinUrls, replaceEmojis, restoreEmojis, splitCurrencyExpression
-
Field Details
-
PATTERN_1
-
PATTERN_2
-
-
Constructor Details
-
EsperantoWordTokenizer
public EsperantoWordTokenizer()
-
-
Method Details
-
tokenize
Tokenizes just like WordTokenizer with the exception that words such as "dank'" contain an apostrophe within it.- Specified by:
tokenizein interfaceTokenizer- Overrides:
tokenizein classWordTokenizer- Parameters:
text- - Text to tokenize- Returns:
- List of tokens. Note: a special string EO@APOS is used to replace apostrophe during tokenizing.
-