Package org.languagetool.tokenizers.es
Class SpanishWordTokenizer
java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.es.SpanishWordTokenizer
- All Implemented Interfaces:
Tokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own
token.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final StringFields inherited from class org.languagetool.tokenizers.WordTokenizer
REMOVED_EMOJI -
Constructor Summary
Constructors -
Method Summary
Methods inherited from class org.languagetool.tokenizers.WordTokenizer
getProtocols, getTokenizingCharacters, isCurrencyExpression, isEMail, isUrl, joinEMails, joinEMailsAndUrls, joinUrls, replaceEmojis, restoreEmojis, splitCurrencyExpression
-
Field Details
-
wordCharacters
- See Also:
-
tokenizerPattern
-
DECIMAL_POINT
-
DECIMAL_COMMA
-
ORDINAL_POINT
-
PATTERN_1
-
PATTERN_2
-
PATTERN_3
-
SOFT_HYPHEN
-
-
Constructor Details
-
SpanishWordTokenizer
public SpanishWordTokenizer()
-
-
Method Details
-
tokenize
- Specified by:
tokenizein interfaceTokenizer- Overrides:
tokenizein classWordTokenizer
-
wordsToAdd
-