Package org.languagetool.tokenizers.nl
Class DutchWordTokenizer
java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.nl.DutchWordTokenizer
- All Implemented Interfaces:
Tokenizer
-
Field Summary
FieldsFields inherited from class org.languagetool.tokenizers.WordTokenizer
REMOVED_EMOJI -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate booleanendsWithQuote(String token) private booleanstartsWithQuote(String token) Tokenizes just like WordTokenizer with the exception for words such as "oma's" that contain an apostrophe in their middle.Methods inherited from class org.languagetool.tokenizers.WordTokenizer
getProtocols, isCurrencyExpression, isEMail, isUrl, joinEMails, joinEMailsAndUrls, joinUrls, replaceEmojis, restoreEmojis, splitCurrencyExpression
-
Field Details
-
QUOTES
-
nlTokenizingChars
-
-
Constructor Details
-
DutchWordTokenizer
public DutchWordTokenizer()
-
-
Method Details
-
tokenize
Tokenizes just like WordTokenizer with the exception for words such as "oma's" that contain an apostrophe in their middle.- Specified by:
tokenizein interfaceTokenizer- Overrides:
tokenizein classWordTokenizer- Parameters:
text- Text to tokenize- Returns:
- List of tokens
-
startsWithQuote
-
endsWithQuote
-
getTokenizingCharacters
- Overrides:
getTokenizingCharactersin classWordTokenizer- Returns:
- The string containing the characters used by the tokenizer to tokenize words.
-