Class DutchWordTokenizer

java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.nl.DutchWordTokenizer
All Implemented Interfaces:
Tokenizer

public class DutchWordTokenizer extends WordTokenizer
  • Field Details

    • QUOTES

      private static final List<String> QUOTES
    • nlTokenizingChars

      private final String nlTokenizingChars
  • Constructor Details

    • DutchWordTokenizer

      public DutchWordTokenizer()
  • Method Details

    • tokenize

      public List<String> tokenize(String text)
      Tokenizes just like WordTokenizer with the exception for words such as "oma's" that contain an apostrophe in their middle.
      Specified by:
      tokenize in interface Tokenizer
      Overrides:
      tokenize in class WordTokenizer
      Parameters:
      text - Text to tokenize
      Returns:
      List of tokens
    • startsWithQuote

      private boolean startsWithQuote(String token)
    • endsWithQuote

      private boolean endsWithQuote(String token)
    • getTokenizingCharacters

      public String getTokenizingCharacters()
      Overrides:
      getTokenizingCharacters in class WordTokenizer
      Returns:
      The string containing the characters used by the tokenizer to tokenize words.