Class PortugueseWordTokenizer

java.lang.Object
org.languagetool.tokenizers.WordTokenizer
org.languagetool.tokenizers.pt.PortugueseWordTokenizer
All Implemented Interfaces:
Tokenizer

public class PortugueseWordTokenizer extends WordTokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own token.
Since:
3.6
  • Field Details

    • tagger

      private final PortugueseTagger tagger
    • DECIMAL_COMMA_SUBST

      private static final char DECIMAL_COMMA_SUBST
      See Also:
    • NON_BREAKING_SPACE_SUBST

      private static final char NON_BREAKING_SPACE_SUBST
      See Also:
    • NON_BREAKING_DOT_SUBST

      private static final char NON_BREAKING_DOT_SUBST
      See Also:
    • NON_BREAKING_COLON_SUBST

      private static final char NON_BREAKING_COLON_SUBST
      See Also:
    • HYPHEN_SUBST_TEXT

      private static final String HYPHEN_SUBST_TEXT
      See Also:
    • HYPHEN_SUBST

      private static final Pattern HYPHEN_SUBST
    • DECIMAL_COMMA_PATTERN

      private static final Pattern DECIMAL_COMMA_PATTERN
    • DECIMAL_COMMA_REPL

      private static final String DECIMAL_COMMA_REPL
      See Also:
    • DECIMAL_SPACE_PATTERN

      private static final Pattern DECIMAL_SPACE_PATTERN
    • DOTTED_NUMBERS_PATTERN

      private static final Pattern DOTTED_NUMBERS_PATTERN
    • DOTTED_NUMBERS_REPL

      private static final String DOTTED_NUMBERS_REPL
      See Also:
    • COLON_NUMBERS_PATTERN

      private static final Pattern COLON_NUMBERS_PATTERN
    • COLON_NUMBERS_REPL

      private static final String COLON_NUMBERS_REPL
      See Also:
    • DATE_PATTERN

      private static final Pattern DATE_PATTERN
    • DATE_PATTERN_REPL

      private static final String DATE_PATTERN_REPL
      See Also:
    • DOTTED_ORDINALS_PATTERN

      private static final Pattern DOTTED_ORDINALS_PATTERN
    • DOTTED_ORDINALS_REPL

      private static final String DOTTED_ORDINALS_REPL
      See Also:
    • HYPHEN_PATTERN

      private static final Pattern HYPHEN_PATTERN
    • HYPHEN_REPL

      private static final String HYPHEN_REPL
    • NEARBY_HYPHENS_PATTERN

      private static final Pattern NEARBY_HYPHENS_PATTERN
    • NEARBY_HYPHENS_REPL

      private static final String NEARBY_HYPHENS_REPL
    • wordChars

      private final String wordChars
    • wordCharsLeftEdge

      private final String wordCharsLeftEdge
      See Also:
    • wordCharsRightEdge

      private final String wordCharsRightEdge
      See Also:
    • wordPattern

      private final Pattern wordPattern
  • Constructor Details

    • PortugueseWordTokenizer

      public PortugueseWordTokenizer()
  • Method Details