Interface Tokenizer

All Known Subinterfaces:
CompoundWordTokenizer, SentenceTokenizer
All Known Implementing Classes:
ArabicWordTokenizer, BelarusianWordTokenizer, BretonWordTokenizer, CatalanWordTokenizer, ChineseSentenceTokenizer, ChineseWordTokenizer, CrimeanTatarWordTokenizer, DutchWordTokenizer, EnglishWordTokenizer, EsperantoWordTokenizer, FrenchWordTokenizer, GalicianWordTokenizer, GermanCompoundTokenizer, GermanWordTokenizer, GoogleStyleWordTokenizer, GreekWordTokenizer, JapaneseWordTokenizer, KhmerWordTokenizer, PersianWordTokenizer, PolishWordTokenizer, PortugueseWordTokenizer, RomanianWordTokenizer, RussianWordTokenizer, SimpleSentenceTokenizer, SpanishWordTokenizer, SRXSentenceTokenizer, TagalogWordTokenizer, UkrainianWordTokenizer, WordTokenizer

public interface Tokenizer
Interface for classes that tokenize text into smaller units.
  • Method Summary

    Modifier and Type
    Method
    Description