Package org.languagetool.tokenizers
Interface Tokenizer
- All Known Subinterfaces:
CompoundWordTokenizer,SentenceTokenizer
- All Known Implementing Classes:
ArabicWordTokenizer,BelarusianWordTokenizer,BretonWordTokenizer,CatalanWordTokenizer,ChineseSentenceTokenizer,ChineseWordTokenizer,CrimeanTatarWordTokenizer,DutchWordTokenizer,EnglishWordTokenizer,EsperantoWordTokenizer,FrenchWordTokenizer,GalicianWordTokenizer,GermanCompoundTokenizer,GermanWordTokenizer,GoogleStyleWordTokenizer,GreekWordTokenizer,JapaneseWordTokenizer,KhmerWordTokenizer,PersianWordTokenizer,PolishWordTokenizer,PortugueseWordTokenizer,RomanianWordTokenizer,RussianWordTokenizer,SimpleSentenceTokenizer,SpanishWordTokenizer,SRXSentenceTokenizer,TagalogWordTokenizer,UkrainianWordTokenizer,WordTokenizer
public interface Tokenizer
Interface for classes that tokenize text into smaller units.
-
Method Summary
-
Method Details
-
tokenize
-