Class EnglishChunker

java.lang.Object
org.languagetool.chunking.EnglishChunker
All Implemented Interfaces:
Chunker

public class EnglishChunker extends Object implements Chunker
OpenNLP-based chunker. Also uses the OpenNLP tokenizer and POS tagger and maps the result to our own tokens (we have our own tokenizer), as far as trivially possible.
Since:
2.3
  • Field Details

    • TOKENIZER_MODEL

      private static final String TOKENIZER_MODEL
      See Also:
    • POS_TAGGER_MODEL

      private static final String POS_TAGGER_MODEL
      See Also:
    • CHUNKER_MODEL

      private static final String CHUNKER_MODEL
      See Also:
    • tokenModel

      private static volatile opennlp.tools.tokenize.TokenizerModel tokenModel
      This needs to be static to save memory: as Language.LANGUAGES is static, any language that is once created there will never be released. As English has several variants, we'd have as many posModels etc. as we have variants -> huge waste of memory:
    • posModel

      private static volatile opennlp.tools.postag.POSModel posModel
    • chunkerModel

      private static volatile opennlp.tools.chunker.ChunkerModel chunkerModel
    • chunkFilter

      private final EnglishChunkFilter chunkFilter
  • Constructor Details

    • EnglishChunker

      public EnglishChunker()
  • Method Details