Class MultiWordChunker2

java.lang.Object
org.languagetool.tagging.disambiguation.AbstractDisambiguator
org.languagetool.tagging.disambiguation.MultiWordChunker2
All Implemented Interfaces:
Disambiguator

public class MultiWordChunker2 extends AbstractDisambiguator
Multiword tagger-chunker. Note: currently does not support:
  • overlapping tagging (first matching multiword entry wins)
Author:
Andriy Rysin
  • Constructor Details

    • MultiWordChunker2

      public MultiWordChunker2(String filename)
      Parameters:
      filename - file text with multiwords and tags
    • MultiWordChunker2

      public MultiWordChunker2(String filename, boolean allowFirstCapitalized)
      Parameters:
      filename - file text with multiwords and tags
      allowFirstCapitalized - if set to true, first word of the multiword can be capitalized
  • Method Details

    • setRemoveOtherReadings

      public void setRemoveOtherReadings(boolean removeOtherReadings)
      Parameters:
      removeOtherReadings - If true and multiword matches other readings will be removed
    • setWrapTag

      public void setWrapTag(boolean wrapTag)
      Parameters:
      wrapTag - If true the tag will be wrapped with < and >
    • formatPosTag

      protected String formatPosTag(String posTag, int position, int multiwordLength)
      Override this method if you want format POS tag differently
      Parameters:
      posTag - POS tag for the multiword
      position - Position of the token in the multiword
      Returns:
      Returns formatted POS tag for the multiword
    • disambiguate

      public AnalyzedSentence disambiguate(AnalyzedSentence input)
      Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.
      Parameters:
      input - The tokens to be chunked.
      Returns:
      AnalyzedSentence with additional markers.
    • matches

      protected boolean matches(String matchText, AnalyzedTokenReadings inputTokens)
    • prepareNewReading

      protected AnalyzedTokenReadings prepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag)