Class BaseTagger

java.lang.Object
org.languagetool.tagging.BaseTagger
All Implemented Interfaces:
Tagger
Direct Known Subclasses:
ArabicTagger, AsturianTagger, BretonTagger, CatalanTagger, CrimeanTatarTagger, DanishTagger, DutchTagger, EnglishTagger, FrenchTagger, GalicianTagger, GermanTagger, GreekTagger, IrishTagger, ItalianTagger, KhmerTagger, PolishTagger, PortugueseTagger, RomanianTagger, RussianTagger, SlovakTagger, SpanishTagger, SwedishTagger, TagalogTagger, TamilTagger, UkrainianTagger

public abstract class BaseTagger extends Object implements Tagger
Base tagger using Morfologik binary dictionaries.
  • Field Details

    • MANUAL_ADDITIONS_FILE

      private static final String MANUAL_ADDITIONS_FILE
      See Also:
    • CUSTOM_MANUAL_ADDITIONS_FILE

      private static final String CUSTOM_MANUAL_ADDITIONS_FILE
      See Also:
    • MANUAL_REMOVALS_FILE

      private static final String MANUAL_REMOVALS_FILE
      See Also:
    • CUSTOM_MANUAL_REMOVALS_FILE

      private static final String CUSTOM_MANUAL_REMOVALS_FILE
      See Also:
    • wordTagger

      protected final WordTagger wordTagger
    • locale

      protected final Locale locale
    • tagLowercaseWithUppercase

      private final boolean tagLowercaseWithUppercase
    • dictionaryPath

      private final String dictionaryPath
    • dictionary

      private final morfologik.stemming.Dictionary dictionary
  • Constructor Details

    • BaseTagger

      public BaseTagger(String filename, Locale locale)
      Since:
      2.9
    • BaseTagger

      public BaseTagger(String filename, Locale locale, boolean tagLowercaseWithUppercase)
      Since:
      2.9
    • BaseTagger

      public BaseTagger(String filename, Locale locale, boolean tagLowercaseWithUppercase, boolean internTags)
      Parameters:
      internTags - true if string tags should be interned
      Since:
      4.9
  • Method Details

    • getManualAdditionsFileNames

      @NotNull public List<String> getManualAdditionsFileNames()
      Get the filenames for manual additions, e.g., /en/added.txt.
      Since:
      5.0
    • getManualRemovalsFileNames

      @NotNull public List<String> getManualRemovalsFileNames()
      Get the filenames for manual removals, e.g., /en/removed.txt.
      Since:
      5.0
    • getDictionaryPath

      public String getDictionaryPath()
      Since:
      2.9
    • overwriteWithManualTagger

      public boolean overwriteWithManualTagger()
      If true, tags from the binary dictionary (*.dict) will be overwritten by manual tags from the plain text dictionary.
      Since:
      2.9
    • getWordTagger

      protected WordTagger getWordTagger()
    • initWordTagger

      private WordTagger initWordTagger(boolean internTags)
    • getDictionary

      protected morfologik.stemming.Dictionary getDictionary()
    • tag

      public List<AnalyzedTokenReadings> tag(List<String> sentenceTokens) throws IOException
      Description copied from interface: Tagger
      Returns a list of AnalyzedTokens that assigns each term in the sentence some kind of part-of-speech information (not necessarily just one tag).

      Note that this method takes exactly one sentence. Its implementation may implement special cases for the first word of a sentence, which is usually written with an uppercase letter.

      Specified by:
      tag in interface Tagger
      Parameters:
      sentenceTokens - the text as returned by a WordTokenizer
      Throws:
      IOException
    • getAnalyzedTokens

      protected List<AnalyzedToken> getAnalyzedTokens(String word)
    • asAnalyzedTokenList

      protected List<AnalyzedToken> asAnalyzedTokenList(String word, List<morfologik.stemming.WordData> wdList)
    • asAnalyzedTokenListForTaggedWords

      protected List<AnalyzedToken> asAnalyzedTokenListForTaggedWords(String word, List<TaggedWord> taggedWords)
    • asAnalyzedToken

      protected AnalyzedToken asAnalyzedToken(String word, morfologik.stemming.WordData wd)
    • asAnalyzedToken

      private AnalyzedToken asAnalyzedToken(String word, TaggedWord taggedWord)
    • addTokens

      private void addTokens(List<AnalyzedToken> taggedTokens, List<AnalyzedToken> l)
    • createNullToken

      public final AnalyzedTokenReadings createNullToken(String token, int startPos)
      Description copied from interface: Tagger
      Create the AnalyzedToken used for whitespace and other non-words. Use null as the POS tag for this token.
      Specified by:
      createNullToken in interface Tagger
    • createToken

      public AnalyzedToken createToken(String token, String posTag)
      Description copied from interface: Tagger
      Create a token specific to the language of the implementing class.
      Specified by:
      createToken in interface Tagger
    • additionalTags

      @Nullable protected List<AnalyzedToken> additionalTags(String word, WordTagger wordTagger)
      Allows additional tagging in some language-dependent circumstances
      Parameters:
      word - The word to tag
      Returns:
      Returns list of analyzed tokens with additional tags, or null