Class BaseSynthesizer

java.lang.Object
org.languagetool.synthesis.BaseSynthesizer
All Implemented Interfaces:
Synthesizer
Direct Known Subclasses:
ArabicSynthesizer, CatalanSynthesizer, CrimeanTatarSynthesizer, DutchSynthesizer, EnglishSynthesizer, FrenchSynthesizer, GalicianSynthesizer, GermanSynthesizer, GreekSynthesizer, IrishSynthesizer, ItalianSynthesizer, PolishSynthesizer, PortugueseSynthesizer, RomanianSynthesizer, RussianSynthesizer, SlovakSynthesizer, SpanishSynthesizer, SwedishSynthesizer, UkrainianSynthesizer

public class BaseSynthesizer extends Object implements Synthesizer
  • Field Details

    • SPELLNUMBER_TAG

      public final String SPELLNUMBER_TAG
      See Also:
    • SPELLNUMBER_FEMININE_TAG

      public final String SPELLNUMBER_FEMININE_TAG
      See Also:
    • SPELLNUMBER_ROMAN_TAG

      public final String SPELLNUMBER_ROMAN_TAG
      See Also:
    • possibleTags

      protected volatile List<String> possibleTags
    • tagFileName

      private final String tagFileName
    • resourceFileName

      private final String resourceFileName
    • stemmer

      private final morfologik.stemming.IStemmer stemmer
    • manualSynthesizer

      private final ManualSynthesizer manualSynthesizer
    • removalSynthesizer

      private final ManualSynthesizer removalSynthesizer
    • removalSynthesizer2

      private final ManualSynthesizer removalSynthesizer2
    • sorosFileName

      private final String sorosFileName
    • numberSpeller

      private final Soros numberSpeller
    • romanNumberer

      private final Soros romanNumberer
    • dictionary

      private volatile morfologik.stemming.Dictionary dictionary
    • language

      protected Language language
  • Constructor Details

    • BaseSynthesizer

      public BaseSynthesizer(String sorosFileName, String resourceFileName, String tagFileName, Language lang)
      Parameters:
      resourceFileName - The dictionary file name.
      tagFileName - The name of a file containing all possible tags.
    • BaseSynthesizer

      public BaseSynthesizer(String sorosFileName, String resourceFileName, String tagFileName, String langShortCode)
      Parameters:
      resourceFileName - The dictionary file name.
      tagFileName - The name of a file containing all possible tags.
      langShortCode - the language short code used to find the data files
    • BaseSynthesizer

      public BaseSynthesizer(String resourceFileName, String tagFileName, Language lang)
    • BaseSynthesizer

      public BaseSynthesizer(String resourceFileName, String tagFileName, String langShortCode)
  • Method Details

    • getDictionary

      protected morfologik.stemming.Dictionary getDictionary() throws IOException
      Returns the Dictionary used for this synthesizer. The dictionary file can be defined in the constructor.
      Throws:
      IOException - In case the dictionary cannot be loaded.
    • createStemmer

      protected morfologik.stemming.IStemmer createStemmer()
      Creates a new IStemmer based on the configured dictionary. The result must not be shared among threads.
      Since:
      2.3
    • createNumberSpeller

      private Soros createNumberSpeller(String langcode)
    • createRomanNumberer

      private Soros createRomanNumberer()
    • lookup

      protected List<String> lookup(String lemma, String posTag)
      Lookup the inflected forms of a lemma defined by a part-of-speech tag.
      Parameters:
      lemma - the lemma to be inflected.
      posTag - the desired part-of-speech tag.
    • synthesize

      public String[] synthesize(AnalyzedToken token, String posTag) throws IOException
      Get a form of a given AnalyzedToken, where the form is defined by a part-of-speech tag.
      Specified by:
      synthesize in interface Synthesizer
      Parameters:
      token - AnalyzedToken to be inflected.
      posTag - The desired part-of-speech tag.
      Returns:
      inflected words, or an empty array if no forms were found
      Throws:
      IOException
    • synthesize

      public String[] synthesize(AnalyzedToken token, String posTag, boolean posTagRegExp) throws IOException
      Description copied from interface: Synthesizer
      Generates a form of the word with a given POS tag for a given lemma. POS tag can be specified using regular expressions.
      Specified by:
      synthesize in interface Synthesizer
      Parameters:
      token - the token to be used for synthesis
      posTag - POS tag of the form to be generated
      posTagRegExp - Specifies whether the posTag string is a regular expression.
      Throws:
      IOException
    • synthesizeForPosTags

      public String[] synthesizeForPosTags(String lemma, Predicate<String> acceptTag) throws IOException
      Synthesize forms for the given lemma and for all POS tags satisfying the given predicate.
      Throws:
      IOException
      Since:
      5.3
    • getPosTagCorrection

      public String getPosTagCorrection(String posTag)
      Description copied from interface: Synthesizer
      Gets a corrected version of the POS tag used for synthesis. Useful when the tagset defines special disjunction that need to be converted into regexp disjunctions.
      Specified by:
      getPosTagCorrection in interface Synthesizer
      Parameters:
      posTag - original POS tag to correct
      Returns:
      converted POS tag
    • getStemmer

      public morfologik.stemming.IStemmer getStemmer()
      Returns:
      the stemmer interface to be used.
      Since:
      2.5
    • initPossibleTags

      protected void initPossibleTags() throws IOException
      Throws:
      IOException
    • loadTags

      private List<String> loadTags() throws IOException
      Throws:
      IOException
    • getSpelledNumber

      public String getSpelledNumber(String arabicNumeral)
      Description copied from interface: Synthesizer
      Spells out a number
      Specified by:
      getSpelledNumber in interface Synthesizer
      Parameters:
      arabicNumeral - in arabic numerals
      Returns:
      String of the spelled out number
    • getRomanNumber

      public String getRomanNumber(String arabicNumeral)
    • isException

      protected boolean isException(String w)
    • removeExceptions

      protected String[] removeExceptions(String[] words)
    • getTargetPosTag

      public String getTargetPosTag(List<String> posTags, String targetPosTag)
      Description copied from interface: Synthesizer
      Select the desired POS tag to synthesize
      Specified by:
      getTargetPosTag in interface Synthesizer