Class LuceneSingleIndexLanguageModel

java.lang.Object
org.languagetool.languagemodel.BaseLanguageModel
org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
All Implemented Interfaces:
AutoCloseable, LanguageModel

public class LuceneSingleIndexLanguageModel extends BaseLanguageModel
Information about ngram occurrences, taken from Lucene indexes (one index per ngram level). This is not a real language model as it only returns information about occurrence counts but has no probability calculation, especially not for the case with 0 occurrences.
Since:
3.2
  • Constructor Details

    • LuceneSingleIndexLanguageModel

      public LuceneSingleIndexLanguageModel(File topIndexDir)
      Parameters:
      topIndexDir - a directory which contains at least another sub directory called 3grams, which is a Lucene index with ngram occurrences as created by org.languagetool.dev.FrequencyIndexCreator.
    • LuceneSingleIndexLanguageModel

      @Experimental public LuceneSingleIndexLanguageModel(int maxNgram)
  • Method Details

    • validateDirectory

      public static void validateDirectory(File topIndexDir)
      Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories 1grams etc.
      Since:
      3.0
    • clearCaches

      @Experimental public static void clearCaches()
      Only used internally.
      Since:
      3.2
    • doValidateDirectory

      protected void doValidateDirectory(File topIndexDir)
    • getCount

      public long getCount(List<String> tokens)
      Description copied from class: BaseLanguageModel
      Get the occurrence count for the given token sequence.
      Specified by:
      getCount in class BaseLanguageModel
    • getCount

      public long getCount(String token1)
      Description copied from class: BaseLanguageModel
      Get the occurrence count for token.
      Specified by:
      getCount in class BaseLanguageModel
    • getTotalTokenCount

      public long getTotalTokenCount()
      Specified by:
      getTotalTokenCount in class BaseLanguageModel
    • getLuceneSearcher

      protected LuceneSingleIndexLanguageModel.LuceneSearcher getLuceneSearcher(int ngramSize)
    • close

      public void close()
    • toString

      public String toString()
      Overrides:
      toString in class Object