Class LanguageIdentifier

java.lang.Object
org.languagetool.language.LanguageIdentifier

public class LanguageIdentifier extends Object
Identify the language of a text. Note that some languages might never be detected because they are close to another language. Language variants like en-US or en-GB are not detected, the result will be en for those. By default, only the first 1000 characters of a text are considered. Email signatures that use \n-- \n as a delimiter are ignored.
Since:
2.9
  • Constructor Details

    • LanguageIdentifier

      public LanguageIdentifier()
    • LanguageIdentifier

      public LanguageIdentifier(int maxLength)
      Parameters:
      maxLength - the maximum number of characters that will be considered - can help with performance. Don't use values below 100, as this would decrease accuracy.
      Throws:
      IllegalArgumentException - if maxLength is less than 10
      Since:
      4.2
  • Method Details

    • enableFasttext

      public void enableFasttext(File fasttextBinary, File fasttextModel)
    • detectLanguage

      @Nullable public @Nullable Language detectLanguage(String text)
      Returns:
      language or null if language could not be identified
    • detectLanguage

      @Nullable public @Nullable DetectedLanguage detectLanguage(String text, List<String> noopLangsTmp, List<String> preferredLangsTmp)
      Parameters:
      noopLangsTmp - list of codes that are detected but will lead to the NoopLanguage that has no rules
      Returns:
      language or null if language could not be identified
      Since:
      4.4 (new parameter noopLangs, changed return type to DetectedLanguage)