Class DefaultLanguageIdentifier

java.lang.Object
org.languagetool.language.identifier.LanguageIdentifier
org.languagetool.language.identifier.DefaultLanguageIdentifier

public class DefaultLanguageIdentifier extends LanguageIdentifier
Identify the language of a text. Note that some languages might never be detected because they are close to another language. Language variants like en-US or en-GB are not detected, the result will be en for those. By default, only the first 1000 characters of a text are considered. Email signatures that use \n-- \n as a delimiter are ignored.
Since:
2.9
  • Field Details

    • logger

      private static final org.slf4j.Logger logger
    • MINIMAL_CONFIDENCE

      private static final double MINIMAL_CONFIDENCE
      See Also:
    • SHORT_ALGO_THRESHOLD

      private static final int SHORT_ALGO_THRESHOLD
      See Also:
    • CONSIDER_ONLY_PREFERRED_THRESHOLD

      private static final int CONSIDER_ONLY_PREFERRED_THRESHOLD
      See Also:
    • ignoreLangCodes

      private static final List<String> ignoreLangCodes
    • externalLangCodes

      private static final List<String> externalLangCodes
    • FASTTEXT_CONFIDENCE_THRESHOLD

      private static final float FASTTEXT_CONFIDENCE_THRESHOLD
      See Also:
    • languageDetector

      private final com.optimaize.langdetect.LanguageDetector languageDetector
    • textObjectFactory

      private final com.optimaize.langdetect.text.TextObjectFactory textObjectFactory
    • fasttextInitCounter

      private final AtomicInteger fasttextInitCounter
    • fastTextDetector

      private FastTextDetector fastTextDetector
    • ngram

      private NGramDetector ngram
  • Constructor Details

    • DefaultLanguageIdentifier

      DefaultLanguageIdentifier()
    • DefaultLanguageIdentifier

      DefaultLanguageIdentifier(int maxLength)
      Parameters:
      maxLength - the maximum number of characters that will be considered - can help with performance. Don't use values below 100, as this would decrease accuracy.
      Throws:
      IllegalArgumentException - if maxLength is less than 10
      Since:
      4.2
  • Method Details