Package org.languagetool.language
Class LanguageIdentifier
java.lang.Object
org.languagetool.language.LanguageIdentifier
Identify the language of a text. Note that some languages might never be
detected because they are close to another language. Language variants like
en-US or en-GB are not detected, the result will be
en for those.
By default, only the first 1000 characters of a text are considered.
Email signatures that use \n-- \n as a delimiter are ignored.- Since:
- 2.9
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription@Nullable LanguagedetectLanguage(String text) @Nullable DetectedLanguagevoidenableFasttext(File fasttextBinary, File fasttextModel)
-
Constructor Details
-
LanguageIdentifier
public LanguageIdentifier() -
LanguageIdentifier
public LanguageIdentifier(int maxLength) - Parameters:
maxLength- the maximum number of characters that will be considered - can help with performance. Don't use values below 100, as this would decrease accuracy.- Throws:
IllegalArgumentException- ifmaxLengthis less than 10- Since:
- 4.2
-
-
Method Details
-
enableFasttext
-
detectLanguage
- Returns:
- language or
nullif language could not be identified
-
detectLanguage
@Nullable public @Nullable DetectedLanguage detectLanguage(String text, List<String> noopLangsTmp, List<String> preferredLangsTmp) - Parameters:
noopLangsTmp- list of codes that are detected but will lead to the NoopLanguage that has no rules- Returns:
- language or
nullif language could not be identified - Since:
- 4.4 (new parameter noopLangs, changed return type to DetectedLanguage)
-