Package org.languagetool
Class Language
java.lang.Object
org.languagetool.Language
- Direct Known Subclasses:
Asturian,Breton,Catalan,CrimeanTatar,Danish,DynamicLanguage,Esperanto,Galician,Greek,Japanese,Khmer,LanguageBuilder.ExtendedLanguage,LanguageWithModel,NoopLanguage,Persian,Polish,Romanian,SimpleSentenceTokenizer.AnyLanguage,Slovak,Slovenian,Swedish,Tagalog,Tamil,Ukrainian
Base class for any supported language (English, German, etc). Language classes
are detected at runtime by searching the classpath for files named
META-INF/org/languagetool/language-module.properties. Those file(s)
need to contain a key languageClasses which specifies the fully qualified
class name(s), e.g. org.languagetool.language.English. Use commas to specify
more than one class.
Sub classes should typically use lazy init for anything that's costly to set up. This improves start up time for the LanguageTool stand-alone version.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final Patternprivate Chunkerprivate static final Disambiguatorprivate static final Taggerprivate final UnifierConfigurationprivate Disambiguatorprivate final Patternprivate static final Patternprivate static final Map<Class<Language>, JLanguageTool> private static final org.slf4j.Loggerprivate static final Patternprivate static final Patternprivate List<AbstractPatternRule> private Chunkerprivate static final Patternprivate static final SentenceTokenizerprivate SentenceTokenizerprivate Stringprivate static final Map<Class<? extends Language>, SpellingCheckRule> Create an instance of the default spelling rule of this language Accessed (with caching) via getDefaultSpellingRuleprivate static final Stringprivate static final Stringprivate Synthesizerprivate Taggerprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate final UnifierConfigurationprivate static final WordTokenizerprivate Tokenizer -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionadaptSuggestion(String s, String originalErrorStr) adjustMatch(RuleMatch rm, List<String> features) private StringCreates language specific chunker.Creates language specific disambiguator.Create a shared instance of JLanguageTool to use in rules for further processing Instances are shared by Language As this is a shared instance, do not modify (add or remove) any rules or filters.Creates language specific post disambiguation chunker.Creates language specific sentence tokenizer.protected SpellingCheckRulecreateDefaultSpellingRule(ResourceBundle messages) Creates language specific part-of-speech synthesizer.Creates language specific part-of-speech tagger.Creates language specific word tokenizer.booleanConsiders languages as equal if their language code, including the country and variant codes are equal.booleanequalsConsiderVariantsIfSpecified(Language otherLanguage) Return true if this is the same language as the given one, considering country variants only if set for both languages.filterRuleMatches(List<RuleMatch> ruleMatches, AnnotatedText text, Set<String> enabledRules) This function is called by JLanguageTool before CleanOverlappingFilter removes overlapping ruleMatchesGet this language's chunker implementation ornull.A file with commons words, either in the classpath or as a filename in the file system.abstract String[]Get this language's country options , e.g.Get disabled rules different from the default ones for this language variant.Get enabled rules different from the default ones for this language variant.Languages that have country variants need to overwrite this to select their most common variant.protected intRetrieve default spelling rule for this language Useful for rules to implement suppression of misspelled suggestionsgetDefaultSpellingRule(ResourceBundle messages) Deprecated.Get this language's feature unifier used for disambiguation.Get this language's part-of-speech disambiguator implementation.getLanguageModel(File indexDir) Get this language's Java locale, not considering the country code.Get this language's Java locale, considering language code and country code (if any).Information about whether the support for this language in LanguageTool is actively maintained.abstract Contributor[]Get the name(s) of the maintainer(s) for this language ornull.abstract StringgetName()Get this language's name in English, e.g.protected List<AbstractPatternRule> Get the pattern rules as defined in the files returned bygetRuleFileNames().Get this language's post disambiguation chunker implementation ornull.protected intReturns a priority for Rule or Category Id (default: 0).getRelevantLanguageModelCapableRules(ResourceBundle messages, LanguageModel languageModel, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) Get a list of rules that can optionally use aLanguageModel.getRelevantLanguageModelRules(ResourceBundle messages, LanguageModel languageModel, UserConfig userConfig) Get a list of rules that require aLanguageModel.getRelevantRemoteRules(ResourceBundle messageBundle, List<RemoteRuleConfig> configs, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages, boolean inputLogging) For rules that depend on a remote server; based onRemoteRulewill be executed asynchronously, with timeout, retries, etc.getRelevantRules(ResourceBundle messages, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) Get the rules classes that should run for texts in this language.getRelevantRulesGlobalConfig(ResourceBundle messages, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) Get the rules classes that should run for texts in this language.getRemoteEnhancedRules(ResourceBundle messageBundle, List<RemoteRuleConfig> configs, UserConfig userConfig, Language motherTongue, List<Language> altLanguages, boolean inputLogging) For rules whose results are extended using some remote service, e.g.Get the location of the rule file(s) in a form like/org/languagetool/rules/de/grammar.xml, i.e.intgetRulePriority(Rule rule) Returns a priority for Rule (default: 0).Get this language's sentence tokenizer implementation.abstract StringGet this language's character code, e.g.final StringGet the short name of the language with country and variant (if any), if it is a single-country language.Get this language's part-of-speech synthesizer implementation ornull.Get this language's part-of-speech tagger implementation.final StringgetTranslatedName(ResourceBundle messages) Get the name of the language translated to the current locale, if available.Get this language's feature unifier.Get this language's variant, e.g.Get this language's word tokenizer implementation.private booleaninthashCode()booleanbooleanhasNGramFalseFriendRule(Language motherTongue) Return true if language has ngram-based false friend rule returned bygetRelevantLanguageModelCapableRules(java.util.ResourceBundle, org.languagetool.languagemodel.LanguageModel, org.languagetool.GlobalConfig, org.languagetool.UserConfig, org.languagetool.Language, java.util.List<org.languagetool.Language>).final booleanWhether this class has at least one subclass that implements variants of this language.booleanbooleanFor internal use only.booleanbooleanWhether this language supports spell checking only and no advanced grammar and style checking.private booleanbooleanWhether this is a country variant of another language, i.e.voidsetChunker(Chunker chunker) Set this language's chunker implementation ornull.voidsetDisambiguator(Disambiguator disambiguator) Set this language's part-of-speech disambiguator implementation.voidsetPostDisambiguationChunker(Chunker chunker) Set this language's post disambiguation chunker implementation ornull.voidsetSentenceTokenizer(SentenceTokenizer tokenizer) Set this language's sentence tokenizer implementation.voidsetSynthesizer(Synthesizer synthesizer) Set this language's part-of-speech synthesizer implementation ornull.voidSet this language's part-of-speech tagger implementation.voidsetWordTokenizer(Tokenizer tokenizer) Set this language's word tokenizer implementation.toAdvancedTypography(String input) final StringtoString()
-
Field Details
-
logger
private static final org.slf4j.Logger logger -
DEMO_DISAMBIGUATOR
-
DEMO_TAGGER
-
SENTENCE_TOKENIZER
-
WORD_TOKENIZER
-
INSIDE_SUGGESTION
-
APOSTROPHE
-
SUGGESTION_OPEN_TAG
- See Also:
-
SUGGESTION_CLOSE_TAG
- See Also:
-
NBSPACE1
-
NBSPACE2
-
languagetoolInstances
-
QUOTED_CHAR_PATTERN
-
TYPOGRAPHY_PATTERN_1
-
TYPOGRAPHY_PATTERN_2
-
TYPOGRAPHY_PATTERN_3
-
TYPOGRAPHY_PATTERN_4
-
TYPOGRAPHY_PATTERN_5
-
unifierConfig
-
disambiguationUnifierConfig
-
ignoredCharactersRegex
-
patternRules
-
disambiguator
-
tagger
-
sentenceTokenizer
-
wordTokenizer
-
chunker
-
postDisambiguationChunker
-
synthesizer
-
shortCodeWithCountryAndVariant
-
spellingRules
Create an instance of the default spelling rule of this language Accessed (with caching) via getDefaultSpellingRule- Since:
- 5.5
-
-
Constructor Details
-
Language
protected Language()
-
-
Method Details
-
getShortCode
Get this language's character code, e.g.enfor English. For most languages this is a two-letter code according to ISO 639-1, but for those languages that don't have a two-letter code, a three-letter code according to ISO 639-2 is returned. The country parameter (e.g. "US"), if any, is not returned.- Since:
- 3.6
-
getName
Get this language's name in English, e.g.EnglishorGerman (Germany).- Returns:
- language name
-
getCountries
Get this language's country options , e.g.US(as inen-US) orPL(as inpl-PL).- Returns:
- String[] - array of country options for the language.
-
getMaintainers
Get the name(s) of the maintainer(s) for this language ornull. -
getRelevantRules
public abstract List<Rule> getRelevantRules(ResourceBundle messages, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException Get the rules classes that should run for texts in this language.- Throws:
IOException- Since:
- 4.3
-
getCommonWordsPath
A file with commons words, either in the classpath or as a filename in the file system.- Since:
- 4.5
-
getVariant
Get this language's variant, e.g.valencia(as inca-ES-valencia) ornull. Attention: not to be confused with "country" option- Returns:
- variant for the language or
null - Since:
- 2.3
-
getDefaultEnabledRulesForVariant
Get enabled rules different from the default ones for this language variant.- Returns:
- enabled rules for the language variant.
- Since:
- 2.4
-
getDefaultDisabledRulesForVariant
Get disabled rules different from the default ones for this language variant.- Returns:
- disabled rules for the language variant.
- Since:
- 2.4
-
getLanguageModel
- Parameters:
indexDir- directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence counts- Returns:
- a LanguageModel or
nullif this language doesn't support one - Throws:
IOException- Since:
- 2.7
-
getRelevantLanguageModelRules
public List<Rule> getRelevantLanguageModelRules(ResourceBundle messages, LanguageModel languageModel, UserConfig userConfig) throws IOException Get a list of rules that require aLanguageModel. Returns an empty list for languages that don't have such rules.- Throws:
IOException- Since:
- 2.7
-
getRelevantLanguageModelCapableRules
public List<Rule> getRelevantLanguageModelCapableRules(ResourceBundle messages, @Nullable LanguageModel languageModel, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException Get a list of rules that can optionally use aLanguageModel. Returns an empty list for languages that don't have such rules.- Parameters:
languageModel- null if no language model is available- Throws:
IOException- Since:
- 4.5
-
getRelevantRemoteRules
public List<Rule> getRelevantRemoteRules(ResourceBundle messageBundle, List<RemoteRuleConfig> configs, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages, boolean inputLogging) throws IOException For rules that depend on a remote server; based onRemoteRulewill be executed asynchronously, with timeout, retries, etc. as configured Can return non-remote rules (e.g. if configuration missing, or for A/B tests), will be executed normally- Throws:
IOException
-
getRemoteEnhancedRules
@Experimental public Function<Rule,Rule> getRemoteEnhancedRules(ResourceBundle messageBundle, List<RemoteRuleConfig> configs, UserConfig userConfig, Language motherTongue, List<Language> altLanguages, boolean inputLogging) throws IOException For rules whose results are extended using some remote service, e.g.BERTSuggestionRanking- Returns:
- function that transforms old rule into remote-enhanced rule
- Throws:
IOException- Since:
- 4.8
-
getRelevantRulesGlobalConfig
public List<Rule> getRelevantRulesGlobalConfig(ResourceBundle messages, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException Get the rules classes that should run for texts in this language.- Throws:
IOException- Since:
- 4.6
-
createDefaultSpellingRule
@Nullable protected SpellingCheckRule createDefaultSpellingRule(ResourceBundle messages) throws IOException - Throws:
IOException
-
getDefaultSpellingRule
Retrieve default spelling rule for this language Useful for rules to implement suppression of misspelled suggestions- Since:
- 5.5
-
getDefaultSpellingRule
Deprecated.Retrieve default spelling rule for this language Useful for rules to implement suppression of misspelled suggestions- Parameters:
messages- unused- Since:
- 5.5
-
getLocale
Get this language's Java locale, not considering the country code. -
getLocaleWithCountryAndVariant
Get this language's Java locale, considering language code and country code (if any).- Since:
- 2.1
-
getRuleFileNames
Get the location of the rule file(s) in a form like/org/languagetool/rules/de/grammar.xml, i.e. a path in the classpath. The files must exist or an exception will be thrown, unless the filename contains the string-test-. -
getDefaultLanguageVariant
Languages that have country variants need to overwrite this to select their most common variant.- Returns:
- default country variant
- Since:
- 1.8
-
createDefaultDisambiguator
Creates language specific disambiguator. This function will be called each time ingetDisambiguator()if disambiguator is not set. -
getDisambiguator
Get this language's part-of-speech disambiguator implementation. -
setDisambiguator
Set this language's part-of-speech disambiguator implementation. -
createDefaultTagger
Creates language specific part-of-speech tagger. The tagger must not benull, but it can be a trivial pseudo-tagger that only assignsnulltags. This function will be called each time ingetTagger()()} if tagger is not set. -
getTagger
Get this language's part-of-speech tagger implementation. -
setTagger
Set this language's part-of-speech tagger implementation. -
createDefaultSentenceTokenizer
Creates language specific sentence tokenizer. This function will be called each time ingetSentenceTokenizer()if sentence tokenizer is not set. -
getSentenceTokenizer
Get this language's sentence tokenizer implementation. -
setSentenceTokenizer
Set this language's sentence tokenizer implementation. -
createDefaultWordTokenizer
Creates language specific word tokenizer. This function will be called each time ingetWordTokenizer()if word tokenizer is not set. -
getWordTokenizer
Get this language's word tokenizer implementation. -
setWordTokenizer
Set this language's word tokenizer implementation. -
createDefaultChunker
Creates language specific chunker. This function will be called each time ingetChunker()if chunker is not set. -
getChunker
Get this language's chunker implementation ornull.- Since:
- 2.3
-
setChunker
Set this language's chunker implementation ornull. -
createDefaultPostDisambiguationChunker
Creates language specific post disambiguation chunker. This function will be called each time ingetPostDisambiguationChunker()if chunker is not set. -
getPostDisambiguationChunker
Get this language's post disambiguation chunker implementation ornull.- Since:
- 2.9
-
setPostDisambiguationChunker
Set this language's post disambiguation chunker implementation ornull. -
createDefaultJLanguageTool
Create a shared instance of JLanguageTool to use in rules for further processing Instances are shared by Language As this is a shared instance, do not modify (add or remove) any rules or filters. The alternative to disabling/enabling rules is to select the desired rules from getAllActiveRules(), and run them separately with rule.match(analizedSentence). Do not call this in a static block or to initialize a static JLanguageTool field in rules or filters classes, this could lead to a deadlock during initialization.- Returns:
- a shared JLanguageTool instance for this language
- Since:
- 6.1
-
createDefaultSynthesizer
Creates language specific part-of-speech synthesizer. This function will be called each time ingetSynthesizer()if synthesizer is not set. -
getSynthesizer
Get this language's part-of-speech synthesizer implementation ornull. -
setSynthesizer
Set this language's part-of-speech synthesizer implementation ornull. -
getUnifier
Get this language's feature unifier.- Returns:
- Feature unifier for analyzed tokens.
-
getDisambiguationUnifier
Get this language's feature unifier used for disambiguation. Note: it might be different from the normal rule unifier.- Returns:
- Feature unifier for analyzed tokens.
-
getUnifierConfiguration
- Since:
- 2.3
-
getDisambiguationUnifierConfiguration
- Since:
- 2.3
-
getTranslatedName
Get the name of the language translated to the current locale, if available. Otherwise, get the untranslated name. -
getShortCodeWithCountryAndVariant
Get the short name of the language with country and variant (if any), if it is a single-country language. For generic language classes, get only a two- or three-character code.- Since:
- 3.6
-
buildShortCodeWithCountryAndVariant
-
getPatternRules
Get the pattern rules as defined in the files returned bygetRuleFileNames().- Throws:
IOException- Since:
- 2.7
-
toString
-
isVariant
public boolean isVariant()Whether this is a country variant of another language, i.e. whether it doesn't directly extendLanguage, but a subclass ofLanguage.- Since:
- 1.8
-
hasVariant
public final boolean hasVariant()Whether this class has at least one subclass that implements variants of this language.- Since:
- 1.8
-
isExternal
public boolean isExternal()For internal use only. Overwritten to returntruefor languages that have been loaded from an external file after start up. -
equalsConsiderVariantsIfSpecified
Return true if this is the same language as the given one, considering country variants only if set for both languages. For example: en = en, en = en-GB, en-GB = en-GB, but en-US != en-GB- Since:
- 1.8
-
hasCountry
private boolean hasCountry() -
getIgnoredCharactersRegex
- Returns:
- Return compiled regular expression to ignore inside tokens
- Since:
- 2.9
-
getMaintainedState
Information about whether the support for this language in LanguageTool is actively maintained. If not, the user interface might show a warning.- Since:
- 3.3
-
isHiddenFromGui
public boolean isHiddenFromGui() -
isTheDefaultVariant
private boolean isTheDefaultVariant() -
getPriorityForId
Returns a priority for Rule or Category Id (default: 0). Positive integers have higher priority. Negative integers have lower priority.- Since:
- 3.6
-
getRulePriority
Returns a priority for Rule (default: 0). Positive integers have higher priority. Negative integers have lower priority.- Since:
- 5.0
-
getDefaultRulePriorityForStyle
protected int getDefaultRulePriorityForStyle() -
isSpellcheckOnlyLanguage
public boolean isSpellcheckOnlyLanguage()Whether this language supports spell checking only and no advanced grammar and style checking.- Since:
- 4.5
-
hasNGramFalseFriendRule
Return true if language has ngram-based false friend rule returned bygetRelevantLanguageModelCapableRules(java.util.ResourceBundle, org.languagetool.languagemodel.LanguageModel, org.languagetool.GlobalConfig, org.languagetool.UserConfig, org.languagetool.Language, java.util.List<org.languagetool.Language>).- Since:
- 4.6
-
getOpeningDoubleQuote
- Since:
- 5.1
-
getClosingDoubleQuote
- Since:
- 5.1
-
getOpeningSingleQuote
- Since:
- 5.1
-
getClosingSingleQuote
- Since:
- 5.1
-
isAdvancedTypographyEnabled
public boolean isAdvancedTypographyEnabled()- Since:
- 5.1
-
toAdvancedTypography
- Since:
- 5.1
-
equals
Considers languages as equal if their language code, including the country and variant codes are equal. -
hashCode
public int hashCode() -
hasMinMatchesRules
public boolean hasMinMatchesRules()- Since:
- 5.1 Some rules contain the field min_matches to check repeated patterns
-
adaptSuggestion
- Since:
- 6.0 Adjust suggestion
-
getConsistencyRulePrefix
-
adjustMatch
-
prepareLineForSpeller
-
filterRuleMatches
public List<RuleMatch> filterRuleMatches(List<RuleMatch> ruleMatches, AnnotatedText text, Set<String> enabledRules) This function is called by JLanguageTool before CleanOverlappingFilter removes overlapping ruleMatches- Returns:
- filtered ruleMatches
-
getMultitokenSpeller
-
getPriorityMap
- Since:
- 6.4
-
getDefaultSpellingRule()