Class Language

java.lang.Object
org.languagetool.Language
Direct Known Subclasses:
Asturian, Breton, Catalan, CrimeanTatar, Danish, DynamicLanguage, Esperanto, Galician, Greek, Japanese, Khmer, LanguageBuilder.ExtendedLanguage, LanguageWithModel, NoopLanguage, Persian, Polish, Romanian, SimpleSentenceTokenizer.AnyLanguage, Slovak, Slovenian, Swedish, Tagalog, Tamil, Ukrainian

public abstract class Language extends Object
Base class for any supported language (English, German, etc). Language classes are detected at runtime by searching the classpath for files named META-INF/org/languagetool/language-module.properties. Those file(s) need to contain a key languageClasses which specifies the fully qualified class name(s), e.g. org.languagetool.language.English. Use commas to specify more than one class.

Sub classes should typically use lazy init for anything that's costly to set up. This improves start up time for the LanguageTool stand-alone version.

  • Field Details

    • logger

      private static final org.slf4j.Logger logger
    • DEMO_DISAMBIGUATOR

      private static final Disambiguator DEMO_DISAMBIGUATOR
    • DEMO_TAGGER

      private static final Tagger DEMO_TAGGER
    • SENTENCE_TOKENIZER

      private static final SentenceTokenizer SENTENCE_TOKENIZER
    • WORD_TOKENIZER

      private static final WordTokenizer WORD_TOKENIZER
    • INSIDE_SUGGESTION

      private static final Pattern INSIDE_SUGGESTION
    • APOSTROPHE

      private static final Pattern APOSTROPHE
    • SUGGESTION_OPEN_TAG

      private static final String SUGGESTION_OPEN_TAG
      See Also:
    • SUGGESTION_CLOSE_TAG

      private static final String SUGGESTION_CLOSE_TAG
      See Also:
    • NBSPACE1

      private static final Pattern NBSPACE1
    • NBSPACE2

      private static final Pattern NBSPACE2
    • languagetoolInstances

      private static final Map<Class<Language>,JLanguageTool> languagetoolInstances
    • QUOTED_CHAR_PATTERN

      private static final Pattern QUOTED_CHAR_PATTERN
    • TYPOGRAPHY_PATTERN_1

      private static final Pattern TYPOGRAPHY_PATTERN_1
    • TYPOGRAPHY_PATTERN_2

      private static final Pattern TYPOGRAPHY_PATTERN_2
    • TYPOGRAPHY_PATTERN_3

      private static final Pattern TYPOGRAPHY_PATTERN_3
    • TYPOGRAPHY_PATTERN_4

      private static final Pattern TYPOGRAPHY_PATTERN_4
    • TYPOGRAPHY_PATTERN_5

      private static final Pattern TYPOGRAPHY_PATTERN_5
    • unifierConfig

      private final UnifierConfiguration unifierConfig
    • disambiguationUnifierConfig

      private final UnifierConfiguration disambiguationUnifierConfig
    • ignoredCharactersRegex

      private final Pattern ignoredCharactersRegex
    • patternRules

      private List<AbstractPatternRule> patternRules
    • disambiguator

      private Disambiguator disambiguator
    • tagger

      private Tagger tagger
    • sentenceTokenizer

      private SentenceTokenizer sentenceTokenizer
    • wordTokenizer

      private Tokenizer wordTokenizer
    • chunker

      private Chunker chunker
    • postDisambiguationChunker

      private Chunker postDisambiguationChunker
    • synthesizer

      private Synthesizer synthesizer
    • shortCodeWithCountryAndVariant

      private String shortCodeWithCountryAndVariant
    • spellingRules

      private static final Map<Class<? extends Language>,SpellingCheckRule> spellingRules
      Create an instance of the default spelling rule of this language Accessed (with caching) via getDefaultSpellingRule
      Since:
      5.5
  • Constructor Details

    • Language

      protected Language()
  • Method Details

    • getShortCode

      public abstract String getShortCode()
      Get this language's character code, e.g. en for English. For most languages this is a two-letter code according to ISO 639-1, but for those languages that don't have a two-letter code, a three-letter code according to ISO 639-2 is returned. The country parameter (e.g. "US"), if any, is not returned.
      Since:
      3.6
    • getName

      public abstract String getName()
      Get this language's name in English, e.g. English or German (Germany).
      Returns:
      language name
    • getCountries

      public abstract String[] getCountries()
      Get this language's country options , e.g. US (as in en-US) or PL (as in pl-PL).
      Returns:
      String[] - array of country options for the language.
    • getMaintainers

      @Nullable public abstract Contributor[] getMaintainers()
      Get the name(s) of the maintainer(s) for this language or null.
    • getRelevantRules

      public abstract List<Rule> getRelevantRules(ResourceBundle messages, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException
      Get the rules classes that should run for texts in this language.
      Throws:
      IOException
      Since:
      4.3
    • getCommonWordsPath

      public String getCommonWordsPath()
      A file with commons words, either in the classpath or as a filename in the file system.
      Since:
      4.5
    • getVariant

      @Nullable public String getVariant()
      Get this language's variant, e.g. valencia (as in ca-ES-valencia) or null. Attention: not to be confused with "country" option
      Returns:
      variant for the language or null
      Since:
      2.3
    • getDefaultEnabledRulesForVariant

      public List<String> getDefaultEnabledRulesForVariant()
      Get enabled rules different from the default ones for this language variant.
      Returns:
      enabled rules for the language variant.
      Since:
      2.4
    • getDefaultDisabledRulesForVariant

      public List<String> getDefaultDisabledRulesForVariant()
      Get disabled rules different from the default ones for this language variant.
      Returns:
      disabled rules for the language variant.
      Since:
      2.4
    • getLanguageModel

      @Nullable public LanguageModel getLanguageModel(File indexDir) throws IOException
      Parameters:
      indexDir - directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence counts
      Returns:
      a LanguageModel or null if this language doesn't support one
      Throws:
      IOException
      Since:
      2.7
    • getRelevantLanguageModelRules

      public List<Rule> getRelevantLanguageModelRules(ResourceBundle messages, LanguageModel languageModel, UserConfig userConfig) throws IOException
      Get a list of rules that require a LanguageModel. Returns an empty list for languages that don't have such rules.
      Throws:
      IOException
      Since:
      2.7
    • getRelevantLanguageModelCapableRules

      public List<Rule> getRelevantLanguageModelCapableRules(ResourceBundle messages, @Nullable LanguageModel languageModel, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException
      Get a list of rules that can optionally use a LanguageModel. Returns an empty list for languages that don't have such rules.
      Parameters:
      languageModel - null if no language model is available
      Throws:
      IOException
      Since:
      4.5
    • getRelevantRemoteRules

      public List<Rule> getRelevantRemoteRules(ResourceBundle messageBundle, List<RemoteRuleConfig> configs, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages, boolean inputLogging) throws IOException
      For rules that depend on a remote server; based on RemoteRule will be executed asynchronously, with timeout, retries, etc. as configured Can return non-remote rules (e.g. if configuration missing, or for A/B tests), will be executed normally
      Throws:
      IOException
    • getRemoteEnhancedRules

      @Experimental public Function<Rule,Rule> getRemoteEnhancedRules(ResourceBundle messageBundle, List<RemoteRuleConfig> configs, UserConfig userConfig, Language motherTongue, List<Language> altLanguages, boolean inputLogging) throws IOException
      For rules whose results are extended using some remote service, e.g. BERTSuggestionRanking
      Returns:
      function that transforms old rule into remote-enhanced rule
      Throws:
      IOException
      Since:
      4.8
    • getRelevantRulesGlobalConfig

      public List<Rule> getRelevantRulesGlobalConfig(ResourceBundle messages, GlobalConfig globalConfig, UserConfig userConfig, Language motherTongue, List<Language> altLanguages) throws IOException
      Get the rules classes that should run for texts in this language.
      Throws:
      IOException
      Since:
      4.6
    • createDefaultSpellingRule

      @Nullable protected SpellingCheckRule createDefaultSpellingRule(ResourceBundle messages) throws IOException
      Throws:
      IOException
    • getDefaultSpellingRule

      @Nullable public SpellingCheckRule getDefaultSpellingRule()
      Retrieve default spelling rule for this language Useful for rules to implement suppression of misspelled suggestions
      Since:
      5.5
    • getDefaultSpellingRule

      @Deprecated public SpellingCheckRule getDefaultSpellingRule(ResourceBundle messages)
      Deprecated.
      Retrieve default spelling rule for this language Useful for rules to implement suppression of misspelled suggestions
      Parameters:
      messages - unused
      Since:
      5.5
    • getLocale

      public Locale getLocale()
      Get this language's Java locale, not considering the country code.
    • getLocaleWithCountryAndVariant

      public Locale getLocaleWithCountryAndVariant()
      Get this language's Java locale, considering language code and country code (if any).
      Since:
      2.1
    • getRuleFileNames

      public List<String> getRuleFileNames()
      Get the location of the rule file(s) in a form like /org/languagetool/rules/de/grammar.xml, i.e. a path in the classpath. The files must exist or an exception will be thrown, unless the filename contains the string -test-.
    • getDefaultLanguageVariant

      @NotNull public Language getDefaultLanguageVariant()
      Languages that have country variants need to overwrite this to select their most common variant.
      Returns:
      default country variant
      Since:
      1.8
    • createDefaultDisambiguator

      public Disambiguator createDefaultDisambiguator()
      Creates language specific disambiguator. This function will be called each time in getDisambiguator() if disambiguator is not set.
    • getDisambiguator

      public Disambiguator getDisambiguator()
      Get this language's part-of-speech disambiguator implementation.
    • setDisambiguator

      public void setDisambiguator(Disambiguator disambiguator)
      Set this language's part-of-speech disambiguator implementation.
    • createDefaultTagger

      @NotNull public Tagger createDefaultTagger()
      Creates language specific part-of-speech tagger. The tagger must not be null, but it can be a trivial pseudo-tagger that only assigns null tags. This function will be called each time in getTagger() ()} if tagger is not set.
    • getTagger

      @NotNull public Tagger getTagger()
      Get this language's part-of-speech tagger implementation.
    • setTagger

      public void setTagger(Tagger tagger)
      Set this language's part-of-speech tagger implementation.
    • createDefaultSentenceTokenizer

      public SentenceTokenizer createDefaultSentenceTokenizer()
      Creates language specific sentence tokenizer. This function will be called each time in getSentenceTokenizer() if sentence tokenizer is not set.
    • getSentenceTokenizer

      public SentenceTokenizer getSentenceTokenizer()
      Get this language's sentence tokenizer implementation.
    • setSentenceTokenizer

      public void setSentenceTokenizer(SentenceTokenizer tokenizer)
      Set this language's sentence tokenizer implementation.
    • createDefaultWordTokenizer

      public Tokenizer createDefaultWordTokenizer()
      Creates language specific word tokenizer. This function will be called each time in getWordTokenizer() if word tokenizer is not set.
    • getWordTokenizer

      public Tokenizer getWordTokenizer()
      Get this language's word tokenizer implementation.
    • setWordTokenizer

      public void setWordTokenizer(Tokenizer tokenizer)
      Set this language's word tokenizer implementation.
    • createDefaultChunker

      @Nullable public Chunker createDefaultChunker()
      Creates language specific chunker. This function will be called each time in getChunker() if chunker is not set.
    • getChunker

      @Nullable public Chunker getChunker()
      Get this language's chunker implementation or null.
      Since:
      2.3
    • setChunker

      public void setChunker(Chunker chunker)
      Set this language's chunker implementation or null.
    • createDefaultPostDisambiguationChunker

      @Nullable public Chunker createDefaultPostDisambiguationChunker()
      Creates language specific post disambiguation chunker. This function will be called each time in getPostDisambiguationChunker() if chunker is not set.
    • getPostDisambiguationChunker

      @Nullable public Chunker getPostDisambiguationChunker()
      Get this language's post disambiguation chunker implementation or null.
      Since:
      2.9
    • setPostDisambiguationChunker

      public void setPostDisambiguationChunker(Chunker chunker)
      Set this language's post disambiguation chunker implementation or null.
    • createDefaultJLanguageTool

      public JLanguageTool createDefaultJLanguageTool()
      Create a shared instance of JLanguageTool to use in rules for further processing Instances are shared by Language As this is a shared instance, do not modify (add or remove) any rules or filters. The alternative to disabling/enabling rules is to select the desired rules from getAllActiveRules(), and run them separately with rule.match(analizedSentence). Do not call this in a static block or to initialize a static JLanguageTool field in rules or filters classes, this could lead to a deadlock during initialization.
      Returns:
      a shared JLanguageTool instance for this language
      Since:
      6.1
    • createDefaultSynthesizer

      @Nullable public Synthesizer createDefaultSynthesizer()
      Creates language specific part-of-speech synthesizer. This function will be called each time in getSynthesizer() if synthesizer is not set.
    • getSynthesizer

      @Nullable public Synthesizer getSynthesizer()
      Get this language's part-of-speech synthesizer implementation or null.
    • setSynthesizer

      public void setSynthesizer(Synthesizer synthesizer)
      Set this language's part-of-speech synthesizer implementation or null.
    • getUnifier

      public Unifier getUnifier()
      Get this language's feature unifier.
      Returns:
      Feature unifier for analyzed tokens.
    • getDisambiguationUnifier

      public Unifier getDisambiguationUnifier()
      Get this language's feature unifier used for disambiguation. Note: it might be different from the normal rule unifier.
      Returns:
      Feature unifier for analyzed tokens.
    • getUnifierConfiguration

      public UnifierConfiguration getUnifierConfiguration()
      Since:
      2.3
    • getDisambiguationUnifierConfiguration

      public UnifierConfiguration getDisambiguationUnifierConfiguration()
      Since:
      2.3
    • getTranslatedName

      public final String getTranslatedName(ResourceBundle messages)
      Get the name of the language translated to the current locale, if available. Otherwise, get the untranslated name.
    • getShortCodeWithCountryAndVariant

      public final String getShortCodeWithCountryAndVariant()
      Get the short name of the language with country and variant (if any), if it is a single-country language. For generic language classes, get only a two- or three-character code.
      Since:
      3.6
    • buildShortCodeWithCountryAndVariant

      private String buildShortCodeWithCountryAndVariant()
    • getPatternRules

      protected List<AbstractPatternRule> getPatternRules() throws IOException
      Get the pattern rules as defined in the files returned by getRuleFileNames().
      Throws:
      IOException
      Since:
      2.7
    • toString

      public final String toString()
      Overrides:
      toString in class Object
    • isVariant

      public boolean isVariant()
      Whether this is a country variant of another language, i.e. whether it doesn't directly extend Language, but a subclass of Language.
      Since:
      1.8
    • hasVariant

      public final boolean hasVariant()
      Whether this class has at least one subclass that implements variants of this language.
      Since:
      1.8
    • isExternal

      public boolean isExternal()
      For internal use only. Overwritten to return true for languages that have been loaded from an external file after start up.
    • equalsConsiderVariantsIfSpecified

      public boolean equalsConsiderVariantsIfSpecified(Language otherLanguage)
      Return true if this is the same language as the given one, considering country variants only if set for both languages. For example: en = en, en = en-GB, en-GB = en-GB, but en-US != en-GB
      Since:
      1.8
    • hasCountry

      private boolean hasCountry()
    • getIgnoredCharactersRegex

      public Pattern getIgnoredCharactersRegex()
      Returns:
      Return compiled regular expression to ignore inside tokens
      Since:
      2.9
    • getMaintainedState

      public LanguageMaintainedState getMaintainedState()
      Information about whether the support for this language in LanguageTool is actively maintained. If not, the user interface might show a warning.
      Since:
      3.3
    • isHiddenFromGui

      public boolean isHiddenFromGui()
    • isTheDefaultVariant

      private boolean isTheDefaultVariant()
    • getPriorityForId

      protected int getPriorityForId(String id)
      Returns a priority for Rule or Category Id (default: 0). Positive integers have higher priority. Negative integers have lower priority.
      Since:
      3.6
    • getRulePriority

      public int getRulePriority(Rule rule)
      Returns a priority for Rule (default: 0). Positive integers have higher priority. Negative integers have lower priority.
      Since:
      5.0
    • getDefaultRulePriorityForStyle

      protected int getDefaultRulePriorityForStyle()
    • isSpellcheckOnlyLanguage

      public boolean isSpellcheckOnlyLanguage()
      Whether this language supports spell checking only and no advanced grammar and style checking.
      Since:
      4.5
    • hasNGramFalseFriendRule

      public boolean hasNGramFalseFriendRule(Language motherTongue)
      Since:
      4.6
    • getOpeningDoubleQuote

      public String getOpeningDoubleQuote()
      Since:
      5.1
    • getClosingDoubleQuote

      public String getClosingDoubleQuote()
      Since:
      5.1
    • getOpeningSingleQuote

      public String getOpeningSingleQuote()
      Since:
      5.1
    • getClosingSingleQuote

      public String getClosingSingleQuote()
      Since:
      5.1
    • isAdvancedTypographyEnabled

      public boolean isAdvancedTypographyEnabled()
      Since:
      5.1
    • toAdvancedTypography

      public String toAdvancedTypography(String input)
      Since:
      5.1
    • equals

      public boolean equals(Object o)
      Considers languages as equal if their language code, including the country and variant codes are equal.
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • hasMinMatchesRules

      public boolean hasMinMatchesRules()
      Since:
      5.1 Some rules contain the field min_matches to check repeated patterns
    • adaptSuggestion

      public String adaptSuggestion(String s, String originalErrorStr)
      Since:
      6.0 Adjust suggestion
    • getConsistencyRulePrefix

      public String getConsistencyRulePrefix()
    • adjustMatch

      public RuleMatch adjustMatch(RuleMatch rm, List<String> features)
    • prepareLineForSpeller

      public List<String> prepareLineForSpeller(String s)
    • filterRuleMatches

      public List<RuleMatch> filterRuleMatches(List<RuleMatch> ruleMatches, AnnotatedText text, Set<String> enabledRules)
      This function is called by JLanguageTool before CleanOverlappingFilter removes overlapping ruleMatches
      Returns:
      filtered ruleMatches
    • getMultitokenSpeller

      public MultitokenSpeller getMultitokenSpeller()
    • getPriorityMap

      public Map<String,Integer> getPriorityMap()
      Since:
      6.4