Class CompoundAwareHunspellRule
java.lang.Object
org.languagetool.rules.Rule
org.languagetool.rules.spelling.SpellingCheckRule
org.languagetool.rules.spelling.hunspell.HunspellRule
org.languagetool.rules.spelling.hunspell.CompoundAwareHunspellRule
- Direct Known Subclasses:
GermanSpellerRule
A spell checker that combines Hunspell and Morfologik spell checking
to support compound words and offer fast suggestions for some misspelled
compound words.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final CompoundWordTokenizerprivate static final intprivate final Supplier<MorfologikMultiSpeller> Fields inherited from class org.languagetool.rules.spelling.hunspell.HunspellRule
FILE_EXTENSION, hunspell, nonWordPattern, RULE_IDFields inherited from class org.languagetool.rules.spelling.SpellingCheckRule
CUSTOM_SPELLING_FILE, GLOBAL_SPELLING_FILE, HIGH_CONFIDENCE, ignoreWordsWithLength, language, languageModel, LANGUAGETOOL, LANGUAGETOOLER, MAX_TOKEN_LENGTH, wordListLoader, wordsToBeIgnored -
Constructor Summary
ConstructorsConstructorDescriptionCompoundAwareHunspellRule(ResourceBundle messages, Language language, CompoundWordTokenizer compoundSplitter, Supplier<MorfologikMultiSpeller> morfoSpeller, UserConfig userConfig, List<Language> altLanguages, LanguageModel languageModel) CompoundAwareHunspellRule(ResourceBundle messages, Language language, CompoundWordTokenizer compoundSplitter, MorfologikMultiSpeller morfoSpeller, UserConfig userConfig) CompoundAwareHunspellRule(ResourceBundle messages, Language language, CompoundWordTokenizer compoundSplitter, MorfologikMultiSpeller morfoSpeller, UserConfig userConfig, List<Language> altLanguages) CompoundAwareHunspellRule(ResourceBundle messages, Language language, CompoundWordTokenizer compoundSplitter, MorfologikMultiSpeller morfoSpeller, UserConfig userConfig, List<Language> altLanguages, LanguageModel languageModel) -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract voidfilterForLanguage(List<String> suggestions) getCandidates(String word) Find potential corrections - it's okay if some of these are not valid words, this list will be filtered against the spellchecker before being returned to the user.getCandidates(List<String> parts) getCorrectWords(List<String> wordsOrPhrases) getFilteredSuggestions(List<String> wordsOrPhrases) getSpellingFilePaths(String langCode) protected static List<InputStream> getStreams(List<String> paths) getSuggestions(String word) As a hunspell-based approach is too slow, we use Morfologik to create suggestions.private static voidhandleWordEndPunctuation(String punct, String word, List<String> noSplitSuggestions, MorfologikMultiSpeller morfoSpeller) sortSuggestionByQuality(String misspelling, List<String> suggestions) Methods inherited from class org.languagetool.rules.spelling.hunspell.HunspellRule
acceptSuggestion, ensureInitialized, getActiveChecks, getDescription, getDictFilenameInResources, getId, getMessage, getSentenceTextWithoutUrlsAndImmunizedTokens, getShortMessage, init, isFirstItemHighConfidenceSuggestion, isInIgnoredSet, isMisspelled, isQuotedCompound, match, tokenizeTextMethods inherited from class org.languagetool.rules.spelling.SpellingCheckRule
acceptPhrases, addIgnoreTokens, addIgnoreWords, addProhibitedWords, addSuggestionsToRuleMatch, createWrongSplitMatch, expandLine, filterDupes, filterNoSuggestWords, filterSuggestions, getAdditionalProhibitFileNames, getAdditionalSpellingFileNames, getAdditionalSuggestions, getAdditionalTopSuggestions, getAntiPatterns, getIgnoreFileName, getLanguageVariantSpellingFileName, getOnlySuggestions, getProhibitFileName, getSpellingFileName, ignorePotentiallyMisspelledWord, ignoreToken, ignoreWord, ignoreWord, isDictionaryBasedSpellingRule, isEMail, isIgnoredNoCase, isLatinScript, isProhibited, isUrl, setConsiderIgnoreWords, setConvertsCase, startsWithIgnoredWord, tokenizeNewWordsMethods inherited from class org.languagetool.rules.Rule
addExamplePair, addTags, addToneTags, cacheAntiPatterns, estimateContextForSureMatch, getCategory, getCorrectExamples, getDistanceTokens, getErrorTriggeringExamples, getFullId, getIncorrectExamples, getLocQualityIssueType, getMinPrevMatches, getPriority, getRuleOptions, getSentenceWithImmunization, getSourceFile, getSubId, getTags, getToneTags, getUrl, hasTag, hasToneTag, isDefaultOff, isDefaultTempOff, isGoalSpecific, isIncludedInHiddenMatches, isOfficeDefaultOff, isOfficeDefaultOn, isPremium, makeAntiPatterns, setCategory, setCorrectExamples, setDefaultOff, setDefaultOn, setDefaultTempOff, setDistanceTokens, setErrorTriggeringExamples, setExamplePair, setGoalSpecific, setIncludedInHiddenMatches, setIncorrectExamples, setLocQualityIssueType, setMinPrevMatches, setOfficeDefaultOff, setOfficeDefaultOn, setPremium, setPriority, setTags, setToneTags, setUrl, supportsLanguage, toRuleMatchArray, useInOffice
-
Field Details
-
MAX_SUGGESTIONS
private static final int MAX_SUGGESTIONS- See Also:
-
compoundSplitter
-
morfoSpeller
-
-
Constructor Details
-
CompoundAwareHunspellRule
public CompoundAwareHunspellRule(ResourceBundle messages, Language language, CompoundWordTokenizer compoundSplitter, MorfologikMultiSpeller morfoSpeller, UserConfig userConfig) -
CompoundAwareHunspellRule
public CompoundAwareHunspellRule(ResourceBundle messages, Language language, CompoundWordTokenizer compoundSplitter, MorfologikMultiSpeller morfoSpeller, UserConfig userConfig, List<Language> altLanguages) - Since:
- 4.3
-
CompoundAwareHunspellRule
public CompoundAwareHunspellRule(ResourceBundle messages, Language language, CompoundWordTokenizer compoundSplitter, MorfologikMultiSpeller morfoSpeller, UserConfig userConfig, List<Language> altLanguages, LanguageModel languageModel) -
CompoundAwareHunspellRule
public CompoundAwareHunspellRule(ResourceBundle messages, Language language, CompoundWordTokenizer compoundSplitter, Supplier<MorfologikMultiSpeller> morfoSpeller, UserConfig userConfig, List<Language> altLanguages, LanguageModel languageModel) - Since:
- 6.4
-
-
Method Details
-
filterForLanguage
-
getSpellingFilePaths
-
getStreams
-
getSuggestions
As a hunspell-based approach is too slow, we use Morfologik to create suggestions. As this won't work for compounds not in the dictionary, we split the word and also get suggestions on the compound parts. In the end, all candidates are filtered against Hunspell again (which supports compounds).- Overrides:
getSuggestionsin classHunspellRule- Throws:
IOException
-
handleWordEndPunctuation
private static void handleWordEndPunctuation(String punct, String word, List<String> noSplitSuggestions, MorfologikMultiSpeller morfoSpeller) -
getCandidates
Find potential corrections - it's okay if some of these are not valid words, this list will be filtered against the spellchecker before being returned to the user. -
getCandidates
-
sortSuggestionByQuality
- Overrides:
sortSuggestionByQualityin classHunspellRule
-
getCorrectWords
-
getFilteredSuggestions
- Since:
- 4.7
-