Package org.languagetool.rules.ngrams
Class ConfusionProbabilityRule
java.lang.Object
org.languagetool.rules.Rule
org.languagetool.rules.ngrams.ConfusionProbabilityRule
- Direct Known Subclasses:
ArabicConfusionProbabilityRule,ChineseConfusionProbabilityRule,ConfusionProbabilityRule.SpecificIdRule,DutchConfusionProbabilityRule,EnglishConfusionProbabilityRule,EnglishForL2SpeakersFalseFriendRule,FrenchConfusionProbabilityRule,GermanConfusionProbabilityRule,ItalianConfusionProbabilityRule,PortugueseConfusionProbabilityRule,RussianConfusionProbabilityRule,SpanishConfusionProbabilityRule
LanguageTool's homophone confusion check that uses ngram lookups
to decide which word in a confusion set (from
confusion_sets.txt) suits best.
Also see https://dev.languagetool.org/finding-errors-using-n-gram-data.- Since:
- 2.7
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class(package private) static class -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final List<DisambiguationPatternRule> private static final com.google.common.cache.LoadingCache<ConfusionProbabilityRule.PathAndLanguage, Map<String, List<ConfusionPair>>> private static final booleanprivate final intprivate final Languageprivate final LanguageModelstatic final floatprivate static final doubleprivate static final Patternstatic final StringDeprecated.private final Map<String, List<ConfusionPair>> -
Constructor Summary
ConstructorsConstructorDescriptionConfusionProbabilityRule(ResourceBundle messages, LanguageModel languageModel, Language language) ConfusionProbabilityRule(ResourceBundle messages, LanguageModel languageModel, Language language, int grams) ConfusionProbabilityRule(ResourceBundle messages, LanguageModel languageModel, Language language, int grams, List<String> exceptions) ConfusionProbabilityRule(ResourceBundle messages, LanguageModel languageModel, Language language, int grams, List<String> exceptions, List<List<PatternToken>> antiPatterns) -
Method Summary
Modifier and TypeMethodDescriptionprivate Stringprivate booleancovers(int exceptionStartPos, int exceptionEndPos, int startPos, int endPos) private voidintA number that estimates how many words there must be after a match before we can be (relatively) sure the match is valid.private ConfusionStringgetAlternativeTerm(List<ConfusionString> confusionSet, GoogleToken token) Overwrite this to avoid false alarms by ignoring these patterns - note that yourRule.match(AnalyzedSentence)method needs to callRule.getSentenceWithImmunization(org.languagetool.AnalyzedSentence)for this to be used and you need to checkAnalyzedTokenReadings.isImmunized()private ConfusionStringgetBetterAlternativeOrNull(GoogleToken token, List<GoogleToken> tokens, List<ConfusionString> confusionSet, long factor) private ConfusionStringgetBetterAlternativeOrNull(GoogleToken token, List<GoogleToken> tokens, ConfusionString otherWord, long factor) private ConfusionStringgetConfusionString(List<ConfusionString> confusionSet, GoogleToken token) A short description of the error this rule can detect, usually in the language of the text that is checked.private StringgetDescription(String word1, String word2) getId()A string used to identify the rule in e.g.protected StringgetMessage(ConfusionString textString, ConfusionString suggestion) intReturns the ngram level used, typically 3.getSuggestions(String message) protected booleanisCommonWord(String token) private booleanisCoveredByAntiPattern(AnalyzedSentence sentence, GoogleToken googleToken) protected booleanisException(String sentenceText, int startPos, int endPos) Return true to prevent a match.private booleanisLocalException(AnalyzedSentence sentence, GoogleToken googleToken) private booleanisRealWord(String token) match(AnalyzedSentence sentence) Check whether the given sentence matches this error rule, i.e.voidDeprecated.used only for testsMethods inherited from class org.languagetool.rules.Rule
addExamplePair, addTags, addToneTags, cacheAntiPatterns, getCategory, getCorrectExamples, getDistanceTokens, getErrorTriggeringExamples, getFullId, getIncorrectExamples, getLocQualityIssueType, getMinPrevMatches, getPriority, getRuleOptions, getSentenceWithImmunization, getSourceFile, getSubId, getTags, getToneTags, getUrl, hasTag, hasToneTag, isDefaultOff, isDefaultTempOff, isDictionaryBasedSpellingRule, isGoalSpecific, isIncludedInHiddenMatches, isOfficeDefaultOff, isOfficeDefaultOn, isPremium, makeAntiPatterns, setCategory, setCorrectExamples, setDefaultOff, setDefaultOn, setDefaultTempOff, setDistanceTokens, setErrorTriggeringExamples, setExamplePair, setGoalSpecific, setIncludedInHiddenMatches, setIncorrectExamples, setLocQualityIssueType, setMinPrevMatches, setOfficeDefaultOff, setOfficeDefaultOn, setPremium, setPriority, setTags, setToneTags, setUrl, supportsLanguage, toRuleMatchArray, useInOffice
-
Field Details
-
RULE_ID
Deprecated.not used anymore, the id is now more specific (likeCONFUSION_RULE_TERM1_TERM2)- Since:
- 3.1
- See Also:
-
MIN_COVERAGE
public static final float MIN_COVERAGE- See Also:
-
MIN_PROB
private static final double MIN_PROB- See Also:
-
DEBUG
private static final boolean DEBUG- See Also:
-
REAL_WORD
-
confSetCache
private static final com.google.common.cache.LoadingCache<ConfusionProbabilityRule.PathAndLanguage,Map<String, confSetCacheList<ConfusionPair>>> -
wordToPairs
-
lm
-
grams
private final int grams -
language
-
exceptions
-
antiPatterns
-
-
Constructor Details
-
ConfusionProbabilityRule
public ConfusionProbabilityRule(ResourceBundle messages, LanguageModel languageModel, Language language) -
ConfusionProbabilityRule
public ConfusionProbabilityRule(ResourceBundle messages, LanguageModel languageModel, Language language, int grams) -
ConfusionProbabilityRule
public ConfusionProbabilityRule(ResourceBundle messages, LanguageModel languageModel, Language language, int grams, List<String> exceptions) - Since:
- 4.7
-
ConfusionProbabilityRule
public ConfusionProbabilityRule(ResourceBundle messages, LanguageModel languageModel, Language language, int grams, List<String> exceptions, List<List<PatternToken>> antiPatterns)
-
-
Method Details
-
getFilenames
-
getId
Description copied from class:RuleA string used to identify the rule in e.g. configuration files. This string is supposed to be unique and to stay the same in all upcoming versions of LanguageTool. It's supposed to contain only the charactersA-Zand the underscore. -
estimateContextForSureMatch
public int estimateContextForSureMatch()Description copied from class:RuleA number that estimates how many words there must be after a match before we can be (relatively) sure the match is valid. This is useful for check-as-you-type, where a match might occur and the word that gets typed next makes the match disappear (something one would obviously like to avoid). Note: this may over-estimate the real context size. Returns-1when the sentence needs to end to be sure there's a match.- Overrides:
estimateContextForSureMatchin classRule
-
match
Description copied from class:RuleCheck whether the given sentence matches this error rule, i.e. whether it contains the error detected by this rule. Note that the order in which this method is called is not always guaranteed, i.e. the sentence order in the text may be different from the order in which you get the sentences (this may be the case when LanguageTool is used as a LibreOffice/OpenOffice add-on, for example). In other words, implementations must be stateless, so that a previous call to this method has no influence on later calls. -
isCommonWord
-
isCoveredByAntiPattern
-
cleanId
-
isRealWord
-
isLocalException
-
covers
private boolean covers(int exceptionStartPos, int exceptionEndPos, int startPos, int endPos) -
getSuggestions
-
isException
Return true to prevent a match. -
getDescription
Description copied from class:RuleA short description of the error this rule can detect, usually in the language of the text that is checked.- Specified by:
getDescriptionin classRule
-
getDescription
-
getMessage
-
setConfusionPair
Deprecated.used only for tests -
getNGrams
public int getNGrams()Returns the ngram level used, typically 3.- Since:
- 3.1
-
getBetterAlternativeOrNull
@Nullable private ConfusionString getBetterAlternativeOrNull(GoogleToken token, List<GoogleToken> tokens, List<ConfusionString> confusionSet, long factor) -
getAlternativeTerm
-
getConfusionString
-
getBetterAlternativeOrNull
private ConfusionString getBetterAlternativeOrNull(GoogleToken token, List<GoogleToken> tokens, ConfusionString otherWord, long factor) -
debug
-
getAntiPatterns
Description copied from class:RuleOverwrite this to avoid false alarms by ignoring these patterns - note that yourRule.match(AnalyzedSentence)method needs to callRule.getSentenceWithImmunization(org.languagetool.AnalyzedSentence)for this to be used and you need to checkAnalyzedTokenReadings.isImmunized()- Overrides:
getAntiPatternsin classRule
-
CONFUSION_RULE_TERM1_TERM2)