Package org.languagetool.rules.patterns
Class PatternToken
java.lang.Object
org.languagetool.rules.patterns.PatternToken
- All Implemented Interfaces:
Cloneable
A part of a pattern, represents the 'token' element of the
grammar.xml.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classprivate static classFields that are null in most instances of PatternToken -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final PatternToken[]private static final intTrue if scope=="next".private shortprivate static final intprivate static final intprivate static final intSet to true on tokens that close the unification block.private byteprivate static final intprivate static final intprivate PatternToken.PosTokenprivate PatternToken.RareFieldsprivate byteprivate static final intThis var is used to determine if callingsetStringElement(java.lang.String)makes sense.private static final intprivate StringMatcherprivate static final intprivate static final intDetermines whether the element should be ignored when doing unificationstatic final StringMatches only tokens without any POS tag.private static final int -
Constructor Summary
ConstructorsConstructorDescriptionPatternToken(boolean inflected, StringMatcher textMatcher) PatternToken(String token, boolean caseSensitive, boolean regExp, boolean inflected) Creates Element that is used to match tokens in the text. -
Method Summary
Modifier and TypeMethodDescription(package private) voidaddException(boolean scopeNext, boolean scopePrevious, PatternToken exception) calcStringHints(boolean inflected) clone()compile(AnalyzedTokenReadings token, Synthesizer synth) Prepare PatternToken for matching by formatting its string token and POS (if the Element is supposed to refer to some other token).private voiddoCompile(AnalyzedTokenReadings token, Synthesizer synth) Returns the group of elements linked with AND operator.getMatch()intThe maximum number of times the element may occur.intThe minimum number of times the element needs to occur.booleanReturns the group of elements linked with OR operator.Gets the phrase the element is in.booleanintGets the exception scope length.private StringgetTestToken(AnalyzedToken token) Get unification features and types.booleanChecks if this element has an AND group associated with it.booleanboolean(package private) booleanhasFlag(int mask) booleanChecks if the element has an exception for a next scope.booleanChecks if this element has an OR group associated with it.booleanChecks if the element has an exception for a previous token.(package private) booleanprivate PatternToken.RareFieldsbooleanEnables testing multiple conditions specified by multiple element exceptions.booleanWhether the element matches case sensitively.booleanisExceptionMatched(AnalyzedToken token) Checks whether an exception matches.booleanThis method checks exceptions both in AND-group and the token.booleanbooleanbooleanbooleanisMatched(AnalyzedToken token) Checks whether the rule element matches the token given as a parameter.booleanChecks whether an exception for a previous token matches (in case the exception had scope == "previous").booleanChecks whether an exception for a previous token matches all readings of a given token (in case the exception had scope == "previous").booleanChecks whether a previously set exception matches (in case the exception had scope == "next").booleanChecks if the Element is in any phrase.booleanTests whether the POS matches a regular expression.private booleanisPosTokenMatched(AnalyzedToken token) Tests if part of speech matches a given string.booleanbooleanTests whether the element matches a regular expression.booleanChecks if the token is a sentence start.booleanDetermines whether the element should be silently ignored during unification, and simply added.booleanbooleanbooleanisWhitespaceBefore(AnalyzedToken token) (package private) static StringnormalizeTextPattern(String token) voidsetAndGroupElement(PatternToken andToken) voidsetChunkTag(ChunkTag chunkTag) voidsetExceptionSpaceBefore(boolean isWhite) Sets the attribute on the exception that determines matching of patterns that depends on whether there was a space before the token matching the exception or not.private voidsetFlag(int mask, boolean value) voidsetInsideMarker(boolean isInsideMarker) voidvoidSets the reference to another token.voidsetMaxOccurrence(int i) The maximum number of times this element may occur.voidsetMinOccurrence(int i) The minimum number of times this element may occur.voidsetNegation(boolean negation) Negates the matching so that non-matching elements match and vice-versa.voidsetOrGroupElement(PatternToken orToken) voidsetPhraseName(String id) Sets the phrase the element is in.voidsetPosToken(PatternToken.PosToken posToken) voidsetSkipNext(int i) voidsetStringElement(String token) voidsetStringPosException(String token, boolean regExp, boolean inflected, boolean negation, boolean scopeNext, boolean scopePrevious, String posToken, boolean posRegExp, boolean posNegation, Boolean caseSensitivity) Sets a string and/or pos exception for matching tokens.(package private) voidsetTextMatcher(StringMatcher matcher) voidsetUnification(Map<String, List<String>> uniFeatures) voidSets the element as ignored during unification.voidvoidsetWhitespaceBefore(boolean isWhite) toString()
-
Field Details
-
UNKNOWN_TAG
Matches only tokens without any POS tag.- See Also:
-
EMPTY_ARRAY
-
INFLECTED_MASK
private static final int INFLECTED_MASK- See Also:
-
NEGATION_MASK
private static final int NEGATION_MASK- See Also:
-
TEST_WHITESPACE_MASK
private static final int TEST_WHITESPACE_MASK- See Also:
-
WHITESPACE_BEFORE_MASK
private static final int WHITESPACE_BEFORE_MASK- See Also:
-
INSIDE_MARKER_MASK
private static final int INSIDE_MARKER_MASK- See Also:
-
EXCEPTION_VALID_NEXT_MASK
private static final int EXCEPTION_VALID_NEXT_MASKTrue if scope=="next".- See Also:
-
MAY_BE_OMITTED_MASK
private static final int MAY_BE_OMITTED_MASK- See Also:
-
TEST_STRING_MASK
private static final int TEST_STRING_MASKThis var is used to determine if callingsetStringElement(java.lang.String)makes sense. This method takes most time so it's best to reduce the number of its calls.- See Also:
-
UNIFICATION_NEUTRAL_MASK
private static final int UNIFICATION_NEUTRAL_MASKDetermines whether the element should be ignored when doing unification- See Also:
-
UNI_NEGATION_MASK
private static final int UNI_NEGATION_MASK- See Also:
-
LAST_UNIFIED_MASK
private static final int LAST_UNIFIED_MASKSet to true on tokens that close the unification block.- See Also:
-
flags
private short flags -
textMatcher
-
posToken
-
rareFields
-
skip
private byte skip -
maxOccurrence
private byte maxOccurrence
-
-
Constructor Details
-
PatternToken
Creates Element that is used to match tokens in the text.- Parameters:
token- String to be matchedcaseSensitive- true if the check is case-sensitiveregExp- true if the check uses regular expressionsinflected- true if the check refers to base forms (lemmas), note thattokenmust be a base form for this to work
-
PatternToken
PatternToken(boolean inflected, @NotNull StringMatcher textMatcher)
-
-
Method Details
-
clone
- Overrides:
clonein classObject- Throws:
CloneNotSupportedException
-
isMatched
Checks whether the rule element matches the token given as a parameter.- Parameters:
token- AnalyzedToken to check matching against- Returns:
- True if token matches, false otherwise.
-
hasFlag
boolean hasFlag(int mask) -
setFlag
private void setFlag(int mask, boolean value) -
isExceptionMatched
Checks whether an exception matches.- Parameters:
token- AnalyzedToken to check matching against- Returns:
- True if any of the exceptions matches (logical disjunction).
-
isAndExceptionGroupMatched
Enables testing multiple conditions specified by multiple element exceptions. Works as logical AND operator.- Parameters:
token- the token checked for exceptions.- Returns:
- true if all conditions are met, false otherwise.
-
isExceptionMatchedCompletely
This method checks exceptions both in AND-group and the token. Introduced to for clarity.- Parameters:
token- Token to match- Returns:
- True if matched.
-
setAndGroupElement
-
hasAndGroup
public boolean hasAndGroup()Checks if this element has an AND group associated with it.- Returns:
- true if the element has a group of elements that all should match.
-
getAndGroup
Returns the group of elements linked with AND operator. -
setOrGroupElement
- Since:
- 2.3
-
hasOrGroup
public boolean hasOrGroup()Checks if this element has an OR group associated with it.- Returns:
- true if the element has a group of elements that all should match.
- Since:
- 2.3
-
getOrGroup
Returns the group of elements linked with OR operator.- Since:
- 2.3
-
isMatchedByScopeNextException
Checks whether a previously set exception matches (in case the exception had scope == "next").- Parameters:
token-AnalyzedTokento check matching against.- Returns:
- True if any of the exceptions matches.
-
isMatchedByPreviousException
Checks whether an exception for a previous token matches (in case the exception had scope == "previous").- Parameters:
token-AnalyzedTokento check matching against.- Returns:
- True if any of the exceptions matches.
-
isMatchedByPreviousException
Checks whether an exception for a previous token matches all readings of a given token (in case the exception had scope == "previous").- Parameters:
prevToken-AnalyzedTokenReadingsto check matching against.- Returns:
- true if any of the exceptions matches.
-
isSentenceStart
public boolean isSentenceStart()Checks if the token is a sentence start.- Returns:
- True if the element starts the sentence and the element hasn't been set to have negated POS token.
-
setPosToken
- Since:
- 2.9
-
setChunkTag
- Since:
- 2.9
-
getString
-
setStringElement
-
setTextMatcher
-
normalizeTextPattern
-
setStringPosException
public void setStringPosException(String token, boolean regExp, boolean inflected, boolean negation, boolean scopeNext, boolean scopePrevious, String posToken, boolean posRegExp, boolean posNegation, Boolean caseSensitivity) Sets a string and/or pos exception for matching tokens.- Parameters:
token- The string in the exception.regExp- True if the string is specified as a regular expression.inflected- True if the string is a base form (lemma).negation- True if the exception is negated.scopeNext- True if the exception scope is next tokens.scopePrevious- True if the exception should match only a single previous token.posToken- The part of the speech tag in the exception.posRegExp- True if the POS is specified as a regular expression.posNegation- True if the POS exception is negated.caseSensitivity- if null, use this element's setting for case sensitivity, otherwise the specified value- Since:
- 2.9
-
addException
-
initRareFields
-
isPosTokenMatched
Tests if part of speech matches a given string. Special value UNKNOWN_TAG matches null POS tags.- Parameters:
token- Token to test.- Returns:
- true if matches
-
getTestToken
-
getSkipNext
public int getSkipNext()Gets the exception scope length.- Returns:
- scope length in tokens
-
getMinOccurrence
public int getMinOccurrence()The minimum number of times the element needs to occur. -
getMaxOccurrence
public int getMaxOccurrence()The maximum number of times the element may occur. -
setSkipNext
public void setSkipNext(int i) - Parameters:
i- exception scope length.
-
setMinOccurrence
public void setMinOccurrence(int i) The minimum number of times this element may occur.- Parameters:
i- currently only0and1are supported
-
setMaxOccurrence
public void setMaxOccurrence(int i) The maximum number of times this element may occur.- Parameters:
i- a number >= 1 or-1for unlimited occurrences
-
hasPreviousException
public boolean hasPreviousException()Checks if the element has an exception for a previous token.- Returns:
- True if the element has a previous token matching exception.
-
hasNextException
public boolean hasNextException()Checks if the element has an exception for a next scope. (only used for testing)- Returns:
- True if the element has exception for the next scope.
-
setNegation
public void setNegation(boolean negation) Negates the matching so that non-matching elements match and vice-versa. -
getNegation
public boolean getNegation()- Since:
- 0.9.3
-
isReferenceElement
public boolean isReferenceElement()- Returns:
- true when this element refers to another token.
-
setMatch
Sets the reference to another token.- Parameters:
match- Formatting object for the token reference.
-
getMatch
-
compile
Prepare PatternToken for matching by formatting its string token and POS (if the Element is supposed to refer to some other token).- Parameters:
token- the token specified asAnalyzedTokenReadingssynth- the language synthesizer (Synthesizer)- Throws:
IOException
-
doCompile
- Throws:
IOException
-
setPhraseName
Sets the phrase the element is in.- Parameters:
id- ID of the phrase.
-
isPartOfPhrase
public boolean isPartOfPhrase()Checks if the Element is in any phrase.- Returns:
- True if the Element is contained in the phrase.
-
isCaseSensitive
public boolean isCaseSensitive()Whether the element matches case sensitively.- Since:
- 2.3
-
isRegularExpression
public boolean isRegularExpression()Tests whether the element matches a regular expression.- Since:
- 0.9.6
-
isPOStagRegularExpression
public boolean isPOStagRegularExpression()Tests whether the POS matches a regular expression.- Since:
- 1.3.0
-
getPOStag
- Returns:
- the POS of the Element or
null - Since:
- 0.9.6
-
getChunkTag
- Returns:
- the chunk tag of the Element or
null - Since:
- 2.3
-
getPOSNegation
public boolean getPOSNegation()- Returns:
- true if the POS is negated.
-
isInflected
public boolean isInflected()- Returns:
- true if the token matches all inflected forms
-
getPhraseName
Gets the phrase the element is in.- Returns:
- String The name of the phrase.
-
isUnified
public boolean isUnified() -
setUnification
-
getUniFeatures
Get unification features and types.- Returns:
- A map from features to a list of types or
null - Since:
- 1.0.1
-
setUniNegation
public void setUniNegation() -
isUniNegated
public boolean isUniNegated() -
isLastInUnification
public boolean isLastInUnification() -
setLastInUnification
public void setLastInUnification() -
isUnificationNeutral
public boolean isUnificationNeutral()Determines whether the element should be silently ignored during unification, and simply added.- Returns:
- True when the element is not included in unifying.
- Since:
- 2.5
-
setUnificationNeutral
public void setUnificationNeutral()Sets the element as ignored during unification.- Since:
- 2.5
-
setWhitespaceBefore
public void setWhitespaceBefore(boolean isWhite) -
isInsideMarker
public boolean isInsideMarker() -
setInsideMarker
public void setInsideMarker(boolean isInsideMarker) -
setExceptionSpaceBefore
public void setExceptionSpaceBefore(boolean isWhite) Sets the attribute on the exception that determines matching of patterns that depends on whether there was a space before the token matching the exception or not. The same procedure is used for tokens that are valid for previous or current tokens.- Parameters:
isWhite- If true, the space before exception is required.
-
isWhitespaceBefore
-
getExceptionList
- Returns:
- A List of Exceptions. Used for testing.
- Since:
- 1.0.0
-
hasCurrentOrNextExceptions
@Internal public boolean hasCurrentOrNextExceptions() -
hasExceptionList
public boolean hasExceptionList() -
calcFormHints
- Returns:
- all possible forms that this token pattern can accept, or
nullif such set is unknown/unbounded. This is used internally for performance optimizations.
-
calcLemmaHints
- Returns:
- all possible forms that this token pattern can accept, or
nullif such set is unknown/unbounded. This is used internally for performance optimizations.
-
calcStringHints
-
calcOwnPossibleStringValues
-
hasStringThatMustMatch
boolean hasStringThatMustMatch() -
toString
-