Class MultiWordChunker2
java.lang.Object
org.languagetool.tagging.disambiguation.AbstractDisambiguator
org.languagetool.tagging.disambiguation.MultiWordChunker2
- All Implemented Interfaces:
Disambiguator
Multiword tagger-chunker.
Note: currently does not support:
- overlapping tagging (first matching multiword entry wins)
- Author:
- Andriy Rysin
-
Constructor Summary
ConstructorsConstructorDescriptionMultiWordChunker2(String filename) MultiWordChunker2(String filename, boolean allowFirstCapitalized) -
Method Summary
Modifier and TypeMethodDescriptiondisambiguate(AnalyzedSentence input) Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...)protected StringformatPosTag(String posTag, int position, int multiwordLength) Override this method if you want format POS tag differentlyprotected booleanmatches(String matchText, AnalyzedTokenReadings inputTokens) protected AnalyzedTokenReadingsprepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag) voidsetRemoveOtherReadings(boolean removeOtherReadings) voidsetWrapTag(boolean wrapTag) Methods inherited from class org.languagetool.tagging.disambiguation.AbstractDisambiguator
preDisambiguate
-
Constructor Details
-
MultiWordChunker2
- Parameters:
filename- file text with multiwords and tags
-
MultiWordChunker2
- Parameters:
filename- file text with multiwords and tagsallowFirstCapitalized- if set totrue, first word of the multiword can be capitalized
-
-
Method Details
-
setRemoveOtherReadings
public void setRemoveOtherReadings(boolean removeOtherReadings) - Parameters:
removeOtherReadings- If true and multiword matches other readings will be removed
-
setWrapTag
public void setWrapTag(boolean wrapTag) - Parameters:
wrapTag- If true the tag will be wrapped with < and >
-
formatPosTag
Override this method if you want format POS tag differently- Parameters:
posTag- POS tag for the multiwordposition- Position of the token in the multiword- Returns:
- Returns formatted POS tag for the multiword
-
disambiguate
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.- Parameters:
input- The tokens to be chunked.- Returns:
- AnalyzedSentence with additional markers.
-
matches
-
prepareNewReading
protected AnalyzedTokenReadings prepareNewReading(String tokens, String tok, AnalyzedTokenReadings token, String tag)
-