Class MultiWordChunker
java.lang.Object
org.languagetool.tagging.disambiguation.AbstractDisambiguator
org.languagetool.tagging.disambiguation.MultiWordChunker
- All Implemented Interfaces:
Disambiguator
Multiword tagger-chunker.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate booleanprivate static final Map<MultiWordChunker.Settings, MultiWordChunker> private static final Stringprivate static final Patternprivate booleanprivate booleanprivate static final intprivate Map<String, AnalyzedToken> private Map<String, AnalyzedToken> private Stringprivate final MultiWordChunker.Settingsstatic String -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondisambiguate(AnalyzedSentence input) If possible, filters out the wrong POS tags.final AnalyzedSentencedisambiguate(AnalyzedSentence input, JLanguageTool.CheckCancelledCallback checkCanceled) Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.private voidfillMaps(Map<String, Integer> mStartSpace, Map<String, Integer> mStartNoSpace, Map<String, AnalyzedToken> mFullSpace, Map<String, AnalyzedToken> mFullNoSpace) static MultiWordChunkergetInstance(String filename) static MultiWordChunkergetInstance(String filename, boolean allowFirstCapitalized, boolean allowAllUppercase, boolean allowTitlecase) static MultiWordChunkergetInstance(String filename, boolean allowFirstCapitalized, boolean allowAllUppercase, boolean allowTitlecase, String defaultTag) private AnalyzedTokengetMultiWordAnalyzedToken(AnalyzedTokenReadings[] aTokens, Integer i) private StringgetNextPosTag(String postag) getTokenLettercaseVariants(String originalToken, Map<String, AnalyzedToken> tokenMap) private booleanisLowPriorityTag(String tag) private voidlazyInit()loadWords(InputStream stream) private AnalyzedTokenReadingsprepareNewReading(AnalyzedToken at, String token, AnalyzedTokenReadings atrs, boolean isLast) private AnalyzedTokenReadings[]removePreviousTags(AnalyzedTokenReadings[] aTokens) private AnalyzedTokenReadingssetAndAnnotate(AnalyzedTokenReadings oldReading, AnalyzedToken newReading) voidsetIgnoreSpelling(boolean ignoreSpelling) voidsetRemovePreviousTags(boolean removePreviousTags) Methods inherited from class org.languagetool.tagging.disambiguation.AbstractDisambiguator
preDisambiguate
-
Field Details
-
chunkerCache
-
settings
-
initialized
private volatile boolean initialized -
mStartSpace
-
mStartNoSpace
-
mFullSpace
-
mFullNoSpace
-
MAX_TOKENS_IN_MULTIWORD
private static final int MAX_TOKENS_IN_MULTIWORD- See Also:
-
DEFAULT_SEPARATOR
- See Also:
-
separator
-
addIgnoreSpelling
private boolean addIgnoreSpelling -
isRemovePreviousTags
private boolean isRemovePreviousTags -
tagForNotAddingTags
-
GermanLineExpander
-
-
Constructor Details
-
MultiWordChunker
-
-
Method Details
-
lazyInit
private void lazyInit() -
fillMaps
-
getTokenLettercaseVariants
-
disambiguate
Description copied from interface:DisambiguatorIf possible, filters out the wrong POS tags.- Parameters:
input- The sentence with already tagged words. The words are expected to have multiple tags.- Returns:
- Analyzed sentence, where each word has only one (possibly the most correct) tag.
- Throws:
IOException
-
disambiguate
public final AnalyzedSentence disambiguate(AnalyzedSentence input, @Nullable JLanguageTool.CheckCancelledCallback checkCanceled) throws IOException Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.- Parameters:
input- The tokens to be chunked.- Returns:
- AnalyzedSentence with additional markers.
- Throws:
IOException
-
prepareNewReading
private AnalyzedTokenReadings prepareNewReading(AnalyzedToken at, String token, AnalyzedTokenReadings atrs, boolean isLast) -
setAndAnnotate
private AnalyzedTokenReadings setAndAnnotate(AnalyzedTokenReadings oldReading, AnalyzedToken newReading) -
loadWords
-
setIgnoreSpelling
public void setIgnoreSpelling(boolean ignoreSpelling) -
setRemovePreviousTags
public void setRemovePreviousTags(boolean removePreviousTags) -
removePreviousTags
-
getMultiWordAnalyzedToken
-
getNextPosTag
-
isLowPriorityTag
-
getInstance
- Parameters:
filename- file text with multiwords and tags
-
getInstance
@NotNull public static MultiWordChunker getInstance(@NotNull String filename, boolean allowFirstCapitalized, boolean allowAllUppercase, boolean allowTitlecase) -
getInstance
@NotNull public static MultiWordChunker getInstance(@NotNull String filename, boolean allowFirstCapitalized, boolean allowAllUppercase, boolean allowTitlecase, @Nullable String defaultTag)
-