Package org.languagetool.tools
Class StringTools
java.lang.Object
org.languagetool.tools.StringTools
Tools for working with strings.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumConstants for printing XML rule matches. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final Patternprivate static final String[]private static final Patternprivate static final Patternprivate static final Pattern -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic StringAdds spaces before words that are not punctuation.static booleanstatic Stringstatic voidThrow exception if the given string is null or empty or only whitespace.private static StringchangeFirstCharCase(String str, boolean toUpperCase) Returnstrmodified so that its first character is now an lowercase or uppercase character, depending ontoUpperCase.static Stringstatic Stringstatic Stringstatic StringescapeHTML(String s) Escapes these characters: less than, greater than, quote, ampersand.static StringCallsescapeHTML(String).static StringSimple XML filtering for XML tags.getDifference(String s1, String s2) Difference between two strings (only one difference)static booleanhasDiacritics(String str) static booleanisAllUppercase(String str) Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists).static booleanisAllUppercase(List<String> strList) Returns true if the given list of string is made up of all-uppercase words.static booleanstatic booleanisCamelCase(String token) Whether the string is camelCase.static booleanisCapitalizedWord(String str) static booleanChecks whether a given String is an Emoji with a string length larger 1.static booleanHelper method to replace calls to"".equals().static booleanisMixedCase(String str) Returns true if the given string is mixed case, likeMixedCaseormixedCase(but notMixedcase).static booleanChecks if a string is the non-breaking whitespace ().static booleanisNotAllLowercase(String str) Returns true ifstris not made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).static booleanisNotWordCharacter(String input) Whether the string is a punctuation markstatic booleanisNotWordString(String input) static booleanstatic booleanisParagraphEnd(String sentence, boolean singleLineBreaksMarksPara) static booleanisPositiveNumber(char ch) static booleanisPunctuationMark(String input) Whether the string is a punctuation markstatic booleanisWhitespace(String str) Checks if a string contains a whitespace, including: all Unicode whitespace the non-breaking space (U+00A0) the narrow non-breaking space (U+202F) the zero width space (U+200B), used in KhmerDeprecated.use DataBroker#getFromResourceDirAsLines(java.lang.String) instead (NOTE: it won't handle comments)static StringlowercaseFirstChar(String str) Returnstrmodified so that its first character is now an lowercase character.static StringReturnstrif str is capitalizedisCapitalizedWord(String), otherwise return modifiedstrso that its first character is now a lowercase character.static Stringstatic StringnormalizeNFC(String str) static StringnormalizeNFKC(String str) static intstatic StringpreserveCase(String inputString, String modelString) Apply to inputString the casing of modelStringstatic StringreaderToString(Reader reader) static StringreadStream(InputStream stream, String encoding) Read the text stream using the given encoding.static StringremoveDiacritics(String str) static StringremoveTashkeel(String str) Returnstrwithout tashkeel charactersstatic String[]splitCamelCase(String input) static String[]splitDigitsAtEnd(String input) static booleanWhether the first character ofstris an uppercase character.static booleanWhether the first character ofstris an uppercase character.static StringstreamToString(InputStream is, String charsetName) static Stringstatic StringtitlecaseGlobal(String str) Title case a string ignoring a list of words.static StringWill turn a string into a typical rule ID, i.e.static Stringeliminate special (unicode) characters, e.g.static StringFilters any whitespace characters.static StringuppercaseFirstChar(String str) Returnstrmodified so that its first character is now an uppercase character.static StringuppercaseFirstChar(String str, Language language) LikeuppercaseFirstChar(String), but handles a special case for Dutch (IJ in e.g.
-
Field Details
-
NONCHAR
-
WORD_FOR_SPELLER
-
IS_NUMERIC
-
UPPERCASE_GREEK_LETTERS
-
LOWERCASE_GREEK_LETTERS
-
WHITESPACE_ARRAY
-
CHARS_NOT_FOR_SPELLING
-
XML_COMMENT_PATTERN
-
XML_PATTERN
-
PUNCTUATION_PATTERN
-
NOT_WORD_CHARACTER
-
NOT_WORD_STR
-
PATTERN
-
DIACRIT_MARKS
-
ENGLISH_TITLECASE_EXCEPTIONS
-
PORTUGUESE_TITLECASE_EXCEPTIONS
-
FRENCH_TITLECASE_EXCEPTIONS
-
SPANISH_TITLECASE_EXCEPTIONS
-
GERMAN_TITLECASE_EXCEPTIONS
-
DUTCH_TITLECASE_EXCEPTIONS
-
ALL_TITLECASE_EXCEPTIONS
-
-
Constructor Details
-
StringTools
private StringTools()
-
-
Method Details
-
assureSet
Throw exception if the given string is null or empty or only whitespace. -
readStream
Read the text stream using the given encoding.- Parameters:
stream- InputStream the stream to be readencoding- the stream's character encoding, e.g.utf-8, ornullto use the system encoding- Returns:
- a string with the stream's content, lines separated by
\n(note that\nwill be added to the last line even if it is not in the stream) - Throws:
IOException- Since:
- 2.3
-
isAllUppercase
Returns true if the given string is made up of all-uppercase characters (ignoring characters for which no upper-/lowercase distinction exists). -
isAllUppercase
Returns true if the given list of string is made up of all-uppercase words. If the list contains only numbers or punctuation marks it is not considered all-uppercase -
isMixedCase
Returns true if the given string is mixed case, likeMixedCaseormixedCase(but notMixedcase).- Parameters:
str- input str
-
isNotAllLowercase
Returns true ifstris not made up of all-lowercase characters (ignoring characters for which no upper-/lowercase distinction exists).- Since:
- 2.5
-
isCapitalizedWord
- Parameters:
str- input string- Returns:
- true if word starts with an uppercase letter and all other letters are lowercase
-
startsWithUppercase
Whether the first character ofstris an uppercase character. -
startsWithLowercase
Whether the first character ofstris an uppercase character.- Since:
- 4.9
-
allStartWithLowercase
-
uppercaseFirstChar
Returnstrmodified so that its first character is now an uppercase character. Ifstrstarts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character. -
uppercaseFirstChar
@Contract("!null, _ -> !null") @Nullable public static String uppercaseFirstChar(@Nullable String str, Language language) LikeuppercaseFirstChar(String), but handles a special case for Dutch (IJ in e.g. "ijsselmeer" -> "IJsselmeer").- Parameters:
language- the language, will be ignored if it'snull- Since:
- 2.7
-
collectAllTitleCaseExceptions
-
titlecaseGlobal
Title case a string ignoring a list of words. These words are ignored due to titlecasing conventions in the most frequent languages. Differs fromconvertToTitleCaseIteratingChars(String)in that it is less aggressive, i.e., we do not force titlecase in all caps words (e.g. IDEA does not become Idea). This method behaves the same regardless of the language, and is rather aggressive in its ignoring of words. We can, possibly, in the future, have language-specific titlecasing conventions. -
lowercaseFirstChar
Returnstrmodified so that its first character is now an lowercase character. Ifstrstarts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character. -
lowercaseFirstCharIfCapitalized
@Contract("!null, -> !null") @Nullable public static String lowercaseFirstCharIfCapitalized(@Nullable String str) Returnstrif str is capitalizedisCapitalizedWord(String), otherwise return modifiedstrso that its first character is now a lowercase character. -
changeFirstCharCase
@Contract("!null, _ -> !null") @Nullable private static String changeFirstCharCase(@Nullable String str, boolean toUpperCase) Returnstrmodified so that its first character is now an lowercase or uppercase character, depending ontoUpperCase. Ifstrstarts with non-alphabetic characters, such as quotes or parentheses, the first character is determined as the first alphabetic character. -
readerToString
- Throws:
IOException
-
streamToString
- Throws:
IOException
-
escapeXML
CallsescapeHTML(String). -
escapeForXmlAttribute
- Since:
- 2.9
-
escapeForXmlContent
- Since:
- 2.9
-
escapeHTML
Escapes these characters: less than, greater than, quote, ampersand. -
trimWhitespace
Filters any whitespace characters. Useful for trimming the contents of token elements that cannot possibly contain any spaces, with the exception for a single space in a word (for example, if the language supports numbers formatted with spaces as single tokens, as Catalan in LanguageTool).- Parameters:
s- String to be filtered.- Returns:
- Filtered s.
-
trimSpecialCharacters
eliminate special (unicode) characters, e.g. soft hyphens- Parameters:
s- String to filter- Returns:
- s, with non-(alphanumeric, punctuation, space) characters deleted
- Since:
- 4.3
-
addSpace
Adds spaces before words that are not punctuation.- Parameters:
word- Word to add the preceding space.language- Language of the word (to check typography conventions). Currently French convention of not adding spaces only before '.' and ',' is implemented; other languages assume that before ,.;:!? no spaces should be added.- Returns:
- String containing a space or an empty string.
-
isWhitespace
Checks if a string contains a whitespace, including:- all Unicode whitespace
- the non-breaking space (U+00A0)
- the narrow non-breaking space (U+202F)
- the zero width space (U+200B), used in Khmer
- Parameters:
str- String to check- Returns:
- true if the string is a whitespace character
-
isNonBreakingWhitespace
Checks if a string is the non-breaking whitespace ().- Since:
- 2.1
-
isPositiveNumber
public static boolean isPositiveNumber(char ch) - Parameters:
ch- Character to check- Returns:
- True if the character is a positive number (decimal digit from 1 to 9).
-
isEmpty
Helper method to replace calls to"".equals().- Parameters:
str- String to check- Returns:
- true if string is empty or
null
-
filterXML
Simple XML filtering for XML tags.- Parameters:
str- XML string to be filtered.- Returns:
- Filtered string without XML tags.
-
hasDiacritics
-
removeDiacritics
-
normalizeNFKC
-
normalizeNFC
-
preserveCase
Apply to inputString the casing of modelString- Parameters:
inputString- , modelString- Returns:
- string
-
asString
-
isParagraphEnd
- Since:
- 4.3
-
loadLines
Deprecated.use DataBroker#getFromResourceDirAsLines(java.lang.String) instead (NOTE: it won't handle comments)Loads file, ignoring comments (lines starting with#).- Parameters:
path- path in resource dir- Since:
- 4.6
-
toId
Will turn a string into a typical rule ID, i.e. uppercase and "_" instead of spaces. All non-ASCII characters are replaced with "_", EXCEPT for Latin-1 ranges U+00C0-U+00D6 and U+00D8-U+00DE. "de" locales have a special implementation (ä => ae, etc.).- Parameters:
language- LT language object, used to apply language-specific normalisation rules.- Since:
- 5.1
-
isCamelCase
Whether the string is camelCase. Works only with ASCII input and with single words.- Since:
- 5.3
-
isPunctuationMark
Whether the string is a punctuation mark- Since:
- 5.5
-
isNotWordCharacter
Whether the string is a punctuation mark- Since:
- 6.1
-
getDifference
Difference between two strings (only one difference)- Returns:
- List of strings: 0: common string at the start; 1: diff in string1; 2: diff in string2; 3: common string at the end
- Since:
- 6.2
-
makeWrong
-
removeTashkeel
Returnstrwithout tashkeel characters- Parameters:
str- input str
-
isNotWordString
-
numberOf
-
convertToTitleCaseIteratingChars
-
isEmoji
Checks whether a given String is an Emoji with a string length larger 1.- Parameters:
word- to be checked- Since:
- 6.4
-
stringForSpeller
-
splitCamelCase
-
splitDigitsAtEnd
-
isAnagram
-
isNumeric
-