Class NgramExtractor

java.lang.Object
com.optimaize.langdetect.ngram.NgramExtractor

public class NgramExtractor extends Object
Class for extracting n-grams out of a text.
Author:
Fabian Kessler
  • Method Details

    • gramLength

      public static NgramExtractor gramLength(int gramLength)
    • gramLengths

      public static NgramExtractor gramLengths(Integer... gramLength)
    • filter

      public NgramExtractor filter(NgramFilter filter)
    • textPadding

      public NgramExtractor textPadding(char textPadding)
      To ensure having border grams, this character is added to the left and right of the text.

      Example: when textPadding is a space ' ' then a text input "foo" becomes " foo ", ensuring that n-grams like " f" are created.

      If the text already has such a character in that position (eg starts with), it is not added there.

      Parameters:
      textPadding - for example a space ' '.
    • getGramLengths

      public List<Integer> getGramLengths()
    • extractGrams

      @NotNull public @NotNull List<String> extractGrams(@NotNull @NotNull CharSequence text)
      Creates the n-grams for a given text in the order they occur.

      Example: extractSortedGrams("Foo bar", 2) => [Fo,oo,o , b,ba,ar]

      Parameters:
      text -
      Returns:
      The grams, empty if the input was empty or if none for that gramLength fits.
    • extractCountedGrams

      @NotNull public @NotNull Map<String,Integer> extractCountedGrams(@NotNull @NotNull CharSequence text)
      Returns:
      Key = ngram, value = count The order is as the n-grams appeared first in the string.