Class AnalyzedSentence

java.lang.Object
org.languagetool.AnalyzedSentence

public final class AnalyzedSentence extends Object
A sentence that has been tokenized and analyzed.
  • Field Details

  • Constructor Details

  • Method Details

    • getNonBlankReadings

      @NotNull private List<AnalyzedTokenReadings> getNonBlankReadings(AnalyzedTokenReadings[] tokens, int whCounter, int nonWhCounter, int[] mapping)
    • indexTokens

      private static Map<String,List<Integer>> indexTokens(AnalyzedTokenReadings[] tokens)
    • indexLemmas

      private static Map<String,List<Integer>> indexLemmas(AnalyzedTokenReadings[] tokens)
    • makeUnmodifiable

      private static Map<String,List<Integer>> makeUnmodifiable(Map<String,List<Integer>> result)
    • copy

      public AnalyzedSentence copy(AnalyzedSentence sentence)
      The method copies AnalyzedSentence and returns the copy. Useful for performing local immunization (for example).
      Parameters:
      sentence - AnalyzedSentence to be copied
      Returns:
      a new object which is a copy
      Since:
      2.5
    • getTokens

      public AnalyzedTokenReadings[] getTokens()
      Returns the AnalyzedTokenReadings of the analyzed text. Whitespace is also a token.
    • getPreDisambigTokens

      public AnalyzedTokenReadings[] getPreDisambigTokens()
      Since:
      4.5
    • getTokensWithoutWhitespace

      public AnalyzedTokenReadings[] getTokensWithoutWhitespace()
      Returns the AnalyzedTokenReadings of the analyzed text, with whitespace tokens removed but with the artificial SENT_START token included.
    • getNonWhitespaceTokenCount

      @Internal public int getNonWhitespaceTokenCount()
      Get the length of the array returned by getTokensWithoutWhitespace() without additional allocations.
    • getPreDisambigTokensWithoutWhitespace

      public AnalyzedTokenReadings[] getPreDisambigTokensWithoutWhitespace()
      Since:
      4.5
    • getOriginalPosition

      public int getOriginalPosition(int nonWhPosition)
      Get a position of a non-whitespace token in the original sentence with whitespace.
      Parameters:
      nonWhPosition - position of a non-whitespace token
      Returns:
      position in the original sentence.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • toShortString

      public String toShortString(String readingDelimiter)
      Return string representation without chunk information.
      Since:
      2.3
    • getText

      public String getText()
      Return the original text.
      Since:
      2.7
    • calcText

      private String calcText()
    • getCorrectedTextLength

      public int getCorrectedTextLength()
      Text length taking position fixes (for removed soft hyphens etc.) into account, so this is _not_ always equal to getText().
      Since:
      5.1
    • toTextString

      String toTextString()
      Return string representation without any analysis information, just the original text.
      Since:
      2.6
    • toString

      public String toString(String readingDelimiter)
      Return string representation with chunk information.
    • toString

      private String toString(String readingDelimiter, boolean includeChunks)
    • getAnnotations

      public String getAnnotations()
      Get disambiguator actions log.
    • getTokenSet

      public Set<String> getTokenSet()
      Get the lowercase tokens of this sentence in a set. Used internally for performance optimization.
      Since:
      2.4
    • getLemmaSet

      public Set<String> getLemmaSet()
      Get the lowercase lemmas of this sentence in a set. Used internally for performance optimization.
      Since:
      2.5
    • getTokenOffsets

      @Nullable @Internal public List<Integer> getTokenOffsets(String token)
      Returns:
      all offsets in getTokensWithoutWhitespace() where tokens with the given text occur (case-insensitive), or null if there are no such occurrences
      Since:
      5.3
    • getLemmaOffsets

      @Nullable @Internal public List<Integer> getLemmaOffsets(String token)
      Returns:
      all offsets in getTokensWithoutWhitespace() where tokens with the given lemma occur (case-insensitive), or null if there are no such occurrences
      Since:
      5.3
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object