A B C D E F G H I K L M N O P R S T U V W X 
All Classes All Packages

A

acceptClausesWithoutDelimiter - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
action - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
action - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
addLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Adds an arbitrary String label to this TextBlock.
addLabelAction(LabelAction) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
addLabels(String...) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Adds a set of labels to this TextBlock.
addLabels(Set<String>) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Adds a set of labels to this TextBlock.
addLabelsTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
 
addPotentialTitles(Set<String>, String, String, int) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
AddPrecedingLabelsFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Adds the labels of the preceding block to the current block, optionally adding a prefix.
AddPrecedingLabelsFilter(String) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
Creates a new AddPrecedingLabelsFilter instance.
addTagAction(String, TagAction) - Method in class com.kohlschutter.boilerpipe.sax.TagActionMap
Adds a particular TagAction for a given tag.
addTextBlock(TextBlock) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
addTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
 
addTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
 
addWhitespaceIfNecessary() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
afterEnd(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
afterEnd(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
afterStart(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
afterStart(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
alt - Variable in class com.kohlschutter.boilerpipe.document.Image
 
ANCHOR_TEXT_END - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
ANCHOR_TEXT_START - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
area - Variable in class com.kohlschutter.boilerpipe.document.Image
 
ARTICLE_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Works very well for most types of Article-like HTML.
ARTICLE_METADATA - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
ArticleExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor which is tuned towards news articles.
ArticleExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
 
ArticleMetadataFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Tries to find TextBlocks that comprise of "article metadata".
ArticleMetadataFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
ArticleSentencesExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor which is tuned towards extracting sentences from news articles.
ArticleSentencesExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
 
avgNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
Returns the average number of words at block-level (= overall number of words divided by the number of blocks).

B

beforeEnd(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
beforeEnd(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
beforeStart(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
beforeStart(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
BlockProximityFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
BlockProximityFusion(int, boolean, boolean) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
Creates a new BlockProximityFusion instance.
BlockTagLabelAction(LabelAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
blockTagLevel - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
BoilerpipeDocumentSource - Interface in com.kohlschutter.boilerpipe
Something that can be represented as a TextDocument.
BoilerpipeExtractor - Interface in com.kohlschutter.boilerpipe
Describes a complete filter pipeline.
BoilerpipeFilter - Interface in com.kohlschutter.boilerpipe
A generic BoilerpipeFilter.
BoilerpipeHTMLContentHandler - Class in com.kohlschutter.boilerpipe.sax
A simple SAX ContentHandler, used by BoilerpipeSAXInput.
BoilerpipeHTMLContentHandler() - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
BoilerpipeHTMLContentHandler(TagActionMap) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
Constructs a BoilerpipeHTMLContentHandler using the given TagActionMap.
BoilerpipeHTMLContentHandler.Event - Enum in com.kohlschutter.boilerpipe.sax
 
BoilerpipeHTMLParser - Class in com.kohlschutter.boilerpipe.sax
A simple SAX Parser, used by BoilerpipeSAXInput.
BoilerpipeHTMLParser() - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
Constructs a BoilerpipeHTMLParser using a default HTML content handler.
BoilerpipeHTMLParser(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
 
BoilerpipeHTMLParser(BoilerpipeHTMLContentHandler) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
BoilerpipeInput - Interface in com.kohlschutter.boilerpipe
A source that returns TextDocuments.
BoilerpipeProcessingException - Exception in com.kohlschutter.boilerpipe
Exception for signaling failure in the processing pipeline.
BoilerpipeProcessingException() - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(String) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(String, Throwable) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeProcessingException(Throwable) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
BoilerpipeSAXInput - Class in com.kohlschutter.boilerpipe.sax
Parses an InputSource using SAX and returns a TextDocument.
BoilerpipeSAXInput(InputSource) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
Creates a new instance of BoilerpipeSAXInput for the given InputSource.
BoilerplateBlockFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Removes TextBlocks which have explicitly been marked as "not content".
BoilerplateBlockFilter(String) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 

C

CANOLA_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Trained on krdwrd Canola (different definition of "boilerplate").
CanolaExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor trained on krdwrd Canola .
CanolaExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
 
Chained(TagAction, TagAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
changesTagLevel() - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
 
characterElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
characterElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
CHARACTERS - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
charset - Variable in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
CLASSIFIER - Static variable in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
The actual classifier, exposed.
classify(TextBlock, TextBlock, TextBlock) - Method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
 
classify(TextBlock, TextBlock, TextBlock) - Method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
 
clone() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
clone() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
 
com.kohlschutter.boilerpipe - package com.kohlschutter.boilerpipe
The Boilerpipe top-level package.
com.kohlschutter.boilerpipe.conditions - package com.kohlschutter.boilerpipe.conditions
 
com.kohlschutter.boilerpipe.demo - package com.kohlschutter.boilerpipe.demo
Just some simple demo code.
com.kohlschutter.boilerpipe.document - package com.kohlschutter.boilerpipe.document
The Boilerpipe document model.
com.kohlschutter.boilerpipe.estimators - package com.kohlschutter.boilerpipe.estimators
 
com.kohlschutter.boilerpipe.extractors - package com.kohlschutter.boilerpipe.extractors
Some standard extractors (i.e., completely piped BoilerpipeFilters)
com.kohlschutter.boilerpipe.filters.debug - package com.kohlschutter.boilerpipe.filters.debug
 
com.kohlschutter.boilerpipe.filters.english - package com.kohlschutter.boilerpipe.filters.english
These BoilerpipeFilters have only been tested on English text.
com.kohlschutter.boilerpipe.filters.heuristics - package com.kohlschutter.boilerpipe.filters.heuristics
These BoilerpipeFilters are pure heuristics.
com.kohlschutter.boilerpipe.filters.simple - package com.kohlschutter.boilerpipe.filters.simple
These BoilerpipeFilters are straight-forward and probably not really specific to English.
com.kohlschutter.boilerpipe.labels - package com.kohlschutter.boilerpipe.labels
 
com.kohlschutter.boilerpipe.sax - package com.kohlschutter.boilerpipe.sax
Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
com.kohlschutter.boilerpipe.util - package com.kohlschutter.boilerpipe.util
Some helper classes.
CommonExtractors - Class in com.kohlschutter.boilerpipe.extractors
Provides quick access to common BoilerpipeExtractors.
CommonExtractors() - Constructor for class com.kohlschutter.boilerpipe.extractors.CommonExtractors
 
CommonTagActions - Class in com.kohlschutter.boilerpipe.sax
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
CommonTagActions() - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions
 
CommonTagActions.BlockTagLabelAction - Class in com.kohlschutter.boilerpipe.sax
CommonTagActions for block-level elements, which triggers some LabelAction on the generated TextBlock.
CommonTagActions.Chained - Class in com.kohlschutter.boilerpipe.sax
 
CommonTagActions.InlineTagLabelAction - Class in com.kohlschutter.boilerpipe.sax
CommonTagActions for inline elements, which triggers some LabelAction on the generated TextBlock.
compareTo(Image) - Method in class com.kohlschutter.boilerpipe.document.Image
 
cond - Variable in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
 
condition - Variable in class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
 
ConditionalLabelAction - Class in com.kohlschutter.boilerpipe.labels
Adds labels to a TextBlock if the given criteria are met.
ConditionalLabelAction(TextBlockCondition, String...) - Constructor for class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
 
containedTextElements - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
contentBitSet - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
contentBitSet - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
ContentFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
Merges two blocks using some heuristics.
ContentFusion() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
Creates a new ContentFusion instance.
contentHandler - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
 
contentOnly - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
currentContainedTextElements - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 

D

data - Variable in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
debugString() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns detailed debugging information about the contained TextBlocks.
DEFAULT_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Usually worse than ArticleExtractor, but simpler/no heuristics.
DEFAULT_INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
DEFAULT_INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
DefaultExtractor - Class in com.kohlschutter.boilerpipe.extractors
A quite generic full-text extractor.
DefaultExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
 
DefaultLabels - Class in com.kohlschutter.boilerpipe.labels
Some pre-defined labels which can be used in conjunction with TextBlock.addLabel(String) and TextBlock.hasLabel(String).
DefaultLabels() - Constructor for class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
DefaultTagActionMap - Class in com.kohlschutter.boilerpipe.sax
Default TagActions.
DefaultTagActionMap() - Constructor for class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
 
DensityRulesClassifier - Class in com.kohlschutter.boilerpipe.filters.english
Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities.
DensityRulesClassifier() - Constructor for class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
 
DocumentTitleMatchClassifier - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks TextBlocks which contain parts of the HTML <TITLE> tag, using some heuristics which are quite specific to the news domain.
DocumentTitleMatchClassifier(String) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 

E

EMPTY_BITSET - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
EMPTY_END - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
EMPTY_START - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
end(BoilerpipeHTMLContentHandler, String, String) - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
 
END_TAG - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
endDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endDocument() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
endDocument() - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
equalLabels(Set<String>, Set<String>) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
 
Event() - Constructor for enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
ExpandTitleToContentFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks all TextBlocks "content" which are between the headline and the part that has already been marked content, if they are marked DefaultLabels.MIGHT_BE_CONTENT.
ExpandTitleToContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
expandToSameLevelText - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
ExtractorBase - Class in com.kohlschutter.boilerpipe.extractors
The base class of Extractors.
ExtractorBase() - Constructor for class com.kohlschutter.boilerpipe.extractors.ExtractorBase
 
extraStyleSheet - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 

F

fetch(URL) - Static method in class com.kohlschutter.boilerpipe.sax.HTMLFetcher
Fetches the document at the given URL, using URLConnection.
filter - Variable in class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
flush - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
flushBlock() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
fontSizeStack - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 

G

getAlt() - Method in class com.kohlschutter.boilerpipe.document.Image
 
getAncestorLabels() - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
getArea() - Method in class com.kohlschutter.boilerpipe.document.Image
Returns the image's area (specified by width * height), or -1 if width/height weren't both specified or could not be parsed.
getCharset() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
getContainedTextElements() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Returns the containedTextElements BitSet, or null.
getContent() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns the TextDocument's content.
getData() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
getDefaultInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
Returns the singleton instance for DeleteBlocksAfterContentFilter.
getDefaultInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
getExtraStyleSheet() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Returns the extra stylesheet definition that will be inserted in the HEAD element.
getHeight() - Method in class com.kohlschutter.boilerpipe.document.Image
 
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
Returns the singleton instance for ArticleExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
Returns the singleton instance for ArticleSentencesExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
Returns the singleton instance for CanolaExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
Returns the singleton instance for DefaultExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
Returns the singleton instance for LargestContentExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
Returns the singleton instance for NumWordsRulesExtractor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
Returns the default instance for PrintDebugFilter, which dumps debug information to System.out
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
Returns the singleton instance for RulebasedBoilerpipeClassifier.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
Returns the singleton instance for TerminatingBlocksFinder.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
Returns the singleton instance for ExpandTitleToContentFilter.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
Returns the singleton instance for BlockFusionProcessor.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
Returns the singleton instance for ExpandTitleToContentFilter.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
Returns the singleton instance for BoilerplateBlockFilter.
getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
Returns the singleton instance for TerminatingBlocksFinder.
getInstance() - Static method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
Returns the singleton instance of ImageExtractor.
getLabels() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Returns the labels associated to this TextBlock, or null if no such labels exist.
getLinkDensity() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getLongestPart(String, String) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
getNumFullTextWords(TextBlock) - Static method in class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
 
getNumFullTextWords(TextBlock, float) - Static method in class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
 
getNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
Returns the overall number of words in all blocks.
getNumWordsInAnchorText() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getOffsetBlocksEnd() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getOffsetBlocksStart() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getPostHighlight() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Returns the string that will be inserted after any highlighted HTML block.
getPotentialTitles() - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
getPreHighlight() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Returns the string that will be inserted before any highlighted HTML block.
getSrc() - Method in class com.kohlschutter.boilerpipe.document.Image
 
getTagLevel() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getTagWhitelist() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
getText() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getText(boolean, boolean) - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns the TextDocument's content, non-content or both
getText(TextDocument) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
Extracts text from the given TextDocument object.
getText(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the given TextDocument object.
getText(Reader) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code available from the given Reader.
getText(Reader) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given Reader.
getText(String) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code given as a String.
getText(String) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code given as a String.
getText(URL) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given URL.
getText(InputSource) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
Extracts text from the HTML code available from the given InputSource.
getText(InputSource) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
Extracts text from the HTML code available from the given InputSource.
getTextBlocks() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns the TextBlocks of this document.
getTextBlocks() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
getTextDensity() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
getTextDocument() - Method in interface com.kohlschutter.boilerpipe.BoilerpipeInput
Returns (somehow) a TextDocument.
getTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
Retrieves the TextDocument using a default HTML parser.
getTextDocument(BoilerpipeHTMLParser) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
Retrieves the TextDocument using the given HTML parser.
getTitle() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Returns the "main" title for this document, or null if no such title has ben set.
getTitle() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
getWidth() - Method in class com.kohlschutter.boilerpipe.document.Image
 

H

H1 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
H2 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
H3 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
hasLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
Checks whether this TextBlock has the given label.
HEADING - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
height - Variable in class com.kohlschutter.boilerpipe.document.Image
 
HeuristicFilterBase - Class in com.kohlschutter.boilerpipe.filters.english
Base class for some heuristics that are used by boilerpipe filters.
HeuristicFilterBase() - Constructor for class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
 
hl - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
HR - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
html - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
HTMLDocument - Class in com.kohlschutter.boilerpipe.sax
HTMLDocument(byte[], Charset) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
HTMLDocument(String) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
HTMLFetcher - Class in com.kohlschutter.boilerpipe.sax
A very simple HTTP/HTML fetcher, really just for demo purposes.
HTMLFetcher() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLFetcher
 
HTMLHighlightDemo - Class in com.kohlschutter.boilerpipe.demo
Demonstrates how to use Boilerpipe to get the main content, highlighted as HTML.
HTMLHighlightDemo() - Constructor for class com.kohlschutter.boilerpipe.demo.HTMLHighlightDemo
 
HTMLHighlighter - Class in com.kohlschutter.boilerpipe.sax
Highlights text blocks in an HTML document that have been marked as "content" in the corresponding TextDocument.
HTMLHighlighter(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
HTMLHighlighter.Implementation - Class in com.kohlschutter.boilerpipe.sax
 
HTMLHighlighter.TagAction - Class in com.kohlschutter.boilerpipe.sax
 

I

ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
IgnoreBlocksAfterContentFilter - Class in com.kohlschutter.boilerpipe.filters.english
Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT.
IgnoreBlocksAfterContentFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
IgnoreBlocksAfterContentFromEndFilter - Class in com.kohlschutter.boilerpipe.filters.english
Marks all blocks as "non-content" that occur after blocks that have been marked DefaultLabels.INDICATES_END_OF_TEXT, and after any content block.
IgnoreBlocksAfterContentFromEndFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
 
Image - Class in com.kohlschutter.boilerpipe.document
Represents an Image resource that is contained in the document.
Image(String, String, String, String) - Constructor for class com.kohlschutter.boilerpipe.document.Image
 
ImageExtractor - Class in com.kohlschutter.boilerpipe.sax
Extracts the images that are enclosed by extracted content.
ImageExtractor() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor
 
ImageExtractor.Implementation - Class in com.kohlschutter.boilerpipe.sax
 
ImageExtractor.TagAction - Class in com.kohlschutter.boilerpipe.sax
 
ImageExtractorDemo - Class in com.kohlschutter.boilerpipe.demo
Demonstrates how to use Boilerpipe to get the images within the main content.
ImageExtractorDemo() - Constructor for class com.kohlschutter.boilerpipe.demo.ImageExtractorDemo
 
Implementation() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
Implementation() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
inAnchor - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
inAnchorText - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
inBody - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
INDICATES_END_OF_TEXT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
inHighlight - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
initDensities() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
InlineTagLabelAction(LabelAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
InputSourceable - Interface in com.kohlschutter.boilerpipe.sax
An InputSourceable can return an arbitrary number of new InputSources for a given document.
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
Returns the singleton instance of SimpleEstimator
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
Returns the default instance for PrintDebugFilter, which dumps debug information to System.out
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
 
INSTANCE - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
 
INSTANCE_200 - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
INSTANCE_EXPAND_TO_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
INSTANCE_KEEP_TITLE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 
INSTANCE_PRE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
INSTANCE_STRICTLY_NOT_CONTENT - Static variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
INSTANCE_TEXT - Static variable in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
 
InvertedFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Reverts the "isContent" flag for all TextBlocks
InvertedFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
 
is - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
 
isBlockLevel - Variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
isClause(CharSequence) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
isContent - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
isContent() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
isDigit(char) - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
 
isLowQuality(TextDocumentStatistics, TextDocumentStatistics) - Method in class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
Given the statistics of the document before and after applying the BoilerpipeExtractor, can we regard the extraction quality (too) low? Works well with DefaultExtractor, ArticleExtractor and others.
isOutputHighlightOnly() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
If true, only HTML enclosed within highlighted content will be returned
isWord(String) - Static method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 

K

KEEP_EVERYTHING_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Dummy Extractor; should return the input text.
KeepEverythingExtractor - Class in com.kohlschutter.boilerpipe.extractors
Marks everything as content.
KeepEverythingExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
 
KeepEverythingWithMinKWordsExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor which extracts the largest text component of a page.
KeepEverythingWithMinKWordsExtractor(int) - Constructor for class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
KeepLargestBlockFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Keeps the largest TextBlock only (by the number of words).
KeepLargestBlockFilter(boolean, int) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
KeepLargestFulltextBlockFilter - Class in com.kohlschutter.boilerpipe.filters.english
Keeps the largest TextBlock only (by the number of words).
KeepLargestFulltextBlockFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 

L

LabelAction - Class in com.kohlschutter.boilerpipe.labels
Helps adding labels to TextBlocks.
LabelAction(String...) - Constructor for class com.kohlschutter.boilerpipe.labels.LabelAction
 
LabelFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
Fuses adjacent blocks if their labels are equal.
LabelFusion() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
Creates a new LabelFusion instance.
labelPrefix - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
labels - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
labels - Variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
labels - Variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
 
labels - Variable in class com.kohlschutter.boilerpipe.labels.LabelAction
 
labelStack - Variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
labelStacks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
LabelToBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks all blocks that contain a given label as "boilerplate".
LabelToBoilerplateFilter(String...) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
LabelToContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks all blocks that contain a given label as "content".
LabelToContentFilter(String...) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
 
labelToKeep - Variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 
LargeBlockSameTagLevelToContentFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks all blocks as content that: are on the same tag-level as very likely main content (usually the level of the largest block) have a significant number of words, currently: at least 100
LargeBlockSameTagLevelToContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
 
LARGEST_CONTENT_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
Like DefaultExtractor, but keeps the largest text block only.
LargestContentExtractor - Class in com.kohlschutter.boilerpipe.extractors
A full-text extractor which extracts the largest text component of a page.
LargestContentExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
 
lastEndTag - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
lastEvent - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
lastStartTag - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
LI - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
linkDensity - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
linksBuffer - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
linksHighlight - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
ListAtEndFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks nested list-item blocks after the end of the main content.
ListAtEndFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
 

M

main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.HTMLHighlightDemo
 
main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.ImageExtractorDemo
 
main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.Oneliner
 
main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.UsingSAX
 
MarkEverythingBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks all blocks as boilerplate.
MarkEverythingBoilerplateFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
 
MarkEverythingContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks all blocks as content.
MarkEverythingContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
 
MARKUP_PREFIX - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
markupLabelsOnly(Set<String>) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
 
MarkupTagAction - Class in com.kohlschutter.boilerpipe.sax
Assigns labels for element CSS classes and ids to the corresponding TextBlock.
MarkupTagAction(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
MAX_DISTANCE_1 - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_CONTENT_ONLY - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_CONTENT_ONLY_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
MAX_DISTANCE_1_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
maxBlocksDistance - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
meetsCondition(TextBlock) - Method in interface com.kohlschutter.boilerpipe.conditions.TextBlockCondition
Returns true iff the given TextBlock tb meets the defined condition.
mergeNext(TextBlock) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
MIGHT_BE_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
MinClauseWordsFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5).
MinClauseWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
MinClauseWordsFilter(int, boolean) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
MinFulltextWordsFilter - Class in com.kohlschutter.boilerpipe.filters.english
Keeps only those content blocks which contain at least k full-text words (measured by HeuristicFilterBase.getNumFullTextWords(TextBlock)).
MinFulltextWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
minNumWords - Variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
minWords - Variable in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
minWords - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
minWords - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
minWords - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
 
MinWordsFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Keeps only those content blocks which contain at least k words.
MinWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
 

N

newExtractingInstance() - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Creates a new HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup.
newHighlightingInstance() - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Creates a new HTMLHighlighter, which is set-up to return the full HTML text, with the extracted text portion highlighted.
nullTrim(String) - Static method in class com.kohlschutter.boilerpipe.document.Image
 
numBlocks - Variable in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
 
numFullTextWords - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
numWords - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
numWords - Variable in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
 
numWordsInAnchorText - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
numWordsInWrappedLines - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
NumWordsRulesClassifier - Class in com.kohlschutter.boilerpipe.filters.english
Classifies TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block.
NumWordsRulesClassifier() - Constructor for class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
 
NumWordsRulesExtractor - Class in com.kohlschutter.boilerpipe.extractors
A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).
NumWordsRulesExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
 
numWrappedLines - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 

O

offsetBlocks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
offsetBlocksEnd - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
offsetBlocksStart - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
Oneliner - Class in com.kohlschutter.boilerpipe.demo
Demonstrates how to use Boilerpipe to get the main content as plain text.
Oneliner() - Constructor for class com.kohlschutter.boilerpipe.demo.Oneliner
 
out - Variable in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
 
outputHighlightOnly - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 

P

PAT_CHARSET - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLFetcher
 
PAT_CLAUSE_DELIMITER - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
PAT_FONT_SIZE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
 
PAT_NOT_WORD_BOUNDARY - Static variable in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
 
PAT_NUM - Static variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
PAT_REMOVE_CHARACTERS - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
PAT_SUPER_TAG - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
PAT_TAG_NO_TEXT - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
PAT_VALID_WORD_CHARACTER - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
PAT_WHITESPACE - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
PAT_WORD_BOUNDARY - Static variable in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
 
PATTERNS_SHORT - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
postHighlight - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
potentialTitles - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
preHighlight - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
PrintDebugFilter - Class in com.kohlschutter.boilerpipe.filters.debug
Prints debug information about the current state of the TextDocument.
PrintDebugFilter(PrintWriter) - Constructor for class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
Creates a new instance of PrintDebugFilter.
process(TextDocument) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeFilter
Processes the given document doc.
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
 
process(TextDocument, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Processes the given TextDocument and the original HTML text (as a String).
process(TextDocument, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
Processes the given TextDocument and the original HTML text (as a String).
process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Processes the given TextDocument and the original HTML text (as an InputSource ).
process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
Processes the given TextDocument and the original HTML text (as an InputSource ).
process(URL, BoilerpipeExtractor) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified BoilerpipeExtractor.
process(URL, BoilerpipeExtractor) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
Fetches the given URL using HTMLFetcher and processes the retrieved HTML using the specified BoilerpipeExtractor.
processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 

R

recycle() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
Recycles this instance.
removeLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 

S

sameTagLevelOnly - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
 
sbLastWasWhitespace - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
serialVersionUID - Static variable in exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
 
serialVersionUID - Static variable in class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
 
serialVersionUID - Static variable in class com.kohlschutter.boilerpipe.sax.TagActionMap
 
setContentHandler(BoilerpipeHTMLContentHandler) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
 
setContentHandler(ContentHandler) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
 
setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
setExtraStyleSheet(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Sets the extra stylesheet definition that will be inserted in the HEAD element.
setIsContent(boolean) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
setOutputHighlightOnly(boolean) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document.
setPostHighlight(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Sets the string that will be inserted after any highlighted HTML block.
setPreHighlight(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
Sets the string that will be inserted prior to any highlighted HTML block.
setTagAction(String, TagAction) - Method in class com.kohlschutter.boilerpipe.sax.TagActionMap
Sets a particular TagAction for a given tag.
setTagLevel(int) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
setTagWhitelist(Map<String, Set<String>>) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
setTitle(String) - Method in class com.kohlschutter.boilerpipe.document.TextDocument
Updates the "main" title for this document.
setTitle(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
SimpleBlockFusionProcessor - Class in com.kohlschutter.boilerpipe.filters.heuristics
Merges two subsequent blocks if their text densities are equal.
SimpleBlockFusionProcessor() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
 
SimpleEstimator - Class in com.kohlschutter.boilerpipe.estimators
Estimates the "goodness" of a BoilerpipeExtractor on a given document.
SimpleEstimator() - Constructor for class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
 
skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
SplitParagraphBlocksFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Splits TextBlocks at paragraph boundaries.
SplitParagraphBlocksFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
 
src - Variable in class com.kohlschutter.boilerpipe.document.Image
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
 
start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
 
START_TAG - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
startDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startDocument() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
startDocument() - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
 
startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
 
startsWithNumber(String, int, String...) - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
Checks whether the given text t starts with a sequence of digits, followed by one of the given strings.
STRICTLY_NOT_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
SurroundingToContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
Marks blocks as "content" if their preceding and following blocks are both already marked "content", and the given TextBlockCondition is met.
SurroundingToContentFilter(TextBlockCondition) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
 

T

t1 - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
t2 - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
 
TA_ANCHOR_TEXT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag as "anchor" (this should usually only be set for the <A> tag).
TA_BLOCK_LEVEL - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Explicitly marks this tag a simple "block-level" element, which always generates whitespace
TA_BODY - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag the body element (this should usually only be set for the <BODY> tag).
TA_FONT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Special TagAction for the <FONT> tag, which keeps track of the absolute and relative font size.
TA_HEAD - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag as "ignorable", i.e.
TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
 
TA_INLINE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Deprecated.
TA_INLINE_NO_WHITESPACE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag a simple "inline" element, which neither generates whitespace, nor a new block.
TA_INLINE_WHITESPACE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
Marks this tag a simple "inline" element, which generates whitespace, but no new block.
TAG_ACTIONS - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
TAG_ACTIONS - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
 
TagAction - Interface in com.kohlschutter.boilerpipe.sax
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
TagAction() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
 
TagAction() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
 
TagActionMap - Class in com.kohlschutter.boilerpipe.sax
Base class for definition a set of TagActions that are to be used for the HTML parsing process.
TagActionMap() - Constructor for class com.kohlschutter.boilerpipe.sax.TagActionMap
 
tagActions - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
tagLevel - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
tagLevel - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
tagWhitelist - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
TerminatingBlocksFinder - Class in com.kohlschutter.boilerpipe.filters.english
Finds blocks which are potentially indicating the end of an article text and marks them with DefaultLabels.INDICATES_END_OF_TEXT.
TerminatingBlocksFinder() - Constructor for class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
 
text - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
TextBlock - Class in com.kohlschutter.boilerpipe.document
Describes a block of text.
TextBlock(String) - Constructor for class com.kohlschutter.boilerpipe.document.TextBlock
 
TextBlock(String, BitSet, int, int, int, int, int) - Constructor for class com.kohlschutter.boilerpipe.document.TextBlock
 
TextBlockCondition - Interface in com.kohlschutter.boilerpipe.conditions
Evaluates whether a given TextBlock meets a certain condition.
textBlocks - Variable in class com.kohlschutter.boilerpipe.document.TextDocument
 
textBlocks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
textBuffer - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
textDensity - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
 
TextDocument - Class in com.kohlschutter.boilerpipe.document
A text document, consisting of one or more TextBlocks.
TextDocument(String, List<TextBlock>) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocument
Creates a new TextDocument with given TextBlocks and given title.
TextDocument(List<TextBlock>) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocument
Creates a new TextDocument with given TextBlocks, and no title.
TextDocumentStatistics - Class in com.kohlschutter.boilerpipe.document
Provides shallow statistics on a given TextDocument
TextDocumentStatistics(TextDocument, boolean) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
Computes statistics on a given TextDocument.
textElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
title - Variable in class com.kohlschutter.boilerpipe.document.TextDocument
 
title - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
TITLE - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 
toInputSource() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
 
toInputSource() - Method in interface com.kohlschutter.boilerpipe.sax.InputSourceable
 
tokenBuffer - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
 
tokenize(CharSequence) - Static method in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
Tokenizes the text and returns an array of tokens.
toString() - Method in class com.kohlschutter.boilerpipe.document.Image
 
toString() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
 
toString() - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
 
toTextDocument() - Method in interface com.kohlschutter.boilerpipe.BoilerpipeDocumentSource
 
toTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
Returns a TextDocument containing the extracted TextBlock s.
toTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
Returns a TextDocument containing the extracted TextBlock s.
TrailingHeadlineToBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
Marks trailing headlines (TextBlocks that have the label DefaultLabels.HEADING) as boilerplate.
TrailingHeadlineToBoilerplateFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
 

U

UnicodeTokenizer - Class in com.kohlschutter.boilerpipe.util
Tokenizes text according to Unicode word boundaries and strips off non-word characters.
UnicodeTokenizer() - Constructor for class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
 
UsingSAX - Class in com.kohlschutter.boilerpipe.demo
Demonstrates how to use Boilerpipe when working with InputSources.
UsingSAX() - Constructor for class com.kohlschutter.boilerpipe.demo.UsingSAX
 

V

valueOf(String) - Static method in enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
Returns the enum constant of this type with the specified name.
values() - Static method in enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
Returns an array containing the constants of this enum type, in the order they are declared.
VERY_LIKELY_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
 

W

WHITESPACE - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
 
width - Variable in class com.kohlschutter.boilerpipe.document.Image
 

X

xmlEncode(String) - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
 
A B C D E F G H I K L M N O P R S T U V W X 
All Classes All Packages