A B C D E F G H I K L M N O P R S T U V W X
All Classes All Packages
All Classes All Packages
All Classes All Packages
A
- acceptClausesWithoutDelimiter - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- action - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- action - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- addLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Adds an arbitrary String label to this
TextBlock. - addLabelAction(LabelAction) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- addLabels(String...) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Adds a set of labels to this
TextBlock. - addLabels(Set<String>) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Adds a set of labels to this
TextBlock. - addLabelsTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
- addPotentialTitles(Set<String>, String, String, int) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- AddPrecedingLabelsFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Adds the labels of the preceding block to the current block, optionally adding a prefix.
- AddPrecedingLabelsFilter(String) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
-
Creates a new
AddPrecedingLabelsFilterinstance. - addTagAction(String, TagAction) - Method in class com.kohlschutter.boilerpipe.sax.TagActionMap
-
Adds a particular
TagActionfor a given tag. - addTextBlock(TextBlock) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- addTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
- addTo(TextBlock) - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
- addWhitespaceIfNecessary() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- afterEnd(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- afterEnd(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- afterStart(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- afterStart(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- alt - Variable in class com.kohlschutter.boilerpipe.document.Image
- ANCHOR_TEXT_END - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- ANCHOR_TEXT_START - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- area - Variable in class com.kohlschutter.boilerpipe.document.Image
- ARTICLE_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Works very well for most types of Article-like HTML.
- ARTICLE_METADATA - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- ArticleExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A full-text extractor which is tuned towards news articles.
- ArticleExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
- ArticleMetadataFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Tries to find TextBlocks that comprise of "article metadata".
- ArticleMetadataFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
- ArticleSentencesExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A full-text extractor which is tuned towards extracting sentences from news articles.
- ArticleSentencesExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
- avgNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
-
Returns the average number of words at block-level (= overall number of words divided by the number of blocks).
B
- beforeEnd(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- beforeEnd(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- beforeStart(HTMLHighlighter.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- beforeStart(ImageExtractor.Implementation, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- BlockProximityFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Fuses adjacent blocks if their distance (in blocks) does not exceed a certain limit.
- BlockProximityFusion(int, boolean, boolean) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
-
Creates a new
BlockProximityFusioninstance. - BlockTagLabelAction(LabelAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- blockTagLevel - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- BoilerpipeDocumentSource - Interface in com.kohlschutter.boilerpipe
-
Something that can be represented as a
TextDocument. - BoilerpipeExtractor - Interface in com.kohlschutter.boilerpipe
-
Describes a complete filter pipeline.
- BoilerpipeFilter - Interface in com.kohlschutter.boilerpipe
-
A generic
BoilerpipeFilter. - BoilerpipeHTMLContentHandler - Class in com.kohlschutter.boilerpipe.sax
-
A simple SAX
ContentHandler, used byBoilerpipeSAXInput. - BoilerpipeHTMLContentHandler() - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
Constructs a
BoilerpipeHTMLContentHandlerusing theDefaultTagActionMap. - BoilerpipeHTMLContentHandler(TagActionMap) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
Constructs a
BoilerpipeHTMLContentHandlerusing the givenTagActionMap. - BoilerpipeHTMLContentHandler.Event - Enum in com.kohlschutter.boilerpipe.sax
- BoilerpipeHTMLParser - Class in com.kohlschutter.boilerpipe.sax
-
A simple SAX Parser, used by
BoilerpipeSAXInput. - BoilerpipeHTMLParser() - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
-
Constructs a
BoilerpipeHTMLParserusing a default HTML content handler. - BoilerpipeHTMLParser(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
- BoilerpipeHTMLParser(BoilerpipeHTMLContentHandler) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
-
Constructs a
BoilerpipeHTMLParserusing the givenBoilerpipeHTMLContentHandler. - BoilerpipeInput - Interface in com.kohlschutter.boilerpipe
-
A source that returns
TextDocuments. - BoilerpipeProcessingException - Exception in com.kohlschutter.boilerpipe
-
Exception for signaling failure in the processing pipeline.
- BoilerpipeProcessingException() - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- BoilerpipeProcessingException(String) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- BoilerpipeProcessingException(String, Throwable) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- BoilerpipeProcessingException(Throwable) - Constructor for exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- BoilerpipeSAXInput - Class in com.kohlschutter.boilerpipe.sax
-
Parses an
InputSourceusing SAX and returns aTextDocument. - BoilerpipeSAXInput(InputSource) - Constructor for class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
-
Creates a new instance of
BoilerpipeSAXInputfor the givenInputSource. - BoilerplateBlockFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Removes
TextBlocks which have explicitly been marked as "not content". - BoilerplateBlockFilter(String) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
C
- CANOLA_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Trained on krdwrd Canola (different definition of "boilerplate").
- CanolaExtractor - Class in com.kohlschutter.boilerpipe.extractors
- CanolaExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
- Chained(TagAction, TagAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- changesTagLevel() - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- changesTagLevel() - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
- characterElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- characterElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- characters(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- CHARACTERS - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- charset - Variable in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- CLASSIFIER - Static variable in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
-
The actual classifier, exposed.
- classify(TextBlock, TextBlock, TextBlock) - Method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
- classify(TextBlock, TextBlock, TextBlock) - Method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
- clone() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- clone() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
- com.kohlschutter.boilerpipe - package com.kohlschutter.boilerpipe
-
The Boilerpipe top-level package.
- com.kohlschutter.boilerpipe.conditions - package com.kohlschutter.boilerpipe.conditions
- com.kohlschutter.boilerpipe.demo - package com.kohlschutter.boilerpipe.demo
-
Just some simple demo code.
- com.kohlschutter.boilerpipe.document - package com.kohlschutter.boilerpipe.document
-
The Boilerpipe document model.
- com.kohlschutter.boilerpipe.estimators - package com.kohlschutter.boilerpipe.estimators
- com.kohlschutter.boilerpipe.extractors - package com.kohlschutter.boilerpipe.extractors
-
Some standard extractors (i.e., completely piped BoilerpipeFilters)
- com.kohlschutter.boilerpipe.filters.debug - package com.kohlschutter.boilerpipe.filters.debug
- com.kohlschutter.boilerpipe.filters.english - package com.kohlschutter.boilerpipe.filters.english
-
These BoilerpipeFilters have only been tested on English text.
- com.kohlschutter.boilerpipe.filters.heuristics - package com.kohlschutter.boilerpipe.filters.heuristics
-
These BoilerpipeFilters are pure heuristics.
- com.kohlschutter.boilerpipe.filters.simple - package com.kohlschutter.boilerpipe.filters.simple
-
These BoilerpipeFilters are straight-forward and probably not really specific to English.
- com.kohlschutter.boilerpipe.labels - package com.kohlschutter.boilerpipe.labels
- com.kohlschutter.boilerpipe.sax - package com.kohlschutter.boilerpipe.sax
-
Classes related to parsing and producing HTML from/to Boilerpipe TextDocuments.
- com.kohlschutter.boilerpipe.util - package com.kohlschutter.boilerpipe.util
-
Some helper classes.
- CommonExtractors - Class in com.kohlschutter.boilerpipe.extractors
-
Provides quick access to common
BoilerpipeExtractors. - CommonExtractors() - Constructor for class com.kohlschutter.boilerpipe.extractors.CommonExtractors
- CommonTagActions - Class in com.kohlschutter.boilerpipe.sax
-
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
- CommonTagActions() - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions
- CommonTagActions.BlockTagLabelAction - Class in com.kohlschutter.boilerpipe.sax
-
CommonTagActionsfor block-level elements, which triggers someLabelActionon the generatedTextBlock. - CommonTagActions.Chained - Class in com.kohlschutter.boilerpipe.sax
- CommonTagActions.InlineTagLabelAction - Class in com.kohlschutter.boilerpipe.sax
- compareTo(Image) - Method in class com.kohlschutter.boilerpipe.document.Image
- cond - Variable in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
- condition - Variable in class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
- ConditionalLabelAction - Class in com.kohlschutter.boilerpipe.labels
-
Adds labels to a
TextBlockif the given criteria are met. - ConditionalLabelAction(TextBlockCondition, String...) - Constructor for class com.kohlschutter.boilerpipe.labels.ConditionalLabelAction
- containedTextElements - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- contentBitSet - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- contentBitSet - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- ContentFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Merges two blocks using some heuristics.
- ContentFusion() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
-
Creates a new
ContentFusioninstance. - contentHandler - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
- contentOnly - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- currentContainedTextElements - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
D
- data - Variable in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- debugString() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns detailed debugging information about the contained
TextBlocks. - DEFAULT_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Usually worse than
ArticleExtractor, but simpler/no heuristics. - DEFAULT_INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- DEFAULT_INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- DefaultExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A quite generic full-text extractor.
- DefaultExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
- DefaultLabels - Class in com.kohlschutter.boilerpipe.labels
-
Some pre-defined labels which can be used in conjunction with
TextBlock.addLabel(String)andTextBlock.hasLabel(String). - DefaultLabels() - Constructor for class com.kohlschutter.boilerpipe.labels.DefaultLabels
- DefaultTagActionMap - Class in com.kohlschutter.boilerpipe.sax
-
Default
TagActions. - DefaultTagActionMap() - Constructor for class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
- DensityRulesClassifier - Class in com.kohlschutter.boilerpipe.filters.english
-
Classifies
TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features", particularly using text densities and link densities. - DensityRulesClassifier() - Constructor for class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
- DocumentTitleMatchClassifier - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks
TextBlocks which contain parts of the HTML<TITLE>tag, using some heuristics which are quite specific to the news domain. - DocumentTitleMatchClassifier(String) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
E
- EMPTY_BITSET - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
- EMPTY_END - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
- EMPTY_START - Static variable in class com.kohlschutter.boilerpipe.document.TextBlock
- end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- end(BoilerpipeHTMLContentHandler, String, String) - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- end(BoilerpipeHTMLContentHandler, String, String) - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
- END_TAG - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- endDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- endDocument() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- endDocument() - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- endElement(String, String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- endPrefixMapping(String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- equalLabels(Set<String>, Set<String>) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
- Event() - Constructor for enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- ExpandTitleToContentFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks all
TextBlocks "content" which are between the headline and the part that has already been marked content, if they are markedDefaultLabels.MIGHT_BE_CONTENT. - ExpandTitleToContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
- expandToSameLevelText - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- ExtractorBase - Class in com.kohlschutter.boilerpipe.extractors
-
The base class of Extractors.
- ExtractorBase() - Constructor for class com.kohlschutter.boilerpipe.extractors.ExtractorBase
- extraStyleSheet - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
F
- fetch(URL) - Static method in class com.kohlschutter.boilerpipe.sax.HTMLFetcher
-
Fetches the document at the given URL, using
URLConnection. - filter - Variable in class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
- flush - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- flushBlock() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- fontSizeStack - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
G
- getAlt() - Method in class com.kohlschutter.boilerpipe.document.Image
- getAncestorLabels() - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- getArea() - Method in class com.kohlschutter.boilerpipe.document.Image
-
Returns the image's area (specified by width * height), or -1 if width/height weren't both specified or could not be parsed.
- getCharset() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- getContainedTextElements() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Returns the containedTextElements BitSet, or
null. - getContent() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns the
TextDocument's content. - getData() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- getDefaultInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
-
Returns the singleton instance for DeleteBlocksAfterContentFilter.
- getDefaultInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- getExtraStyleSheet() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Returns the extra stylesheet definition that will be inserted in the HEAD element.
- getHeight() - Method in class com.kohlschutter.boilerpipe.document.Image
- getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
-
Returns the singleton instance for
ArticleExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
-
Returns the singleton instance for
ArticleSentencesExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
-
Returns the singleton instance for
CanolaExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
-
Returns the singleton instance for
DefaultExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
-
Returns the singleton instance for
LargestContentExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
-
Returns the singleton instance for
NumWordsRulesExtractor. - getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
-
Returns the default instance for
PrintDebugFilter, which dumps debug information toSystem.out - getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
-
Returns the singleton instance for RulebasedBoilerpipeClassifier.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
-
Returns the singleton instance for RulebasedBoilerpipeClassifier.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
-
Returns the singleton instance for TerminatingBlocksFinder.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
-
Returns the singleton instance for ExpandTitleToContentFilter.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
-
Returns the singleton instance for BlockFusionProcessor.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
-
Returns the singleton instance for ExpandTitleToContentFilter.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
-
Returns the singleton instance for BoilerplateBlockFilter.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
-
Returns the singleton instance for TerminatingBlocksFinder.
- getInstance() - Static method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
-
Returns the singleton instance of
ImageExtractor. - getLabels() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Returns the labels associated to this TextBlock, or
nullif no such labels exist. - getLinkDensity() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getLongestPart(String, String) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- getNumFullTextWords(TextBlock) - Static method in class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
- getNumFullTextWords(TextBlock, float) - Static method in class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
- getNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getNumWords() - Method in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
-
Returns the overall number of words in all blocks.
- getNumWordsInAnchorText() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getOffsetBlocksEnd() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getOffsetBlocksStart() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getPostHighlight() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Returns the string that will be inserted after any highlighted HTML block.
- getPotentialTitles() - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- getPreHighlight() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Returns the string that will be inserted before any highlighted HTML block.
- getSrc() - Method in class com.kohlschutter.boilerpipe.document.Image
- getTagLevel() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getTagWhitelist() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- getText() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getText(boolean, boolean) - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns the
TextDocument's content, non-content or both - getText(TextDocument) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
-
Extracts text from the given
TextDocumentobject. - getText(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the given
TextDocumentobject. - getText(Reader) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code available from the given
Reader. - getText(Reader) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given
Reader. - getText(String) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code given as a String.
- getText(String) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code given as a String.
- getText(URL) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given
URL. - getText(InputSource) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeExtractor
-
Extracts text from the HTML code available from the given
InputSource. - getText(InputSource) - Method in class com.kohlschutter.boilerpipe.extractors.ExtractorBase
-
Extracts text from the HTML code available from the given
InputSource. - getTextBlocks() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns the
TextBlocks of this document. - getTextBlocks() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- getTextDensity() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- getTextDocument() - Method in interface com.kohlschutter.boilerpipe.BoilerpipeInput
-
Returns (somehow) a
TextDocument. - getTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
-
Retrieves the
TextDocumentusing a default HTML parser. - getTextDocument(BoilerpipeHTMLParser) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
-
Retrieves the
TextDocumentusing the given HTML parser. - getTitle() - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Returns the "main" title for this document, or
nullif no such title has ben set. - getTitle() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- getWidth() - Method in class com.kohlschutter.boilerpipe.document.Image
H
- H1 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- H2 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- H3 - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- hasLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
-
Checks whether this TextBlock has the given label.
- HEADING - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- height - Variable in class com.kohlschutter.boilerpipe.document.Image
- HeuristicFilterBase - Class in com.kohlschutter.boilerpipe.filters.english
-
Base class for some heuristics that are used by boilerpipe filters.
- HeuristicFilterBase() - Constructor for class com.kohlschutter.boilerpipe.filters.english.HeuristicFilterBase
- hl - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- HR - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- html - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- HTMLDocument - Class in com.kohlschutter.boilerpipe.sax
-
An
InputSourceableforHTMLFetcher. - HTMLDocument(byte[], Charset) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLDocument
- HTMLDocument(String) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLDocument
- HTMLFetcher - Class in com.kohlschutter.boilerpipe.sax
-
A very simple HTTP/HTML fetcher, really just for demo purposes.
- HTMLFetcher() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLFetcher
- HTMLHighlightDemo - Class in com.kohlschutter.boilerpipe.demo
-
Demonstrates how to use Boilerpipe to get the main content, highlighted as HTML.
- HTMLHighlightDemo() - Constructor for class com.kohlschutter.boilerpipe.demo.HTMLHighlightDemo
- HTMLHighlighter - Class in com.kohlschutter.boilerpipe.sax
-
Highlights text blocks in an HTML document that have been marked as "content" in the corresponding
TextDocument. - HTMLHighlighter(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- HTMLHighlighter.Implementation - Class in com.kohlschutter.boilerpipe.sax
- HTMLHighlighter.TagAction - Class in com.kohlschutter.boilerpipe.sax
I
- ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- ignorableWhitespace(char[], int, int) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- IgnoreBlocksAfterContentFilter - Class in com.kohlschutter.boilerpipe.filters.english
-
Marks all blocks as "non-content" that occur after blocks that have been marked
DefaultLabels.INDICATES_END_OF_TEXT. - IgnoreBlocksAfterContentFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- IgnoreBlocksAfterContentFromEndFilter - Class in com.kohlschutter.boilerpipe.filters.english
-
Marks all blocks as "non-content" that occur after blocks that have been marked
DefaultLabels.INDICATES_END_OF_TEXT, and after any content block. - IgnoreBlocksAfterContentFromEndFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
- Image - Class in com.kohlschutter.boilerpipe.document
-
Represents an Image resource that is contained in the document.
- Image(String, String, String, String) - Constructor for class com.kohlschutter.boilerpipe.document.Image
- ImageExtractor - Class in com.kohlschutter.boilerpipe.sax
-
Extracts the images that are enclosed by extracted content.
- ImageExtractor() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor
- ImageExtractor.Implementation - Class in com.kohlschutter.boilerpipe.sax
- ImageExtractor.TagAction - Class in com.kohlschutter.boilerpipe.sax
- ImageExtractorDemo - Class in com.kohlschutter.boilerpipe.demo
-
Demonstrates how to use Boilerpipe to get the images within the main content.
- ImageExtractorDemo() - Constructor for class com.kohlschutter.boilerpipe.demo.ImageExtractorDemo
- Implementation() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- Implementation() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- inAnchor - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- inAnchorText - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- inBody - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- INDICATES_END_OF_TEXT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- inHighlight - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- inIgnorableElement - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- initDensities() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- InlineTagLabelAction(LabelAction) - Constructor for class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- InputSourceable - Interface in com.kohlschutter.boilerpipe.sax
-
An InputSourceable can return an arbitrary number of new
InputSources for a given document. - INSTANCE - Static variable in class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
-
Returns the singleton instance of
SimpleEstimator - INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
-
Returns the default instance for
PrintDebugFilter, which dumps debug information toSystem.out - INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
- INSTANCE - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
- INSTANCE_200 - Static variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- INSTANCE_EXPAND_TO_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- INSTANCE_EXPAND_TO_SAME_TAGLEVEL_MIN_WORDS - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- INSTANCE_KEEP_TITLE - Static variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
- INSTANCE_PRE - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
- INSTANCE_STRICTLY_NOT_CONTENT - Static variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
- INSTANCE_TEXT - Static variable in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
- InvertedFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Reverts the "isContent" flag for all
TextBlocks - InvertedFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
- is - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeSAXInput
- isBlockLevel - Variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- isClause(CharSequence) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- isContent - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- isContent() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- isDigit(char) - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
- isLowQuality(TextDocumentStatistics, TextDocumentStatistics) - Method in class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
-
Given the statistics of the document before and after applying the
BoilerpipeExtractor, can we regard the extraction quality (too) low? Works well withDefaultExtractor,ArticleExtractorand others. - isOutputHighlightOnly() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
If true, only HTML enclosed within highlighted content will be returned
- isWord(String) - Static method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
K
- KEEP_EVERYTHING_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Dummy Extractor; should return the input text.
- KeepEverythingExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
Marks everything as content.
- KeepEverythingExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
- KeepEverythingWithMinKWordsExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A full-text extractor which extracts the largest text component of a page.
- KeepEverythingWithMinKWordsExtractor(int) - Constructor for class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
- KeepLargestBlockFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Keeps the largest
TextBlockonly (by the number of words). - KeepLargestBlockFilter(boolean, int) - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- KeepLargestFulltextBlockFilter - Class in com.kohlschutter.boilerpipe.filters.english
-
Keeps the largest
TextBlockonly (by the number of words). - KeepLargestFulltextBlockFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
L
- LabelAction - Class in com.kohlschutter.boilerpipe.labels
-
Helps adding labels to
TextBlocks. - LabelAction(String...) - Constructor for class com.kohlschutter.boilerpipe.labels.LabelAction
- LabelFusion - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Fuses adjacent blocks if their labels are equal.
- LabelFusion() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
-
Creates a new
LabelFusioninstance. - labelPrefix - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
- labels - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- labels - Variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
- labels - Variable in class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
- labels - Variable in class com.kohlschutter.boilerpipe.labels.LabelAction
- labelStack - Variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- labelStacks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- LabelToBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks all blocks that contain a given label as "boilerplate".
- LabelToBoilerplateFilter(String...) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
- LabelToContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks all blocks that contain a given label as "content".
- LabelToContentFilter(String...) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
- labelToKeep - Variable in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
- LargeBlockSameTagLevelToContentFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks all blocks as content that: are on the same tag-level as very likely main content (usually the level of the largest block) have a significant number of words, currently: at least 100
- LargeBlockSameTagLevelToContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
- LARGEST_CONTENT_EXTRACTOR - Static variable in class com.kohlschutter.boilerpipe.extractors.CommonExtractors
-
Like
DefaultExtractor, but keeps the largest text block only. - LargestContentExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A full-text extractor which extracts the largest text component of a page.
- LargestContentExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
- lastEndTag - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- lastEvent - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- lastStartTag - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- LI - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- linkDensity - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- linksBuffer - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- linksHighlight - Variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- ListAtEndFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks nested list-item blocks after the end of the main content.
- ListAtEndFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
M
- main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.HTMLHighlightDemo
- main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.ImageExtractorDemo
- main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.Oneliner
- main(String[]) - Static method in class com.kohlschutter.boilerpipe.demo.UsingSAX
- MarkEverythingBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks all blocks as boilerplate.
- MarkEverythingBoilerplateFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
- MarkEverythingContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks all blocks as content.
- MarkEverythingContentFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
- MARKUP_PREFIX - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- markupLabelsOnly(Set<String>) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
- MarkupTagAction - Class in com.kohlschutter.boilerpipe.sax
-
Assigns labels for element CSS classes and ids to the corresponding
TextBlock. - MarkupTagAction(boolean) - Constructor for class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- MAX_DISTANCE_1 - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- MAX_DISTANCE_1_CONTENT_ONLY - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- MAX_DISTANCE_1_CONTENT_ONLY_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- MAX_DISTANCE_1_SAME_TAGLEVEL - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- maxBlocksDistance - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- meetsCondition(TextBlock) - Method in interface com.kohlschutter.boilerpipe.conditions.TextBlockCondition
-
Returns
trueiff the givenTextBlocktb meets the defined condition. - mergeNext(TextBlock) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- MIGHT_BE_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- MinClauseWordsFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Keeps only blocks that have at least one segment fragment ("clause") with at least k words (default: 5).
- MinClauseWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- MinClauseWordsFilter(int, boolean) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- MinFulltextWordsFilter - Class in com.kohlschutter.boilerpipe.filters.english
-
Keeps only those content blocks which contain at least k full-text words (measured by
HeuristicFilterBase.getNumFullTextWords(TextBlock)). - MinFulltextWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- minNumWords - Variable in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- minWords - Variable in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- minWords - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- minWords - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- minWords - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
- MinWordsFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Keeps only those content blocks which contain at least k words.
- MinWordsFilter(int) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
N
- newExtractingInstance() - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Creates a new
HTMLHighlighter, which is set-up to return only the extracted HTML text, including enclosed markup. - newHighlightingInstance() - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Creates a new
HTMLHighlighter, which is set-up to return the full HTML text, with the extracted text portion highlighted. - nullTrim(String) - Static method in class com.kohlschutter.boilerpipe.document.Image
- numBlocks - Variable in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
- numFullTextWords - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- numWords - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- numWords - Variable in class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
- numWordsInAnchorText - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- numWordsInWrappedLines - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- NumWordsRulesClassifier - Class in com.kohlschutter.boilerpipe.filters.english
-
Classifies
TextBlocks as content/not-content through rules that have been determined using the C4.8 machine learning algorithm, as described in the paper "Boilerplate Detection using Shallow Text Features" (WSDM 2010), particularly using number of words per block and link density per block. - NumWordsRulesClassifier() - Constructor for class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
- NumWordsRulesExtractor - Class in com.kohlschutter.boilerpipe.extractors
-
A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).
- NumWordsRulesExtractor() - Constructor for class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
- numWrappedLines - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
O
- offsetBlocks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- offsetBlocksEnd - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- offsetBlocksStart - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- Oneliner - Class in com.kohlschutter.boilerpipe.demo
-
Demonstrates how to use Boilerpipe to get the main content as plain text.
- Oneliner() - Constructor for class com.kohlschutter.boilerpipe.demo.Oneliner
- out - Variable in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
- outputHighlightOnly - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
P
- PAT_CHARSET - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLFetcher
- PAT_CLAUSE_DELIMITER - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- PAT_FONT_SIZE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
- PAT_NOT_WORD_BOUNDARY - Static variable in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
- PAT_NUM - Static variable in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- PAT_REMOVE_CHARACTERS - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- PAT_SUPER_TAG - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- PAT_TAG_NO_TEXT - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- PAT_VALID_WORD_CHARACTER - Static variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- PAT_WHITESPACE - Variable in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- PAT_WORD_BOUNDARY - Static variable in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
- PATTERNS_SHORT - Static variable in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
- postHighlight - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- potentialTitles - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- preHighlight - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- PrintDebugFilter - Class in com.kohlschutter.boilerpipe.filters.debug
-
Prints debug information about the current state of the TextDocument.
- PrintDebugFilter(PrintWriter) - Constructor for class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
-
Creates a new instance of
PrintDebugFilter. - process(TextDocument) - Method in interface com.kohlschutter.boilerpipe.BoilerpipeFilter
-
Processes the given document
doc. - process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ArticleExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.ArticleSentencesExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.CanolaExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.DefaultExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.KeepEverythingExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.KeepEverythingWithMinKWordsExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.LargestContentExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.extractors.NumWordsRulesExtractor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.debug.PrintDebugFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.DensityRulesClassifier
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.IgnoreBlocksAfterContentFromEndFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.KeepLargestFulltextBlockFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.MinFulltextWordsFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.NumWordsRulesClassifier
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.AddPrecedingLabelsFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ArticleMetadataFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ContentFusion
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.DocumentTitleMatchClassifier
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ExpandTitleToContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.KeepLargestBlockFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LabelFusion
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.LargeBlockSameTagLevelToContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.ListAtEndFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.BoilerplateBlockFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.InvertedFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.LabelToBoilerplateFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.LabelToContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingBoilerplateFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MarkEverythingContentFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinClauseWordsFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.MinWordsFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
- process(TextDocument) - Method in class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
- process(TextDocument, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Processes the given
TextDocumentand the original HTML text (as a String). - process(TextDocument, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
-
Processes the given
TextDocumentand the original HTML text (as a String). - process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Processes the given
TextDocumentand the original HTML text (as anInputSource). - process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- process(TextDocument, InputSource) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
-
Processes the given
TextDocumentand the original HTML text (as anInputSource). - process(URL, BoilerpipeExtractor) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Fetches the given
URLusingHTMLFetcherand processes the retrieved HTML using the specifiedBoilerpipeExtractor. - process(URL, BoilerpipeExtractor) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor
-
Fetches the given
URLusingHTMLFetcherand processes the retrieved HTML using the specifiedBoilerpipeExtractor. - processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- processingInstruction(String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
R
- recycle() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
Recycles this instance.
- removeLabel(String) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
S
- sameTagLevelOnly - Variable in class com.kohlschutter.boilerpipe.filters.heuristics.BlockProximityFusion
- sbLastWasWhitespace - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- serialVersionUID - Static variable in exception com.kohlschutter.boilerpipe.BoilerpipeProcessingException
- serialVersionUID - Static variable in class com.kohlschutter.boilerpipe.sax.DefaultTagActionMap
- serialVersionUID - Static variable in class com.kohlschutter.boilerpipe.sax.TagActionMap
- setContentHandler(BoilerpipeHTMLContentHandler) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
- setContentHandler(ContentHandler) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
- setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- setDocumentLocator(Locator) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- setExtraStyleSheet(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Sets the extra stylesheet definition that will be inserted in the HEAD element.
- setIsContent(boolean) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- setOutputHighlightOnly(boolean) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Sets whether only HTML enclosed within highlighted content will be returned, or the whole HTML document.
- setPostHighlight(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Sets the string that will be inserted after any highlighted HTML block.
- setPreHighlight(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
-
Sets the string that will be inserted prior to any highlighted HTML block.
- setTagAction(String, TagAction) - Method in class com.kohlschutter.boilerpipe.sax.TagActionMap
-
Sets a particular
TagActionfor a given tag. - setTagLevel(int) - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- setTagWhitelist(Map<String, Set<String>>) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- setTitle(String) - Method in class com.kohlschutter.boilerpipe.document.TextDocument
-
Updates the "main" title for this document.
- setTitle(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- SimpleBlockFusionProcessor - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Merges two subsequent blocks if their text densities are equal.
- SimpleBlockFusionProcessor() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.SimpleBlockFusionProcessor
- SimpleEstimator - Class in com.kohlschutter.boilerpipe.estimators
-
Estimates the "goodness" of a
BoilerpipeExtractoron a given document. - SimpleEstimator() - Constructor for class com.kohlschutter.boilerpipe.estimators.SimpleEstimator
- skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- skippedEntity(String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- SplitParagraphBlocksFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Splits TextBlocks at paragraph boundaries.
- SplitParagraphBlocksFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.simple.SplitParagraphBlocksFilter
- src - Variable in class com.kohlschutter.boilerpipe.document.Image
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.BlockTagLabelAction
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.CommonTagActions.InlineTagLabelAction
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.MarkupTagAction
- start(BoilerpipeHTMLContentHandler, String, String, Attributes) - Method in interface com.kohlschutter.boilerpipe.sax.TagAction
- START_TAG - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- startDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- startDocument() - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- startDocument() - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- startElement(String, String, String, Attributes) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.Implementation
- startPrefixMapping(String, String) - Method in class com.kohlschutter.boilerpipe.sax.ImageExtractor.Implementation
- startsWithNumber(String, int, String...) - Static method in class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
-
Checks whether the given text t starts with a sequence of digits, followed by one of the given strings.
- STRICTLY_NOT_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- SurroundingToContentFilter - Class in com.kohlschutter.boilerpipe.filters.simple
-
Marks blocks as "content" if their preceding and following blocks are both already marked "content", and the given
TextBlockConditionis met. - SurroundingToContentFilter(TextBlockCondition) - Constructor for class com.kohlschutter.boilerpipe.filters.simple.SurroundingToContentFilter
T
- t1 - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- t2 - Variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions.Chained
- TA_ANCHOR_TEXT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag as "anchor" (this should usually only be set for the
<A>tag). - TA_BLOCK_LEVEL - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Explicitly marks this tag a simple "block-level" element, which always generates whitespace
- TA_BODY - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag the body element (this should usually only be set for the
<BODY>tag). - TA_FONT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Special TagAction for the
<FONT>tag, which keeps track of the absolute and relative font size. - TA_HEAD - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag as "ignorable", i.e.
- TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- TA_IGNORABLE_ELEMENT - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
- TA_INLINE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Deprecated.Use
CommonTagActions.TA_INLINE_WHITESPACEinstead - TA_INLINE_NO_WHITESPACE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag a simple "inline" element, which neither generates whitespace, nor a new block.
- TA_INLINE_WHITESPACE - Static variable in class com.kohlschutter.boilerpipe.sax.CommonTagActions
-
Marks this tag a simple "inline" element, which generates whitespace, but no new block.
- TAG_ACTIONS - Static variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- TAG_ACTIONS - Static variable in class com.kohlschutter.boilerpipe.sax.ImageExtractor
- TagAction - Interface in com.kohlschutter.boilerpipe.sax
-
Defines an action that is to be performed whenever a particular tag occurs during HTML parsing.
- TagAction() - Constructor for class com.kohlschutter.boilerpipe.sax.HTMLHighlighter.TagAction
- TagAction() - Constructor for class com.kohlschutter.boilerpipe.sax.ImageExtractor.TagAction
- TagActionMap - Class in com.kohlschutter.boilerpipe.sax
-
Base class for definition a set of
TagActions that are to be used for the HTML parsing process. - TagActionMap() - Constructor for class com.kohlschutter.boilerpipe.sax.TagActionMap
- tagActions - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- tagLevel - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- tagLevel - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- tagWhitelist - Variable in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
- TerminatingBlocksFinder - Class in com.kohlschutter.boilerpipe.filters.english
-
Finds blocks which are potentially indicating the end of an article text and marks them with
DefaultLabels.INDICATES_END_OF_TEXT. - TerminatingBlocksFinder() - Constructor for class com.kohlschutter.boilerpipe.filters.english.TerminatingBlocksFinder
- text - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- TextBlock - Class in com.kohlschutter.boilerpipe.document
-
Describes a block of text.
- TextBlock(String) - Constructor for class com.kohlschutter.boilerpipe.document.TextBlock
- TextBlock(String, BitSet, int, int, int, int, int) - Constructor for class com.kohlschutter.boilerpipe.document.TextBlock
- TextBlockCondition - Interface in com.kohlschutter.boilerpipe.conditions
-
Evaluates whether a given
TextBlockmeets a certain condition. - textBlocks - Variable in class com.kohlschutter.boilerpipe.document.TextDocument
- textBlocks - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- textBuffer - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- textDensity - Variable in class com.kohlschutter.boilerpipe.document.TextBlock
- TextDocument - Class in com.kohlschutter.boilerpipe.document
-
A text document, consisting of one or more
TextBlocks. - TextDocument(String, List<TextBlock>) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocument
-
Creates a new
TextDocumentwith givenTextBlocks and given title. - TextDocument(List<TextBlock>) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocument
-
Creates a new
TextDocumentwith givenTextBlocks, and no title. - TextDocumentStatistics - Class in com.kohlschutter.boilerpipe.document
-
Provides shallow statistics on a given
TextDocument - TextDocumentStatistics(TextDocument, boolean) - Constructor for class com.kohlschutter.boilerpipe.document.TextDocumentStatistics
-
Computes statistics on a given
TextDocument. - textElementIdx - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- title - Variable in class com.kohlschutter.boilerpipe.document.TextDocument
- title - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- TITLE - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
- toInputSource() - Method in class com.kohlschutter.boilerpipe.sax.HTMLDocument
- toInputSource() - Method in interface com.kohlschutter.boilerpipe.sax.InputSourceable
- tokenBuffer - Variable in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
- tokenize(CharSequence) - Static method in class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
-
Tokenizes the text and returns an array of tokens.
- toString() - Method in class com.kohlschutter.boilerpipe.document.Image
- toString() - Method in class com.kohlschutter.boilerpipe.document.TextBlock
- toString() - Method in class com.kohlschutter.boilerpipe.labels.LabelAction
- toTextDocument() - Method in interface com.kohlschutter.boilerpipe.BoilerpipeDocumentSource
- toTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler
-
Returns a
TextDocumentcontaining the extractedTextBlocks. - toTextDocument() - Method in class com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLParser
-
Returns a
TextDocumentcontaining the extractedTextBlocks. - TrailingHeadlineToBoilerplateFilter - Class in com.kohlschutter.boilerpipe.filters.heuristics
-
Marks trailing headlines (
TextBlocks that have the labelDefaultLabels.HEADING) as boilerplate. - TrailingHeadlineToBoilerplateFilter() - Constructor for class com.kohlschutter.boilerpipe.filters.heuristics.TrailingHeadlineToBoilerplateFilter
U
- UnicodeTokenizer - Class in com.kohlschutter.boilerpipe.util
-
Tokenizes text according to Unicode word boundaries and strips off non-word characters.
- UnicodeTokenizer() - Constructor for class com.kohlschutter.boilerpipe.util.UnicodeTokenizer
- UsingSAX - Class in com.kohlschutter.boilerpipe.demo
-
Demonstrates how to use Boilerpipe when working with
InputSources. - UsingSAX() - Constructor for class com.kohlschutter.boilerpipe.demo.UsingSAX
V
- valueOf(String) - Static method in enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
-
Returns the enum constant of this type with the specified name.
- values() - Static method in enum com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
-
Returns an array containing the constants of this enum type, in the order they are declared.
- VERY_LIKELY_CONTENT - Static variable in class com.kohlschutter.boilerpipe.labels.DefaultLabels
W
- WHITESPACE - com.kohlschutter.boilerpipe.sax.BoilerpipeHTMLContentHandler.Event
- width - Variable in class com.kohlschutter.boilerpipe.document.Image
X
- xmlEncode(String) - Static method in class com.kohlschutter.boilerpipe.sax.HTMLHighlighter
All Classes All Packages