Package net.htmlparser.jericho
Jericho HTML Parser 3.3
A java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML. Also provides high-level HTML form manipulation functions.
For an introduction to the API, the documentation of the Source
class is the best place to start.
For a summary of features and sample applications, visit the homepage at http://jerichohtml.sourceforge.net
For downloads, support and updates visit the SourceForge.net project page at http://sourceforge.net/projects/jerichohtml/
The Jericho HTML Parser is an open source library released under both the Eclipse Public License (EPL) and GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in either one of these licence documents.
-
Interface Summary Interface Description CharStreamSource Represents a character stream source.HTMLElementName Contains static fields representing the names of all elements defined in the HTML 4.01 specification and the draft HTML 5 specification.Logger Defines the interface for handling log messages.LoggerProvider OutputSegment Defines the interface for an output segment, which is used in anOutputDocument
to replace segments of the source document with other text.ParseText Represents the text from the source document that is to be parsed. -
Class Summary Class Description Attribute Attributes BasicLogFormatter Provides basic formatting for log messages.CharacterEntityReference Represents an HTML Character Entity Reference.CharacterReference Represents an HTML Character Reference, implemented by the subclassesCharacterEntityReference
andNumericCharacterReference
.CharStreamSourceUtil Contains static utility methods for manipulating the way data is retrieved from aCharStreamSource
object.Config Encapsulates global configuration properties which determine the behaviour of various functions.Config.CompatibilityMode Represents a set of configuration parameters that relate to user agent compatibility issues.Element EndTag EndTagType Defines the syntax for an end tag type.EndTagTypeGenericImplementation Provides a generic implementation of the abstractEndTagType
class based on the most common end tag behaviour.FormControl Represents an HTML form control.FormControlOutputStyle.ConfigDisplayValue Contains static properties that configure theFormControlOutputStyle.DISPLAY_VALUE
form control output style.FormField Represents a field in an HTML form, a field being defined as the group of all form controls having the same name.FormFields Represents a collection ofFormField
objects.HTMLElements Contains static methods which group HTML element names by the characteristics of their associated elements.MasonTagTypes MicrosoftConditionalCommentTagTypes Contains tag types representing Microsoft® conditional comments.MicrosoftTagTypes Deprecated. Use the tag types defined inMicrosoftConditionalCommentTagTypes
instead.NumericCharacterReference Represents an HTML Numeric Character Reference.OutputDocument PHPTagTypes Renderer Performs a simple rendering of HTML markup into text.RowColumnVector Represents the row and column number of a character position in the source document.Segment Represents a segment of aSource
document.Source Represents a source HTML document.SourceCompactor Compacts HTML source by removing all unnecessary white space.SourceFormatter Formats HTML source by laying out each non-inline-level element on a new line with an appropriate indent.StartTag StartTagType Defines the syntax for a start tag type.StartTagTypeGenericImplementation Provides a generic implementation of the abstractStartTagType
class based on the most common start tag behaviour.StreamedSource Represents a streamed source HTML document.Tag TagType Defines the syntax for a tag type that can be recognised by the parser.TextExtractor Extracts the textual content from HTML markup.Util Contains miscellaneous utility methods not directly associated with the HTML Parser library.WriterLogger Provides an implementation of theLogger
interface that sends output to the specifiedjava.io.Writer
. -
Enum Summary Enum Description FormControlOutputStyle An enumerated type representing the three major output styles of a form control's output element.FormControlType Represents the control type of aFormControl
.