Class Segment
- java.lang.Object
-
- Segment
-
- All Implemented Interfaces:
java.lang.CharSequence
,java.lang.Comparable<Segment>
- Direct Known Subclasses:
Attribute
,CharacterReference
,Element
,FormControl
,SequentialListSegment
,Source
,Tag
public class Segment extends java.lang.Object implements java.lang.Comparable<Segment>, java.lang.CharSequence
Represents a segment of aSource
document.Many of the tag search methods are defined in this class.
The span of a segment is defined by the combination of its begin and end character positions.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description char
charAt(int index)
Returns the character at the specified index.int
compareTo(Segment segment)
Compares thisSegment
object to another object.boolean
encloses(int pos)
Indicates whether this segment encloses the specified character position in the source document.boolean
encloses(Segment segment)
Indicates whether thisSegment
encloses the specifiedSegment
.boolean
equals(java.lang.Object object)
Compares the specified object with thisSegment
for equality.java.util.List<CharacterReference>
getAllCharacterReferences()
Returns a list of allCharacterReference
objects that are enclosed by this segment.java.util.List<Element>
getAllElements()
java.util.List<Element>
getAllElements(java.lang.String name)
java.util.List<Element>
getAllElements(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
java.util.List<Element>
getAllElements(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
java.util.List<Element>
getAllElements(StartTagType startTagType)
java.util.List<Element>
getAllElementsByClass(java.lang.String className)
java.util.List<StartTag>
getAllStartTags()
java.util.List<StartTag>
getAllStartTags(java.lang.String name)
java.util.List<StartTag>
getAllStartTags(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
java.util.List<StartTag>
getAllStartTags(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
java.util.List<StartTag>
getAllStartTags(StartTagType startTagType)
java.util.List<StartTag>
getAllStartTagsByClass(java.lang.String className)
java.util.List<Tag>
getAllTags()
java.util.List<Tag>
getAllTags(TagType tagType)
int
getBegin()
Returns the character position in theSource
document at which this segment begins, inclusive.java.util.List<Element>
getChildElements()
Returns a list of the immediate children of this segment in the document element hierarchy.java.lang.String
getDebugInfo()
Returns a string representation of this object useful for debugging purposes.int
getEnd()
Returns the character position in theSource
document immediately after the end of this segment.Element
getFirstElement()
Element
getFirstElement(java.lang.String name)
Element
getFirstElement(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
Element
getFirstElement(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
Element
getFirstElementByClass(java.lang.String className)
StartTag
getFirstStartTag()
StartTag
getFirstStartTag(java.lang.String name)
StartTag
getFirstStartTag(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
StartTag
getFirstStartTag(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
StartTag
getFirstStartTag(StartTagType startTagType)
StartTag
getFirstStartTagByClass(java.lang.String className)
java.util.List<FormControl>
getFormControls()
Returns a list of theFormControl
objects that are enclosed by this segment.FormFields
getFormFields()
Returns theFormFields
object representing all form fields that are enclosed by this segment.int
getMaxDepthIndicator()
Returns an indication of the maximum depth of nested elements within this segment.java.util.Iterator<Segment>
getNodeIterator()
Returns an iterator over every tag, character reference and plain text segment contained within this segment.Renderer
getRenderer()
Performs a simple rendering of the HTML markup in this segment into text.RowColumnVector
getRowColumnVector()
Returns aRowColumnVector
object representing the row and column number of the start of this segment in the source document.Source
getSource()
Returns theSource
document containing this segment.java.util.List<Segment>
getStyleURISegments()
TextExtractor
getTextExtractor()
Extracts the textual content from the HTML markup of this segment.java.util.List<Attribute>
getURIAttributes()
int
hashCode()
Returns a hash code value for the segment.void
ignoreWhenParsing()
Causes the this segment to be ignored when parsing.boolean
isWhiteSpace()
Indicates whether this segment consists entirely of white space.static boolean
isWhiteSpace(char ch)
Indicates whether the specified character is white space.int
length()
Returns the length of the segment.Attributes
parseAttributes()
Parses anyAttributes
within this segment.java.lang.CharSequence
subSequence(int beginIndex, int endIndex)
Returns a new character sequence that is a subsequence of this sequence.java.lang.String
toString()
Returns the source text of this segment as aString
.
-
-
-
Method Detail
-
getSource
public final Source getSource()
Returns theSource
document containing this segment.If a
StreamedSource
is in use, this method throws anUnsupportedOperationException
.- Returns:
- the
Source
document containing this segment.
-
getBegin
public final int getBegin()
Returns the character position in theSource
document at which this segment begins, inclusive.Use the
Source.getRowColumnVector(int pos)
method to determine the row and column numbers corresponding to this character position.- Returns:
- the character position in the
Source
document at which this segment begins, inclusive.
-
getEnd
public final int getEnd()
Returns the character position in theSource
document immediately after the end of this segment.The character at the position specified by this property is not included in the segment.
- Returns:
- the character position in the
Source
document immediately after the end of this segment. - See Also:
getBegin()
-
equals
public final boolean equals(java.lang.Object object)
Compares the specified object with thisSegment
for equality.Returns
true
if and only if the specified object is also aSegment
, and both segments have the sameSource
, and the same begin and end positions.- Overrides:
equals
in classjava.lang.Object
- Parameters:
object
- the object to be compared for equality with thisSegment
.- Returns:
true
if the specified object is equal to thisSegment
, otherwisefalse
.
-
hashCode
public int hashCode()
Returns a hash code value for the segment.The current implementation returns the sum of the begin and end positions, although this is not guaranteed in future versions.
- Overrides:
hashCode
in classjava.lang.Object
- Returns:
- a hash code value for the segment.
-
length
public int length()
Returns the length of the segment. This is defined as the number of characters between the begin and end positions.- Specified by:
length
in interfacejava.lang.CharSequence
- Returns:
- the length of the segment.
-
encloses
public final boolean encloses(Segment segment)
Indicates whether thisSegment
encloses the specifiedSegment
.This is the case if
getBegin()
<=segment.
getBegin()
&&
getEnd()
>=segment.
getEnd()
.Note that a segment encloses itself.
- Parameters:
segment
- the segment to be tested for being enclosed by this segment.- Returns:
true
if thisSegment
encloses the specifiedSegment
, otherwisefalse
.
-
encloses
public final boolean encloses(int pos)
Indicates whether this segment encloses the specified character position in the source document.This is the case if
getBegin()
<= pos <
getEnd()
.- Parameters:
pos
- the position in theSource
document.- Returns:
true
if this segment encloses the specified character position in the source document, otherwisefalse
.
-
toString
public java.lang.String toString()
Returns the source text of this segment as aString
.The returned
String
is newly created with every call to this method, unless this segment is itself an instance ofSource
.- Specified by:
toString
in interfacejava.lang.CharSequence
- Overrides:
toString
in classjava.lang.Object
- Returns:
- the source text of this segment as a
String
.
-
getRenderer
public Renderer getRenderer()
Performs a simple rendering of the HTML markup in this segment into text.The output can be configured by setting any number of properties on the returned
Renderer
instance before obtaining its output.- Returns:
- an instance of
Renderer
based on this segment. - See Also:
getTextExtractor()
-
getTextExtractor
public TextExtractor getTextExtractor()
Extracts the textual content from the HTML markup of this segment.The output can be configured by setting properties on the returned
TextExtractor
instance before obtaining its output.- Returns:
- an instance of
TextExtractor
based on this segment. - See Also:
getRenderer()
-
getNodeIterator
public java.util.Iterator<Segment> getNodeIterator()
Returns an iterator over every tag, character reference and plain text segment contained within this segment.See the
Source.iterator()
method for a detailed description.- Example:
-
The following code demonstrates the typical usage of this method to make an exact copy of this segment to
writer
(assuming no server tags are present):for (Iterator<Segment> nodeIterator=segment.getNoteIterator(); nodeIterator.hasNext();) { Segment nodeSegment=nodeIterator.next(); if (nodeSegment instanceof Tag) { Tag tag=(Tag)nodeSegment; // HANDLE TAG // Uncomment the following line to ensure each tag is valid XML: // writer.write(tag.tidy()); continue; } else if (nodeSegment instanceof CharacterReference) { CharacterReference characterReference=(CharacterReference)nodeSegment; // HANDLE CHARACTER REFERENCE // Uncomment the following line to decode all character references instead of copying them verbatim: // characterReference.appendCharTo(writer); continue; } else { // HANDLE PLAIN TEXT } // unless specific handling has prevented getting to here, simply output the segment as is: writer.write(nodeSegment.toString()); }
- Returns:
- an iterator over every tag, character reference and plain text segment contained within this segment.
-
getAllTags
public java.util.List<Tag> getAllTags()
Returns a list of allTag
objects that are enclosed by this segment.The
Source.fullSequentialParse()
method should be called after construction of theSource
object if this method is to be used on a large proportion of the source. It is called automatically if this method is called on theSource
object itself.See the
Tag
class documentation for more details about the behaviour of this method.
-
getAllTags
public java.util.List<Tag> getAllTags(TagType tagType)
Returns a list of allTag
objects of the specified type that are enclosed by this segment.See the
Tag
class documentation for more details about the behaviour of this method.Specifying a
null
argument to thetagType
parameter is equivalent togetAllTags()
.- Parameters:
tagType
- the type of tags to get.- Returns:
- a list of all
Tag
objects of the specified type that are enclosed by this segment. - See Also:
getAllStartTags(StartTagType)
-
getAllStartTags
public java.util.List<StartTag> getAllStartTags()
Returns a list of allStartTag
objects that are enclosed by this segment.The
Source.fullSequentialParse()
method should be called after construction of theSource
object if this method is to be used on a large proportion of the source. It is called automatically if this method is called on theSource
object itself.See the
Tag
class documentation for more details about the behaviour of this method.
-
getAllStartTags
public java.util.List<StartTag> getAllStartTags(StartTagType startTagType)
Returns a list of allStartTag
objects of the specified type that are enclosed by this segment.See the
Tag
class documentation for more details about the behaviour of this method.Specifying a
null
argument to thestartTagType
parameter is equivalent togetAllStartTags()
.
-
getAllStartTags
public java.util.List<StartTag> getAllStartTags(java.lang.String name)
Returns a list of all normalStartTag
objects with the specified name that are enclosed by this segment.See the
Tag
class documentation for more details about the behaviour of this method.Specifying a
null
argument to thename
parameter is equivalent togetAllStartTags()
, which may include non-normal start tags.This method also returns unregistered tags if the specified name is not a valid XML tag name.
-
getAllStartTags
public java.util.List<StartTag> getAllStartTags(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
Returns a list of allStartTag
objects with the specified attribute name/value pair that are enclosed by this segment.See the
Tag
class documentation for more details about the behaviour of this method.- Parameters:
attributeName
- the attribute name (case insensitive) to search for, must not benull
.value
- the value of the specified attribute to search for, must not benull
.valueCaseSensitive
- specifies whether the attribute value matching is case sensitive.- Returns:
- a list of all
StartTag
objects with the specified attribute name/value pair that are enclosed by this segment. - See Also:
getAllStartTags(String attributeName, Pattern valueRegexPattern)
-
getAllStartTags
public java.util.List<StartTag> getAllStartTags(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
Returns a list of allStartTag
objects with the specified attribute name and value pattern that are enclosed by this segment.Specifying a
null
argument to thevalueRegexPattern
parameter performs the search on the attribute name only, without regard to the attribute value. This will also match an attribute that has no value at all.See the
Tag
class documentation for more details about the behaviour of this method.- Parameters:
attributeName
- the attribute name (case insensitive) to search for, must not benull
.valueRegexPattern
- the regular expression pattern that must match the attribute value, may benull
.- Returns:
- a list of all
StartTag
objects with the specified attribute name and value pattern that are enclosed by this segment. - See Also:
getAllStartTags(String attributeName, String value, boolean valueCaseSensitive)
-
getAllStartTagsByClass
public java.util.List<StartTag> getAllStartTagsByClass(java.lang.String className)
Returns a list of allStartTag
objects with the specified class that are enclosed by this segment.This matches start tags with a
class
attribute that contains the specified class name, either as an exact match or where the specified class name is one of multiple class names separated by white space in the attribute value.See the
Tag
class documentation for more details about the behaviour of this method.
-
getChildElements
public java.util.List<Element> getChildElements()
Returns a list of the immediate children of this segment in the document element hierarchy.The returned list may include an element that extends beyond the end of this segment, as long as it begins within this segment.
An element found at the start of this segment is included in the list. Note however that if this segment is an
Element
, the overridingElement.getChildElements()
method is called instead, which only returns the children of the element.Calling
getChildElements()
on anElement
is much more efficient than calling it on aSegment
.The objects in the list are all of type
Element
.The
Source.fullSequentialParse()
method should be called after construction of theSource
object if this method is to be used on a large proportion of the source. It is called automatically if this method is called on theSource
object itself.See the
Source.getChildElements()
method for more details.- Returns:
- the a list of the immediate children of this segment in the document element hierarchy, guaranteed not
null
. - See Also:
Element.getParentElement()
-
getAllElements
public java.util.List<Element> getAllElements()
Returns a list of allElement
objects that are enclosed by this segment.The
Source.fullSequentialParse()
method should be called after construction of theSource
object if this method is to be used on a large proportion of the source. It is called automatically if this method is called on theSource
object itself.The elements returned correspond exactly with the start tags returned in the
getAllStartTags()
method.If this segment is itself an
Element
, the result includes this element in the list.
-
getAllElements
public java.util.List<Element> getAllElements(java.lang.String name)
Returns a list of allElement
objects with the specified name that are enclosed by this segment.The elements returned correspond with the start tags returned in the
getAllStartTags(String name)
method, except that elements which are not entirely enclosed by this segment are excluded.Specifying a
null
argument to thename
parameter is equivalent togetAllElements()
, which may include elements of non-normal tags.This method also returns elements consisting of unregistered tags if the specified name is not a valid XML tag name.
If this segment is itself an
Element
with the specified name, the result includes this element in the list.
-
getAllElements
public java.util.List<Element> getAllElements(StartTagType startTagType)
Returns a list of allElement
objects with start tags of the specified type that are enclosed by this segment.The elements returned correspond with the start tags returned in the
getAllTags(TagType)
method, except that elements which are not entirely enclosed by this segment are excluded.If this segment is itself an
Element
with the specified type, the result includes this element in the list.
-
getAllElements
public java.util.List<Element> getAllElements(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
Returns a list of allElement
objects with the specified attribute name/value pair that are enclosed by this segment.The elements returned correspond with the start tags returned in the
getAllStartTags(String attributeName, String value, boolean valueCaseSensitive)
method, except that elements which are not entirely enclosed by this segment are excluded.If this segment is itself an
Element
with the specified name/value pair, the result includes this element in the list.- Parameters:
attributeName
- the attribute name (case insensitive) to search for, must not benull
.value
- the value of the specified attribute to search for, must not benull
.valueCaseSensitive
- specifies whether the attribute value matching is case sensitive.- Returns:
- a list of all
Element
objects with the specified attribute name/value pair that are enclosed by this segment. - See Also:
getAllElements(String attributeName, Pattern valueRegexPattern)
-
getAllElements
public java.util.List<Element> getAllElements(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
Returns a list of allElement
objects with the specified attribute name and value pattern that are enclosed by this segment.The elements returned correspond with the start tags returned in the
getAllStartTags(String attributeName, Pattern valueRegexPattern)
method, except that elements which are not entirely enclosed by this segment are excluded.Specifying a
null
argument to thevalueRegexPattern
parameter performs the search on the attribute name only, without regard to the attribute value. This will also match an attribute that has no value at all.If this segment is itself an
Element
with the specified attribute name and value pattern, the result includes this element in the list.- Parameters:
attributeName
- the attribute name (case insensitive) to search for, must not benull
.valueRegexPattern
- the regular expression pattern that must match the attribute value, may benull
.- Returns:
- a list of all
Element
objects with the specified attribute name and value pattern that are enclosed by this segment. - See Also:
getAllElements(String attributeName, String value, boolean valueCaseSensitive)
-
getAllElementsByClass
public java.util.List<Element> getAllElementsByClass(java.lang.String className)
Returns a list of allElement
objects with the specified class that are enclosed by this segment.This matches elements with a
class
attribute that contains the specified class name, either as an exact match or where the specified class name is one of multiple class names separated by white space in the attribute value.The elements returned correspond with the start tags returned in the
getAllStartTagsByClass(String className)
method, except that elements which are not entirely enclosed by this segment are excluded.If this segment is itself an
Element
with the specified class, the result includes this element in the list.
-
getAllCharacterReferences
public java.util.List<CharacterReference> getAllCharacterReferences()
Returns a list of allCharacterReference
objects that are enclosed by this segment.- Returns:
- a list of all
CharacterReference
objects that are enclosed by this segment.
-
getURIAttributes
public java.util.List<Attribute> getURIAttributes()
Returns a list of all attributes enclosed by this segment that have URI values.According to the HTML 4.01 specification, the following attributes have URI values:
HTML element name Attribute name A
href APPLET
codebase APPLET
archive AREA
href BASE
href BLOCKQUOTE
cite BODY
background FORM
action FRAME
longdesc FRAME
src DEL
cite HEAD
profile IFRAME
longdesc IFRAME
src IMG
longdesc IMG
src IMG
usemap INPUT
src INPUT
usemap INS
cite LINK
href OBJECT
archive OBJECT
classid OBJECT
codebase OBJECT
data OBJECT
usemap Q
cite SCRIPT
src Attributes from other elements may also be returned if the attribute name matches one of those in the list above.
This method is often used in conjunction with the
getStyleURISegments()
method in order to find all URIs in a document.The attributes are returned in order of appearance.
- Returns:
- a list of all attributes enclosed by this segment that have URI values.
- See Also:
getStyleURISegments()
-
getStyleURISegments
public java.util.List<Segment> getStyleURISegments()
Returns a list of all URI segments inside the CSS ofSTYLE
elements andstyle
attribute values enclosed by this segment.If this segment does not contain any tags, the entire segment is assumed to be CSS.
The URI segments are found by searching the CSS for the functional notation "
url()
" as described in section 4.3.4 of the CSS2 specification.The segments are returned in order of appearance.
- Returns:
- a list of all URI segments inside
STYLE
elements andstyle
attribute values enclosed by this segment. - See Also:
getURIAttributes()
-
getFirstStartTag
public final StartTag getFirstStartTag()
Returns the firstStartTag
enclosed by this segment.This is functionally equivalent to
getAllStartTags()
.iterator().next()
, but does not search beyond the first start tag and returnsnull
if no such start tag exists.
-
getFirstStartTag
public final StartTag getFirstStartTag(StartTagType startTagType)
Returns the firstStartTag
of the specified type enclosed by this segment.This is functionally equivalent to
getAllStartTags(startTagType)
.iterator().next()
, but does not search beyond the first start tag and returnsnull
if no such start tag exists.
-
getFirstStartTag
public final StartTag getFirstStartTag(java.lang.String name)
Returns the first normalStartTag
enclosed by this segment.This is functionally equivalent to
getAllStartTags(name)
.iterator().next()
, but does not search beyond the first start tag and returnsnull
if no such start tag exists.Specifying a
null
argument to thename
parameter is equivalent togetFirstStartTag()
.
-
getFirstStartTag
public final StartTag getFirstStartTag(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
Returns the firstStartTag
with the specified attribute name/value pair enclosed by this segment.This is functionally equivalent to
getAllStartTags(attributeName,value,valueCaseSensitive)
.iterator().next()
, but does not search beyond the first start tag and returnsnull
if no such start tag exists.- Parameters:
attributeName
- the attribute name (case insensitive) to search for, must not benull
.value
- the value of the specified attribute to search for, must not benull
.valueCaseSensitive
- specifies whether the attribute value matching is case sensitive.- Returns:
- the first
StartTag
with the specified attribute name/value pair enclosed by this segment, ornull
if none exists. - See Also:
getFirstStartTag(String attributeName, Pattern valueRegexPattern)
-
getFirstStartTag
public final StartTag getFirstStartTag(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
Returns the firstStartTag
with the specified attribute name and value pattern that is enclosed by this segment.This is functionally equivalent to
getAllStartTags(attributeName,valueRegexPattern)
.iterator().next()
, but does not search beyond the first start tag and returnsnull
if no such start tag exists.- Parameters:
attributeName
- the attribute name (case insensitive) to search for, must not benull
.valueRegexPattern
- the regular expression pattern that must match the attribute value, may benull
.- Returns:
- the first
StartTag
with the specified attribute name and value pattern that is enclosed by this segment, ornull
if none exists. - See Also:
getFirstStartTag(String attributeName, String value, boolean valueCaseSensitive)
-
getFirstStartTagByClass
public final StartTag getFirstStartTagByClass(java.lang.String className)
Returns the firstStartTag
with the specified class that is enclosed by this segment.This is functionally equivalent to
getAllStartTagsByClass(className)
.iterator().next()
, but does not search beyond the first start tag and returnsnull
if no such start tag exists.
-
getFirstElement
public final Element getFirstElement()
Returns the firstElement
enclosed by this segment.This is functionally equivalent to
getAllElements()
.iterator().next()
, but does not search beyond the first enclosed element and returnsnull
if no such element exists.If this segment is itself an
Element
, this element is returned, not the first child element.
-
getFirstElement
public final Element getFirstElement(java.lang.String name)
Returns the first normalElement
with the specified name enclosed by this segment.This is functionally equivalent to
getAllElements(name)
.iterator().next()
, but does not search beyond the first enclosed element and returnsnull
if no such element exists.Specifying a
null
argument to thename
parameter is equivalent togetFirstElement()
.If this segment is itself an
Element
with the specified name, this element is returned.
-
getFirstElement
public final Element getFirstElement(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
Returns the firstElement
with the specified attribute name/value pair enclosed by this segment.This is functionally equivalent to
getAllElements(attributeName,value,valueCaseSensitive)
.iterator().next()
, but does not search beyond the first enclosed element and returnsnull
if no such element exists.If this segment is itself an
Element
with the specified attribute name/value pair, this element is returned.- Parameters:
attributeName
- the attribute name (case insensitive) to search for, must not benull
.value
- the value of the specified attribute to search for, must not benull
.valueCaseSensitive
- specifies whether the attribute value matching is case sensitive.- Returns:
- the first
Element
with the specified attribute name/value pair enclosed by this segment, ornull
if none exists. - See Also:
getFirstElement(String attributeName, Pattern valueRegexPattern)
-
getFirstElement
public final Element getFirstElement(java.lang.String attributeName, java.util.regex.Pattern valueRegexPattern)
Returns the firstElement
with the specified attribute name and value pattern that is enclosed by this segment.This is functionally equivalent to
getAllElements(attributeName,valueRegexPattern)
.iterator().next()
, but does not search beyond the first enclosed element and returnsnull
if no such element exists.If this segment is itself an
Element
with the specified attribute name and value pattern, this element is returned.- Parameters:
attributeName
- the attribute name (case insensitive) to search for, must not benull
.valueRegexPattern
- the regular expression pattern that must match the attribute value, may benull
.- Returns:
- the first
Element
with the specified attribute name and value pattern that is enclosed by this segment, ornull
if none exists. - See Also:
getFirstElement(String attributeName, String value, boolean valueCaseSensitive)
-
getFirstElementByClass
public final Element getFirstElementByClass(java.lang.String className)
Returns the firstElement
with the specified class that is enclosed by this segment.This is functionally equivalent to
getAllElementsByClass(className)
.iterator().next()
, but does not search beyond the first enclosed element and returnsnull
if no such element exists.If this segment is itself an
Element
with the specified class, this element is returned.
-
getFormControls
public java.util.List<FormControl> getFormControls()
Returns a list of theFormControl
objects that are enclosed by this segment.- Returns:
- a list of the
FormControl
objects that are enclosed by this segment.
-
getFormFields
public FormFields getFormFields()
Returns theFormFields
object representing all form fields that are enclosed by this segment.This is equivalent to
new FormFields
(
getFormControls()
)
.- Returns:
- the
FormFields
object representing all form fields that are enclosed by this segment. - See Also:
getFormControls()
-
parseAttributes
public Attributes parseAttributes()
Parses anyAttributes
within this segment. This method is only used in the unusual situation where attributes exist outside of a start tag. TheStartTag.getAttributes()
method should be used in normal situations.This is equivalent to
source.
parseAttributes
(
getBegin()
,
getEnd()
)
.- Returns:
- the
Attributes
within this segment, ornull
if too many errors occur while parsing.
-
ignoreWhenParsing
public void ignoreWhenParsing()
Causes the this segment to be ignored when parsing.Ignored segments are treated as blank spaces by the parsing mechanism, but are included as normal text in all other functions.
This method was originally the only means of preventing server tags located inside normal tags from interfering with the parsing of the tags (such as where an attribute of a normal tag uses a server tag to dynamically set its value), as well as preventing non-server tags from being recognised inside server tags.
It is not necessary to use this method to ignore server tags located inside normal tags, as the attributes parser automatically ignores any server tags.
It is not necessary to use this method to ignore non-server tags inside server tags, or the contents of
SCRIPT
elements, as the parser does this automatically when performing a full sequential parse.This leaves only very few scenarios where calling this method still provides a significant benefit.
One such case is where XML-style server tags are used inside normal tags. Here is an example using an XML-style JSP tag:
The first double-quote of<a href="<i18n:resource path="/Portal"/>?BACK=TRUE">back</a>
"/Portal"
will be interpreted as the end quote for thehref
attribute, as there is no way for the parser to recognise theil8n:resource
element as a server tag. Such use of XML-style server tags inside normal tags is generally seen as bad practice, but it is nevertheless valid JSP. The only way to ensure that this library is able to parse the normal tag surrounding it is to find these server tags first and call theignoreWhenParsing
method to ignore them before parsing the rest of the document.It is important to understand the difference between ignoring the segment when parsing and removing the segment completely. Any text inside a segment that is ignored when parsing is treated by most functions as content, and as such is included in the output of tools such as
TextExtractor
andRenderer
.To remove segments completely, create an
OutputDocument
and call itsremove(Segment)
orreplaceWithSpaces(int begin, int end)
method for each segment. Then create a new source document usingnew Source(outputDocument.toString())
and perform the desired operations on this new source object.Calling this method after the
Source.fullSequentialParse()
method has been called is not permitted and throws anIllegalStateException
.Any tags appearing in this segment that are found before this method is called will remain in the tag cache, and so will continue to be found by the tag search methods. If this is undesirable, the
Source.clearCache()
method can be called to remove them from the cache. Calling theSource.fullSequentialParse()
method after this method clears the cache automatically.For best performance, this method should be called on all segments that need to be ignored without calling any of the tag search methods in between.
-
compareTo
public int compareTo(Segment segment)
Compares thisSegment
object to another object.If the argument is not a
Segment
, aClassCastException
is thrown.A segment is considered to be before another segment if its begin position is earlier, or in the case that both segments begin at the same position, its end position is earlier.
Segments that begin and end at the same position are considered equal for the purposes of this comparison, even if they relate to different source documents.
Note: this class has a natural ordering that is inconsistent with equals. This means that this method may return zero in some cases where calling the
equals(Object)
method with the same argument returnsfalse
.- Specified by:
compareTo
in interfacejava.lang.Comparable<Segment>
- Parameters:
segment
- the segment to be compared- Returns:
- a negative integer, zero, or a positive integer as this segment is before, equal to, or after the specified segment.
- Throws:
java.lang.ClassCastException
- if the argument is not aSegment
-
isWhiteSpace
public final boolean isWhiteSpace()
Indicates whether this segment consists entirely of white space.- Returns:
true
if this segment consists entirely of white space, otherwisefalse
.
-
getMaxDepthIndicator
public int getMaxDepthIndicator()
Returns an indication of the maximum depth of nested elements within this segment.A high return value can indicate that the segment contains a large number of incorrectly nested tags that could result in a
StackOverflowException
if its content is parsed.The usefulness of this method is debatable as a
StackOverflowException
is a recoverable error that can be easily caught. The use of this method to pre-detect and avoid a stack overflow may save some memory and processing resources in certain circumstances, but the cost of calling this method to check every segment or document will very often exceed any benefit.It is up to the application developer to determine what return value constitutes an unreasonable level of nesting given the stack space allocated to the application and other factors.
Note that the return value is an approximation only and is usually greater than the actual maximum element depth that would be reported by calling the
Element.getDepth()
method on the most nested element.- Returns:
- an indication of the maximum depth of nested elements within this segment.
-
isWhiteSpace
public static final boolean isWhiteSpace(char ch)
Indicates whether the specified character is white space.The HTML 4.01 specification section 9.1 specifies the following white space characters:
- space (U+0020)
- tab (U+0009)
- form feed (U+000C)
- line feed (U+000A)
- carriage return (U+000D)
- zero-width space (U+200B)
Despite the explicit inclusion of the zero-width space in the HTML specification, Microsoft IE6 does not recognise them as white space and renders them as an unprintable character (empty square). Even zero-width spaces included using the numeric character reference
​
are rendered this way.- Parameters:
ch
- the character to test.- Returns:
true
if the specified character is white space, otherwisefalse
.
-
getRowColumnVector
public RowColumnVector getRowColumnVector()
Returns aRowColumnVector
object representing the row and column number of the start of this segment in the source document.- Returns:
- a
RowColumnVector
object representing the row and column number of the start of this segment in the source document. - See Also:
Source.getRowColumnVector(int pos)
-
getDebugInfo
public java.lang.String getDebugInfo()
Returns a string representation of this object useful for debugging purposes.- Returns:
- a string representation of this object useful for debugging purposes.
-
charAt
public char charAt(int index)
Returns the character at the specified index.This is logically equivalent to
toString().charAt(index)
for valid argument values0 <= index < length()
.However because this implementation works directly on the underlying document source string, it should not be assumed that an
IndexOutOfBoundsException
is thrown for an invalid argument value.- Specified by:
charAt
in interfacejava.lang.CharSequence
- Parameters:
index
- the index of the character.- Returns:
- the character at the specified index.
-
subSequence
public java.lang.CharSequence subSequence(int beginIndex, int endIndex)
Returns a new character sequence that is a subsequence of this sequence.This is logically equivalent to
toString().subSequence(beginIndex,endIndex)
for valid values ofbeginIndex
andendIndex
.However because this implementation works directly on the underlying document source text, it should not be assumed that an
IndexOutOfBoundsException
is thrown for invalid argument values as described in theString.subSequence(int,int)
method.- Specified by:
subSequence
in interfacejava.lang.CharSequence
- Parameters:
beginIndex
- the begin index, inclusive.endIndex
- the end index, exclusive.- Returns:
- a new character sequence that is a subsequence of this sequence.
-
-