Class StartTag
-
- All Implemented Interfaces:
java.lang.CharSequence
,java.lang.Comparable<Segment>
public final class StartTag extends Tag
Represents the start tag of an element in a specific source document.A start tag always has a type that is a subclass of
StartTagType
, meaning that any tag that does not start with the characters '</
' is categorised as a start tag.This includes many tags which stand alone, without a corresponding end tag, and would not intuitively be categorised as a "start tag". For example, an HTML comment is represented as a single start tag that spans the whole comment, and does not have an end tag at all.
See the static fields defined in the
StartTagType
class for a list of the standard start tag types.StartTag
instances are obtained using one of the following methods:Element.getStartTag()
Tag.getNextTag()
Tag.getPreviousTag()
Source.getPreviousStartTag(int pos)
Source.getPreviousStartTag(int pos, String name)
Source.getPreviousTag(int pos)
Source.getPreviousTag(int pos, TagType)
Source.getNextStartTag(int pos)
Source.getNextStartTag(int pos, String name)
Source.getNextStartTag(int pos, String attributeName, String value, boolean valueCaseSensitive)
Source.getNextTag(int pos)
Source.getNextTag(int pos, TagType)
Source.getEnclosingTag(int pos)
Source.getEnclosingTag(int pos, TagType)
Source.getTagAt(int pos)
Segment.getAllStartTags()
Segment.getAllStartTags(String name)
Segment.getAllStartTags(String attributeName, String value, boolean valueCaseSensitive)
Segment.getAllTags()
Segment.getAllTags(TagType)
The methods above which accept a
name
parameter are categorised as named search methods.In such methods dealing with start tags, specifying an argument to the
name
parameter that ends in a colon (:
) searches for all start tags in the specified XML namespace.The constants defined in the
HTMLElementName
interface can be used directly as arguments to thesename
parameters. For example,source.getAllStartTags(
HTMLElementName.A
)
is equivalent tosource.getAllStartTags("a")
, and gets all hyperlink start tags.The
Tag
superclass defines a method calledgetName()
to get the name of this start tag.See also the XML 1.0 specification for start tags.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static java.lang.String
generateHTML(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributesMap, boolean emptyElementTag)
Generates the HTML text of a normal start tag with the specified tag name and attributes map.Attributes
getAttributes()
Returns the attributes specified in this start tag.java.lang.String
getAttributeValue(java.lang.String attributeName)
Returns the decoded value of the attribute with the specified name (case insensitive).java.lang.String
getDebugInfo()
Returns a string representation of this object useful for debugging purposes.Element
getElement()
Returns the element that is started by this start tag.FormControl
getFormControl()
Returns theFormControl
defined by this start tag.StartTagType
getStartTagType()
Returns the type of this start tag.Segment
getTagContent()
Returns the segment between the end of the tag's name and the start of its end delimiter.TagType
getTagType()
Returns the type of this tag.boolean
isEmptyElementTag()
Indicates whether this start tag is an empty-element tag.boolean
isEndTagForbidden()
Indicates whether a matching end tag is forbidden.boolean
isEndTagRequired()
Indicates whether a matching end tag is required.boolean
isSyntacticalEmptyElementTag()
Indicates whether this start tag is syntactically an empty-element tag.boolean
isUnregistered()
Indicates whether this tag has a syntax that does not match any of the registered tag types.Attributes
parseAttributes()
Parses the attributes specified in this start tag, regardless of the type of start tag.Attributes
parseAttributes(int maxErrorCount)
Parses the attributes specified in this start tag, regardless of the type of start tag.java.lang.String
tidy()
Returns an XML representation of this start tag.java.lang.String
tidy(boolean toXHTML)
Returns an XML or XHTML representation of this start tag.-
Methods inherited from class net.htmlparser.jericho.Tag
getName, getNameSegment, getNextTag, getPreviousTag, getUserData, isXMLName, isXMLNameChar, isXMLNameStartChar, setUserData
-
Methods inherited from class net.htmlparser.jericho.Segment
charAt, compareTo, encloses, encloses, equals, getAllCharacterReferences, getAllElements, getAllElements, getAllElements, getAllElements, getAllElements, getAllElementsByClass, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTags, getAllStartTagsByClass, getAllTags, getAllTags, getBegin, getChildElements, getEnd, getFirstElement, getFirstElement, getFirstElement, getFirstElement, getFirstElementByClass, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTag, getFirstStartTagByClass, getFormControls, getFormFields, getMaxDepthIndicator, getNodeIterator, getRenderer, getRowColumnVector, getSource, getStyleURISegments, getTextExtractor, getURIAttributes, hashCode, ignoreWhenParsing, isWhiteSpace, isWhiteSpace, length, subSequence, toString
-
-
-
-
Method Detail
-
getElement
public Element getElement()
Returns the element that is started by this start tag. Guaranteed notnull
.- Example 1: Elements for which the end tag is required
-
1. <div> 2. <div> 3. <div> 4. <div>This is line 4</div> 5. </div> 6. <div>This is line 6</div> 7. </div>
- The start tag on line 1 returns an empty element spanning only the start tag.
This is because the end tag of a
<div>
element is required, making the sample code invalid as all the end tags are matched with other start tags. - The start tag on line 2 returns an element spanning to the end of line 7.
- The start tag on line 3 returns an element spanning to the end of line 5.
- The start tag on line 4 returns an element spanning to the end of line 4.
- The start tag on line 6 returns an element spanning to the end of line 6.
- The start tag on line 1 returns an empty element spanning only the start tag.
This is because the end tag of a
- Example 2: Elements for which the end tag is optional
-
1. <ul> 2. <li>item 1 3. <li>item 2 4. <ul> 5. <li>subitem 1</li> 6. <li>subitem 2 7. </ul> 8. <li>item 3</li> 9. </ul>
- The start tag on line 1 returns an element spanning to the end of line 9.
- The start tag on line 2 returns an element spanning to the start of the
<li>
start tag on line 3. - The start tag on line 3 returns an element spanning to the start of the
<li>
start tag on line 8. - The start tag on line 4 returns an element spanning to the end of line 7.
- The start tag on line 5 returns an element spanning to the end of line 5.
- The start tag on line 6 returns an element spanning to the start of the
</ul>
end tag on line 7. - The start tag on line 8 returns an element spanning to the end of line 8.
- Specified by:
getElement
in classTag
- Returns:
- the element that is started by this start tag.
-
isEmptyElementTag
public boolean isEmptyElementTag()
Indicates whether this start tag is an empty-element tag.This property checks that the the tag is syntactically an empty-element tag, but in addition checks that the name of the tag is not one that is defined in the HTML specification to have a required or an optional end tag, which the major browsers do not recognise as empty-element tags, even in an XHTML document.
This is equivalent to:
isSyntacticalEmptyElementTag()
&& !(
HTMLElements.getEndTagOptionalElementNames()
.contains(
getName()
) ||
HTMLElements.getEndTagRequiredElementNames()
.contains(
getName()
))
.You can set the static
Config.IsHTMLEmptyElementTagRecognised
property totrue
to force the parser to recognise all empty-element tags, making this method is exactly equivalent toisSyntacticalEmptyElementTag()
.- Returns:
true
if this start tag is an empty-element tag, otherwisefalse
.
-
isSyntacticalEmptyElementTag
public boolean isSyntacticalEmptyElementTag()
Indicates whether this start tag is syntactically an empty-element tag.This is signified by the characters "/>" at the end of the start tag.
Only a normal start tag can be syntactically an empty-element tag.
This property simply reports whether the syntax of the start tag is consistent with that of an empty-element tag, it does not guarantee that this start tag's element is actually empty.
This possible discrepancy reflects the way major browsers interpret illegal empty element tags used in HTML elements, and is explained further in the documentation of the
isEmptyElementTag()
property.- Returns:
true
if this start tag is syntactically an empty-element tag, otherwisefalse
.- See Also:
isEmptyElementTag()
-
getStartTagType
public StartTagType getStartTagType()
Returns the type of this start tag.This is equivalent to
(StartTagType)
getTagType()
.- Returns:
- the type of this start tag.
-
getTagType
public TagType getTagType()
Description copied from class:Tag
Returns the type of this tag.- Specified by:
getTagType
in classTag
- Returns:
- the type of this tag.
-
getAttributes
public Attributes getAttributes()
Returns the attributes specified in this start tag.Return value is not
null
if and only ifgetStartTagType()
.
hasAttributes()
==true
.To force the parsing of attributes in other start tag types, use the
parseAttributes()
method instead.- Returns:
- the attributes specified in this start tag, or
null
if the type of this start tag does not have attributes. - See Also:
parseAttributes()
,Source.parseAttributes(int pos, int maxEnd)
-
getAttributeValue
public java.lang.String getAttributeValue(java.lang.String attributeName)
Returns the decoded value of the attribute with the specified name (case insensitive).Returns
null
if this start tag does not have attributes, no attribute with the specified name exists or the attribute has no value.This is equivalent to
getAttributes()
.
getValue(attributeName)
, except that it returnsnull
if this start tag does not have attributes instead of throwing aNullPointerException
.- Parameters:
attributeName
- the name of the attribute to get.- Returns:
- the decoded value of the attribute with the specified name, or
null
if the attribute does not exist or has no value.
-
parseAttributes
public Attributes parseAttributes()
Parses the attributes specified in this start tag, regardless of the type of start tag. This method is only required in the unusual situation where attributes exist in a start tag whose type doesn't have attributes.This method returns the cached attributes from the
getAttributes()
method if its value is notnull
, otherwise the source is physically parsed with each call to this method.This is equivalent to
parseAttributes
(
Attributes.getDefaultMaxErrorCount()
)}
.- Overrides:
parseAttributes
in classSegment
- Returns:
- the attributes specified in this start tag, or
null
if too many errors occur while parsing. - See Also:
getAttributes()
,Source.parseAttributes(int pos, int maxEnd)
-
parseAttributes
public Attributes parseAttributes(int maxErrorCount)
Parses the attributes specified in this start tag, regardless of the type of start tag. This method is only required in the unusual situation where attributes exist in a start tag whose type doesn't have attributes.See the documentation of the
parseAttributes()
method for more information.- Parameters:
maxErrorCount
- the maximum number of minor errors allowed while parsing- Returns:
- the attributes specified in this start tag, or
null
if too many errors occur while parsing. - See Also:
getAttributes()
-
getTagContent
public Segment getTagContent()
Returns the segment between the end of the tag's name and the start of its end delimiter.This method is normally only of use for start tags whose content is something other than attributes.
A new
Segment
object is created with each call to this method.- Returns:
- the segment between the end of the tag's name and the start of the end delimiter.
-
getFormControl
public FormControl getFormControl()
Returns theFormControl
defined by this start tag.This is equivalent to
getElement()
.
getFormControl()
.- Returns:
- the
FormControl
defined by this start tag, ornull
if it is not a control.
-
isEndTagForbidden
public boolean isEndTagForbidden()
Indicates whether a matching end tag is forbidden.This property returns
true
if one of the following conditions is met:- The type of this start tag does not specify a corresponding end tag type.
- The name of this start tag indicates it is the start of an HTML element whose end tag is forbidden.
- This start tag is syntactically an empty-element tag and its name indicates it is the start of a non-HTML element.
If this property returns
true
then this start tag's element will always be a single tag element.- Returns:
true
if a matching end tag is forbidden, otherwisefalse
.
-
isEndTagRequired
public boolean isEndTagRequired()
Indicates whether a matching end tag is required.This property returns
true
if one of the following conditions is met:- The type of this start tag is NOT
StartTagType.NORMAL
, but specifies a corresponding end tag type. - The name of this start tag indicates it is the start of an HTML element whose end tag is required.
- This start tag is NOT syntactically an empty-element tag and its name indicates it is the start of a non-HTML element.
- Returns:
true
if a matching end tag is required, otherwisefalse
.
- The type of this start tag is NOT
-
isUnregistered
public boolean isUnregistered()
Description copied from class:Tag
Indicates whether this tag has a syntax that does not match any of the registered tag types.The only requirement of an unregistered tag type is that it starts with '
<
' and there is a closing '>
' character at some position after it in the source document.The absence or presence of a '
/
' character after the initial '<
' determines whether an unregistered tag is respectively aStartTag
with a type ofStartTagType.UNREGISTERED
or anEndTag
with a type ofEndTagType.UNREGISTERED
.There are no restrictions on the characters that might appear between these delimiters, including other '
<
' characters. This may result in a '>
' character that is identified as the closing delimiter of two separate tags, one an unregistered tag, and the other a tag of any type that begins in the middle of the unregistered tag. As explained below, unregistered tags are usually only found when specifically looking for them, so it is up to the user to detect and deal with any such nonsensical results.Unregistered tags are only returned by the
Source.getTagAt(int pos)
method, named search methods, where the specifiedname
matches the first characters inside the tag, and by tag type search methods, where the specifiedtagType
is eitherStartTagType.UNREGISTERED
orEndTagType.UNREGISTERED
.Open tag searches and other searches always ignore unregistered tags, although every discovery of an unregistered tag is logged by the parser.
The logic behind this design is that unregistered tag types are usually the result of a '
<
' character in the text that was mistakenly left unencoded, or a less-than operator inside a script, or some other occurrence which is of no interest to the user. By returning unregistered tags in named and tag type search methods, the library allows the user to specifically search for tags with a certain syntax that does not match any existingTagType
. This expediency feature avoids the need for the user to create a custom tag type to define the syntax before searching for these tags. By not returning unregistered tags in the less specific search methods, it is providing only the information that most users are interested in.- Specified by:
isUnregistered
in classTag
- Returns:
true
if this tag has a syntax that does not match any of the registered tag types, otherwisefalse
.
-
tidy
public java.lang.String tidy()
Returns an XML representation of this start tag.This is equivalent to
tidy(false)
, thereby keeping the name of the tag in its original case.See the documentation of the
tidy(boolean toXHTML)
method for more details.- Specified by:
tidy
in classTag
- Returns:
- an XML representation of this start tag, or the source text if it is of a type that does not have attributes.
-
tidy
public java.lang.String tidy(boolean toXHTML)
Returns an XML or XHTML representation of this start tag.The tidying of the tag is carried out as follows:
- if this start tag is of a type that does not have attributes, then the original source text of the entire tag is returned.
- if this start tag contains any server tags outside of an attribute value, then the original source text of the entire tag is returned.
- name converted to lower case if the
toXHTML
argument istrue
and this is a normal start tag - attributes separated by a single space
- attribute names in original case
- attribute values are enclosed in double quotes and re-encoded
- if this start tag forms an HTML element that has no end tag, a slash is inserted before the closing angle bracket, separated from the name or last attribute by a single space.
- if an attribute value contains a server tag it is inserted verbatim instead of being encoded.
The
toXHTML
parameter determines only whether the name is converted to lower case for normal tags. In all other respects the generated tag is already valid XHTML.- Example:
-
The following source text:
<INPUT name=Company value='Günter O&#39;Reilly & Associés'>
<input name="Company" value="Günter O'Reilly & Associés" />
- Parameters:
toXHTML
- specifies whether the output is XHTML.- Returns:
- an XML or XHTML representation of this start tag, or the source text if it is of a type that does not have attributes.
-
generateHTML
public static java.lang.String generateHTML(java.lang.String tagName, java.util.Map<java.lang.String,java.lang.String> attributesMap, boolean emptyElementTag)
Generates the HTML text of a normal start tag with the specified tag name and attributes map.The output of the attributes is as described in the
Attributes.generateHTML(Map attributesMap)
method.The
emptyElementTag
parameter specifies whether the start tag should be an empty-element tag, in which case a slash is inserted before the closing angle bracket, separated from the name or last attribute by a single space.- Example:
-
The following code:
LinkedHashMap attributesMap=new LinkedHashMap(); attributesMap.put("name","Company"); attributesMap.put("value","G\n00fcnter O'Reilly & Associés"); System.out.println(StartTag.generateHTML("INPUT",attributesMap,true));
<INPUT name="Company" value="Günter O'Reilly & Associés" />
- Parameters:
tagName
- the name of the start tag.attributesMap
- a map containing attribute name/value pairs.emptyElementTag
- specifies whether the start tag should be an empty-element tag.- Returns:
- the HTML text of a normal start tag with the specified tag name and attributes map.
- See Also:
EndTag.generateHTML(String tagName)
-
getDebugInfo
public java.lang.String getDebugInfo()
Description copied from class:Segment
Returns a string representation of this object useful for debugging purposes.- Overrides:
getDebugInfo
in classSegment
- Returns:
- a string representation of this object useful for debugging purposes.
-
-