Class SourceFormatter
- java.lang.Object
-
- SourceFormatter
-
- All Implemented Interfaces:
CharStreamSource
public final class SourceFormatter extends java.lang.Object implements CharStreamSource
Formats HTML source by laying out each non-inline-level element on a new line with an appropriate indent.Any indentation present in the original source text is removed.
Use one of the following methods to obtain the output:
The output text is functionally equivalent to the original source and should be rendered identically unless specified below.
The following points describe the process in general terms. Any aspect of the algorithm not specifically mentioned here is subject to change without notice in future versions.
- Every element that is not an inline-level element appears on a new line with an indent corresponding to its depth in the document element hierarchy.
- The indent is formed by writing n repetitions of the string specified in the
IndentString
property, where n is the depth of the indentation. - The content of an indented element starts on a new line and is indented at a depth one greater than that of the element, with the end tag appearing on a new line at the same depth as the start tag. If the content contains only text and inline-level elements, it may continue on the same line as the start tag. Additionally, if the output content contains no new lines, the end tag may also continue on the same line.
- The content of preformatted elements such as
PRE
andTEXTAREA
are not indented, nor is the white space modified in any way. - Only normal and document type declaration elements are indented. All others are treated as inline-level elements.
- White space and indentation inside HTML comments, CDATA sections, or any server tag is preserved, but with the indentation of new lines starting at a depth one greater than that of the surrounding text.
- White space and indentation inside
SCRIPT
elements is preserved, but with the indentation of new lines starting at a depth one greater than that of theSCRIPT
element. - If the
TidyTags
property is set totrue
, every tag in the document is replaced with the output from itsTag.tidy()
method. If this property is set tofalse
, the tag from the original text is used, including all white space, but with any new lines indented at a depth one greater than that of the element. - If the
CollapseWhiteSpace
property is set totrue
, every string of one or more white space characters located outside of a tag is replaced with a single space in the output. White space located adjacent to a non-inline-level element tag (except server tags) may be removed. - If the
IndentAllElements
property is set totrue
, every element appears indented on a new line, including inline-level elements. This generates output that is a good representation of the actual document element hierarchy, but is very likely to introduce white space that compromises the functional equivalency of the document. - The
NewLine
property specifies the character sequence to use for each newline in the output document. - If the source document contains server tags, the functional equivalency of the output document may be compromised.
Formatting an entire
Source
object performs a full sequential parse automatically.
-
-
Constructor Summary
Constructors Constructor Description SourceFormatter(Segment segment)
Constructs a newSourceFormatter
based on the specifiedSegment
.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
appendTo(java.lang.Appendable appendable)
Appends the output to the specifiedAppendable
object.boolean
getCollapseWhiteSpace()
Indicates whether white space in the text between the tags is to be collapsed.long
getEstimatedMaximumOutputLength()
Returns the estimated maximum number of characters in the output, or-1
if no estimate is available.boolean
getIndentAllElements()
Indicates whether all elements are to be indented, including inline-level elements and those with preformatted contents.java.lang.String
getIndentString()
Returns the string to be used for indentation.java.lang.String
getNewLine()
Returns the string to be used to represent a newline in the output.boolean
getTidyTags()
Indicates whether the original text of each tag is to be replaced with the output from itsTag.tidy()
method.SourceFormatter
setCollapseWhiteSpace(boolean collapseWhiteSpace)
Sets whether white space in the text between the tags is to be collapsed.SourceFormatter
setIndentAllElements(boolean indentAllElements)
Sets whether all elements are to be indented, including inline-level elements and those with preformatted contents.SourceFormatter
setIndentString(java.lang.String indentString)
Sets the string to be used for indentation.SourceFormatter
setNewLine(java.lang.String newLine)
Sets the string to be used to represent a newline in the output.SourceFormatter
setTidyTags(boolean tidyTags)
Sets whether the original text of each tag is to be replaced with the output from itsTag.tidy()
method.java.lang.String
toString()
Returns the output as a string.void
writeTo(java.io.Writer writer)
Writes the output to the specifiedWriter
.
-
-
-
Constructor Detail
-
SourceFormatter
public SourceFormatter(Segment segment)
Constructs a newSourceFormatter
based on the specifiedSegment
.- Parameters:
segment
- the segment containing the HTML to be formatted.- See Also:
Source.getSourceFormatter()
-
-
Method Detail
-
writeTo
public void writeTo(java.io.Writer writer) throws java.io.IOException
Description copied from interface:CharStreamSource
Writes the output to the specifiedWriter
.- Specified by:
writeTo
in interfaceCharStreamSource
- Parameters:
writer
- the destinationjava.io.Writer
for the output.- Throws:
java.io.IOException
- if an I/O exception occurs.
-
appendTo
public void appendTo(java.lang.Appendable appendable) throws java.io.IOException
Description copied from interface:CharStreamSource
Appends the output to the specifiedAppendable
object.- Specified by:
appendTo
in interfaceCharStreamSource
- Parameters:
appendable
- the destinationjava.lang.Appendable
object for the output.- Throws:
java.io.IOException
- if an I/O exception occurs.
-
getEstimatedMaximumOutputLength
public long getEstimatedMaximumOutputLength()
Description copied from interface:CharStreamSource
Returns the estimated maximum number of characters in the output, or-1
if no estimate is available.The returned value should be used as a guide for efficiency purposes only, for example to set an initial
StringBuilder
capacity. There is no guarantee that the length of the output is indeed less than this value, as classes implementing this method often use assumptions based on typical usage to calculate the estimate.Although implementations of this method should never return a value less than -1, users of this method must not assume that this will always be the case. Standard practice is to interpret any negative value as meaning that no estimate is available.
- Specified by:
getEstimatedMaximumOutputLength
in interfaceCharStreamSource
- Returns:
- the estimated maximum number of characters in the output, or
-1
if no estimate is available.
-
toString
public java.lang.String toString()
Description copied from interface:CharStreamSource
Returns the output as a string.- Specified by:
toString
in interfaceCharStreamSource
- Overrides:
toString
in classjava.lang.Object
- Returns:
- the output as a string.
-
setIndentString
public SourceFormatter setIndentString(java.lang.String indentString)
Sets the string to be used for indentation.The default value is a string containing a single tab character (U+0009).
The most commonly used indent strings are
"\t"
(single tab)," "
(single space)," "
(2 spaces), and" "
(4 spaces).- Parameters:
indentString
- the string to be used for indentation, must not benull
.- Returns:
- this
SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getIndentString()
-
getIndentString
public java.lang.String getIndentString()
Returns the string to be used for indentation.See the
setIndentString(String)
method for a full description of this property.- Returns:
- the string to be used for indentation.
-
setTidyTags
public SourceFormatter setTidyTags(boolean tidyTags)
Sets whether the original text of each tag is to be replaced with the output from itsTag.tidy()
method.The default value is
false
.If this property is set to
false
, the tag from the original text is used, including all white space, but with any new lines indented at a depth one greater than that of the element.- Parameters:
tidyTags
- specifies whether the original text of each tag is to be replaced with the output from itsTag.tidy()
method.- Returns:
- this
SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getTidyTags()
-
getTidyTags
public boolean getTidyTags()
Indicates whether the original text of each tag is to be replaced with the output from itsTag.tidy()
method.See the
setTidyTags(boolean)
method for a full description of this property.- Returns:
true
if the original text of each tag is to be replaced with the output from itsTag.tidy()
method, otherwisefalse
.
-
setCollapseWhiteSpace
public SourceFormatter setCollapseWhiteSpace(boolean collapseWhiteSpace)
Sets whether white space in the text between the tags is to be collapsed.The default value is
false
.If this property is set to
true
, every string of one or more white space characters located outside of a tag is replaced with a single space in the output. White space located adjacent to a non-inline-level element tag (except server tags) may be removed.- Parameters:
collapseWhiteSpace
- specifies whether white space in the text between the tags is to be collapsed.- Returns:
- this
SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getCollapseWhiteSpace()
-
getCollapseWhiteSpace
public boolean getCollapseWhiteSpace()
Indicates whether white space in the text between the tags is to be collapsed.See the
setCollapseWhiteSpace(boolean collapseWhiteSpace)
method for a full description of this property.- Returns:
true
if white space in the text between the tags is to be collapsed, otherwisefalse
.
-
setIndentAllElements
public SourceFormatter setIndentAllElements(boolean indentAllElements)
Sets whether all elements are to be indented, including inline-level elements and those with preformatted contents.The default value is
false
.If this property is set to
true
, every element appears indented on a new line, including inline-level elements.This generates output that is a good representation of the actual document element hierarchy, but is very likely to introduce white space that compromises the functional equivalency of the document.
- Parameters:
indentAllElements
- specifies whether all elements are to be indented.- Returns:
- this
SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getIndentAllElements()
-
getIndentAllElements
public boolean getIndentAllElements()
Indicates whether all elements are to be indented, including inline-level elements and those with preformatted contents.See the
setIndentAllElements(boolean)
method for a full description of this property.- Returns:
true
if all elements are to be indented, otherwisefalse
.
-
setNewLine
public SourceFormatter setNewLine(java.lang.String newLine)
Sets the string to be used to represent a newline in the output.The default is to use the same new line string as is used in the source document, which is determined via the
Source.getNewLine()
method. If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document, or using the value from the staticConfig.NewLine
property.Specifying a
null
argument resets the property to its default value, which is to use the same new line string as is used in the source document.- Parameters:
newLine
- the string to be used to represent a newline in the output, may benull
.- Returns:
- this
SourceFormatter
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getNewLine()
-
getNewLine
public java.lang.String getNewLine()
Returns the string to be used to represent a newline in the output.See the
setNewLine(String)
method for a full description of this property.- Returns:
- the string to be used to represent a newline in the output.
-
-