Class Renderer
- java.lang.Object
-
- Renderer
-
- All Implemented Interfaces:
CharStreamSource
public class Renderer extends java.lang.Object implements CharStreamSource
Performs a simple rendering of HTML markup into text.This provides a human readable version of the segment content that is modelled on the way Mozilla Thunderbird and other email clients provide an automatic conversion of HTML content to text in their alternative MIME encoding of emails.
The output using default settings complies with the "text/plain; format=flowed" (DelSp=No) protocol described in RFC3676.
Many properties are available to customise the output, possibly the most significant of which being
MaxLineLength
. See the individual property descriptions for details.Use one of the following methods to obtain the output:
The rendering of some constructs, especially tables, is very rudimentary. No attempt is made to render nested tables properly, except to ensure that all of the text content is included in the output.
Rendering an entire
Source
object performs a full sequential parse automatically.Any aspect of the algorithm not specifically mentioned here is subject to change without notice in future versions.
To extract pure text without any rendering of the markup, use the
TextExtractor
class instead.
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
appendTo(java.lang.Appendable appendable)
Appends the output to the specifiedAppendable
object.int
getBlockIndentSize()
Returns the size of the indent to be used for anything other thanLI
elements.boolean
getConvertNonBreakingSpaces()
Indicates whether non-breaking space (
) character entity references are converted to spaces.boolean
getDecorateFontStyles()
Indicates whether decoration characters are to be included around the content of some font style elements and phrase elements.static int
getDefaultBottomMargin(java.lang.String htmlElementName)
Returns the default bottom margin of an HTML block element with the specified name.static int
getDefaultTopMargin(java.lang.String htmlElementName)
Returns the default top margin of an HTML block element with the specified name.long
getEstimatedMaximumOutputLength()
Returns the estimated maximum number of characters in the output, or-1
if no estimate is available.int
getHRLineLength()
Returns the length of a horizontal line.boolean
getIncludeAlternateText()
Indicates whether the alternate text of a tag that has analt
attribute is included in the output.boolean
getIncludeFirstElementTopMargin()
Indicates whether the top margin of the first element is rendered.boolean
getIncludeHyperlinkURLs()
Indicates whether hyperlink URLs are included in the output.char[]
getListBullets()
Returns the bullet characters to use for list items insideUL
elements.int
getListIndentSize()
Returns the size of the indent to be used forLI
elements.int
getMaxLineLength()
Returns the column at which lines are to be wrapped.java.lang.String
getNewLine()
Returns the string to be used to represent a newline in the output.java.lang.String
getTableCellSeparator()
Returns the string that is to separate table cells.static boolean
isDefaultIndent(java.lang.String htmlElementName)
Returns the default value of whether an HTML block element of the specified name is indented.java.lang.String
renderAlternateText(StartTag startTag)
Renders the alternate text of the specified start tag.java.lang.String
renderHyperlinkURL(StartTag startTag)
Renders the hyperlink URL from the specifiedStartTag
.Renderer
setBlockIndentSize(int blockIndentSize)
Sets the size of the indent to be used for anything other thanLI
elements.Renderer
setConvertNonBreakingSpaces(boolean convertNonBreakingSpaces)
Sets whether non-breaking space (
) character entity references are converted to spaces.Renderer
setDecorateFontStyles(boolean decorateFontStyles)
Sets whether decoration characters are to be included around the content of some font style elements and phrase elements.static void
setDefaultBottomMargin(java.lang.String htmlElementName, int bottomMargin)
Sets the default bottom margin of an HTML block element with the specified name.static void
setDefaultIndent(java.lang.String htmlElementName, boolean indent)
Sets the default value of whether an HTML block element of the specified name is indented.static void
setDefaultTopMargin(java.lang.String htmlElementName, int topMargin)
Sets the default top margin of an HTML block element with the specified name.Renderer
setHRLineLength(int hrLineLength)
Sets the length of a horizontal line.Renderer
setIncludeAlternateText(boolean includeAlternateText)
Sets whether the alternate text of a tag that has analt
attribute is included in the output.Renderer
setIncludeFirstElementTopMargin(boolean includeFirstElementTopMargin)
Sets whether the top margin of the first element is rendered.Renderer
setIncludeHyperlinkURLs(boolean includeHyperlinkURLs)
Sets whether hyperlink URLs are included in the output.Renderer
setListBullets(char[] listBullets)
Sets the bullet characters to use for list items insideUL
elements.Renderer
setListIndentSize(int listIndentSize)
Sets the size of the indent to be used forLI
elements.Renderer
setMaxLineLength(int maxLineLength)
Sets the column at which lines are to be wrapped.Renderer
setNewLine(java.lang.String newLine)
Sets the string to be used to represent a newline in the output.Renderer
setTableCellSeparator(java.lang.String tableCellSeparator)
Sets the string that is to separate table cells.java.lang.String
toString()
Returns the output as a string.void
writeTo(java.io.Writer writer)
Writes the output to the specifiedWriter
.
-
-
-
Constructor Detail
-
Renderer
public Renderer(Segment segment)
Constructs a newRenderer
based on the specifiedSegment
.- Parameters:
segment
- the segment containing the HTML to be rendered.- See Also:
Segment.getRenderer()
-
-
Method Detail
-
writeTo
public void writeTo(java.io.Writer writer) throws java.io.IOException
Description copied from interface:CharStreamSource
Writes the output to the specifiedWriter
.- Specified by:
writeTo
in interfaceCharStreamSource
- Parameters:
writer
- the destinationjava.io.Writer
for the output.- Throws:
java.io.IOException
- if an I/O exception occurs.
-
appendTo
public void appendTo(java.lang.Appendable appendable) throws java.io.IOException
Description copied from interface:CharStreamSource
Appends the output to the specifiedAppendable
object.- Specified by:
appendTo
in interfaceCharStreamSource
- Parameters:
appendable
- the destinationjava.lang.Appendable
object for the output.- Throws:
java.io.IOException
- if an I/O exception occurs.
-
getEstimatedMaximumOutputLength
public long getEstimatedMaximumOutputLength()
Description copied from interface:CharStreamSource
Returns the estimated maximum number of characters in the output, or-1
if no estimate is available.The returned value should be used as a guide for efficiency purposes only, for example to set an initial
StringBuilder
capacity. There is no guarantee that the length of the output is indeed less than this value, as classes implementing this method often use assumptions based on typical usage to calculate the estimate.Although implementations of this method should never return a value less than -1, users of this method must not assume that this will always be the case. Standard practice is to interpret any negative value as meaning that no estimate is available.
- Specified by:
getEstimatedMaximumOutputLength
in interfaceCharStreamSource
- Returns:
- the estimated maximum number of characters in the output, or
-1
if no estimate is available.
-
toString
public java.lang.String toString()
Description copied from interface:CharStreamSource
Returns the output as a string.- Specified by:
toString
in interfaceCharStreamSource
- Overrides:
toString
in classjava.lang.Object
- Returns:
- the output as a string.
-
setMaxLineLength
public Renderer setMaxLineLength(int maxLineLength)
Sets the column at which lines are to be wrapped.Lines that would otherwise exceed this length are wrapped onto a new line at a word boundary.
Setting this property automatically sets the
HRLineLength
property toMaxLineLength - 4
.Setting this property to zero disables line wrapping completely, and leaves the value of
HRLineLength
unchanged.A Line may still exceed this length if it consists of a single word, where the length of the word plus the line indent exceeds the maximum length. In this case the line is wrapped immediately after the end of the word.
The default value is
76
, which reflects the maximum line length for sending email data specified in RFC2049 section 3.5.- Parameters:
maxLineLength
- the column at which lines are to be wrapped.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getMaxLineLength()
-
getMaxLineLength
public int getMaxLineLength()
Returns the column at which lines are to be wrapped.See the
setMaxLineLength(int)
method for a full description of this property.- Returns:
- the column at which lines are to be wrapped, or zero if line wrapping is disabled.
-
setHRLineLength
public Renderer setHRLineLength(int hrLineLength)
Sets the length of a horizontal line.The length determines the number of hyphen characters used to render
HR
elements.This property is set automatically to
MaxLineLength - 4
when theMaxLineLength
property is set. The default value is72
.- Parameters:
hrLineLength
- the length of a horizontal line.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getHRLineLength()
-
getHRLineLength
public int getHRLineLength()
Returns the length of a horizontal line.See the
setHRLineLength(int)
method for a full description of this property.- Returns:
- the length of a horizontal line.
-
setNewLine
public Renderer setNewLine(java.lang.String newLine)
Sets the string to be used to represent a newline in the output.The default value is
"\r\n"
(CR+LF) regardless of the platform on which the library is running. This is so that the default configuration produces valid MIME plain/text output, which mandates the use of CR+LF for line breaks.Specifying a
null
argument causes the output to use same new line string as is used in the source document, which is determined via theSource.getNewLine()
method. If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document, or using the value from the staticConfig.NewLine
property.- Parameters:
newLine
- the string to be used to represent a newline in the output, may benull
.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getNewLine()
-
getNewLine
public java.lang.String getNewLine()
Returns the string to be used to represent a newline in the output.See the
setNewLine(String)
method for a full description of this property.- Returns:
- the string to be used to represent a newline in the output.
-
setIncludeHyperlinkURLs
public Renderer setIncludeHyperlinkURLs(boolean includeHyperlinkURLs)
Sets whether hyperlink URLs are included in the output.The default value is
true
.When this property is
true
, the URL of each hyperlink is included in the output as determined by the implementation of therenderHyperlinkURL(StartTag)
method.- Example:
-
Assuming the default implementation of
renderHyperlinkURL(StartTag)
, when this property istrue
, the following HTML:<a href="http://jericho.htmlparser.net/">Jericho HTML Parser</a>
Jericho HTML Parser <http://jericho.htmlparser.net/>
- Parameters:
includeHyperlinkURLs
- specifies whether hyperlink URLs are included in the output.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getIncludeHyperlinkURLs()
-
getIncludeHyperlinkURLs
public boolean getIncludeHyperlinkURLs()
Indicates whether hyperlink URLs are included in the output.See the
setIncludeHyperlinkURLs(boolean)
method for a full description of this property.- Returns:
true
if hyperlink URLs are included in the output, otherwisefalse
.
-
renderHyperlinkURL
public java.lang.String renderHyperlinkURL(StartTag startTag)
Renders the hyperlink URL from the specifiedStartTag
.A return value of
null
indicates that the hyperlink URL should not be rendered at all.The default implementation of this method returns
null
if thehref
attribute of the specified start tag starts with "javascript:
", is a relative or invalid URI, or is missing completely. In all other cases it returns the value of thehref
attribute enclosed in angle brackets.See the documentation of the
setIncludeHyperlinkURLs(boolean)
method for an example of how a hyperlink is rendered by the default implementation.This method can be overridden in a subclass to customise the rendering of hyperlink URLs.
Rendering of hyperlink URLs can be disabled completely without overriding this method by setting the
IncludeHyperlinkURLs
property tofalse
.- Example:
-
To render hyperlink URLs without the enclosing angle brackets:
Renderer renderer=new Renderer(segment) {
public String renderHyperlinkURL(StartTag startTag) {
String href=startTag.getAttributeValue("href");
if (href==null || href.startsWith("javascript:")) return null;
try {
URI uri=new URI(href);
if (!uri.isAbsolute()) return null;
} catch (URISyntaxException ex) {
return null;
}
return href;
}
};
String renderedSegment=renderer.toString();
- Parameters:
startTag
- the start tag of the hyperlink element, must not benull
.- Returns:
- The rendered hyperlink URL from the specified
StartTag
, ornull
if the hyperlink URL should not be rendered.
-
setIncludeAlternateText
public Renderer setIncludeAlternateText(boolean includeAlternateText)
Sets whether the alternate text of a tag that has analt
attribute is included in the output.The default value is
true
. Note that this is not conistent with common email clients such as Mozilla Thunderbird which do not render alternate text at all, even when a tag specifies alternate text.When this property is
true
, the alternate text is included in the output as determined by the implementation of therenderAlternateText(StartTag)
method.- Example:
-
Assuming the default implementation of
renderAlternateText(StartTag)
, when this property istrue
, the following HTML:<img src="smiley.png" alt="smiley face" />
[smiley face]
- Parameters:
includeAlternateText
- specifies whether the alternate text of a tag that has analt
attribute is included in the output.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getIncludeAlternateText()
-
getIncludeAlternateText
public boolean getIncludeAlternateText()
Indicates whether the alternate text of a tag that has analt
attribute is included in the output.See the
setIncludeAlternateText(boolean)
method for a full description of this property.- Returns:
true
if the alternate text of a tag that has analt
attribute is included in the output, otherwisefalse
.
-
renderAlternateText
public java.lang.String renderAlternateText(StartTag startTag)
Renders the alternate text of the specified start tag.A return value of
null
indicates that the alternate text is not to be rendered at all.The default implementation of this method returns
null
if thealt
attribute of the specified start tag is missing or empty, or if the specified start tag is from anAREA
element. In all other cases it returns the value of thealt
attribute enclosed in square brackets[…]
.See the documentation of the
setIncludeAlternateText(boolean)
method for an example of how alternate text is rendered by the default implementation.This method can be overridden in a subclass to customise the rendering of alternate text.
Rendering of alternate text can be disabled completely without overriding this method by setting the
IncludeAlternateText
property tofalse
.- Example:
-
To render alternate text with double angle quotation marks instead of square brackets:
Renderer renderer=new Renderer(segment) {
public String renderAlternateText(StartTag startTag) {
if (startTag.getName()==HTMLElementName.AREA) return null; String alt=startTag.getAttributeValue("alt");
if (alt==null || alt.length()==0) return null;
return '«'+alt+'»';
}
};
String renderedSegment=renderer.toString();
- Parameters:
startTag
- the start tag containing analt
attribute, must not benull
.- Returns:
- The rendered alternate text, or
null
if the alternate text should not be rendered.
-
setDecorateFontStyles
public Renderer setDecorateFontStyles(boolean decorateFontStyles)
Sets whether decoration characters are to be included around the content of some font style elements and phrase elements.The default value is
false
.Below is a table summarising the decorated elements.
Elements Character Example Output B
andSTRONG
*
*bold text*
I
andEM
/
/italic text/
U
_
_underlined text_
CODE
|
|code|
- Parameters:
decorateFontStyles
- specifies whether decoration characters are to be included around the content of some font style elements.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getDecorateFontStyles()
-
getDecorateFontStyles
public boolean getDecorateFontStyles()
Indicates whether decoration characters are to be included around the content of some font style elements and phrase elements.See the
setDecorateFontStyles(boolean)
method for a full description of this property.- Returns:
true
if decoration characters are to be included around the content of some font style elements, otherwisefalse
.
-
setConvertNonBreakingSpaces
public Renderer setConvertNonBreakingSpaces(boolean convertNonBreakingSpaces)
Sets whether non-breaking space (
) character entity references are converted to spaces.The default value is that of the static
Config.ConvertNonBreakingSpaces
property at the time theRenderer
is instantiated.- Parameters:
convertNonBreakingSpaces
- specifies whether non-breaking space (
) character entity references are converted to spaces.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getConvertNonBreakingSpaces()
-
getConvertNonBreakingSpaces
public boolean getConvertNonBreakingSpaces()
Indicates whether non-breaking space (
) character entity references are converted to spaces.See the
setConvertNonBreakingSpaces(boolean)
method for a full description of this property.- Returns:
true
if non-breaking space (
) character entity references are converted to spaces, otherwisefalse
.
-
setBlockIndentSize
public Renderer setBlockIndentSize(int blockIndentSize)
Sets the size of the indent to be used for anything other thanLI
elements.At present this applies to
BLOCKQUOTE
andDD
elements.The default value is
4
.- Parameters:
blockIndentSize
- the size of the indent.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getBlockIndentSize()
-
getBlockIndentSize
public int getBlockIndentSize()
Returns the size of the indent to be used for anything other thanLI
elements.See the
setBlockIndentSize(int)
method for a full description of this property.- Returns:
- the size of the indent to be used for anything other than
LI
elements.
-
setListIndentSize
public Renderer setListIndentSize(int listIndentSize)
Sets the size of the indent to be used forLI
elements.The default value is
6
.This applies to
LI
elements inside bothUL
andOL
elements.The bullet or number of the list item is included as part of the indent.
- Parameters:
listIndentSize
- the size of the indent.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getListIndentSize()
-
getListIndentSize
public int getListIndentSize()
Returns the size of the indent to be used forLI
elements.See the
setListIndentSize(int)
method for a full description of this property.- Returns:
- the size of the indent to be used for
LI
elements.
-
setListBullets
public Renderer setListBullets(char[] listBullets)
Sets the bullet characters to use for list items insideUL
elements.The values in the default array are
*
,o
,+
and#
.If the nesting of rendered lists goes deeper than the length of this array, the bullet characters start repeating from the first in the array.
WARNING: If any of the characters in the default array are modified, this will affect all other instances of this class using the default array.
- Parameters:
listBullets
- an array of characters to be used as bullets, must have at least one entry.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getListBullets()
-
getListBullets
public char[] getListBullets()
Returns the bullet characters to use for list items insideUL
elements.See the
setListBullets(char[])
method for a full description of this property.- Returns:
- the bullet characters to use for list items inside
UL
elements.
-
setIncludeFirstElementTopMargin
public Renderer setIncludeFirstElementTopMargin(boolean includeFirstElementTopMargin)
Sets whether the top margin of the first element is rendered.The default value is
false
.If this property is set to
true
, then the source "<h1>Heading</h1>
" would be rendered as "\r\n\r\nHeading
", assuming all other default settings. If this property isfalse
, then the same source would be rendered as "Heading
".Note that the bottom margin of the last element is never rendered.
- Parameters:
includeFirstElementTopMargin
- specifies whether the top margin of the first element is rendered.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getIncludeFirstElementTopMargin()
-
getIncludeFirstElementTopMargin
public boolean getIncludeFirstElementTopMargin()
Indicates whether the top margin of the first element is rendered.See the
setIncludeFirstElementTopMargin(boolean)
method for a full description of this property.- Returns:
true
if the top margin of the first element is rendered, otherwisefalse
.
-
setTableCellSeparator
public Renderer setTableCellSeparator(java.lang.String tableCellSeparator)
Sets the string that is to separate table cells.The default value is
" \t"
(a space followed by a tab).- Parameters:
tableCellSeparator
- the string that is to separate table cells.- Returns:
- this
Renderer
instance, allowing multiple property setting methods to be chained in a single statement. - See Also:
getTableCellSeparator()
-
getTableCellSeparator
public java.lang.String getTableCellSeparator()
Returns the string that is to separate table cells.See the
setTableCellSeparator(String)
method for a full description of this property.- Returns:
- the string that is to separate table cells.
-
setDefaultTopMargin
public static void setDefaultTopMargin(java.lang.String htmlElementName, int topMargin)
Sets the default top margin of an HTML block element with the specified name.The top margin is the number of blank lines that are to be inserted above the rendered block.
As this is a static method, the setting affects all instances of the
Renderer
class.The
htmlElementName
argument must be one of the following:
ADDRESS
,BLOCKQUOTE
,CAPTION
,CENTER
,DD
,DIR
,DIV
,DT
,FIELDSET
,FORM
,H1
,H2
,H3
,H4
,H5
,H6
,HR
,LEGEND
,LI
,MENU
,OL
,P
,PRE
,TR
,UL
- Parameters:
htmlElementName
- (required) the case insensitive name of a supported HTML block element.topMargin
- the new top margin of the specified element.- Throws:
java.lang.UnsupportedOperationException
- if an unsupported element name is specified.
-
getDefaultTopMargin
public static int getDefaultTopMargin(java.lang.String htmlElementName)
Returns the default top margin of an HTML block element with the specified name.See the
setDefaultTopMargin(String htmlElementName, int topMargin)
method for a full description of this property.- Parameters:
htmlElementName
- (required) the case insensitive name of a supported HTML block element.- Returns:
- the default top margin of an HTML block element with the specified name.
- Throws:
java.lang.UnsupportedOperationException
- if an unsupported element name is specified.
-
setDefaultBottomMargin
public static void setDefaultBottomMargin(java.lang.String htmlElementName, int bottomMargin)
Sets the default bottom margin of an HTML block element with the specified name.The bottom margin is the number of blank lines that are to be inserted below the rendered block.
As this is a static method, the setting affects all instances of the
Renderer
class.The
htmlElementName
argument must be one of the following:
ADDRESS
,BLOCKQUOTE
,CAPTION
,CENTER
,DD
,DIR
,DIV
,DT
,FIELDSET
,FORM
,H1
,H2
,H3
,H4
,H5
,H6
,HR
,LEGEND
,LI
,MENU
,OL
,P
,PRE
,TR
,UL
- Parameters:
htmlElementName
- (required) the case insensitive name of a supported HTML block element.bottomMargin
- the new bottom margin of the specified element.- Throws:
java.lang.UnsupportedOperationException
- if an unsupported element name is specified.
-
getDefaultBottomMargin
public static int getDefaultBottomMargin(java.lang.String htmlElementName)
Returns the default bottom margin of an HTML block element with the specified name.See the
setDefaultBottomMargin(String htmlElementName, int bottomMargin)
method for a full description of this property.- Parameters:
htmlElementName
- (required) the case insensitive name of a supported HTML block element.- Returns:
- the default bottom margin of an HTML block element with the specified name.
- Throws:
java.lang.UnsupportedOperationException
- if an unsupported element name is specified.
-
setDefaultIndent
public static void setDefaultIndent(java.lang.String htmlElementName, boolean indent)
Sets the default value of whether an HTML block element of the specified name is indented.As this is a static method, the setting affects all instances of the
Renderer
class.The
htmlElementName
argument must be one of the following:
ADDRESS
,BLOCKQUOTE
,CAPTION
,CENTER
,DD
,DIR
,DIV
,DT
,FIELDSET
,FORM
,H1
,H2
,H3
,H4
,H5
,H6
,HR
,LEGEND
,MENU
,OL
,P
,PRE
,TR
,UL
- Parameters:
htmlElementName
- (required) the case insensitive name of a supported HTML block element.indent
- whether the the specified element is indented.- Throws:
java.lang.UnsupportedOperationException
- if an unsupported element name is specified.
-
isDefaultIndent
public static boolean isDefaultIndent(java.lang.String htmlElementName)
Returns the default value of whether an HTML block element of the specified name is indented.See the
setDefaultIndent(String htmlElementName, boolean indent)
method for a full description of this property.- Parameters:
htmlElementName
- (required) the case insensitive name of a supported HTML block element.- Returns:
- the default value of whether an HTML block element of the specified name is indented.
- Throws:
java.lang.UnsupportedOperationException
- if an unsupported element name is specified.
-
-