com.lowagie.text.xml.simpleparser

Class SimpleXMLParser

public final class SimpleXMLParser extends Object

A simple XML and HTML parser. This parser is, like the SAX parser, an event based parser, but with much less functionality.

The parser can:

Field Summary
static intATTRIBUTE_EQUAL
static intATTRIBUTE_KEY
static intATTRIBUTE_VALUE
Stringattributekey
the attribute key.
HashMapattributes
current attributes
Stringattributevalue
the attribute value.
intcharacter
The current character.
intcolumns
the column where the current character occurs
SimpleXMLDocHandlerCommentcomment
The handler to which we are going to forward comments.
static intCDATA
static intCOMMENT
SimpleXMLDocHandlerdoc
The handler to which we are going to forward document content
StringBufferentity
current entity (whatever is encountered between & and ;)
booleaneol
was the last character equivalent to a newline?
static intENTITY
static intEXAMIN_TAG
booleanhtml
Are we parsing HTML?
static intIN_CLOSETAG
intlines
the line we are currently reading
intnested
Keeps track of the number of tags that are open.
booleannowhite
A boolean indicating if the next character should be taken into account if it's a space character.
intpreviousCharacter
The previous character.
static intPI
intquoteCharacter
the quote character that was used to open the quote.
static intQUOTE
Stackstack
the state stack
intstate
the current state
static intSINGLE_TAG
Stringtag
current tagname
StringBuffertext
current text (whatever is encountered between tags)
static intTAG_ENCOUNTERED
static intTAG_EXAMINED
static intTEXT
static intUNKNOWN
possible states
Constructor Summary
SimpleXMLParser(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, boolean html)
Creates a Simple XML parser object.
Method Summary
voiddoTag()
Sets the name of the tag.
static StringescapeXML(String s, boolean onlyASCII)
Escapes a string with the appropriated XML codes.
voidflush()
Flushes the text that is currently in the buffer.
static StringgetDeclaredEncoding(String decl)
static StringgetEncodingName(byte[] b4)
Returns the IANA encoding name that is auto-detected from the bytes specified, with the endian-ness of that encoding where appropriate. (method found in org.apache.xerces.impl.XMLEntityManager, originally published by the Apache Software Foundation under the Apache Software License; now being used in iText under the MPL)
voidgo(Reader r)
Does the actual parsing.
voidinitTag()
Initialized the tag name and attributes.
static voidparse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html)
Parses the XML document firing the events to the handler.
static voidparse(SimpleXMLDocHandler doc, InputStream in)
Parses the XML document firing the events to the handler.
static voidparse(SimpleXMLDocHandler doc, Reader r)
voidprocessTag(boolean start)
processes the tag.
intrestoreState()
Gets a state from the stack
voidsaveState(int s)
Adds a state to the stack.
voidthrowException(String s)
Throws an exception

Field Detail

ATTRIBUTE_EQUAL

private static final int ATTRIBUTE_EQUAL

ATTRIBUTE_KEY

private static final int ATTRIBUTE_KEY

ATTRIBUTE_VALUE

private static final int ATTRIBUTE_VALUE

attributekey

String attributekey
the attribute key.

attributes

HashMap attributes
current attributes

attributevalue

String attributevalue
the attribute value.

character

int character
The current character.

columns

int columns
the column where the current character occurs

comment

SimpleXMLDocHandlerComment comment
The handler to which we are going to forward comments.

CDATA

private static final int CDATA

COMMENT

private static final int COMMENT

doc

SimpleXMLDocHandler doc
The handler to which we are going to forward document content

entity

StringBuffer entity
current entity (whatever is encountered between & and ;)

eol

boolean eol
was the last character equivalent to a newline?

ENTITY

private static final int ENTITY

EXAMIN_TAG

private static final int EXAMIN_TAG

html

boolean html
Are we parsing HTML?

IN_CLOSETAG

private static final int IN_CLOSETAG

lines

int lines
the line we are currently reading

nested

int nested
Keeps track of the number of tags that are open.

nowhite

boolean nowhite
A boolean indicating if the next character should be taken into account if it's a space character. When nospace is false, the previous character wasn't whitespace.

Since: 2.1.5

previousCharacter

int previousCharacter
The previous character.

PI

private static final int PI

quoteCharacter

int quoteCharacter
the quote character that was used to open the quote.

QUOTE

private static final int QUOTE

stack

Stack stack
the state stack

state

int state
the current state

SINGLE_TAG

private static final int SINGLE_TAG

tag

String tag
current tagname

text

StringBuffer text
current text (whatever is encountered between tags)

TAG_ENCOUNTERED

private static final int TAG_ENCOUNTERED

TAG_EXAMINED

private static final int TAG_EXAMINED

TEXT

private static final int TEXT

UNKNOWN

private static final int UNKNOWN
possible states

Constructor Detail

SimpleXMLParser

private SimpleXMLParser(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, boolean html)
Creates a Simple XML parser object. Call go(BufferedReader) immediately after creation.

Method Detail

doTag

private void doTag()
Sets the name of the tag.

escapeXML

public static String escapeXML(String s, boolean onlyASCII)
Escapes a string with the appropriated XML codes.

Parameters: s the string to be escaped onlyASCII codes above 127 will always be escaped with &#nn; if true

Returns: the escaped string

flush

private void flush()
Flushes the text that is currently in the buffer. The text can be ignored, added to the document as content or as comment,... depending on the current state.

getDeclaredEncoding

private static String getDeclaredEncoding(String decl)

getEncodingName

private static String getEncodingName(byte[] b4)
Returns the IANA encoding name that is auto-detected from the bytes specified, with the endian-ness of that encoding where appropriate. (method found in org.apache.xerces.impl.XMLEntityManager, originally published by the Apache Software Foundation under the Apache Software License; now being used in iText under the MPL)

Parameters: b4 The first four bytes of the input.

Returns: an IANA-encoding string

go

private void go(Reader r)
Does the actual parsing. Perform this immediately after creating the parser object.

initTag

private void initTag()
Initialized the tag name and attributes.

parse

public static void parse(SimpleXMLDocHandler doc, SimpleXMLDocHandlerComment comment, Reader r, boolean html)
Parses the XML document firing the events to the handler.

Parameters: doc the document handler r the document. The encoding is already resolved. The reader is not closed

Throws: IOException on error

parse

public static void parse(SimpleXMLDocHandler doc, InputStream in)
Parses the XML document firing the events to the handler.

Parameters: doc the document handler in the document. The encoding is deduced from the stream. The stream is not closed

Throws: IOException on error

parse

public static void parse(SimpleXMLDocHandler doc, Reader r)

processTag

private void processTag(boolean start)
processes the tag.

Parameters: start if true we are dealing with a tag that has just been opened; if false we are closing a tag.

restoreState

private int restoreState()
Gets a state from the stack

Returns: the previous state

saveState

private void saveState(int s)
Adds a state to the stack.

Parameters: s a state to add to the stack

throwException

private void throwException(String s)
Throws an exception