org.gjt.xpp.impl.tokenizer
public class Tokenizer extends Object
Field Summary | |
---|---|
static byte | ATTR_CHARACTERS |
static byte | ATTR_CONTENT |
static byte | ATTR_NAME |
char[] | buf |
static byte | CDSECT |
static byte | CHARACTERS |
static byte | CHAR_REF |
static byte | COMMENT |
static byte | CONTENT |
static byte | DOCTYPE |
static byte | EMPTY_ELEMENT |
static byte | END_DOCUMENT |
static byte | ENTITY_REF |
static byte | ETAG_NAME |
protected static boolean[] | lookupNameChar |
protected static boolean[] | lookupNameStartChar |
protected static int | LOOKUP_MAX |
protected static char | LOOKUP_MAX_CHAR |
int | nsColonCount |
boolean | paramNotifyAttValue |
boolean | paramNotifyCDSect |
boolean | paramNotifyCharacters |
boolean | paramNotifyCharRef |
boolean | paramNotifyComment |
boolean | paramNotifyDoctype |
boolean | paramNotifyEntityRef |
boolean | paramNotifyPI |
boolean | parsedContent
This falg decides which buffer will be used to retrieve
content for current token. |
char[] | pc
This is buffer for parsed content such as
actual valuue of entity
('<' in buf but in pc it is '<') |
int | pcEnd |
int | pcStart
Range [pcStart, pcEnd) defines part of pc that is content
of current token iff parsedContent == false |
int | pos position of next char that will be read from buffer |
int | posEnd |
int | posNsColon |
int | posStart
Range [posStart, posEnd) defines part of buf that is content
of current token iff parsedContent == false |
static byte | PI |
boolean | seenContent |
static byte | STAG_END |
static byte | STAG_NAME |
Constructor Summary | |
---|---|
Tokenizer() |
Method Summary | |
---|---|
int | getBufferShrinkOffset() |
int | getColumnNumber() |
int | getHardLimit() |
int | getLineNumber() |
String | getPosDesc()
Return string describing current position of parsers as
text 'at line %d (row) and column %d (colum) [seen %s...]'. |
int | getSoftLimit() |
boolean | isAllowedMixedContent() |
boolean | isBufferShrinkable() |
protected boolean | isNameChar(char ch) |
protected boolean | isNameStartChar(char ch) |
protected boolean | isS(char ch)
Determine if ch is whitespace ([3] S) |
byte | next()
Return next recognized toke or END_DOCUMENT if no more input.
|
void | reset() |
void | setAllowedMixedContent(boolean enable)
Set support for mixed conetent. |
void | setBufferShrinkable(boolean shrinkable) |
void | setHardLimit(int value)
Set hard limit on internal buffer size.
|
void | setInput(Reader r) Reset tokenizer state and set new input source |
void | setInput(char[] data) Reset tokenizer state and set new input source |
void | setInput(char[] data, int off, int len) |
void | setNotifyAll(boolean enable)
Set notification of all XML content tokens:
Characters, Comment, CDSect, Doctype, PI, EntityRef, CharRef and
AttValue (tokens for STag, ETag and Attribute are always sent). |
void | setParseContent(boolean enable)
Allow reporting parsed content for element content
and attribute content (no need to deal with low level
tokens such as in setNotifyAll). |
void | setSoftLimit(int value)
Set soft limit on internal buffer size.
|
This is simple automata (in pseudo-code):
byte next() { while(state != END_DOCUMENT) { ch = more(); // read character from input state = func(ch, state); // do transition if(state is accepting) return state; // return token to caller } }
For speed (and simplicity?) it is using few procedures such as readName() or isS().