|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.google.gdata.util.parser.Parser<T>
T
- public abstract class Parser<T>
The com.google.gdata.util.parser package provides a framework for creating recursive descent parsers. A fairly straightforward transformation exists between EBNF (extended Backus-Naur form) grammars and code used to construct a parser using this framework that will match the grammar. The intention of this package is to obviate the need to create small mini-parsers for tasks that don't feel large enough for a standard compiler-compiler (like JavaCC) but still need more formalism than simple string tokenization. The basic approach this framework takes to parsing is to define several types of leaf parsers which know how to parse a particular type of object (character set, string literal, etc.) and to then combine them together in interesting ways. For example, a parser for a comma separated list of integers would look like: Parser Chset.DIGIT.plus().list(new Chset(',')); The EBNF this represents is: sent: [0-9]+ ("," [0-9]+)* The Parser.list() functionality is sometimes represented as the '%' operator in EBNF extensions. It performs the transformation: a % b --> a (b a)* The leaf parsers that are currently defined are:
Chset
,
Strlit
,
The operators which combine 1 or 2 parsers together are:
,
Action
,
Alternative
,
Difference
,
Intersection
,
Repeat
,
Rule
,
In general, it isn't necessary to create an operator-type parser directly as
an appropriate member function usually exists in Parser for creating
them. Note that these are purely convenience routines.
In general, the parsers are greedy. For example, the Sequence parser will
match as much as possible with the left sub-parser before trying the right
sub-parser. This behavior can normally be avoided by using a recursive
grammar. Consider the following grammar:
token: foo* bar
foo: [a-z]+
bar: foo [0-9]+
This grammar will fail to parse the string "aa0" because the 'foo*' rule
will consume all of the letters and not leave one left for the 'bar'
rule. An alternate definition of 'token' can prevent this behavior:
token: (foo token) | bar
The parsers created by this parser framework use infinite lookahead. In
extreme cases, a parser can be constructed which scans over the parse buffer
many times trying to find a match. In practice, this doesn't happen very
often.
Field Summary | |
---|---|
static int |
NO_MATCH
|
Constructor Summary | |
---|---|
Parser()
|
Method Summary | ||
---|---|---|
|
action(Callback<U> callback)
Creates a Action that will fire and call
Callback.handle whenever this matches. |
|
static
|
alternative(Parser<? super T> left,
Parser<? super T> right)
Creates an Alternative parser from the left and
right sub-parsers. |
|
static
|
difference(Parser<? super T> left,
Parser<? super T> right)
Creates a Difference parser from the left and
right sub-parsers. |
|
static
|
intersection(Parser<? super T> left,
Parser<? super T> right)
Creates an Intersection parser from the left and
right sub-parsers. |
|
Parser<T> |
list(Parser<? super T> sep)
Creates a Parser that matches a sequence of this
parsers separated by sep parsers. |
|
Parser<T> |
optional()
Creates a Repeat parser that matches this either
0 or 1 times. |
|
abstract int |
parse(char[] buf,
int start,
int end,
T udata)
The parse interface that subclasses must implement. |
|
int |
parse(char[] buf,
T udata)
Convenience routine to parse a character array. |
|
int |
parse(java.io.Reader reader,
T udata)
Convenience routine to parse a java.io.Reader . |
|
int |
parse(java.lang.String str,
T udata)
Convenience routine to parse a string. |
|
Parser<T> |
plus()
Creates a Repeat parser that matches this 1 or
more times. |
|
Parser<T> |
repeat(int count)
Creates a Repeat parser that matches this
exactly count times. |
|
Parser<T> |
repeat(int min,
int max)
Creates a Repeat parser that matches this at
least min times and not mroe than max times. |
|
static
|
sequence(Parser<? super T> left,
Parser<? super T> right)
Creates a Sequence parser from the left and
right sub-parsers. |
|
static
|
sequence(Parser<? super T> one,
Parser<? super T> two,
Parser<? super T> three)
Creates a Sequence parser from parsers one ,
two and three sub-parsers. |
|
static
|
sequence(Parser<? super T> one,
Parser<? super T> two,
Parser<? super T> three,
Parser<? super T> four)
Creates a sequence of four parsers. |
|
static
|
sequence(Parser<? super T> one,
Parser<? super T> two,
Parser<? super T> three,
Parser<? super T> four,
Parser<? super T> five)
Creates a sequence of five parsers. |
|
Parser<T> |
star()
Creates a Repeat parser that matches this 0 or
more times. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int NO_MATCH
Constructor Detail |
---|
public Parser()
Method Detail |
---|
public abstract int parse(char[] buf, int start, int end, T udata)
buf
- The character array to match against.start
- The start offset of data within the character array to match
against.end
- The end offset of data within the character array to match
against.udata
- User defined object that is passed to
Callback.handle
when an Action
fires.public final int parse(char[] buf, T udata)
public final int parse(java.lang.String str, T udata)
public final int parse(java.io.Reader reader, T udata)
java.io.Reader
.
public final Parser<T> repeat(int count)
Repeat
parser that matches this
exactly count
times.
this{count}
count
- The number of times this
must match in sequence.public final Parser<T> repeat(int min, int max)
Repeat
parser that matches this
at
least min
times and not mroe than max
times.
this{min,max}
min
- The minimum number of times this
must match in
sequence.max
- The maximum number of times this
is allowed to
match in sequence.public final Parser<T> star()
Repeat
parser that matches this
0 or
more times.
this*
public final Parser<T> plus()
Repeat
parser that matches this
1 or
more times.
this+
public final Parser<T> optional()
Repeat
parser that matches this
either
0 or 1 times.
this?
public final Parser<T> list(Parser<? super T> sep)
Parser
that matches a sequence of this
parsers separated by sep
parsers. These sequences occur
often: space separated words, comma separated words, etc.
sep
- The parser which separates instances of this
.public final <U extends T> Parser<U> action(Callback<U> callback)
Action
that will fire and call
Callback.handle
whenever this
matches.
callback
- The Callback
to call when
this
matches.public static <T> Parser<T> alternative(Parser<? super T> left, Parser<? super T> right)
Alternative
parser from the left
and
right
sub-parsers.
left | right
public static <T> Parser<T> intersection(Parser<? super T> left, Parser<? super T> right)
Intersection
parser from the left
and
right
sub-parsers.
left & right
public static <T> Parser<T> difference(Parser<? super T> left, Parser<? super T> right)
Difference
parser from the left
and
right
sub-parsers.
left - right
public static <T> Parser<T> sequence(Parser<? super T> left, Parser<? super T> right)
Sequence
parser from the left
and
right
sub-parsers.
left right
public static <T> Parser<T> sequence(Parser<? super T> one, Parser<? super T> two, Parser<? super T> three)
Sequence
parser from parsers one
,
two
and three
sub-parsers. Equivalent to calling
Parser.sequence(one, Parser.sequence(two, three)).
public static <T> Parser<T> sequence(Parser<? super T> one, Parser<? super T> two, Parser<? super T> three, Parser<? super T> four)
sequence(Parser,Parser,Parser)
public static <T> Parser<T> sequence(Parser<? super T> one, Parser<? super T> two, Parser<? super T> three, Parser<? super T> four, Parser<? super T> five)
sequence(Parser,Parser,Parser)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |