|
Data.Attoparsec.Char8 | Portability | unknown | Stability | experimental | Maintainer | bos@serpentine.com |
|
|
|
|
|
Description |
Simple, efficient, character-oriented combinator parsing for
ByteString strings, loosely based on the Parsec library.
|
|
Synopsis |
|
|
|
|
Character encodings
|
|
This module is intended for parsing text that is
represented using an 8-bit character set, e.g. ASCII or
ISO-8859-15. It does not make any attempt to deal with character
encodings, multibyte characters, or wide characters. In
particular, all attempts to use characters above code point U+00FF
will give wrong answers.
Code points below U+0100 are simply translated to and from their
numeric values, so e.g. the code point U+00A4 becomes the byte
0xA4 (which is the Euro symbol in ISO-8859-15, but the generic
currency sign in ISO-8859-1). Haskell Char values above U+00FF
are truncated, so e.g. U+1D6B7 is truncated to the byte 0xB7.
|
|
Parser types
|
|
|
The Parser type is a monad.
| Instances | |
|
|
|
The result of a parse.
| Constructors | Fail !ByteString [String] String | The parse failed. The ByteString is the input
that had not yet been consumed when the failure
occurred. The [String] is a list of contexts
in which the error occurred. The String is the
message describing the error, if any.
| Partial (ByteString -> Result r) | Supply this continuation with more input so that
the parser can resume. To indicate that no more
input is available, use an empty string.
| Done !ByteString r | The parse succeeded. The ByteString is the
input that had not yet been consumed (if any) when
the parse succeeded.
|
| Instances | |
|
|
Running parsers
|
|
|
Run a parser and return its result.
|
|
|
Run a parser and print its result to standard output.
|
|
|
:: Monad m | | => m ByteString | | -> Parser a | Initial input for the parser.
| -> ByteString | | -> m (Result a) | | Run a parser with an initial input string, and a monadic action
that can supply more input if needed.
|
|
|
|
If a parser has returned a Partial result, supply it with more
input.
|
|
Combinators
|
|
|
|
|
|
Attempt a parse, and if it fails, rewind the input so that no
input appears to have been consumed.
This combinator is useful in cases where a parser might consume
some input before failing, i.e. the parser needs arbitrary
lookahead. The downside to using this combinator is that it can
retain input for longer than is desirable.
|
|
module Data.Attoparsec.Combinator |
|
Parsing individual characters
|
|
|
The parser satisfy p succeeds for any byte for which the
predicate p returns True. Returns the byte that is actually
parsed.
digit = satisfy isDigit
where isDigit c = c >= '0' && c <= '9'
|
|
|
Match a specific character.
|
|
|
Match any character.
|
|
|
Match a specific character, but return its Word8 value.
|
|
|
Match any character except the given one.
|
|
Special character parsers
|
|
|
Parse a single digit.
|
|
|
Match a letter, in the ISO-8859-15 encoding.
|
|
|
Match a letter, in the ASCII encoding.
|
|
|
Parse a space character.
Note: This parser only gives correct answers for the ASCII
encoding. For instance, it does not recognise U+00A0 (non-breaking
space) as a space character, even though it is a valid ISO-8859-15
byte.
|
|
Fast predicates
|
|
|
A fast digit predicate.
|
|
|
A fast digit predicate.
|
|
|
A fast alphabetic predicate for the ISO-8859-15 encoding
Note: For all character encodings other than ISO-8859-15, and
almost all Unicode code points above U+00A3, this predicate gives
wrong answers.
|
|
|
A fast alphabetic predicate for the ASCII encoding
Note: For all character encodings other than ASCII, and
almost all Unicode code points above U+007F, this predicate gives
wrong answers.
|
|
|
Fast predicate for matching ASCII space characters.
Note: This predicate only gives correct answers for the ASCII
encoding. For instance, it does not recognise U+00A0 (non-breaking
space) as a space character, even though it is a valid ISO-8859-15
byte. For a Unicode-aware and only slightly slower predicate,
use Data.Char.isSpace
|
|
|
Fast Word8 predicate for matching ASCII space characters.
|
|
Character classes
|
|
|
Match any character in a set.
vowel = inClass "aeiou"
Range notation is supported.
halfAlphabet = inClass "a-nA-N"
To add a literal '-' to a set, place it at the beginning or end
of the string.
|
|
|
Match any character not in a set.
|
|
Efficient string handling
|
|
|
string s parses a sequence of bytes that identically match
s. Returns the parsed string (i.e. s). This parser consumes no
input if it fails (even if a partial match).
Note: The behaviour of this parser is different to that of the
similarly-named parser in Parsec, as this one is all-or-nothing.
To illustrate the difference, the following parser will fail under
Parsec given an input of for:
string "foo" <|> string "for"
The reason for its failure is that that the first branch is a
partial match, and will consume the letters 'f' and 'o'
before failing. In Attoparsec, the above parser will succeed on
that input, because the failed first branch will consume nothing.
|
|
|
Satisfy a literal string, ignoring case.
|
|
|
Skip over white space.
|
|
|
Skip past input for as long as the predicate returns True.
|
|
|
Consume exactly n bytes of input.
|
|
|
Consume input as long as the predicate returns False
(i.e. until it returns True), and return the consumed input.
This parser does not fail. It will return an empty string if the
predicate returns True on the first byte of input.
Note: Because this parser does not fail, do not use it with
combinators such as many, because such parsers loop until a
failure occurs. Careless use will thus result in an infinite loop.
|
|
|
Consume input as long as the predicate returns True, and return
the consumed input.
This parser does not fail. It will return an empty string if the
predicate returns False on the first byte of input.
Note: Because this parser does not fail, do not use it with
combinators such as many, because such parsers loop until a
failure occurs. Careless use will thus result in an infinite loop.
|
|
|
Consume input as long as the predicate returns True, and return
the consumed input.
This parser requires the predicate to succeed on at least one byte
of input: it will fail if the predicate never returns True or if
there is no input left.
|
|
Text parsing
|
|
|
Match either a single newline character '\n', or a carriage
return followed by a newline character "\r\n".
|
|
|
A predicate that matches either a carriage return '\r' or
newline '\n' character.
|
|
|
A predicate that matches either a space ' ' or horizontal tab
'\t' character.
|
|
Numeric parsers
|
|
|
Parse and decode an unsigned decimal number.
|
|
|
Parse and decode an unsigned hexadecimal number. The hex digits
'a' through 'f' may be upper or lower case.
This parser does not accept a leading "0x" string.
|
|
|
Parse a number with an optional leading '+' or '-' sign
character.
|
|
State observation and manipulation functions
|
|
|
Match only if all input has been consumed.
|
|
|
Succeed only if at least n bytes of input are available.
|
|
Produced by Haddock version 2.6.0 |