reflex::Pattern Class Reference

updated Thu Jan 26 2017
 
Classes | Public Types | Public Member Functions | Protected Member Functions | Private Types | Private Member Functions | Static Private Member Functions | Private Attributes | Friends | List of all members
reflex::Pattern Class Reference

Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine. More...

#include <pattern.h>

Collaboration diagram for reflex::Pattern:
Collaboration graph
[legend]

Classes

struct  Error
 Throwable error structure reflex::Pattern::Error. More...
 
struct  Option
 Global modifier modes, syntax flags, and compiler options. More...
 
struct  Position
 Finite state machine construction position information. More...
 
struct  State
 Finite state machine. More...
 

Public Types

enum  Const { IMAX = 0xffff }
 Common constants. More...
 
typedef uint16_t Index
 index into opcodes array Pattern::opc_ and subpattern indexing More...
 
typedef uint32_t Opcode
 32 bit opcode word More...
 
typedef void(* FSM) (class Matcher &)
 

Public Member Functions

 Pattern (const char *regex, const char *options=NULL) throw (Error)
 Construct a pattern object given a regex string. More...
 
 Pattern (const char *regex, const std::string &options) throw (Error)
 Construct a pattern object given a regex string. More...
 
 Pattern (const std::string &regex, const char *options=NULL) throw (Error)
 Construct a pattern object given a regex string. More...
 
 Pattern (const std::string &regex, const std::string &options) throw (Error)
 Construct a pattern object given a regex string. More...
 
 Pattern (const Opcode *code) throw (Error)
 Construct a pattern object given an opcode table. More...
 
 Pattern (FSM fsm) throw (Error)
 Construct a pattern object given a function pointer to FSM code. More...
 
virtual ~Pattern (void)
 Destructor, deletes internal code array when owned and allocated. More...
 
Index size (void) const
 Number of subpatterns of this pattern object. More...
 
const std::string operator[] (Index choice) const
 Get subpattern of this pattern object. More...
 
bool reachable (Index choice) const
 Check is subpattern is reachable by a match. More...
 
size_t nodes (void) const
 Get the number of finite state machine nodes (vertices). More...
 
size_t edges (void) const
 Get the number of finite state machine edges (transitions on input characters). More...
 
size_t words (void) const
 Get the code size in number of words. More...
 

Protected Member Functions

virtual void error (enum Error::Code code, const char *message, size_t loc=0) const throw (Error)
 Throw an error. More...
 

Private Types

enum  Meta {
  META_MIN = 0x100, META_NWB = 0x101, META_NWE = 0x102, META_BWB = 0x103,
  META_EWB = 0x104, META_BWE = 0x105, META_EWE = 0x106, META_BOL = 0x107,
  META_EOL = 0x108, META_BOB = 0x109, META_EOB = 0x10a, META_IND = 0x10b,
  META_DED = 0x10c, META_MAX
}
 Meta characters. More...
 
typedef unsigned int Char
 
typedef ORanges< CharChars
 represent (wide) char set as a set of ranges More...
 
typedef size_t Location
 
typedef ORanges< LocationRanges
 
typedef std::set< LocationSet
 
typedef std::map< int, RangesMap
 
typedef std::set< PositionPositions
 
typedef std::map< Position, PositionsFollow
 
typedef std::pair< Chars, PositionsMove
 
typedef std::list< MoveMoves
 

Private Member Functions

void init (const char *options) throw (Error)
 Initialize the pattern at construction. More...
 
void init_options (const char *options)
 
void parse (Positions &startpos, Follow &followpos, Map &modifiers, Map &lookahead) throw (Error)
 
void parse1 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Ranges &lookahead, Index &iter) throw (Error)
 
void parse2 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Ranges &lookahead, Index &iter) throw (Error)
 
void parse3 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Ranges &lookahead, Index &iter) throw (Error)
 
void parse4 (bool begin, Location &loc, Positions &firstpos, Positions &lastpos, bool &nullable, Follow &followpos, Positions &lazypos, Map &modifiers, Ranges &lookahead, Index &iter) throw (Error)
 
void parse_esc (Location &loc) const throw (Error)
 
void compile (State &start, Follow &followpos, const Map &modifiers, const Map &lookahead) throw (Error)
 
void lazy (const Positions &lazypos, Positions &pos) const
 
void lazy (const Positions &lazypos, const Positions &pos, Positions &pos1) const
 
void greedy (Positions &pos) const
 
void trim_lazy (Positions &pos) const
 
void compile_transition (State *state, Follow &followpos, const Map &modifiers, const Map &lookahead, Moves &moves) const throw (Error)
 
void transition (Moves &moves, const Chars &chars, const Positions &follow) const
 
Char compile_esc (Location loc, Chars &chars) const throw (Error)
 
void compile_list (Location loc, Chars &chars, const Map &modifiers) const throw (Error)
 
void posix (size_t index, Chars &chars) const
 
void flip (Chars &chars) const
 
void assemble (State &start) throw (Error)
 
void compact_dfa (State &start)
 
void encode_dfa (State &start) throw (Error)
 
void gencode_dfa (const State &start) const
 
void gencode_dfa_closure (FILE *fd, const State *start, int nest) const
 
void delete_dfa (State &start)
 
void export_dfa (const State &start) const
 
void export_code (void) const
 
Location find_at (Location loc, char c) const
 
Char at (Location k) const
 
bool eq_at (Location loc, const char *s) const
 
Char escape_at (Location loc) const
 
Char escapes_at (Location loc, const char *escapes) const
 

Static Private Member Functions

static bool is_modified (int mode, const Map &modifiers, Location loc)
 
static bool is_meta (Char c)
 
static Opcode opcode_take (Index index)
 
static Opcode opcode_redo (void)
 
static Opcode opcode_tail (Index index)
 
static Opcode opcode_head (Index index)
 
static Opcode opcode_goto (Char lo, Char hi, Index index)
 
static Opcode opcode_halt (void)
 
static bool is_opcode_redo (Opcode opcode)
 
static bool is_opcode_take (Opcode opcode)
 
static bool is_opcode_tail (Opcode opcode)
 
static bool is_opcode_head (Opcode opcode)
 
static bool is_opcode_halt (Opcode opcode)
 
static bool is_opcode_meta (Opcode opcode)
 
static bool is_opcode_meta (Opcode opcode, Char a)
 
static bool is_opcode_match (Opcode opcode, unsigned char c)
 
static Char meta_of (Opcode opcode)
 
static Char lo_of (Opcode opcode)
 
static Char hi_of (Opcode opcode)
 
static Index index_of (Opcode opcode)
 

Private Attributes

Option opt_
 pattern compiler options More...
 
std::string rex_
 regular expression string More...
 
std::vector< Locationend_
 entries point to the subpattern's ending '|' or '\0' More...
 
std::vector< bool > acc_
 true if subpattern n is acceptable (state is reachable) More...
 
size_t vno_
 number of finite state machine vertices |V| More...
 
size_t eno_
 number of finite state machine edges |E| More...
 
const Opcodeopc_
 points to the opcode table More...
 
Index nop_
 number of opcodes generated More...
 
FSM fsm_
 function pointer to FSM code More...
 

Friends

class Matcher
 permit access by the reflex::Matcher engine More...
 

Detailed Description

Pattern class holds a regex pattern and its compiled FSM opcode table or code for the reflex::Matcher engine.

More info TODO

Member Typedef Documentation

typedef unsigned int reflex::Pattern::Char
private

represent (wide) char set as a set of ranges

typedef std::map<Position,Positions> reflex::Pattern::Follow
private
typedef void(* reflex::Pattern::FSM) (class Matcher &)

function pointer to FSM code

typedef uint16_t reflex::Pattern::Index

index into opcodes array Pattern::opc_ and subpattern indexing

typedef size_t reflex::Pattern::Location
private
typedef std::map<int,Ranges> reflex::Pattern::Map
private
typedef std::pair<Chars,Positions> reflex::Pattern::Move
private
typedef std::list<Move> reflex::Pattern::Moves
private
typedef uint32_t reflex::Pattern::Opcode

32 bit opcode word

typedef std::set<Position> reflex::Pattern::Positions
private
typedef std::set<Location> reflex::Pattern::Set
private

Member Enumeration Documentation

Common constants.

Enumerator
IMAX 

max index, also serves as a marker

enum reflex::Pattern::Meta
private

Meta characters.

Enumerator
META_MIN 
META_NWB 

non-word at begin \Bx

META_NWE 

non-word at end x\B

META_BWB 

begin of word at begin \<x

META_EWB 

end of word at begin \>x

META_BWE 

begin of word at end x\<

META_EWE 

end of word at end x\>

META_BOL 

begin of line ^

META_EOL 

end of line $

META_BOB 

begin of buffer \A

META_EOB 

end of buffer \Z

META_IND 

indent boundary \i

META_DED 

dedent boundary \j (must be the largest META code)

META_MAX 

max meta characters

Constructor & Destructor Documentation

reflex::Pattern::Pattern ( const char *  regex,
const char *  options = NULL 
)
throw (Error
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const char *  regex,
const std::string &  options 
)
throw (Error
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const std::string &  regex,
const char *  options = NULL 
)
throw (Error
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const std::string &  regex,
const std::string &  options 
)
throw (Error
)
inlineexplicit

Construct a pattern object given a regex string.

reflex::Pattern::Pattern ( const Opcode code)
throw (Error
)
inlineexplicit

Construct a pattern object given an opcode table.

reflex::Pattern::Pattern ( FSM  fsm)
throw (Error
)
inlineexplicit

Construct a pattern object given a function pointer to FSM code.

virtual reflex::Pattern::~Pattern ( void  )
inlinevirtual

Destructor, deletes internal code array when owned and allocated.

Member Function Documentation

void reflex::Pattern::assemble ( State start)
throw (Error
)
private
Char reflex::Pattern::at ( Location  k) const
inlineprivate
void reflex::Pattern::compact_dfa ( State start)
private
void reflex::Pattern::compile ( State start,
Follow followpos,
const Map modifiers,
const Map lookahead 
)
throw (Error
)
private
Char reflex::Pattern::compile_esc ( Location  loc,
Chars chars 
) const
throw (Error
)
private
void reflex::Pattern::compile_list ( Location  loc,
Chars chars,
const Map modifiers 
) const
throw (Error
)
private
void reflex::Pattern::compile_transition ( State state,
Follow followpos,
const Map modifiers,
const Map lookahead,
Moves moves 
) const
throw (Error
)
private
void reflex::Pattern::delete_dfa ( State start)
private
size_t reflex::Pattern::edges ( void  ) const
inline

Get the number of finite state machine edges (transitions on input characters).

Returns
number of edges or 0 when no finite state machine was constructed by this pattern.
void reflex::Pattern::encode_dfa ( State start)
throw (Error
)
private
bool reflex::Pattern::eq_at ( Location  loc,
const char *  s 
) const
inlineprivate
virtual void reflex::Pattern::error ( enum Error::Code  code,
const char *  message,
size_t  loc = 0 
) const
throw (Error
)
protectedvirtual

Throw an error.

Parameters
codeerror code
messagea human-readable error message (not NULL)
locoptional location of the error in regex string Pattern::rex_
Char reflex::Pattern::escape_at ( Location  loc) const
inlineprivate
Char reflex::Pattern::escapes_at ( Location  loc,
const char *  escapes 
) const
inlineprivate
void reflex::Pattern::export_code ( void  ) const
private
void reflex::Pattern::export_dfa ( const State start) const
private
Location reflex::Pattern::find_at ( Location  loc,
char  c 
) const
inlineprivate
void reflex::Pattern::flip ( Chars chars) const
private
void reflex::Pattern::gencode_dfa ( const State start) const
private
void reflex::Pattern::gencode_dfa_closure ( FILE *  fd,
const State start,
int  nest 
) const
private
void reflex::Pattern::greedy ( Positions pos) const
private
static Char reflex::Pattern::hi_of ( Opcode  opcode)
inlinestaticprivate
static Index reflex::Pattern::index_of ( Opcode  opcode)
inlinestaticprivate
void reflex::Pattern::init ( const char *  options)
throw (Error
)
private

Initialize the pattern at construction.

void reflex::Pattern::init_options ( const char *  options)
private
static bool reflex::Pattern::is_meta ( Char  c)
inlinestaticprivate
static bool reflex::Pattern::is_modified ( int  mode,
const Map modifiers,
Location  loc 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_halt ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_head ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_match ( Opcode  opcode,
unsigned char  c 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_meta ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_meta ( Opcode  opcode,
Char  a 
)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_redo ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_tail ( Opcode  opcode)
inlinestaticprivate
static bool reflex::Pattern::is_opcode_take ( Opcode  opcode)
inlinestaticprivate
void reflex::Pattern::lazy ( const Positions lazypos,
Positions pos 
) const
private
void reflex::Pattern::lazy ( const Positions lazypos,
const Positions pos,
Positions pos1 
) const
private
static Char reflex::Pattern::lo_of ( Opcode  opcode)
inlinestaticprivate
static Char reflex::Pattern::meta_of ( Opcode  opcode)
inlinestaticprivate
size_t reflex::Pattern::nodes ( void  ) const
inline

Get the number of finite state machine nodes (vertices).

Returns
number of nodes or 0 when no finite state machine was constructed by this pattern.
static Opcode reflex::Pattern::opcode_goto ( Char  lo,
Char  hi,
Index  index 
)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_halt ( void  )
inlinestaticprivate
static Opcode reflex::Pattern::opcode_head ( Index  index)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_redo ( void  )
inlinestaticprivate
static Opcode reflex::Pattern::opcode_tail ( Index  index)
inlinestaticprivate
static Opcode reflex::Pattern::opcode_take ( Index  index)
inlinestaticprivate
const std::string reflex::Pattern::operator[] ( Index  choice) const

Get subpattern of this pattern object.

Returns
subpattern string or "".
void reflex::Pattern::parse ( Positions startpos,
Follow followpos,
Map modifiers,
Map lookahead 
)
throw (Error
)
private
void reflex::Pattern::parse1 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Ranges lookahead,
Index iter 
)
throw (Error
)
private
void reflex::Pattern::parse2 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Ranges lookahead,
Index iter 
)
throw (Error
)
private
void reflex::Pattern::parse3 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Ranges lookahead,
Index iter 
)
throw (Error
)
private
void reflex::Pattern::parse4 ( bool  begin,
Location loc,
Positions firstpos,
Positions lastpos,
bool &  nullable,
Follow followpos,
Positions lazypos,
Map modifiers,
Ranges lookahead,
Index iter 
)
throw (Error
)
private
void reflex::Pattern::parse_esc ( Location loc) const
throw (Error
)
private
void reflex::Pattern::posix ( size_t  index,
Chars chars 
) const
private
bool reflex::Pattern::reachable ( Index  choice) const
inline

Check is subpattern is reachable by a match.

Returns
true if subpattern is reachable.
Index reflex::Pattern::size ( void  ) const
inline

Number of subpatterns of this pattern object.

Returns
number of subpatterns.
void reflex::Pattern::transition ( Moves moves,
const Chars chars,
const Positions follow 
) const
private
void reflex::Pattern::trim_lazy ( Positions pos) const
private
size_t reflex::Pattern::words ( void  ) const
inline

Get the code size in number of words.

Returns
number of words or 0 when no code was generated by this pattern.

Friends And Related Function Documentation

friend class Matcher
friend

permit access by the reflex::Matcher engine

Member Data Documentation

std::vector<bool> reflex::Pattern::acc_
private

true if subpattern n is acceptable (state is reachable)

std::vector<Location> reflex::Pattern::end_
private

entries point to the subpattern's ending '|' or '\0'

size_t reflex::Pattern::eno_
private

number of finite state machine edges |E|

FSM reflex::Pattern::fsm_
private

function pointer to FSM code

Index reflex::Pattern::nop_
private

number of opcodes generated

const Opcode* reflex::Pattern::opc_
private

points to the opcode table

Option reflex::Pattern::opt_
private

pattern compiler options

std::string reflex::Pattern::rex_
private

regular expression string

size_t reflex::Pattern::vno_
private

number of finite state machine vertices |V|


The documentation for this class was generated from the following file: