class PERL5_REGULAR_EXPRESSION_BUILDER

All features

Direct parents

conformant parents

POSIX_REGULAR_EXPRESSION_BUILDER

Summary

creation features

exported features

options

scanning

parsing

parsing

parsing

parsing

parsing

sub parts of union

make

behaviours

internal behaviour

parsing

results

build

make

basic

error managment

scanning

assertions

character classes

character class naming

and/or basics

Details

has_extended_ligibility: BOOLEAN

Is the extended ligibility active?

set_extended_ligibility

Activate extended ligibility.

ensure

  • definition: has_extended_ligibility = True

set_no_extended_ligibility

Desactivate extended ligibility.

ensure

  • definition: has_extended_ligibility = False

set_default_options

Set the default options

ensure

  • not has_extended_ligibility
  • is_case_sensitive
  • not does_any_match_newline
  • does_match_text_boundary

has_unterminated_comment: BOOLEAN

was an unterminated comment sequence (?#... detected

skip_blanks_and_comments

Skips the blanks and comments when the extended legibility option is set.

require

  • has_no_error: not has_error

ensure

  • has_no_error: not has_error

read_character

Goto to the next character that is not a blank or a comment.

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input

ensure

  • next_position: position > old position
  • has_no_error: not has_error

read_integer

Reads in 'last_integer' the current integer values and then goto to the next character that is not a blank or a comment.

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • begin_with_a_digit: last_character.is_decimal_digit

ensure

  • has_no_error: not has_error
  • digits_eaten: end_of_input or else not last_character.is_decimal_digit

emit_repeat (mini: INTEGER, maxi: INTEGER)

Takes the top of the stack and replace it with a construction that will evaluate the repeating of it from 'mini' to 'maxi' times. If current character is '?' it means that the repeat is not greedy.

require

  • enougth_data: stack.count > 0
  • mini_is_valid: mini >= 0 and then mini /= Repeat_infiny
  • maxi_is_valid: maxi = Repeat_infiny or else maxi >= mini
  • not_droping: mini = 0 implies maxi /= 0

ensure

  • constant_stack_count: stack.count = old stack.count
  • stack_not_empty: stack.count > 0

parse_alternative

Parses an alternative of sequences. alternative ::= sequence [ '|' sequence ]...

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input

ensure

  • error_or_stack_incremented_by_one: has_error or else stack.count = old stack.count + 1
  • state_known: has_error or else end_of_input or else last_character = ')'

parse_group

Parses a group. A group is either a posix group or an extanded pattern group.

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • begin_with_open_parenthesis: last_character = '('

ensure

  • error_or_stack_incremented_by_one: has_error or else stack.count = old stack.count + 1

parse_escaped

Parses an escaped character. escaped ::= '\' CHARACTER

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • begin_with_escape: last_character = '\'

ensure

  • error_or_stack_incremented_by_one: has_error or else stack.count = old stack.count + 1

parse_union_factor

Parses a union factor. union_factor::= '[.' TEXT '.]' | '[:' CLASS ':]' | '[:<:]' | '[:>:]' | CHARACTER

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input

internal_parse_escaped (in_union: BOOLEAN)

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • begin_with_escape: last_character = '\'

parse_posix_indication
parse_extended_pattern

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • begin_with_open_parenthesis: last_character = '('
  • followed_with_question_mark: valid_next_character and next_character = '?'

parse_looking (ahead: BOOLEAN)

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • begin_with: last_character = '=' or else last_character = '!'

read_modifiers (level: BOOLEAN)

require

  • has_no_error: not has_error

ensure

  • has_no_error: not has_error

internal_parse

Main parse of a POSIX regular expression.

require

  • at_first_position: position = expression.lower
  • stack_is_empty: stack.is_empty
  • no_groups: last_group_count = 0 and group_stack.is_empty

ensure

  • error_or_result: has_error or else stack.count = 1 and group_stack.is_empty

parse_sequence

Parses a sequence of terms. sequence ::= term [ term ]...

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • end_excluded: last_character /= '|' and then last_character /= ')'

ensure

  • error_or_stack_incremented_by_one: has_error or else stack.count = old stack.count + 1
  • state_known: has_error or else end_of_input or else last_character = '|' or else last_character = ')'

parse_term

Parses a term. term ::= factor [ repeat-spec ] repeat-spec ::= '?' | '*' | '+' | '{' integer [',' [integer]] '}'

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • end_excluded: last_character /= '|' and then last_character /= ')'

ensure

  • error_or_stack_incremented_by_one: has_error or else stack.count = old stack.count + 1

parse_factor

Parses a factor. factor ::= group | union | '.' | '^' | '$' | escaped | text

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • end_excluded: last_character /= '|' and then last_character /= ')'

ensure

  • error_or_stack_incremented_by_one: has_error or else stack.count = old stack.count + 1

parse_text

Parses a text. text ::= A SEQUENCE NOT FOLLOWED BY EITHER '*', '+', '?', '{' OF NOT ESCAPED CHARACTERS

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • current_character_is_valid: valid_last_character and then not (once "([.^$\*+?{").has(last_character)

ensure

  • error_or_stack_incremented_by_one: has_error or else stack.count = old stack.count + 1

parse_union

Parses a union. union ::= '[' ['^'] union_term... ']'

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • begin_with_open_bracket: last_character = '['

ensure

  • error_or_stack_incremented_by_one: has_error or else stack.count = old stack.count + 1

parse_union_term

Parses a union term. union_term ::= union_factor ['-' union_factor]

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input

ensure

  • error_or_stack_incremented_by_one: has_error or else stack.count = old stack.count + 1

read_embedded

Parses the text embedded in one of '[.' TEXT '.]', '[:' TEXT ':]' or '[=' TEXT '=]'. The parsed text is put in feature 'last_string'.

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • previous_character_is_open_brace: valid_previous_character and then previous_character = '['
  • current_character_is_valid: valid_last_character and then (once ".:=").has(last_character)

recorded_character: CHARACTER

Last union_factor's character recorded.

recorded_item: BACKTRACKING_NODE

Last union_factor's item (complex expression) recorded.

set_recorded_character (value: CHARACTER)

Records the union_factor's character 'value'.

ensure

  • recorded_item = Void
  • recorded_character = value

set_recorded_item (value: BACKTRACKING_NODE)

Records the union_factor's item (complex expression) 'value'.

require

  • item_not_void: value /= Void

ensure

  • recorded_item /= Void
  • recorded_item = value

emit_recorded

Emits the last union_factor's recorded character or item, depending of its kind.

ensure

  • incremented_by_one: stack.count = old stack.count + 1

make

Initialise the attributes.

is_case_insensitive: BOOLEAN

Is the match case insensitive? Default is False

is_case_sensitive: BOOLEAN

Is the match case sensitive? Default is True

set_case_sensitive

Set the match as case sensitive.

ensure

  • definition: is_case_insensitive = False and is_case_sensitive = True

set_case_insensitive

Set the match as case insensitive.

ensure

  • definition: is_case_insensitive = True and is_case_sensitive = False

does_any_match_newline: BOOLEAN

Does the "any character" mark match a newline? Default is False

set_any_match_newline

The "any character" mark will match a newline.

ensure

  • definition: does_any_match_newline = True

set_any_dont_match_newline

The "any character" mark will not match a newline.

ensure

  • definition: does_any_match_newline = False

does_match_line_boundary: BOOLEAN

Does the begin/end marks match line boundary? Default is False

does_match_text_boundary: BOOLEAN

Does the begin/end marks match text boundary? Default is True

ensure

  • definition: Result = not does_match_line_boundary

set_match_line_boundary

The begin/end marks will match line boundary.

ensure

  • definition: does_match_line_boundary = True and does_match_text_boundary = False

set_match_text_boundary

The begin/end marks will match text boundary.

ensure

  • definition: does_match_line_boundary = False and does_match_text_boundary = True

is_greedy: BOOLEAN

Does match a maximal repeat? Default is False

set_greedy

Will match a maximal repeat.

ensure

  • definition: is_greedy = True

set_not_greedy

Will match a minimal repeat.

ensure

  • definition: is_greedy = False

is_looking_ahead: BOOLEAN

Is building a look-ahead term?

is_looking_behind: BOOLEAN

Is building a look-behind term?

is_looking_around: BOOLEAN

Is building look-ahead or look-behind?

is_looking_positive: BOOLEAN

Is the current look-around positive or negative?

parse_expression (expr: STRING)

Set the expression to parse and parse it. When no error the result if put in feature 'last_regular_expression'. If there is an error, a human readable explanation is retrievable by the feature 'last_error'.

require

  • expression_not_void: expr /= Void

ensure

  • error_or_result: has_error xor has_result

parse

Parse the current expression. The result if any is got through 'last_regular_expression'

require

  • expression_not_void: expression /= Void
  • internal_state_ok: stack.is_empty and group_stack.is_empty

ensure

  • internal_state_ok: stack.is_empty and group_stack.is_empty
  • error_or_result: has_error xor has_result

has_result: BOOLEAN

Did the last 'parse' or 'parse_expression' produced a result in 'last_regular_expression'?

ensure

  • definition: Result = last_pattern.is_valid

last_pattern: BACKTRACKING_REGULAR_EXPRESSION_PATTERN

The last regular expression pattern built by 'parse' or 'parse_expression'

stack: FAST_ARRAY [E_][BACKTRACKING_NODE]

The stack of items.

group_stack: FAST_ARRAY [E_][INTEGER]

The stack of groups

last_group_count: INTEGER

The count of groups currently found.

Repeat_infiny: INTEGER

Constant that means "infinite repitition".

emit (item: BACKTRACKING_NODE)

Pushs 'item' on the stack. [..] -> [.., item]

require

  • item_not_void: item /= Void

ensure

  • stack_count_increased_by_one: stack.count = old stack.count + 1
  • stack_not_empty: stack.count > 0

unemit: BACKTRACKING_NODE

Pops the Result the stack. [... Result] -> [...]

require

  • stack_not_empty: stack.count > 0

ensure

  • stack_count_decreased_by_one: stack.count = old stack.count - 1

emit_any_character

Push the match to any characters

emit_begin_of_line

Push the match to begin of a line

emit_end_of_line

Push the match to end of a line

prepare_group

Declares that a new group begins.

ensure

  • last_group_count_increased_by_one: last_group_count = old last_group_count + 1
  • group_greater_than_zero: last_group_count > 0
  • group_stack_count_increased_by_one: group_stack.count = old group_stack.count + 1
  • group_pushed: group_stack.last = last_group_count

emit_group

Push the "end of group" item and update the group indicators [.. X] -> [.., end_group(i)]

require

  • group_greater_than_zero: last_group_count > 0
  • group_stack_not_empty: not group_stack.is_empty
  • enougth_data: stack.count > 0

ensure

  • constant_stack_count: stack.count = old stack.count
  • stack_not_empty: stack.count > 0
  • last_group_count_unchanged: last_group_count = old last_group_count
  • group_stack_count_decreased_by_one: group_stack.count = old group_stack.count - 1

emit_begin_group
This feature is obsolete: use declare_group/emit_group
emit_end_group
This feature is obsolete: use declare_group/emit_group
emit_match_previous_group (group: INTEGER)

Push the item that matches the character 'char' [..] -> [.., previous_group(group)]

require

  • valid_group: 0 < group and group <= last_group_count
  • closed_group: not group_stack.has(group)

ensure

  • stack_count_increased_by_one: stack.count = old stack.count + 1
  • stack_not_empty: stack.count > 0

emit_match_single (char: CHARACTER)

Push the item that matches the character 'char' [..] -> [.., char]

ensure

  • stack_count_increased_by_one: stack.count = old stack.count + 1
  • stack_not_empty: stack.count > 0

emit_match_range (lower: CHARACTER, upper: CHARACTER)

Push the item that matches the character range 'lower'..'upper'. [..] -> [.., lower..upper]

require

  • valid_range: lower <= upper

ensure

  • stack_count_increased_by_one: stack.count = old stack.count + 1
  • stack_not_empty: stack.count > 0

emit_match_text (text: STRING)

Push the item that matches the 'text' [..] -> [.., text]

ensure

  • stack_count_increased_by_one: stack.count = old stack.count + 1
  • stack_not_empty: stack.count > 0

begin_collect

Begin to collect a collection of items by pushing Void on the stack. After calling 'begin_collect', one of the features 'end_collect_or' or 'end_collect_and' have to be called. That kind of group is intended to manage the collections of alternatives or sequences in an optimal way. [..] -> [.., Void]

ensure

  • has_collect: stack.fast_occurrences(Void) > 0
  • emit_group_empty: stack.last = Void
  • emit_group_count_incremented: stack.fast_occurrences(Void) = old stack.fast_occurrences(Void) + 1

is_collect_empty: BOOLEAN

True if currently begun collect is empty

require

  • is_collecting: stack.fast_occurrences(Void) > 0

ensure

  • definition: Result = (stack.last = Void)

end_collect_true

Replace an empty collection by TRUE [.., Void] -> [.., TRUE]

require

  • is_collecting: stack.fast_occurrences(Void) > 0
  • collect_empty: is_collect_empty

end_collect_or

Collects the item on the stack until the collect mark (a Void) and replace it by a single item that is a or of all of them. The collection must not be empty. The order of evaluation will remain. The binary or's tree is recurssive on right for efficiency. [.., Void, X] -> [.., X] [.., Void, Y, X] -> [.., Y or X] [.., Void, Z, Y, X] -> [.., Z or (Y or X)] ...

require

  • is_collecting: stack.fast_occurrences(Void) > 0
  • collect_not_empty: not is_collect_empty

ensure

  • stack_not_empty: stack.count > 0
  • emit_group_count_decremented: stack.fast_occurrences(Void) = old stack.fast_occurrences(Void) - 1

end_collect_and

Collects the item on the stack until the collect mark (a Void) and replace it by a single item that is a and of all of them. The collection must not be empty. The order of evaluation will remain. The binary and's tree is recursive on right for efficiency. [.., Void, X] -> [.., X] [.., Void, Y, X] -> [.., Y and X] [.., Void, Z, Y, X] -> [.., Z and (Y and X)] ...

require

  • is_collecting: stack.fast_occurrences(Void) > 0
  • collect_not_empty: not is_collect_empty

ensure

  • stack_not_empty: stack.count > 0
  • emit_group_count_decremented: stack.fast_occurrences(Void) = old stack.fast_occurrences(Void) - 1

emit_not

Replaces the top of the stack by its negation. [.., X] -> [.., not(X)] (where not(X) is like (X and (CUT and FALSE)) or TRUE)

require

  • enougth_data: stack.count > 0

ensure

  • constant_stack_count: stack.count = old stack.count
  • stack_not_empty: stack.count > 0

emit_not_then_any

Replaces the top of the stack by its negation followed by any. [.., X] -> [.., not(X)] (where not(X) is like (X and (CUT and FALSE)) or ANY)

require

  • enougth_data: stack.count > 0

ensure

  • constant_stack_count: stack.count = old stack.count
  • stack_not_empty: stack.count > 0

emit_true_or

Replaces the top of the stack by true or it [.., X] -> [.., true or X]

require

  • enougth_data: stack.count > 0

ensure

  • constant_stack_count: stack.count = old stack.count
  • stack_not_empty: stack.count > 0

emit_or_true

Replaces the top of the stack by it or true [.., X] -> [.., X or true]

require

  • enougth_data: stack.count > 0

ensure

  • constant_stack_count: stack.count = old stack.count
  • stack_not_empty: stack.count > 0

emit_controled_or_true

Replaces the top of the stack by if is_greedy then [.., X] -> [.., X or true]

             else [.., X] -> [.., true or X]

controled_or_true_item (x: BACKTRACKING_NODE): BACKTRACKING_NODE

Returns an item for " 'x' or true ". The returned item depend on the flag 'is_greedy'. if is_greedy then Result = (X or true)

             else Result = (true or X)

emit_looking

require

  • enougth_data: stack.count > 0
  • is_looking: is_looking_around

ensure

  • constant_stack_count: stack.count = old stack.count
  • stack_not_empty: stack.count > 0

make

Initialise the attributes.

scanned_string: STRING

The expression being currently build.

set_scanned_string (string: STRING)

Set the 'scanned_string' with 'string'.

ensure

  • has_no_error: not has_error
  • definition: scanned_string = string
  • at_the_begin: position = scanned_string.lower

has_error: BOOLEAN

True when an error was encountered

clear_error

Remove the error flag

ensure

  • has_no_error: not has_error

last_error: STRING

Returns a string recorded for the error.

require

  • has_error: has_error

ensure

  • not_void: Result /= Void

set_error (message: STRING)

Set has_error and last_error. The explaining error string 'last_error' is created as follow: "Error at position 'position': 'message'.".

require

  • message_not_void: message /= Void
  • has_no_error: not has_error

ensure

  • has_error: has_error

position: INTEGER

The scanned position. It is the position of 'last_character'.

last_character: CHARACTER

The scanned character. The last character readden from 'scanned_string'.

valid_last_character: BOOLEAN

True when 'last_character' is valid. Is like 'scanned_string.valid_index(position)'

valid_previous_character: BOOLEAN

True if the position-1 is a valid position.

require

  • scanned_string /= Void

ensure

  • definition: Result = scanned_string.valid_index(position - 1)

previous_character: CHARACTER

The character at position-1.

require

  • valid_previous_character

ensure

  • definition: Result = scanned_string.item(position - 1)

valid_next_character: BOOLEAN

True if the position+1 is a valid position.

require

  • scanned_string /= Void

ensure

  • definition: Result = scanned_string.valid_index(position + 1)

next_character: CHARACTER

The character at position+1.

require

  • valid_next_character

ensure

  • definition: Result = scanned_string.item(position + 1)

end_of_input: BOOLEAN

True when all the characters of 'scanned_string' are scanned.

ensure

  • implies_last_character_not_valid: Result implies not valid_last_character

goto_position (pos: INTEGER)

Change the currently scanned position to 'pos'. Updates 'last_character' and 'valid_last_character' to reflect the new position value.

require

  • has_no_error: not has_error
  • scanned_string /= Void

ensure

  • has_no_error: not has_error
  • position_set: position = pos
  • validity_updated: valid_last_character = scanned_string.valid_index(position)
  • character_updated: valid_last_character implies last_character = scanned_string.item(position)

read_character

Reads the next character.

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input

ensure

  • next_position: position > old position
  • has_no_error: not has_error

read_integer

Reads an integer value beginning at the currently scanned position. The readen value is stored in 'last_integer'.

require

  • has_no_error: not has_error
  • not_at_end: not end_of_input
  • begin_with_a_digit: last_character.is_decimal_digit

ensure

  • has_no_error: not has_error
  • digits_eaten: end_of_input or else not last_character.is_decimal_digit

saved_position: INTEGER

The saved position (only one is currently enougth).

save_position

Saves the current scanning position.

require

  • not_at_end: not end_of_input

ensure

  • not_at_end: not end_of_input
  • position_kept: position = old position
  • saved_position_set: saved_position = position

restore_saved_position

Restore the scanning position to the last saved one.

ensure

  • position_restored: position = old saved_position
  • not_at_end: not end_of_input

last_string: STRING

A string buffer.

last_integer: INTEGER

An integer buffer.

the_any_character_item: REGULAR_EXPRESSION_ITEM_ANY
the_not_end_of_line_item: REGULAR_EXPRESSION_ITEM_NOT_END_OF_LINE
the_begin_of_line_item: REGULAR_EXPRESSION_ITEM_BEGIN_OF_LINE
the_end_of_line_item: REGULAR_EXPRESSION_ITEM_END_OF_LINE
the_begin_of_text_item: REGULAR_EXPRESSION_ITEM_BEGIN_OF_TEXT
the_real_end_of_text_item: REGULAR_EXPRESSION_ITEM_END_OF_TEXT
the_end_of_text_item: REGULAR_EXPRESSION_ITEM_END_OF_TEXT
the_begin_of_word_item: REGULAR_EXPRESSION_ITEM_BEGIN_OF_WORD
the_end_of_word_item: REGULAR_EXPRESSION_ITEM_END_OF_WORD
the_is_posix_alnum_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_ALNUM
the_is_posix_alpha_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_ALPHA
the_is_posix_ascii_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_ASCII
the_is_posix_blank_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_BLANK
the_is_posix_cntrl_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_CNTRL
the_is_posix_digit_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_DIGIT
the_is_posix_graph_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_GRAPH
the_is_posix_lower_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_LOWER
the_is_posix_print_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_PRINT
the_is_posix_punct_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_PUNCT
the_is_posix_space_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_SPACE
the_is_posix_upper_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_UPPER
the_is_posix_word_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_WORD
the_is_posix_xdigit_item: REGULAR_EXPRESSION_ITEM_IS_POSIX_XDIGIT
has_named_posix_item (name: STRING): BOOLEAN

True if 'name' is for a valid posix character class

require

  • name_not_void: name /= Void

named_posix_item (name: STRING): REGULAR_EXPRESSION_ITEM

the item for the valid posix character class 'name'

require

  • name_not_void: name /= Void
  • good_name: has_named_posix_item(name)

ensure

  • good_result: Result /= Void

has_named_perl_item (name: STRING): BOOLEAN

True if 'name' is for a valid perl character class

require

  • name_not_void: name /= Void

named_perl_item (name: STRING): REGULAR_EXPRESSION_ITEM

the item for the valid perl character class 'name'

require

  • name_not_void: name /= Void
  • good_name: has_named_perl_item(name)

ensure

  • good_result: Result /= Void

internal_named_posix_item (name: STRING): REGULAR_EXPRESSION_ITEM

the item for a presumed posix character class 'name'

require

  • name_not_void: name /= Void

internal_named_perl_item (name: STRING): REGULAR_EXPRESSION_ITEM

the item for a presumed perl character class 'name'

require

  • name_not_void: name /= Void

the_cut_node: BACKTRACKING_NODE_CUT
the_true_node: BACKTRACKING_NODE_TRUE
the_false_node: BACKTRACKING_NODE_FALSE
the_cut_and_false_node: BACKTRACKING_NODE_CUT_AND_FALSE

Class invariant