Class TransliteratorParser

java.lang.Object
com.ibm.icu.text.TransliteratorParser

class TransliteratorParser extends Object
  • Field Details

    • dataVector

      public List<RuleBasedTransliterator.Data> dataVector
      PUBLIC data member. A Vector of RuleBasedTransliterator.Data objects, one for each discrete group of rules in the rule set
    • idBlockVector

      public List<String> idBlockVector
      PUBLIC data member. A Vector of Strings containing all of the ID blocks in the rule set
    • curData

      The current data object for which we are parsing rules
    • compoundFilter

      public UnicodeSet compoundFilter
      PUBLIC data member containing the parsed compound filter, if any.
    • direction

      private int direction
    • parseData

      private TransliteratorParser.ParseData parseData
      Temporary symbol table used during parsing.
    • variablesVector

      private List<Object> variablesVector
      Temporary vector of set variables. When parsing is complete, this is copied into the array data.variables. As with data.variables, element 0 corresponds to character data.variablesBase.
    • variableNames

      private Map<String,char[]> variableNames
      Temporary table of variable names. When parsing is complete, this is copied into data.variableNames.
    • segmentStandins

      private StringBuffer segmentStandins
      String of standins for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc.
    • segmentObjects

      private List<StringMatcher> segmentObjects
      Vector of StringMatcher objects for segments. Used during the parsing of a single rule. segmentStandins.charAt(0) is the standin for "$1" and corresponds to StringMatcher object segmentObjects.elementAt(0), etc.
    • variableNext

      private char variableNext
      The next available stand-in for variables. This starts at some point in the private use area (discovered dynamically) and increments up toward variableLimit. At any point during parsing, available variables are variableNext..variableLimit-1.
    • variableLimit

      private char variableLimit
      The last available stand-in for variables. This is discovered dynamically. At any point during parsing, available variables are variableNext..variableLimit-1. During variable definition we use the special value variableLimit-1 as a placeholder.
    • undefinedVariableName

      private String undefinedVariableName
      When we encounter an undefined variable, we do not immediately signal an error, in case we are defining this variable, e.g., "$a = [a-z];". Instead, we save the name of the undefined variable, and substitute in the placeholder char variableLimit - 1, and decrement variableLimit.
    • dotStandIn

      private int dotStandIn
      The stand-in character for the 'dot' set, represented by '.' in patterns. This is allocated the first time it is needed, and reused thereafter.
    • ID_TOKEN

      private static final String ID_TOKEN
      See Also:
    • ID_TOKEN_LEN

      private static final int ID_TOKEN_LEN
      See Also:
    • VARIABLE_DEF_OP

      private static final char VARIABLE_DEF_OP
      See Also:
    • FORWARD_RULE_OP

      private static final char FORWARD_RULE_OP
      See Also:
    • REVERSE_RULE_OP

      private static final char REVERSE_RULE_OP
      See Also:
    • FWDREV_RULE_OP

      private static final char FWDREV_RULE_OP
      See Also:
    • OPERATORS

      private static final String OPERATORS
      See Also:
    • HALF_ENDERS

      private static final String HALF_ENDERS
      See Also:
    • QUOTE

      private static final char QUOTE
      See Also:
    • ESCAPE

      private static final char ESCAPE
      See Also:
    • END_OF_RULE

      private static final char END_OF_RULE
      See Also:
    • RULE_COMMENT_CHAR

      private static final char RULE_COMMENT_CHAR
      See Also:
    • CONTEXT_ANTE

      private static final char CONTEXT_ANTE
      See Also:
    • CONTEXT_POST

      private static final char CONTEXT_POST
      See Also:
    • CURSOR_POS

      private static final char CURSOR_POS
      See Also:
    • CURSOR_OFFSET

      private static final char CURSOR_OFFSET
      See Also:
    • ANCHOR_START

      private static final char ANCHOR_START
      See Also:
    • KLEENE_STAR

      private static final char KLEENE_STAR
      See Also:
    • ONE_OR_MORE

      private static final char ONE_OR_MORE
      See Also:
    • ZERO_OR_ONE

      private static final char ZERO_OR_ONE
      See Also:
    • DOT

      private static final char DOT
      See Also:
    • DOT_SET

      private static final String DOT_SET
      See Also:
    • SEGMENT_OPEN

      private static final char SEGMENT_OPEN
      See Also:
    • SEGMENT_CLOSE

      private static final char SEGMENT_CLOSE
      See Also:
    • FUNCTION

      private static final char FUNCTION
      See Also:
    • ALT_REVERSE_RULE_OP

      private static final char ALT_REVERSE_RULE_OP
      See Also:
    • ALT_FORWARD_RULE_OP

      private static final char ALT_FORWARD_RULE_OP
      See Also:
    • ALT_FWDREV_RULE_OP

      private static final char ALT_FWDREV_RULE_OP
      See Also:
    • ALT_FUNCTION

      private static final char ALT_FUNCTION
      See Also:
    • ILLEGAL_TOP

      private static UnicodeSet ILLEGAL_TOP
    • ILLEGAL_SEG

      private static UnicodeSet ILLEGAL_SEG
    • ILLEGAL_FUNC

      private static UnicodeSet ILLEGAL_FUNC
  • Constructor Details

    • TransliteratorParser

      public TransliteratorParser()
      Constructor.
  • Method Details

    • parse

      public void parse(String rules, int dir)
      Parse a set of rules. After the parse completes, examine the public data members for results.
    • parseRules

      void parseRules(TransliteratorParser.RuleBody ruleArray, int dir)
      Parse an array of zero or more rules. The strings in the array are treated as if they were concatenated together, with rule terminators inserted between array elements if not present already. Any previous rules are discarded. Typically this method is called exactly once, during construction. The member this.data will be set to null if there are no rules.
      Throws:
      IllegalIcuArgumentException - if there is a syntax error in the rules
    • parseRule

      private int parseRule(String rule, int pos, int limit)
      MAIN PARSER. Parse the next rule in the given rule string, starting at pos. Return the index after the last character parsed. Do not parse characters at or after limit. Important: The character at pos must be a non-whitespace character that is not the comment character. This method handles quoting, escaping, and whitespace removal. It parses the end-of-rule character. It recognizes context and cursor indicators. Once it does a lexical breakdown of the rule at pos, it creates a rule object and adds it to our rule list. This method is tightly coupled to the inner class RuleHalf.
    • setVariableRange

      private void setVariableRange(int start, int end)
      Set the variable range to [start, end] (inclusive).
    • checkVariableRange

      private void checkVariableRange(int ch, String rule, int start)
      Assert that the given character is NOT within the variable range. If it is, signal an error. This is necessary to ensure that the variable range does not overlap characters used in a rule.
    • pragmaMaximumBackup

      private void pragmaMaximumBackup(int backup)
      Set the maximum backup to 'backup', in response to a pragma statement.
    • pragmaNormalizeRules

      private void pragmaNormalizeRules(Normalizer.Mode mode)
      Begin normalizing all rules using the given mode, in response to a pragma statement.
    • resemblesPragma

      static boolean resemblesPragma(String rule, int pos, int limit)
      Return true if the given rule looks like a pragma.
      Parameters:
      pos - offset to the first non-whitespace character of the rule.
      limit - pointer past the last character of the rule.
    • parsePragma

      private int parsePragma(String rule, int pos, int limit)
      Parse a pragma. This method assumes resemblesPragma() has already returned true.
      Parameters:
      pos - offset to the first non-whitespace character of the rule.
      limit - pointer past the last character of the rule.
      Returns:
      the position index after the final ';' of the pragma, or -1 on failure.
    • syntaxError

      static final void syntaxError(String msg, String rule, int start)
      Throw an exception indicating a syntax error. Search the rule string for the probable end of the rule. Of course, if the error is that the end of rule marker is missing, then the rule end will not be found. In any case the rule start will be correctly reported.
      Parameters:
      msg - error description
      rule - pattern string
      start - position of first character of current rule
    • ruleEnd

      static final int ruleEnd(String rule, int start, int limit)
    • parseSet

      private final char parseSet(String rule, ParsePosition pos)
      Parse a UnicodeSet out, store it, and return the stand-in character used to represent it.
    • generateStandInFor

      char generateStandInFor(Object obj)
      Generate and return a stand-in for a new UnicodeMatcher or UnicodeReplacer. Store the object.
    • getSegmentStandin

      public char getSegmentStandin(int seg)
      Return the standin for segment seg (1-based).
    • setSegmentObject

      public void setSegmentObject(int seg, StringMatcher obj)
      Set the object for segment seg (1-based).
    • getDotStandIn

      char getDotStandIn()
      Return the stand-in for the dot set. It is allocated the first time and reused thereafter.
    • appendVariableDef

      private void appendVariableDef(String name, StringBuffer buf)
      Append the value of the given variable name to the given StringBuffer.
      Throws:
      IllegalIcuArgumentException - if the name is unknown.