Class StringMatcher

java.lang.Object
com.ibm.icu.text.StringMatcher
All Implemented Interfaces:
UnicodeMatcher, UnicodeReplacer

class StringMatcher extends Object implements UnicodeMatcher, UnicodeReplacer
An object that matches a fixed input string, implementing the UnicodeMatcher API. This object also implements the UnicodeReplacer API, allowing it to emit the matched text as output. Since the match text may contain flexible match elements, such as UnicodeSets, the emitted text is not the match pattern, but instead a substring of the actual matched text. Following convention, the output text is the leftmost match seen up to this point. A StringMatcher may represent a segment, in which case it has a positive segment number. This affects how the matcher converts itself to a pattern but does not otherwise affect its function. A StringMatcher that is not a segment should not be used as a UnicodeReplacer.
  • Field Details

    • pattern

      private String pattern
      The text to be matched.
    • matchStart

      private int matchStart
      Start offset, in the match text, of the rightmost match.
    • matchLimit

      private int matchLimit
      Limit offset, in the match text, of the rightmost match.
    • segmentNumber

      private int segmentNumber
      The segment number, 1-based, or 0 if not a segment.
    • data

      private final RuleBasedTransliterator.Data data
      Context object that maps stand-ins to matcher and replacer objects.
  • Constructor Details

    • StringMatcher

      public StringMatcher(String theString, int segmentNum, RuleBasedTransliterator.Data theData)
      Construct a matcher that matches the given pattern string.
      Parameters:
      theString - the pattern to be matched, possibly containing stand-ins that represent nested UnicodeMatcher objects.
      segmentNum - the segment number from 1..n, or 0 if this is not a segment.
      theData - context object mapping stand-ins to UnicodeMatcher objects.
    • StringMatcher

      public StringMatcher(String theString, int start, int limit, int segmentNum, RuleBasedTransliterator.Data theData)
      Construct a matcher that matches a substring of the given pattern string.
      Parameters:
      theString - the pattern to be matched, possibly containing stand-ins that represent nested UnicodeMatcher objects.
      start - first character of theString to be matched
      limit - index after the last character of theString to be matched.
      segmentNum - the segment number from 1..n, or 0 if this is not a segment.
      theData - context object mapping stand-ins to UnicodeMatcher objects.
  • Method Details

    • matches

      public int matches(Replaceable text, int[] offset, int limit, boolean incremental)
      Implement UnicodeMatcher
      Specified by:
      matches in interface UnicodeMatcher
      Parameters:
      text - the text to be matched
      offset - on input, the index into text at which to begin matching. On output, the limit of the matched text. The number of matched characters is the output value of offset minus the input value. Offset should always point to the HIGH SURROGATE (leading code unit) of a pair of surrogates, both on entry and upon return.
      limit - the limit index of text to be matched. Greater than offset for a forward direction match, less than offset for a backward direction match. The last character to be considered for matching will be text.charAt(limit-1) in the forward direction or text.charAt(limit+1) in the backward direction.
      incremental - if true, then assume further characters may be inserted at limit and check for partial matching. Otherwise assume the text as given is complete.
      Returns:
      a match degree value indicating a full match, a partial match, or a mismatch. If incremental is false then U_PARTIAL_MATCH should never be returned.
    • toPattern

      public String toPattern(boolean escapeUnprintable)
      Implement UnicodeMatcher
      Specified by:
      toPattern in interface UnicodeMatcher
      Parameters:
      escapeUnprintable - if true then convert unprintable character to their hex escape representations, \\uxxxx or \\Uxxxxxxxx. Unprintable characters are those other than U+000A, U+0020..U+007E.
    • matchesIndexValue

      public boolean matchesIndexValue(int v)
      Implement UnicodeMatcher
      Specified by:
      matchesIndexValue in interface UnicodeMatcher
    • addMatchSetTo

      public void addMatchSetTo(UnicodeSet toUnionTo)
      Implementation of UnicodeMatcher API. Union the set of all characters that may be matched by this object into the given set.
      Specified by:
      addMatchSetTo in interface UnicodeMatcher
      Parameters:
      toUnionTo - the set into which to union the source characters
    • replace

      public int replace(Replaceable text, int start, int limit, int[] cursor)
      UnicodeReplacer API
      Specified by:
      replace in interface UnicodeReplacer
      Parameters:
      text - the text to be matched
      start - inclusive start index of text to be replaced
      limit - exclusive end index of text to be replaced; must be greater than or equal to start
      cursor - output parameter for the cursor position. Not all replacer objects will update this, but in a complete tree of replacer objects, representing the entire output side of a transliteration rule, at least one must update it.
      Returns:
      the number of 16-bit code units in the text replacing the characters at offsets start..(limit-1) in text
    • toReplacerPattern

      public String toReplacerPattern(boolean escapeUnprintable)
      UnicodeReplacer API
      Specified by:
      toReplacerPattern in interface UnicodeReplacer
      Parameters:
      escapeUnprintable - if true then convert unprintable character to their hex escape representations, \\uxxxx or \\Uxxxxxxxx. Unprintable characters are defined by Utility.isUnprintable().
    • resetMatch

      public void resetMatch()
      Remove any match data. This must be called before performing a set of matches with this segment.
    • addReplacementSetTo

      public void addReplacementSetTo(UnicodeSet toUnionTo)
      Union the set of all characters that may output by this object into the given set.
      Specified by:
      addReplacementSetTo in interface UnicodeReplacer
      Parameters:
      toUnionTo - the set into which to union the output characters