Class TrieIterator
- All Implemented Interfaces:
RangeValueIterator
Class enabling iteration of the values in a Trie.
2015-sep-03 TODO: Only used in test code, move there.
Result of each iteration contains the interval of codepoints that have the same value type and the value type itself.
The comparison of each codepoint value is done via extract(), which the default implementation is to return the value as it is.
Method extract() can be overwritten to perform manipulations on codepoint values in order to perform specialized comparison.
TrieIterator is designed to be a generic iterator for the CharTrie and the IntTrie, hence to accommodate both types of data, the return result will be in terms of int (32 bit) values.
See com.ibm.icu.text.UCharacterTypeIterator for examples of use.
Notes for porting utrie_enum from icu4c to icu4j:
Internally, icu4c's utrie_enum performs all iterations in its body. In Java
sense, the caller will have to pass a object with a callback function
UTrieEnumRange(const void *context, UChar32 start, UChar32 limit,
uint32_t value) into utrie_enum. utrie_enum will then find ranges of
codepoints with the same value as determined by
UTrieEnumValue(const void *context, uint32_t value). for each range,
utrie_enum calls the callback function to perform a task. In this way,
icu4c performs the iteration within utrie_enum.
To follow the JDK model, icu4j is slightly different from icu4c.
Instead of requesting the caller to implement an object for a callback.
The caller will have to implement a subclass of TrieIterator, fleshing out
the method extract(int) (equivalent to UTrieEnumValue). Independent of icu4j,
the caller will have to code his own iteration and flesh out the task
(equivalent to UTrieEnumRange) to be performed in the iteration loop.
There are basically 3 usage scenarios for porting:
1) UTrieEnumValue is the only implemented callback then just implement a subclass of TrieIterator and override the extract(int) method. The extract(int) method is analogous to UTrieEnumValue callback.
2) UTrieEnumValue and UTrieEnumRange both are implemented then implement a subclass of TrieIterator, override the extract method and iterate, e.g
utrie_enum(invalid input: '&normTrie', _enumPropertyStartsValue, _enumPropertyStartsRange,
set);
In Java :
class TrieIteratorImpl extends TrieIterator{ public TrieIteratorImpl(Trie data){ super(data); } public int extract(int value){ // port the implementation of _enumPropertyStartsValue here } } .... TrieIterator fcdIter = new TrieIteratorImpl(fcdTrieImpl.fcdTrie); while(fcdIter.next(result)) { // port the implementation of _enumPropertyStartsRange }
3) UTrieEnumRange is the only implemented callback then just implement the while loop, when utrie_enum is called
// utrie_enum(invalid input: '&fcdTrie', NULL, _enumPropertyStartsRange, set);
TrieIterator fcdIter = new TrieIterator(fcdTrieImpl.fcdTrie);
while(fcdIter.next(result)){
set.add(result.start);
}
- Since:
- release 2.1, Jan 17 2002
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface com.ibm.icu.util.RangeValueIterator
RangeValueIterator.Element
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final int
Size of the stage 1 BMP indexesprivate static final int
Number of data values in a stage 2 (data array) block.private static final int
Lead surrogate minimum valueprivate int
Next element results and data.private int
Initial value for trie valuesprivate int
private int
private int
private int
private int
private int
private Trie
Trie instanceprivate static final int
Number of trail surrogateprivate static final int
Number of stage 1 indexes for supplementary calculations that maps to each lead surrogate character.private static final int
Trail surrogate minimum value -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate final boolean
Finding the next element.private final void
Finds the next supplementary element.private final boolean
checkBlock
(int currentValue) Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ that has the same value.private final boolean
checkBlockDetail
(int currentValue) Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ after the index m_nextBlockIndex_ that has the same value.private final boolean
Checks if we are beginning at the start of a initial block.private final boolean
checkTrailBlock
(int currentValue) Internal block value calculations Performs calculations on multiple data blocks for a set of trail surrogates to find codepoints in m_nextBlock_ that has the same value.protected int
extract
(int value) Called by next() to extracts a 32 bit value from a trie value used for comparison.final boolean
next
(RangeValueIterator.Element element) Returns true if we are not at the end of the iteration, false otherwise.final void
reset()
Resets the iterator to the beginning of the iterationprivate final void
setResult
(RangeValueIterator.Element element, int start, int limit, int value) Set the result values
-
Field Details
-
BMP_INDEX_LENGTH_
private static final int BMP_INDEX_LENGTH_Size of the stage 1 BMP indexes- See Also:
-
LEAD_SURROGATE_MIN_VALUE_
private static final int LEAD_SURROGATE_MIN_VALUE_Lead surrogate minimum value- See Also:
-
TRAIL_SURROGATE_MIN_VALUE_
private static final int TRAIL_SURROGATE_MIN_VALUE_Trail surrogate minimum value- See Also:
-
TRAIL_SURROGATE_COUNT_
private static final int TRAIL_SURROGATE_COUNT_Number of trail surrogate- See Also:
-
TRAIL_SURROGATE_INDEX_BLOCK_LENGTH_
private static final int TRAIL_SURROGATE_INDEX_BLOCK_LENGTH_Number of stage 1 indexes for supplementary calculations that maps to each lead surrogate character. See second pass into getRawOffset for the trail surrogate character. 10 for significant number of bits for trail surrogates, 5 for what we discard during shifting.- See Also:
-
DATA_BLOCK_LENGTH_
private static final int DATA_BLOCK_LENGTH_Number of data values in a stage 2 (data array) block.- See Also:
-
m_trie_
Trie instance -
m_initialValue_
private int m_initialValue_Initial value for trie values -
m_currentCodepoint_
private int m_currentCodepoint_Next element results and data. -
m_nextCodepoint_
private int m_nextCodepoint_ -
m_nextValue_
private int m_nextValue_ -
m_nextIndex_
private int m_nextIndex_ -
m_nextBlock_
private int m_nextBlock_ -
m_nextBlockIndex_
private int m_nextBlockIndex_ -
m_nextTrailIndexOffset_
private int m_nextTrailIndexOffset_
-
-
Constructor Details
-
TrieIterator
TrieEnumeration constructor- Parameters:
trie
- to be used- Throws:
IllegalArgumentException
- throw when argument is null.
-
-
Method Details
-
next
Returns true if we are not at the end of the iteration, false otherwise.
The next set of codepoints with the same value type will be calculated during this call and returned in the argument element.
- Specified by:
next
in interfaceRangeValueIterator
- Parameters:
element
- return result- Returns:
- true if we are not at the end of the iteration, false otherwise.
- Throws:
NoSuchElementException
- - if no more elements exist.- See Also:
-
reset
public final void reset()Resets the iterator to the beginning of the iteration- Specified by:
reset
in interfaceRangeValueIterator
-
extract
protected int extract(int value) Called by next() to extracts a 32 bit value from a trie value used for comparison. This method is to be overwritten if special manipulation is to be done to retrieve a relevant comparison. The default function is to return the value as it is.- Parameters:
value
- a value from the trie- Returns:
- extracted value
-
setResult
Set the result values- Parameters:
element
- return result objectstart
- codepoint of rangelimit
- (end + 1) codepoint of rangevalue
- common value of range
-
calculateNextBMPElement
Finding the next element. This method is called just before returning the result of next(). We always store the next element before it is requested. In the case that we have to continue calculations into the supplementary planes, a false will be returned.- Parameters:
element
- return result object- Returns:
- true if the next range is found, false if we have to proceed to the supplementary range.
-
calculateNextSupplementaryElement
Finds the next supplementary element. For each entry in the trie, the value to be delivered is passed through extract(). We always store the next element before it is requested. Called after calculateNextBMP() completes its round of BMP characters. There is a slight difference in the usage of m_currentCodepoint_ here as compared to calculateNextBMP(). Though both represents the lower bound of the next element, in calculateNextBMP() it gets set at the start of any loop, where-else, in calculateNextSupplementary() since m_currentCodepoint_ already contains the lower bound of the next element (passed down from calculateNextBMP()), we keep it till the end before resetting it to the new value. Note, if there are no more iterations, it will never get to here. Blocked out by next().- Parameters:
element
- return result object
-
checkBlockDetail
private final boolean checkBlockDetail(int currentValue) Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ after the index m_nextBlockIndex_ that has the same value. Note m_*_ variables at this point is the next codepoint whose value has not been calculated. But when returned with false, it will be the last codepoint whose value has been calculated.- Parameters:
currentValue
- the value which other codepoints are tested against- Returns:
- true if the whole block has the same value as currentValue or if the whole block has been calculated, false otherwise.
-
checkBlock
private final boolean checkBlock(int currentValue) Internal block value calculations Performs calculations on a data block to find codepoints in m_nextBlock_ that has the same value. Will call checkBlockDetail() if highlevel check fails. Note m_*_ variables at this point is the next codepoint whose value has not been calculated.- Parameters:
currentValue
- the value which other codepoints are tested againstcurrentBlock
- the initial block containing all currentValue- Returns:
- true if the whole block has the same value as currentValue or if the whole block has been calculated, false otherwise.
-
checkTrailBlock
private final boolean checkTrailBlock(int currentValue) Internal block value calculations Performs calculations on multiple data blocks for a set of trail surrogates to find codepoints in m_nextBlock_ that has the same value. Will call checkBlock() for internal block checks. Note m_*_ variables at this point is the next codepoint whose value has not been calculated.- Parameters:
currentValue
- the value which other codepoints are tested against- Returns:
- true if the whole block has the same value as currentValue or if the whole block has been calculated, false otherwise.
-
checkNullNextTrailIndex
private final boolean checkNullNextTrailIndex()Checks if we are beginning at the start of a initial block. If we are then the rest of the codepoints in this initial block has the same values. We increment m_nextCodepoint_ and relevant data members if so. This is used only in for the supplementary codepoints because the offset to the trail indexes could be 0.- Returns:
- true if we are at the start of a initial block.
-