Package com.ibm.icu.impl.breakiter
Class MlBreakEngine
java.lang.Object
com.ibm.icu.impl.breakiter.MlBreakEngine
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate UnicodeSet
private UnicodeSet
private int
private static final int
-
Constructor Summary
ConstructorsConstructorDescriptionMlBreakEngine
(UnicodeSet digitOrOpenPunctuationOrAlphabetSet, UnicodeSet closePunctuationSet) Constructor for Chinese and Japanese phrase breaking. -
Method Summary
Modifier and TypeMethodDescriptionint
divideUpRange
(CharacterIterator inText, int startPos, int endPos, CharacterIterator inString, int codePointLength, int[] charPositions, DictionaryBreakEngine.DequeI foundBreaks) Divide up a range of characters handled by this break engine.private void
evaluateBreakpoint
(String inputStr, int[] indexList, int startIdx, int numCodeUnits, ArrayList<Integer> boundary) Evaluate whether the breakpointIdx is a potential breakpoint.private int
initIndexList
(CharacterIterator inString, int[] indexList, int codePointLength) Initialize the index list from the input string.private void
initKeyValue
(UResourceBundle rb, String keyName, String valueName, HashMap<String, Integer> map) In the machine learning's model file, specify the name of the key and value to load the corresponding feature and its score.private void
Load the machine learning's model file.private String
transform
(CharacterIterator inString) Transform a CharacterIterator into a String.
-
Field Details
-
MAX_FEATURE
private static final int MAX_FEATURE- See Also:
-
fDigitOrOpenPunctuationOrAlphabetSet
-
fClosePunctuationSet
-
fModel
-
fNegativeSum
private int fNegativeSum
-
-
Constructor Details
-
MlBreakEngine
public MlBreakEngine(UnicodeSet digitOrOpenPunctuationOrAlphabetSet, UnicodeSet closePunctuationSet) Constructor for Chinese and Japanese phrase breaking.- Parameters:
digitOrOpenPunctuationOrAlphabetSet
- An unicode set with the digit and open punctuation and alphabet.closePunctuationSet
- An unicode set with the close punctuation.
-
-
Method Details
-
divideUpRange
public int divideUpRange(CharacterIterator inText, int startPos, int endPos, CharacterIterator inString, int codePointLength, int[] charPositions, DictionaryBreakEngine.DequeI foundBreaks) Divide up a range of characters handled by this break engine.- Parameters:
inText
- An input text.startPos
- The start index of the input text.endPos
- The end index of the input text.inString
- A input string normalized from inText from startPos to endPoscodePointLength
- The number of code points of inStringcharPositions
- A map that transforms inString's code point index to code unit index.foundBreaks
- A list to store the breakpoint.- Returns:
- The number of breakpoints
-
transform
Transform a CharacterIterator into a String. -
evaluateBreakpoint
private void evaluateBreakpoint(String inputStr, int[] indexList, int startIdx, int numCodeUnits, ArrayList<Integer> boundary) Evaluate whether the breakpointIdx is a potential breakpoint.- Parameters:
inputStr
- An input string to be segmented.indexList
- A code unit index list of the inputStr.startIdx
- The start index of the indexList.numCodeUnits
- The current code unit boundary of the indexList.boundary
- A list including the index of the breakpoint.
-
initIndexList
Initialize the index list from the input string.- Parameters:
inString
- An input string to be segmented.indexList
- A code unit index list of the inString.codePointLength
- The number of code points of the input string- Returns:
- The number of the code units of the first six characters in inString.
-
loadMLModel
private void loadMLModel()Load the machine learning's model file. -
initKeyValue
private void initKeyValue(UResourceBundle rb, String keyName, String valueName, HashMap<String, Integer> map) In the machine learning's model file, specify the name of the key and value to load the corresponding feature and its score.- Parameters:
rb
- A RedouceBundle corresponding to the model file.keyName
- The kay name in the model file.valueName
- The value name in the model file.map
- A HashMap to store the pairs of the feature and its score.
-