public final class HMMChineseTokenizerFactory extends TokenizerFactory
HMMChineseTokenizer
Note: this class will currently emit tokens for punctuation. So you should either add
a WordDelimiterFilter after to remove these (with concatenate off), or use the
SmartChinese stoplist with a StopFilterFactory via:
words="org/apache/lucene/analysis/cn/smart/stopwords.txt"
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
Constructor and Description |
---|
HMMChineseTokenizerFactory(java.util.Map<java.lang.String,java.lang.String> args)
Creates a new HMMChineseTokenizerFactory
|
Modifier and Type | Method and Description |
---|---|
Tokenizer |
create(AttributeFactory factory)
Creates a TokenStream of the specified input using the given AttributeFactory
|
availableTokenizers, create, forName, lookupClass, reloadTokenizers
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitFileNames
public HMMChineseTokenizerFactory(java.util.Map<java.lang.String,java.lang.String> args)
public Tokenizer create(AttributeFactory factory)
TokenizerFactory
create
in class TokenizerFactory
Copyright © 2000–2018 The Apache Software Foundation. All rights reserved.