table_tokenize command tokenizes text by the specified table's tokenizer.
table_tokenize command has required parameters and optional parameters. table and string are required parameters. Others are optional:
table_tokenize table
string
[flags=NONE]
[mode=GET]
Here is a simple example.
Execution example:
register token_filters/stop_word
# [[0,0.0,0.0],true]
table_create Terms TABLE_PAT_KEY ShortText \
--default_tokenizer TokenBigram \
--normalizer NormalizerAuto \
--token_filters TokenFilterStopWord
# [[0,0.0,0.0],true]
column_create Terms is_stop_word COLUMN_SCALAR Bool
# [[0,0.0,0.0],true]
load --table Terms
[
{"_key": "and", "is_stop_word": true}
]
# [[0,0.0,0.0],1]
table_tokenize Terms "Hello and Good-bye" --mode GET
# [
# [
# 0,
# 0.0,
# 0.0
# ],
# [
# {
# "value": "hello",
# "position": 0
# },
# {
# "value": "good",
# "position": 2
# },
# {
# "value": "-",
# "position": 3
# },
# {
# "value": "bye",
# "position": 4
# }
# ]
# ]
Terms table is set TokenBigram tokenizer, NormalizerAuto normalizer, TokenFilterStopWord token filter. It returns tokens that is generated by tokenizeing "Hello and Good-bye" with TokenBigram tokenizer. It is normalized by NormalizerAuto normalizer. and token is removed with TokenFilterStopWord token filter.
This section describes all parameters. Parameters are categorized.
There are required parameters, table and string.
It specifies the lexicon table. table_tokenize command uses the tokenizer, the normalizer, the token filters that is set the lexicon table.
table_tokenize command returns tokenized tokens.
See Return value option in tokenize about details.