src/libsphinxbase/lm/ngram_model_set.c File Reference

Set of language models. More...

#include "ngram_model_set.h"
#include <err.h>
#include <ckd_alloc.h>
#include <strfuncs.h>
#include <filename.h>
#include <string.h>
#include <stdlib.h>

Go to the source code of this file.

Functions

ngram_model_tngram_model_set_init (cmd_ln_t *config, ngram_model_t **models, char **names, const float32 *weights, int32 n_models)
 Create a set of language models sharing a common space of word IDs.
ngram_model_tngram_model_set_read (cmd_ln_t *config, const char *lmctlfile, logmath_t *lmath)
 Read a set of language models from a control file.
int32 ngram_model_set_count (ngram_model_t *base)
 Returns the number of language models in a set.
ngram_model_set_iter_tngram_model_set_iter (ngram_model_t *base)
 Begin iterating over language models in a set.
ngram_model_set_iter_tngram_model_set_iter_next (ngram_model_set_iter_t *itor)
 Move to the next language model in a set.
void ngram_model_set_iter_free (ngram_model_set_iter_t *itor)
 Finish iteration over a langauge model set.
ngram_model_tngram_model_set_iter_model (ngram_model_set_iter_t *itor, char const **lmname)
 Get language model and associated name from an iterator.
ngram_model_tngram_model_set_lookup (ngram_model_t *base, const char *name)
 Look up a language model by name from a set.
ngram_model_tngram_model_set_select (ngram_model_t *base, const char *name)
 Select a single language model from a set for scoring.
const char * ngram_model_set_current (ngram_model_t *base)
 Get the current language model name, if any.
int32 ngram_model_set_current_wid (ngram_model_t *base, int32 set_wid)
 Query the word-ID mapping for the current language model.
int32 ngram_model_set_known_wid (ngram_model_t *base, int32 set_wid)
 Test whether a word ID corresponds to a known word in the current state of the language model set.
ngram_model_tngram_model_set_interp (ngram_model_t *base, const char **names, const float32 *weights)
 Set interpolation weights for a set and enables interpolation.
ngram_model_tngram_model_set_add (ngram_model_t *base, ngram_model_t *model, const char *name, float32 weight, int reuse_widmap)
 Add a language model to a set.
ngram_model_tngram_model_set_remove (ngram_model_t *base, const char *name, int reuse_widmap)
 Remove a language model from a set.
void ngram_model_set_map_words (ngram_model_t *base, const char **words, int32 n_words)
 Set the word-to-ID mapping for this model set.

Detailed Description

Set of language models.

Author:
David Huggins-Daines <dhuggins@cs.cmu.edu>

Definition in file ngram_model_set.c.


Function Documentation

ngram_model_t* ngram_model_set_add ( ngram_model_t set,
ngram_model_t model,
const char *  name,
float32  weight,
int  reuse_widmap 
)

Add a language model to a set.

Parameters:
set The language model set to add to.
model The language model to add.
name The name to associate with this model.
weight Interpolation weight for this model, relative to the uniform distribution. 1.0 is a safe value.
reuse_widmap Reuse the existing word-ID mapping in set. Any new words present in model will not be added to the word-ID mapping in this case.

Definition at line 519 of file ngram_model_set.c.

References ckd_calloc_2d, ckd_free_2d(), ckd_realloc, ckd_salloc, ngram_model_s::lmath, logmath_log(), ngram_model_s::n, ngram_model_s::n_words, ngram_wid(), and ngram_model_s::word_str.

int32 ngram_model_set_current_wid ( ngram_model_t set,
int32  set_wid 
)

Query the word-ID mapping for the current language model.

Returns:
the local word ID in the current language model, or NGRAM_INVALID_WID if set_wid is invalid or interpolation is enabled.

Definition at line 455 of file ngram_model_set.c.

References ngram_model_s::n_words, and NGRAM_INVALID_WID.

ngram_model_t* ngram_model_set_init ( cmd_ln_t config,
ngram_model_t **  models,
char **  names,
const float32 *  weights,
int32  n_models 
)

Create a set of language models sharing a common space of word IDs.

This function creates a meta-language model which groups together a set of language models, synchronizing word IDs between them. To use this language model, you can either select a submodel to use exclusively using ngram_model_set_select(), or interpolate between scores from all models. To do the latter, you can either pass a non-NULL value of the weights parameter, or re-activate interpolation later on by calling ngram_model_set_interp().

In order to make this efficient, there are some restrictions on the models that can be grouped together. The most important (and currently the only) one is that they must all share the same log-math parameters.

Parameters:
config Any configuration parameters to be shared between models.
models Array of pointers to previously created language models.
names Array of strings to use as unique identifiers for LMs.
weights Array of weights to use in interpolating LMs, or NULL for no interpolation.
n_models Number of elements in the arrays passed to this function.

Definition at line 121 of file ngram_model_set.c.

References ngram_model_set_s::base, ckd_calloc, ckd_salloc, ngram_model_set_s::cur, E_ERROR, ngram_model_s::lmath, ngram_model_set_s::lms, logmath_get_base(), logmath_get_shift(), logmath_log(), ngram_model_set_s::lweights, ngram_model_set_s::maphist, ngram_model_s::n, ngram_model_set_s::n_models, and ngram_model_set_s::names.

Referenced by ngram_model_set_read().

ngram_model_t* ngram_model_set_interp ( ngram_model_t set,
const char **  names,
const float32 *  weights 
)

Set interpolation weights for a set and enables interpolation.

If weights is NULL, any previously initialized set of weights will be used. If no weights were specified to ngram_model_set_init(), then a uniform distribution will be used.

Definition at line 488 of file ngram_model_set.c.

References E_ERROR, ngram_model_s::lmath, and logmath_log().

ngram_model_set_iter_t* ngram_model_set_iter ( ngram_model_t set  ) 

Begin iterating over language models in a set.

Returns:
iterator pointing to the first language model, or NULL if no models remain.

Definition at line 367 of file ngram_model_set.c.

References ckd_calloc.

ngram_model_t* ngram_model_set_iter_model ( ngram_model_set_iter_t itor,
char const **  lmname 
)

Get language model and associated name from an iterator.

Parameters:
itor the iterator
lmname Output: string name associated with this language model.
Returns:
Language model pointed to by this iterator.

Definition at line 396 of file ngram_model_set.c.

References ngram_model_set_s::lms, and ngram_model_set_s::names.

ngram_model_set_iter_t* ngram_model_set_iter_next ( ngram_model_set_iter_t itor  ) 

Move to the next language model in a set.

Returns:
iterator pointing to the next language model, or NULL if no models remain.

Definition at line 380 of file ngram_model_set.c.

References ngram_model_set_s::n_models, and ngram_model_set_iter_free().

int32 ngram_model_set_known_wid ( ngram_model_t set,
int32  set_wid 
)

Test whether a word ID corresponds to a known word in the current state of the language model set.

Returns:
If there is a current language model, returns non-zero if set_wid corresponds to a known word in that language model. Otherwise, returns non-zero if set_wid corresponds to a known word in any language model.

Definition at line 467 of file ngram_model_set.c.

References ngram_model_s::n_words, and ngram_unknown_wid().

ngram_model_t* ngram_model_set_lookup ( ngram_model_t set,
const char *  name 
)

Look up a language model by name from a set.

Returns:
language model corresponding to name, or NULL if no language model by that name exists.

Definition at line 404 of file ngram_model_set.c.

ngram_model_t* ngram_model_set_read ( cmd_ln_t config,
const char *  lmctlfile,
logmath_t lmath 
)

Read a set of language models from a control file.

This file creates a language model set from a "control file" of the type used in Sphinx-II and Sphinx-III. File format (optional stuff is indicated by enclosing in []):

   [{ LMClassFileName LMClassFilename ... }]
   TrigramLMFileName LMName [{ LMClassName LMClassName ... }]
   TrigramLMFileName LMName [{ LMClassName LMClassName ... }]
   ...
 (There should be whitespace around the { and } delimiters.)
 

This is an extension of the older format that had only TrigramLMFilenName and LMName pairs. The new format allows a set of LMClass files to be read in and referred to by the trigram LMs.

No "comments" allowed in this file.

Parameters:
config Configuration parameters.
lmctlfile Path to the language model control file.
lmath Log-math parameters to use for probability calculations. Ownership of this object is assumed by the newly created ngram_model_t, and you should not attempt to free it manually. If you wish to reuse it elsewhere, you must retain it with logmath_retain().
Returns:
newly created language model set.

Definition at line 182 of file ngram_model_set.c.

References ckd_calloc, ckd_free(), ckd_salloc, E_ERROR, E_ERROR_SYSTEM, E_INFO, glist_add_ptr(), glist_count(), glist_free(), glist_reverse(), gnode_ptr, hash_table_free(), hash_table_lookup(), hash_table_new(), hash_table_tolist(), NGRAM_AUTO, ngram_model_add_class(), ngram_model_free(), ngram_model_read(), ngram_model_set_init(), path_is_absolute(), string_join(), and hash_entry_s::val.

ngram_model_t* ngram_model_set_remove ( ngram_model_t set,
const char *  name,
int  reuse_widmap 
)

Remove a language model from a set.

Parameters:
set The language model set to remove from.
name The name associated with the model to remove.
reuse_widmap Reuse the existing word-ID mapping in set.

Definition at line 579 of file ngram_model_set.c.

References ckd_free(), ngram_model_s::lmath, ngram_model_s::log_zero, logmath_exp(), logmath_log(), and ngram_model_s::n_words.

ngram_model_t* ngram_model_set_select ( ngram_model_t set,
const char *  name 
)

Select a single language model from a set for scoring.

Returns:
the newly selected language model, or NULL if no language model by that name exists.

Definition at line 427 of file ngram_model_set.c.


Generated on 20 Nov 2009 for SphinxBase by  doxygen 1.6.1