weka.classifiers.meta
Class EnsembleSelection

java.lang.Object
  extended by weka.classifiers.Classifier
      extended by weka.classifiers.RandomizableClassifier
          extended by weka.classifiers.meta.EnsembleSelection
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class EnsembleSelection
extends RandomizableClassifier
implements TechnicalInformationHandler

Combines several classifiers using the ensemble selection method. For more information, see: Caruana, Rich, Niculescu, Alex, Crew, Geoff, and Ksikes, Alex, Ensemble Selection from Libraries of Models, The International Conference on Machine Learning (ICML'04), 2004. Implemented in Weka by Bob Jung and David Michael.

BibTeX:

 @inproceedings{RichCaruana2004,
    author = {Rich Caruana, Alex Niculescu, Geoff Crew, and Alex Ksikes},
    booktitle = {21st International Conference on Machine Learning},
    title = {Ensemble Selection from Libraries of Models},
    year = {2004}
 }
 

Our implementation of ensemble selection is a bit different from the other classifiers because we assume that the list of models to be trained is too large to fit in memory and that our base classifiers will need to be serialized to the file system (in the directory listed in the "workingDirectory option). We have adopted the term "model library" for this large set of classifiers keeping in line with the original paper.

If you are planning to use this classifier, we highly recommend you take a quick look at our FAQ/tutorial on the WIKI. There are a few things that are unique to this classifier that could trip you up. Otherwise, this method is a great way to get really great classifier performance without having to do too much parameter tuning. What is nice is that in the worst case you get a nice summary of how s large number of diverse models performed on your data set.

This class relies on the package weka.classifiers.meta.ensembleSelection.

When run from the Explorer or another GUI, the classifier depends on the package weka.gui.libraryEditor.

Valid options are:

 -L </path/to/modelLibrary>
  Specifies the Model Library File, continuing the list of all models.
 -W </path/to/working/directory>
  Specifies the Working Directory, where all models will be stored.
 -B <numModelBags>
  Set the number of bags, i.e., number of iterations to run 
  the ensemble selection algorithm.
 -E <modelRatio>
  Set the ratio of library models that will be randomly chosen 
  to populate each bag of models.
 -V <validationRatio>
  Set the ratio of the training data set that will be reserved 
  for validation.
 -H <hillClimbIterations>
  Set the number of hillclimbing iterations to be performed 
  on each model bag.
 -I <sortInitialization>
  Set the the ratio of the ensemble library that the sort 
  initialization algorithm will be able to choose from while 
  initializing the ensemble for each model bag
 -X <numFolds>
  Sets the number of cross-validation folds.
 -P <hillclimbMettric>
  Specify the metric that will be used for model selection 
  during the hillclimbing algorithm.
  Valid metrics are: 
   accuracy, rmse, roc, precision, recall, fscore, all
 -A <algorithm>
  Specifies the algorithm to be used for ensemble selection. 
  Valid algorithms are:
   "forward" (default) for forward selection.
   "backward" for backward elimination.
   "both" for both forward and backward elimination.
   "best" to simply print out top performer from the 
      ensemble library
   "library" to only train the models in the ensemble 
      library
 -R
  Flag whether or not models can be selected more than once 
  for an ensemble.
 -G
  Whether sort initialization greedily stops adding models 
  when performance degrades.
 -O
  Flag for verbose output. Prints out performance of all 
  selected models.
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console

Version:
$Revision: 5526 $
Author:
Robert Jung, David Michael
See Also:
Serialized Form

Field Summary
static int ALGORITHM_BACKWARD
           
static int ALGORITHM_BEST
           
static int ALGORITHM_BUILD_LIBRARY
           
static int ALGORITHM_FORWARD
          The "enumeration" of the algorithms we can use.
static int ALGORITHM_FORWARD_BACKWARD
           
static Tag[] TAGS_ALGORITHM
          defines metrics that can be chosen for hillclimbing
static Tag[] TAGS_METRIC
          defines metrics that can be chosen for hillclimbing
 
Constructor Summary
EnsembleSelection()
           
 
Method Summary
 java.lang.String algorithmTipText()
          Returns the tip text for this property
 void buildClassifier(Instances trainData)
          Buildclassifier selects a classifier from the set of classifiers by minimising error on the training data.
 double[] distributionForInstance(Instance instance)
          Calculates the class membership probabilities for the given test instance.
 SelectedTag getAlgorithm()
          Gets the algorithm
 Capabilities getCapabilities()
          We return true for basically everything except for Missing class values, because we can't really answer for all the models in our library.
static java.lang.String getDefaultWorkingDirectory()
          This method tries to find a reasonable path name for the ensemble working directory where models and files will be stored.
 boolean getGreedySortInitialization()
          Get the value of greedySortInitialization.
 int getHillclimbIterations()
          Gets the number of hillclimbIterations.
 SelectedTag getHillclimbMetric()
          Gets the hill climbing metric.
 EnsembleSelectionLibrary getLibrary()
          Gets the ensemble library.
 double getModelRatio()
          Get the value of modelRatio.
 int getNumFolds()
          Gets the number of folds for the cross-validation.
 int getNumModelBags()
          Gets numModelBags.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 boolean getReplacement()
          Get the value of replacement.
 java.lang.String getRevision()
          Returns the revision string.
 double getSortInitializationRatio()
          Get the value of sortInitializationRatio.
 TechnicalInformation getTechnicalInformation()
          Return the technical information.
 double getValidationRatio()
          Get the value of validationRatio.
 boolean getVerboseOutput()
          Get the value of verboseOutput.
 java.io.File getWorkingDirectory()
          Get the value of working directory.
 java.lang.String globalInfo()
          Returns a string describing classifier
 java.lang.String greedySortInitializationTipText()
          Returns the tip text for this property
 java.lang.String hillclimbIterationsTipText()
          Returns the tip text for this property
 java.lang.String hillclimbMetricTipText()
          Returns the tip text for this property
 java.lang.String libraryTipText()
          Returns the tip text for this property
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Executes the classifier from commandline.
 java.lang.String modelRatioTipText()
          Returns the tip text for this property
 java.lang.String numFoldsTipText()
          Returns the tip text for this property
 java.lang.String numModelBagsTipText()
          Returns the tip text for this property
 java.lang.String replacementTipText()
          Returns the tip text for this property
 void setAlgorithm(SelectedTag newType)
          Sets the Algorithm to use
 void setGreedySortInitialization(boolean newGreedySortInitialization)
          Set the value of greedySortInitialization.
 void setHillclimbIterations(int n)
          Sets the number of hillclimbIterations.
 void setHillclimbMetric(SelectedTag newType)
          Sets the hill climbing metric.
 void setLibrary(EnsembleSelectionLibrary newLibrary)
          Sets the ensemble library.
 void setModelRatio(double v)
          Set the value of modelRatio.
 void setNumFolds(int numFolds)
          Sets the number of folds for the cross-validation.
 void setNumModelBags(int n)
          Sets numModelBags.
 void setOptions(java.lang.String[] options)
          Valid options are:

 void setReplacement(boolean newReplacement)
          Set the value of replacement.
 void setSortInitializationRatio(double v)
          Set the value of sortInitializationRatio.
 void setValidationRatio(double v)
          Set the value of validationRatio.
 void setVerboseOutput(boolean newVerboseOutput)
          Set the value of verboseOutput.
 void setWorkingDirectory(java.io.File newWorkingDirectory)
          Set the value of working directory.
 java.lang.String sortInitializationRatioTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Output a representation of this classifier
 java.lang.String validationRatioTipText()
          Returns the tip text for this property
 java.lang.String verboseOutputTipText()
          Returns the tip text for this property
 java.lang.String workingDirectoryTipText()
          Returns the tip text for this property
 
Methods inherited from class weka.classifiers.RandomizableClassifier
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, setDebug
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

TAGS_METRIC

public static final Tag[] TAGS_METRIC
defines metrics that can be chosen for hillclimbing


ALGORITHM_FORWARD

public static final int ALGORITHM_FORWARD
The "enumeration" of the algorithms we can use. Forward - forward selection. For hillclimb iterations,

See Also:
Constant Field Values

ALGORITHM_BACKWARD

public static final int ALGORITHM_BACKWARD
See Also:
Constant Field Values

ALGORITHM_FORWARD_BACKWARD

public static final int ALGORITHM_FORWARD_BACKWARD
See Also:
Constant Field Values

ALGORITHM_BEST

public static final int ALGORITHM_BEST
See Also:
Constant Field Values

ALGORITHM_BUILD_LIBRARY

public static final int ALGORITHM_BUILD_LIBRARY
See Also:
Constant Field Values

TAGS_ALGORITHM

public static final Tag[] TAGS_ALGORITHM
defines metrics that can be chosen for hillclimbing

Constructor Detail

EnsembleSelection

public EnsembleSelection()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing classifier

Returns:
a description suitable for displaying in the explorer/experimenter gui

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableClassifier
Returns:
an enumeration of all the available options.

getCapabilities

public Capabilities getCapabilities()
We return true for basically everything except for Missing class values, because we can't really answer for all the models in our library. If any of them don't work with the supplied data then we just trap the exception.

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class Classifier
Returns:
the capabilities of this classifier
See Also:
Capabilities

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Valid options are:

 -L </path/to/modelLibrary>
  Specifies the Model Library File, continuing the list of all models.
 -W </path/to/working/directory>
  Specifies the Working Directory, where all models will be stored.
 -B <numModelBags>
  Set the number of bags, i.e., number of iterations to run 
  the ensemble selection algorithm.
 -E <modelRatio>
  Set the ratio of library models that will be randomly chosen 
  to populate each bag of models.
 -V <validationRatio>
  Set the ratio of the training data set that will be reserved 
  for validation.
 -H <hillClimbIterations>
  Set the number of hillclimbing iterations to be performed 
  on each model bag.
 -I <sortInitialization>
  Set the the ratio of the ensemble library that the sort 
  initialization algorithm will be able to choose from while 
  initializing the ensemble for each model bag
 -X <numFolds>
  Sets the number of cross-validation folds.
 -P <hillclimbMettric>
  Specify the metric that will be used for model selection 
  during the hillclimbing algorithm.
  Valid metrics are: 
   accuracy, rmse, roc, precision, recall, fscore, all
 -A <algorithm>
  Specifies the algorithm to be used for ensemble selection. 
  Valid algorithms are:
   "forward" (default) for forward selection.
   "backward" for backward elimination.
   "both" for both forward and backward elimination.
   "best" to simply print out top performer from the 
      ensemble library
   "library" to only train the models in the ensemble 
      library
 -R
  Flag whether or not models can be selected more than once 
  for an ensemble.
 -G
  Whether sort initialization greedily stops adding models 
  when performance degrades.
 -O
  Flag for verbose output. Prints out performance of all 
  selected models.
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableClassifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableClassifier
Returns:
an array of strings suitable for passing to setOptions

numFoldsTipText

public java.lang.String numFoldsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumFolds

public int getNumFolds()
Gets the number of folds for the cross-validation.

Returns:
the number of folds for the cross-validation

setNumFolds

public void setNumFolds(int numFolds)
                 throws java.lang.Exception
Sets the number of folds for the cross-validation.

Parameters:
numFolds - the number of folds for the cross-validation
Throws:
java.lang.Exception - if parameter illegal

libraryTipText

public java.lang.String libraryTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getLibrary

public EnsembleSelectionLibrary getLibrary()
Gets the ensemble library.

Returns:
the ensemble library

setLibrary

public void setLibrary(EnsembleSelectionLibrary newLibrary)
Sets the ensemble library.

Parameters:
newLibrary - the ensemble library

modelRatioTipText

public java.lang.String modelRatioTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getModelRatio

public double getModelRatio()
Get the value of modelRatio.

Returns:
Value of modelRatio.

setModelRatio

public void setModelRatio(double v)
Set the value of modelRatio.

Parameters:
v - Value to assign to modelRatio.

validationRatioTipText

public java.lang.String validationRatioTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getValidationRatio

public double getValidationRatio()
Get the value of validationRatio.

Returns:
Value of validationRatio.

setValidationRatio

public void setValidationRatio(double v)
Set the value of validationRatio.

Parameters:
v - Value to assign to validationRatio.

hillclimbMetricTipText

public java.lang.String hillclimbMetricTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getHillclimbMetric

public SelectedTag getHillclimbMetric()
Gets the hill climbing metric. Will be one of METRIC_ACCURACY, METRIC_RMSE, METRIC_ROC, METRIC_PRECISION, METRIC_RECALL, METRIC_FSCORE, METRIC_ALL

Returns:
the hillclimbMetric

setHillclimbMetric

public void setHillclimbMetric(SelectedTag newType)
Sets the hill climbing metric. Will be one of METRIC_ACCURACY, METRIC_RMSE, METRIC_ROC, METRIC_PRECISION, METRIC_RECALL, METRIC_FSCORE, METRIC_ALL

Parameters:
newType - the new hillclimbMetric

algorithmTipText

public java.lang.String algorithmTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getAlgorithm

public SelectedTag getAlgorithm()
Gets the algorithm

Returns:
the algorithm

setAlgorithm

public void setAlgorithm(SelectedTag newType)
Sets the Algorithm to use

Parameters:
newType - the new algorithm

hillclimbIterationsTipText

public java.lang.String hillclimbIterationsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getHillclimbIterations

public int getHillclimbIterations()
Gets the number of hillclimbIterations.

Returns:
the number of hillclimbIterations

setHillclimbIterations

public void setHillclimbIterations(int n)
                            throws java.lang.Exception
Sets the number of hillclimbIterations.

Parameters:
n - the number of hillclimbIterations
Throws:
java.lang.Exception - if parameter illegal

numModelBagsTipText

public java.lang.String numModelBagsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumModelBags

public int getNumModelBags()
Gets numModelBags.

Returns:
numModelBags

setNumModelBags

public void setNumModelBags(int n)
                     throws java.lang.Exception
Sets numModelBags.

Parameters:
n - the new value for numModelBags
Throws:
java.lang.Exception - if parameter illegal

sortInitializationRatioTipText

public java.lang.String sortInitializationRatioTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getSortInitializationRatio

public double getSortInitializationRatio()
Get the value of sortInitializationRatio.

Returns:
Value of sortInitializationRatio.

setSortInitializationRatio

public void setSortInitializationRatio(double v)
Set the value of sortInitializationRatio.

Parameters:
v - Value to assign to sortInitializationRatio.

replacementTipText

public java.lang.String replacementTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getReplacement

public boolean getReplacement()
Get the value of replacement.

Returns:
Value of replacement.

setReplacement

public void setReplacement(boolean newReplacement)
Set the value of replacement.

Parameters:
newReplacement - Value to assign to replacement.

greedySortInitializationTipText

public java.lang.String greedySortInitializationTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getGreedySortInitialization

public boolean getGreedySortInitialization()
Get the value of greedySortInitialization.

Returns:
Value of replacement.

setGreedySortInitialization

public void setGreedySortInitialization(boolean newGreedySortInitialization)
Set the value of greedySortInitialization.

Parameters:
newGreedySortInitialization - Value to assign to replacement.

verboseOutputTipText

public java.lang.String verboseOutputTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getVerboseOutput

public boolean getVerboseOutput()
Get the value of verboseOutput.

Returns:
Value of verboseOutput.

setVerboseOutput

public void setVerboseOutput(boolean newVerboseOutput)
Set the value of verboseOutput.

Parameters:
newVerboseOutput - Value to assign to verboseOutput.

workingDirectoryTipText

public java.lang.String workingDirectoryTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getWorkingDirectory

public java.io.File getWorkingDirectory()
Get the value of working directory.

Returns:
Value of working directory.

setWorkingDirectory

public void setWorkingDirectory(java.io.File newWorkingDirectory)
Set the value of working directory.

Parameters:
newWorkingDirectory - directory Value.

buildClassifier

public void buildClassifier(Instances trainData)
                     throws java.lang.Exception
Buildclassifier selects a classifier from the set of classifiers by minimising error on the training data.

Specified by:
buildClassifier in class Classifier
Parameters:
trainData - the training data to be used for generating the boosted classifier.
Throws:
java.lang.Exception - if the classifier could not be built successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Calculates the class membership probabilities for the given test instance.

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance to be classified
Returns:
predicted class probability distribution
Throws:
java.lang.Exception - if instance could not be classified successfully

getDefaultWorkingDirectory

public static java.lang.String getDefaultWorkingDirectory()
This method tries to find a reasonable path name for the ensemble working directory where models and files will be stored.

Returns:
true if m_workingDirectory now has a valid file name

toString

public java.lang.String toString()
Output a representation of this classifier

Overrides:
toString in class java.lang.Object
Returns:
a string representation of the classifier

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Return the technical information. There is actually another paper that describes our current method of CV for this classifier TODO: Cite Technical report when published

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class Classifier
Returns:
the revision

main

public static void main(java.lang.String[] argv)
Executes the classifier from commandline.

Parameters:
argv - should contain the following arguments: -t training file [-T test file] [-c class index]