weka.classifiers.trees
Class BFTree

java.lang.Object
  extended by weka.classifiers.Classifier
      extended by weka.classifiers.RandomizableClassifier
          extended by weka.classifiers.trees.BFTree
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, AdditionalMeasureProducer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class BFTree
extends RandomizableClassifier
implements AdditionalMeasureProducer, TechnicalInformationHandler

Class for building a best-first decision tree classifier. This class uses binary split for both nominal and numeric attributes. For missing values, the method of 'fractional' instances is used.

For more information, see:

Haijian Shi (2007). Best-first decision tree learning. Hamilton, NZ.

Jerome Friedman, Trevor Hastie, Robert Tibshirani (2000). Additive logistic regression : A statistical view of boosting. Annals of statistics. 28(2):337-407.

BibTeX:

 @mastersthesis{Shi2007,
    address = {Hamilton, NZ},
    author = {Haijian Shi},
    note = {COMP594},
    school = {University of Waikato},
    title = {Best-first decision tree learning},
    year = {2007}
 }
 
 @article{Friedman2000,
    author = {Jerome Friedman and Trevor Hastie and Robert Tibshirani},
    journal = {Annals of statistics},
    number = {2},
    pages = {337-407},
    title = {Additive logistic regression : A statistical view of boosting},
    volume = {28},
    year = {2000},
    ISSN = {0090-5364}
 }
 

Valid options are:

 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -P <UNPRUNED|POSTPRUNED|PREPRUNED>
  The pruning strategy.
  (default: POSTPRUNED)
 -M <min no>
  The minimal number of instances at the terminal nodes.
  (default 2)
 -N <num folds>
  The number of folds used in the pruning.
  (default 5)
 -H
  Don't use heuristic search for nominal attributes in multi-class
  problem (default yes).
 
 -G
  Don't use Gini index for splitting (default yes),
  if not information is used.
 -R
  Don't use error rate in internal cross-validation (default yes), 
  but root mean squared error.
 -A
  Use the 1 SE rule to make pruning decision.
  (default no).
 -C
  Percentage of training data size (0-1]
  (default 1).

Version:
$Revision: 5535 $
Author:
Haijian Shi (hs69@cs.waikato.ac.nz)
See Also:
Serialized Form

Field Summary
static int PRUNING_POSTPRUNING
          pruning strategy: post-pruning
static int PRUNING_PREPRUNING
          pruning strategy: pre-pruning
static int PRUNING_UNPRUNED
          pruning strategy: un-pruned
static Tag[] TAGS_PRUNING
          pruning strategy
 
Constructor Summary
BFTree()
           
 
Method Summary
 void buildClassifier(Instances data)
          Method for building a BestFirst decision tree classifier.
 double[] distributionForInstance(Instance instance)
          Computes class probabilities for instance using the decision tree.
 java.util.Enumeration enumerateMeasures()
          Return an enumeration of the measure names.
 Capabilities getCapabilities()
          Returns default capabilities of the classifier.
 boolean getHeuristic()
          Get if use heuristic search for nominal attributes in multi-class problems.
 double getMeasure(java.lang.String additionalMeasureName)
          Returns the value of the named measure
 int getMinNumObj()
          Get minimal number of instances at the terminal nodes.
 int getNumFoldsPruning()
          Set number of folds in internal cross-validation.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 SelectedTag getPruningStrategy()
          Gets the pruning strategy.
 java.lang.String getRevision()
          Returns the revision string.
 double getSizePer()
          Get training set size.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 boolean getUseErrorRate()
          Get if use error rate in internal cross-validation.
 boolean getUseGini()
          Get if use Gini index as splitting criterion.
 boolean getUseOneSE()
          Get if use the 1SE rule to choose final model.
 java.lang.String globalInfo()
          Returns a string describing classifier
 java.lang.String heuristicTipText()
          Returns the tip text for this property
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method.
 double measureTreeSize()
          Return number of tree size.
 java.lang.String minNumObjTipText()
          Returns the tip text for this property
 java.lang.String numFoldsPruningTipText()
          Returns the tip text for this property
 int numLeaves()
          Compute number of leaf nodes.
 int numNodes()
          Compute size of the tree.
 java.lang.String pruningStrategyTipText()
          Returns the tip text for this property
 void setHeuristic(boolean value)
          Set if use heuristic search for nominal attributes in multi-class problems.
 void setMinNumObj(int value)
          Set minimal number of instances at the terminal nodes.
 void setNumFoldsPruning(int value)
          Set number of folds in internal cross-validation.
 void setOptions(java.lang.String[] options)
          Parses the options for this object.
 void setPruningStrategy(SelectedTag value)
          Sets the pruning strategy.
 void setSizePer(double value)
          Set training set size.
 void setUseErrorRate(boolean value)
          Set if use error rate in internal cross-validation.
 void setUseGini(boolean value)
          Set if use Gini index as splitting criterion.
 void setUseOneSE(boolean value)
          Set if use the 1SE rule to choose final model.
 java.lang.String sizePerTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Prints the decision tree using the protected toString method from below.
 java.lang.String useErrorRateTipText()
          Returns the tip text for this property
 java.lang.String useGiniTipText()
          Returns the tip text for this property
 java.lang.String useOneSETipText()
          Returns the tip text for this property
 
Methods inherited from class weka.classifiers.RandomizableClassifier
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, setDebug
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

PRUNING_UNPRUNED

public static final int PRUNING_UNPRUNED
pruning strategy: un-pruned

See Also:
Constant Field Values

PRUNING_POSTPRUNING

public static final int PRUNING_POSTPRUNING
pruning strategy: post-pruning

See Also:
Constant Field Values

PRUNING_PREPRUNING

public static final int PRUNING_PREPRUNING
pruning strategy: pre-pruning

See Also:
Constant Field Values

TAGS_PRUNING

public static final Tag[] TAGS_PRUNING
pruning strategy

Constructor Detail

BFTree

public BFTree()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing classifier

Returns:
a description suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the classifier.

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class Classifier
Returns:
the capabilities of this classifier
See Also:
Capabilities

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Method for building a BestFirst decision tree classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if decision tree cannot be built successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Computes class probabilities for instance using the decision tree.

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance for which class probabilities is to be computed
Returns:
the class probabilities for the given instance
Throws:
java.lang.Exception - if something goes wrong

toString

public java.lang.String toString()
Prints the decision tree using the protected toString method from below.

Overrides:
toString in class java.lang.Object
Returns:
a textual description of the classifier

numNodes

public int numNodes()
Compute size of the tree.

Returns:
size of the tree

numLeaves

public int numLeaves()
Compute number of leaf nodes.

Returns:
number of leaf nodes

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableClassifier
Returns:
an enumeration describing the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses the options for this object.

Valid options are:

 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -P <UNPRUNED|POSTPRUNED|PREPRUNED>
  The pruning strategy.
  (default: POSTPRUNED)
 -M <min no>
  The minimal number of instances at the terminal nodes.
  (default 2)
 -N <num folds>
  The number of folds used in the pruning.
  (default 5)
 -H
  Don't use heuristic search for nominal attributes in multi-class
  problem (default yes).
 
 -G
  Don't use Gini index for splitting (default yes),
  if not information is used.
 -R
  Don't use error rate in internal cross-validation (default yes), 
  but root mean squared error.
 -A
  Use the 1 SE rule to make pruning decision.
  (default no).
 -C
  Percentage of training data size (0-1]
  (default 1).

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableClassifier
Parameters:
options - the options to use
Throws:
java.lang.Exception - if setting of options fails

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableClassifier
Returns:
the current settings of the Classifier

enumerateMeasures

public java.util.Enumeration enumerateMeasures()
Return an enumeration of the measure names.

Specified by:
enumerateMeasures in interface AdditionalMeasureProducer
Returns:
an enumeration of the measure names

measureTreeSize

public double measureTreeSize()
Return number of tree size.

Returns:
number of tree size

getMeasure

public double getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure

Specified by:
getMeasure in interface AdditionalMeasureProducer
Parameters:
additionalMeasureName - the name of the measure to query for its value
Returns:
the value of the named measure
Throws:
java.lang.IllegalArgumentException - if the named measure is not supported

pruningStrategyTipText

public java.lang.String pruningStrategyTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setPruningStrategy

public void setPruningStrategy(SelectedTag value)
Sets the pruning strategy.

Parameters:
value - the strategy

getPruningStrategy

public SelectedTag getPruningStrategy()
Gets the pruning strategy.

Returns:
the current strategy.

minNumObjTipText

public java.lang.String minNumObjTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setMinNumObj

public void setMinNumObj(int value)
Set minimal number of instances at the terminal nodes.

Parameters:
value - minimal number of instances at the terminal nodes

getMinNumObj

public int getMinNumObj()
Get minimal number of instances at the terminal nodes.

Returns:
minimal number of instances at the terminal nodes

numFoldsPruningTipText

public java.lang.String numFoldsPruningTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumFoldsPruning

public void setNumFoldsPruning(int value)
Set number of folds in internal cross-validation.

Parameters:
value - the number of folds

getNumFoldsPruning

public int getNumFoldsPruning()
Set number of folds in internal cross-validation.

Returns:
number of folds in internal cross-validation

heuristicTipText

public java.lang.String heuristicTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setHeuristic

public void setHeuristic(boolean value)
Set if use heuristic search for nominal attributes in multi-class problems.

Parameters:
value - if use heuristic search for nominal attributes in multi-class problems

getHeuristic

public boolean getHeuristic()
Get if use heuristic search for nominal attributes in multi-class problems.

Returns:
if use heuristic search for nominal attributes in multi-class problems

useGiniTipText

public java.lang.String useGiniTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setUseGini

public void setUseGini(boolean value)
Set if use Gini index as splitting criterion.

Parameters:
value - if use Gini index splitting criterion

getUseGini

public boolean getUseGini()
Get if use Gini index as splitting criterion.

Returns:
if use Gini index as splitting criterion

useErrorRateTipText

public java.lang.String useErrorRateTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setUseErrorRate

public void setUseErrorRate(boolean value)
Set if use error rate in internal cross-validation.

Parameters:
value - if use error rate in internal cross-validation

getUseErrorRate

public boolean getUseErrorRate()
Get if use error rate in internal cross-validation.

Returns:
if use error rate in internal cross-validation.

useOneSETipText

public java.lang.String useOneSETipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setUseOneSE

public void setUseOneSE(boolean value)
Set if use the 1SE rule to choose final model.

Parameters:
value - if use the 1SE rule to choose final model

getUseOneSE

public boolean getUseOneSE()
Get if use the 1SE rule to choose final model.

Returns:
if use the 1SE rule to choose final model

sizePerTipText

public java.lang.String sizePerTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui.

setSizePer

public void setSizePer(double value)
Set training set size.

Parameters:
value - training set size

getSizePer

public double getSizePer()
Get training set size.

Returns:
training set size

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class Classifier
Returns:
the revision

main

public static void main(java.lang.String[] args)
Main method.

Parameters:
args - the options for the classifier