public class LogisticBase extends Classifier implements WeightedInstancesHandler
-D If set, classifier is run in debug mode and may output additional info to the console
Modifier and Type | Field and Description |
---|---|
protected boolean |
m_errorOnProbabilities
Use error on probabilities for stopping criterion of LogitBoost?
|
protected int |
m_fixedNumIterations
Use fixed number of iterations for LogitBoost? (if negative, cross-validate number of iterations)
|
protected int |
m_heuristicStop
Use heuristic to stop performing LogitBoost iterations earlier?
If enabled, LogitBoost is stopped if the current (local) minimum of the error on a test set as
a function of the number of iterations has not changed for m_heuristicStop iterations.
|
protected int |
m_maxIterations
The maximum number of LogitBoost iterations
|
protected int |
m_numClasses
The number of different classes
|
protected Instances |
m_numericData
Numeric version of the training data.
|
protected Instances |
m_numericDataHeader
Header-only version of the numeric version of the training data
|
protected static int |
m_numFoldsBoosting
Number of folds for cross-validating number of LogitBoost iterations
|
protected double |
m_numParameters
Effective number of parameters used for AIC / BIC automatic stopping
|
protected int |
m_numRegressions
The number of LogitBoost iterations performed.
|
protected SimpleLinearRegression[][] |
m_regressions
Array holding the simple regression functions fit by LogitBoost
|
protected Instances |
m_train
Training data
|
protected boolean |
m_useCrossValidation
Use cross-validation to determine best number of LogitBoost iterations ?
|
protected double |
m_weightTrimBeta
Threshold for trimming weights.
|
protected static double |
Z_MAX
Threshold on the Z-value for LogitBoost
|
m_Debug
Constructor and Description |
---|
LogisticBase()
Constructor that creates LogisticBase object with standard options.
|
LogisticBase(int numBoostingIterations,
boolean useCrossValidation,
boolean errorOnProbabilities)
Constructor to create LogisticBase object.
|
Modifier and Type | Method and Description |
---|---|
void |
buildClassifier(Instances data)
Builds the logistic regression model usiing LogitBoost.
|
void |
cleanup()
Cleanup in order to save memory.
|
double[] |
distributionForInstance(Instance instance)
Returns class probabilities for an instance.
|
protected int |
getBestIteration(double[] errors,
int maxIteration)
Helper function to find the minimum in an array of error values.
|
protected double[][] |
getCoefficients()
Returns an array holding the coefficients of the logistic model.
|
protected double |
getErrorRate(Instances data)
Returns the misclassification error of the current model on a set of instances.
|
protected double[] |
getFs(Instance instance)
Computes the F-values for a single instance.
|
protected double[][] |
getFs(Instances data)
Computes the F-values for a set of instances.
|
int |
getMaxIterations()
Returns the maxIterations parameter.
|
protected double |
getMeanAbsoluteError(Instances data)
Returns the error of the probability estimates for the current model on a set of instances.
|
protected Instances |
getNumericData(Instances data)
Converts training data to numeric version.
|
int |
getNumRegressions()
The number of LogitBoost iterations performed (= the number of simple
regression functions fit).
|
protected double[][] |
getProbs(double[][] dataFs)
Computes the p-values (probabilities for the different classes) from
the F-values for a set of instances.
|
String |
getRevision()
Returns the revision string.
|
boolean |
getUseAIC()
Get the value of useAIC.
|
int[][] |
getUsedAttributes()
Returns an array of the indices of the attributes used in the logistic model.
|
double |
getWeightTrimBeta()
Get the value of weightTrimBeta.
|
protected double[][] |
getWs(double[][] probs,
double[][] dataYs)
Computes the LogitBoost weights from an array of y/p values
(actual/estimated class probabilities).
|
protected double[][] |
getYs(Instances data)
Computes the Y-values (actual class probabilities) for a set of instances.
|
protected double |
getZ(double actual,
double p)
Computes the LogitBoost response variable from y/p values
(actual/estimated class probabilities).
|
protected double[][] |
getZs(double[][] probs,
double[][] dataYs)
Computes the LogitBoost response for an array of y/p values
(actual/estimated class probabilities).
|
protected SimpleLinearRegression[][] |
initRegressions()
Helper function to initialize m_regressions.
|
protected double |
negativeLogLikelihood(double[][] dataYs,
double[][] probs)
Returns the negative loglikelihood of the Y-values (actual class probabilities) given the
p-values (current probability estimates).
|
double |
percentAttributesUsed()
Returns the fraction of all attributes in the data that are used in the
logistic model (in percent).
|
protected void |
performBoosting()
Runs LogitBoost using the stopping criterion on the training set.
|
protected int |
performBoosting(Instances train,
Instances test,
double[] error,
int maxIterations)
Runs LogitBoost on a training set and monitors the error on a test set.
|
protected void |
performBoosting(int numIterations)
Runs LogitBoost with a fixed number of iterations.
|
protected void |
performBoostingCV()
Runs LogitBoost, determining the best number of iterations by cross-validation.
|
protected void |
performBoostingInfCriterion()
Runs LogitBoost, determining the best number of iterations by an information criterion (currently AIC).
|
protected boolean |
performIteration(int iteration,
double[][] trainYs,
double[][] trainFs,
double[][] probs,
Instances trainNumeric)
Performs a single iteration of LogitBoost, and updates the model accordingly.
|
protected double[] |
probs(double[] Fs)
Computes the p-values (probabilities for the classes) from the F-values
of the logistic model.
|
protected SimpleLinearRegression[][] |
selectRegressions(SimpleLinearRegression[][] classifiers)
Helper function for cutting back m_regressions to the set of classifiers
(corresponsing to the number of LogitBoost iterations) that gave the
smallest error.
|
void |
setHeuristicStop(int heuristicStop)
Sets the option "heuristicStop".
|
void |
setMaxIterations(int maxIterations)
Sets the parameter "maxIterations".
|
void |
setUseAIC(boolean c)
Set the value of useAIC.
|
void |
setWeightTrimBeta(double w)
Sets the option "weightTrimBeta".
|
String |
toString()
Returns a description of the logistic model (i.e., attributes and
coefficients).
|
classifyInstance, debugTipText, forName, getCapabilities, getDebug, getOptions, listOptions, makeCopies, makeCopy, runClassifier, setDebug, setOptions
protected Instances m_numericDataHeader
protected Instances m_numericData
protected Instances m_train
protected boolean m_useCrossValidation
protected boolean m_errorOnProbabilities
protected int m_fixedNumIterations
protected int m_heuristicStop
protected int m_numRegressions
protected int m_maxIterations
protected int m_numClasses
protected SimpleLinearRegression[][] m_regressions
protected static int m_numFoldsBoosting
protected static final double Z_MAX
protected double m_numParameters
protected double m_weightTrimBeta
public LogisticBase()
public LogisticBase(int numBoostingIterations, boolean useCrossValidation, boolean errorOnProbabilities)
numBoostingIterations
- fixed number of iterations for LogitBoost (if negative, use cross-validation or
stopping criterion on the training data).useCrossValidation
- cross-validate number of LogitBoost iterations (if false, use stopping
criterion on the training data).errorOnProbabilities
- if true, use error on probabilities
instead of misclassification for stopping criterion of LogitBoostpublic void buildClassifier(Instances data) throws Exception
buildClassifier
in class Classifier
data
- the training dataException
- if something goes wrongprotected void performBoostingCV() throws Exception
Exception
- if something goes wrongprotected void performBoostingInfCriterion() throws Exception
Exception
protected int performBoosting(Instances train, Instances test, double[] error, int maxIterations) throws Exception
train
- the training settest
- the test seterror
- array to hold the logged error valuesmaxIterations
- the maximum number of LogitBoost iterations to runException
- if something goes wrongprotected void performBoosting(int numIterations) throws Exception
numIterations
- the number of iterations to runException
- if something goes wrongprotected void performBoosting() throws Exception
Exception
- if something goes wrongprotected double getErrorRate(Instances data) throws Exception
data
- the set of instancesException
- if something goes wrongprotected double getMeanAbsoluteError(Instances data) throws Exception
data
- the set of instancesException
- if something goes wrongprotected int getBestIteration(double[] errors, int maxIteration)
errors
- an array containing errorsmaxIteration
- the maximum of iterationsprotected boolean performIteration(int iteration, double[][] trainYs, double[][] trainFs, double[][] probs, Instances trainNumeric) throws Exception
iteration
- the current iterationtrainYs
- the y-values (see description of LogitBoost) for the model trained so fartrainFs
- the F-values (see description of LogitBoost) for the model trained so farprobs
- the p-values (see description of LogitBoost) for the model trained so fartrainNumeric
- numeric version of the training dataException
- if something goes wrongprotected SimpleLinearRegression[][] initRegressions()
protected Instances getNumericData(Instances data) throws Exception
data
- the data to convertException
- if something goes wrongprotected SimpleLinearRegression[][] selectRegressions(SimpleLinearRegression[][] classifiers)
classifiers
- the original set of classifiersprotected double getZ(double actual, double p)
actual
- the actual class probabilityp
- the estimated class probabilityprotected double[][] getZs(double[][] probs, double[][] dataYs)
dataYs
- the actual class probabilitiesprobs
- the estimated class probabilitiesprotected double[][] getWs(double[][] probs, double[][] dataYs)
dataYs
- the actual class probabilitiesprobs
- the estimated class probabilitiesprotected double[] probs(double[] Fs)
Fs
- the F-valuesprotected double[][] getYs(Instances data)
data
- the data to compute the Y-values fromprotected double[] getFs(Instance instance) throws Exception
instance
- the instance to compute the F-values forException
- if something goes wrongprotected double[][] getFs(Instances data) throws Exception
data
- the data to work onException
- if something goes wrongprotected double[][] getProbs(double[][] dataFs)
dataFs
- the F-valuesprotected double negativeLogLikelihood(double[][] dataYs, double[][] probs)
dataYs
- the Y-valuesprobs
- the p-valuespublic int[][] getUsedAttributes()
public int getNumRegressions()
public double getWeightTrimBeta()
public boolean getUseAIC()
public void setMaxIterations(int maxIterations)
maxIterations
- the maximum iterationspublic void setHeuristicStop(int heuristicStop)
heuristicStop
- the heuristic stop to usepublic void setWeightTrimBeta(double w)
public void setUseAIC(boolean c)
c
- Value to assign to useAIC.public int getMaxIterations()
protected double[][] getCoefficients()
public double percentAttributesUsed()
public String toString()
public double[] distributionForInstance(Instance instance) throws Exception
distributionForInstance
in class Classifier
instance
- the instance to compute the distribution forException
- if distribution can't be computed successfullypublic void cleanup()
public String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class Classifier
Copyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.