weka.classifiers.bayes
Class BayesianLogisticRegression

java.lang.Object
  extended by weka.classifiers.Classifier
      extended by weka.classifiers.bayes.BayesianLogisticRegression
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler

public class BayesianLogisticRegression
extends Classifier
implements OptionHandler, TechnicalInformationHandler

Implements Bayesian Logistic Regression for both Gaussian and Laplace Priors.

For more information, see

Alexander Genkin, David D. Lewis, David Madigan (2004). Large-scale bayesian logistic regression for text categorization. URL http://www.stat.rutgers.edu/~madigan/PAPERS/shortFat-v3a.pdf.

BibTeX:

 @techreport{Genkin2004,
    author = {Alexander Genkin and David D. Lewis and David Madigan},
    institution = {DIMACS},
    title = {Large-scale bayesian logistic regression for text categorization},
    year = {2004},
    URL = {http://www.stat.rutgers.edu/\~madigan/PAPERS/shortFat-v3a.pdf}
 }
 

Version:
$Revision: 5516 $
Author:
Navendu Garg (gargnav at iit dot edu)
See Also:
Serialized Form

Field Summary
 double[] BetaVector
          Array for storing coefficients of Bayesian regression model.
 double Change
          This variable is used to keep track of change in the value of delta summation of r(i).
 int ClassIndex
          The class index from the training data
static int CV_BASED
           
 double[] Delta
          Trust Region Radius
 double[] DeltaBeta
          Array to store Regression Coefficient updates.
 double[] DeltaR
          This vector is used to store the increments on the R(i).
 double[] DeltaUpdate
          Trust Region Radius Update
static int GAUSSIAN
          Distributions available
 java.lang.String HyperparameterRange
          CV Hyperparameter Range
 double[] Hyperparameters
          Array to store Hyperparameter values for each feature.
 int HyperparameterSelection
          Hyperparameter selection method
 double HyperparameterValue
          Best hyperparameter for test phase
static double[] InputHyperparameterValues
          Set of values to be used as hyperparameter values during Cross-Validation.
 int iterationCounter
          Iteration counter
static int LAPLACIAN
           
static double[] LogLikelihood
          Log-likelihood values to be used to choose the best hyperparameter.
 Filter m_Filter
          Filter interface used to point to weka.filters.unsupervised.attribute.Normalize object
 int maxIterations
          Maximum number of iterations
static int NORM_BASED
          Methods for selecting the hyperparameter value
 boolean NormalizeData
          Choose whether to normalize data or not
 int NumFolds
          NumFolds for CV based Hyperparameters selection
 int PriorClass
          Distribution Prior class
 double[] R
          R(i)= BetaVector X x(i) X y(i).
static int SPECIFIC_VALUE
           
static Tag[] TAGS_HYPER_METHOD
           
static Tag[] TAGS_PRIOR
           
 double Threshold
          Threshold for binary classification of probabilisitic estimate
 double Tolerance
          Tolerance criteria for the stopping criterion.
 
Constructor Summary
BayesianLogisticRegression()
           
 
Method Summary
static double bigF(double r, double sigma)
          This is a convient function that defines and upper bound (Delta>0) for values of r(i) reachable by updates in the trust region.
 void buildClassifier(Instances data)
           (1) Set the data to the class attribute m_Instances. (2)Call the method initialize() to initialize the values.
 double classifyInstance(Instance instance)
          Classifies the given instance using the Bayesian Logistic Regression function.
static double classSgn(double value)
          This class is used to mask the internal class labels.
 double CVBasedHyperparameter()
          Method computes the best hyperparameter value by doing cross -validation on the training data and compute the likelihood.
 java.lang.String debugTipText()
          Returns the tip text for this property
 Capabilities getCapabilities()
          This method tests what kind of data this classifier can handle.
 java.lang.String getHyperparameterRange()
          Get the range of hyperparameter values to consider during CV-based selection.
 SelectedTag getHyperparameterSelection()
          Get the method used to select the hyperparameter
 double getHyperparameterValue()
          Get the hyperparameter value.
 double getLoglikeliHood(double[] betas, Instances instances)
           
 int getMaxIterations()
          Get the maximum number of iterations to perform
 int getNumFolds()
          Return the number of folds for CV-based hyperparameter selection
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 SelectedTag getPriorClass()
          Get the type of prior to use.
 java.lang.String getRevision()
          Returns the revision string.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 double getThreshold()
          Return the threshold being used.
 double getTolerance()
          Get the tolerance value
 java.lang.String globalInfo()
           
 java.lang.String hyperparameterRangeTipText()
          Returns the tip text for this property
 java.lang.String hyperparameterSelectionTipText()
          Returns the tip text for this property
 java.lang.String hyperparameterValueTipText()
          Returns the tip text for this property
 void initialize()
           (1)Initialize m_Beta[j] to 0.
 boolean isDebug()
          Returns true if debug is turned on.
 boolean isNormalizeData()
          Returns true if the data is to be normalized first
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static double logisticLinkFunction(double r)
          This method computes the values for the logistic link function.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String maxIterationsTipText()
          Returns the tip text for this property
 java.lang.String normalizeDataTipText()
          Returns the tip text for this property
 double normBasedHyperParameter()
          This function computes the norm-based hyperparameters and stores them in the m_Hyperparameters.
 java.lang.String numFoldsTipText()
          Returns the tip text for this property
 java.lang.String priorClassTipText()
          Returns the tip text for this property
 void setDebug(boolean debugMode)
          Set debugging mode.
 void setHyperparameterRange(java.lang.String hyperparameterRange)
          Set the range of hyperparameter values to consider during CV-based selection
 void setHyperparameterSelection(SelectedTag newMethod)
          Set the method used to select the hyperparameter
 void setHyperparameterValue(double hyperparameterValue)
          Set the hyperparameter value.
 void setMaxIterations(int maxIterations)
          Set the maximum number of iterations to perform
 void setNormalizeData(boolean normalizeData)
          Set whether to normalize the data or not
 void setNumFolds(int numFolds)
          Set the number of folds to use for CV-based hyperparameter selection
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPriorClass(SelectedTag newMethod)
          Set the type of prior to use.
 void setThreshold(double threshold)
          Set the threshold to use.
 void setTolerance(double tolerance)
          Set the tolerance value
static double sgn(double r)
          Sign for a given value.
 boolean stoppingCriterion()
          This method implements the stopping criterion function.
 java.lang.String thresholdTipText()
          Returns the tip text for this property
 java.lang.String toleranceTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Outputs the linear regression model as a string.
 
Methods inherited from class weka.classifiers.Classifier
distributionForInstance, forName, getDebug, makeCopies, makeCopy
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

LogLikelihood

public static double[] LogLikelihood
Log-likelihood values to be used to choose the best hyperparameter.


InputHyperparameterValues

public static double[] InputHyperparameterValues
Set of values to be used as hyperparameter values during Cross-Validation.


NormalizeData

public boolean NormalizeData
Choose whether to normalize data or not


Tolerance

public double Tolerance
Tolerance criteria for the stopping criterion.


Threshold

public double Threshold
Threshold for binary classification of probabilisitic estimate


GAUSSIAN

public static final int GAUSSIAN
Distributions available

See Also:
Constant Field Values

LAPLACIAN

public static final int LAPLACIAN
See Also:
Constant Field Values

TAGS_PRIOR

public static final Tag[] TAGS_PRIOR

PriorClass

public int PriorClass
Distribution Prior class


NumFolds

public int NumFolds
NumFolds for CV based Hyperparameters selection


NORM_BASED

public static final int NORM_BASED
Methods for selecting the hyperparameter value

See Also:
Constant Field Values

CV_BASED

public static final int CV_BASED
See Also:
Constant Field Values

SPECIFIC_VALUE

public static final int SPECIFIC_VALUE
See Also:
Constant Field Values

TAGS_HYPER_METHOD

public static final Tag[] TAGS_HYPER_METHOD

HyperparameterSelection

public int HyperparameterSelection
Hyperparameter selection method


ClassIndex

public int ClassIndex
The class index from the training data


HyperparameterValue

public double HyperparameterValue
Best hyperparameter for test phase


HyperparameterRange

public java.lang.String HyperparameterRange
CV Hyperparameter Range


maxIterations

public int maxIterations
Maximum number of iterations


iterationCounter

public int iterationCounter
Iteration counter


BetaVector

public double[] BetaVector
Array for storing coefficients of Bayesian regression model.


DeltaBeta

public double[] DeltaBeta
Array to store Regression Coefficient updates.


DeltaUpdate

public double[] DeltaUpdate
Trust Region Radius Update


Delta

public double[] Delta
Trust Region Radius


Hyperparameters

public double[] Hyperparameters
Array to store Hyperparameter values for each feature.


R

public double[] R
R(i)= BetaVector X x(i) X y(i). This an intermediate value with respect to vector BETA, input values and corresponding class labels


DeltaR

public double[] DeltaR
This vector is used to store the increments on the R(i). It is also used to determining the stopping criterion.


Change

public double Change
This variable is used to keep track of change in the value of delta summation of r(i).


m_Filter

public Filter m_Filter
Filter interface used to point to weka.filters.unsupervised.attribute.Normalize object

Constructor Detail

BayesianLogisticRegression

public BayesianLogisticRegression()
Method Detail

globalInfo

public java.lang.String globalInfo()

initialize

public void initialize()
                throws java.lang.Exception
 (1)Initialize m_Beta[j] to 0.
 (2)Initialize m_DeltaUpdate[j].
 

Throws:
java.lang.Exception

getCapabilities

public Capabilities getCapabilities()
This method tests what kind of data this classifier can handle. return Capabilities

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class Classifier
Returns:
the capabilities of this object
See Also:
Capabilities

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception

Specified by:
buildClassifier in class Classifier
Parameters:
data - training data
Throws:
java.lang.Exception - if classifier can't be built successfully.

classSgn

public static double classSgn(double value)
This class is used to mask the internal class labels.

Parameters:
value - internal class label
Returns:
 
  • -1 for internal class label 0
  • +1 for internal class label 1

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

bigF

public static double bigF(double r,
                          double sigma)
This is a convient function that defines and upper bound (Delta>0) for values of r(i) reachable by updates in the trust region. r BetaVector X x(i)y(i). delta A parameter where sigma > 0

Returns:
double function value

stoppingCriterion

public boolean stoppingCriterion()
This method implements the stopping criterion function.

Returns:
boolean whether to stop or not.

logisticLinkFunction

public static double logisticLinkFunction(double r)
This method computes the values for the logistic link function.
f(r)=exp(r)/(1+exp(r))

Returns:
output value

sgn

public static double sgn(double r)
Sign for a given value.

Parameters:
r -
Returns:
double +1 if r>0, -1 if r<0

normBasedHyperParameter

public double normBasedHyperParameter()
This function computes the norm-based hyperparameters and stores them in the m_Hyperparameters.


classifyInstance

public double classifyInstance(Instance instance)
                        throws java.lang.Exception
Classifies the given instance using the Bayesian Logistic Regression function.

Overrides:
classifyInstance in class Classifier
Parameters:
instance - the test instance
Returns:
the classification
Throws:
java.lang.Exception - if classification can't be done successfully

toString

public java.lang.String toString()
Outputs the linear regression model as a string.

Overrides:
toString in class java.lang.Object
Returns:
the model as string

CVBasedHyperparameter

public double CVBasedHyperparameter()
                             throws java.lang.Exception
Method computes the best hyperparameter value by doing cross -validation on the training data and compute the likelihood. The method can parse a range of values or a list of values.

Returns:
Best hyperparameter value with the max likelihood value on the training data.
Throws:
java.lang.Exception

getLoglikeliHood

public double getLoglikeliHood(double[] betas,
                               Instances instances)
Returns:
likelihood for a given set of betas and instances

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class Classifier
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -D
  Show Debugging Output
 
 -P <integer>
  Distribution of the Prior (1=Gaussian, 2=Laplacian)
  (default: 1=Gaussian)
 -H <integer>
  Hyperparameter Selection Method (1=Norm-based, 2=CV-based, 3=specific value)
  (default: 1=Norm-based)
 -V <double>
  Specified Hyperparameter Value (use in conjunction with -H 3)
  (default: 0.27)
 -R <string>
  Hyperparameter Range (use in conjunction with -H 2)
  (format: R:start-end,multiplier OR L:val(1), val(2), ..., val(n))
  (default: R:0.01-316,3.16)
 -Tl <double>
  Tolerance Value
  (default: 0.0005)
 -S <double>
  Threshold Value
  (default: 0.5)
 -F <integer>
  Number Of Folds (use in conjuction with -H 2)
  (default: 2)
 -I <integer>
  Max Number of Iterations
  (default: 100)
 -N
  Normalize the data

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class Classifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Description copied from class: Classifier
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class Classifier
Returns:
an array of strings suitable for passing to setOptions

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options

debugTipText

public java.lang.String debugTipText()
Returns the tip text for this property

Overrides:
debugTipText in class Classifier
Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setDebug

public void setDebug(boolean debugMode)
Description copied from class: Classifier
Set debugging mode.

Overrides:
setDebug in class Classifier
Parameters:
debugMode - true if debug output should be printed

hyperparameterSelectionTipText

public java.lang.String hyperparameterSelectionTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getHyperparameterSelection

public SelectedTag getHyperparameterSelection()
Get the method used to select the hyperparameter

Returns:
the method used to select the hyperparameter

setHyperparameterSelection

public void setHyperparameterSelection(SelectedTag newMethod)
Set the method used to select the hyperparameter

Parameters:
newMethod - the method used to set the hyperparameter

priorClassTipText

public java.lang.String priorClassTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setPriorClass

public void setPriorClass(SelectedTag newMethod)
Set the type of prior to use.

Parameters:
newMethod - the type of prior to use.

getPriorClass

public SelectedTag getPriorClass()
Get the type of prior to use.

Returns:
the type of prior to use

thresholdTipText

public java.lang.String thresholdTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getThreshold

public double getThreshold()
Return the threshold being used.

Returns:
the threshold

setThreshold

public void setThreshold(double threshold)
Set the threshold to use.

Parameters:
threshold - the threshold to use

toleranceTipText

public java.lang.String toleranceTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getTolerance

public double getTolerance()
Get the tolerance value

Returns:
the tolerance value

setTolerance

public void setTolerance(double tolerance)
Set the tolerance value

Parameters:
tolerance - the tolerance value to use

hyperparameterValueTipText

public java.lang.String hyperparameterValueTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getHyperparameterValue

public double getHyperparameterValue()
Get the hyperparameter value. Used when the hyperparameter selection method is set to specific value

Returns:
the hyperparameter value

setHyperparameterValue

public void setHyperparameterValue(double hyperparameterValue)
Set the hyperparameter value. Used when the hyperparameter selection method is set to specific value

Parameters:
hyperparameterValue - the value of the hyperparameter

numFoldsTipText

public java.lang.String numFoldsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getNumFolds

public int getNumFolds()
Return the number of folds for CV-based hyperparameter selection

Returns:
the number of CV folds

setNumFolds

public void setNumFolds(int numFolds)
Set the number of folds to use for CV-based hyperparameter selection

Parameters:
numFolds - number of folds to select

maxIterationsTipText

public java.lang.String maxIterationsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getMaxIterations

public int getMaxIterations()
Get the maximum number of iterations to perform

Returns:
the maximum number of iterations

setMaxIterations

public void setMaxIterations(int maxIterations)
Set the maximum number of iterations to perform

Parameters:
maxIterations - maximum number of iterations

normalizeDataTipText

public java.lang.String normalizeDataTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

isNormalizeData

public boolean isNormalizeData()
Returns true if the data is to be normalized first

Returns:
true if the data is to be normalized

setNormalizeData

public void setNormalizeData(boolean normalizeData)
Set whether to normalize the data or not

Parameters:
normalizeData - true if data is to be normalized

hyperparameterRangeTipText

public java.lang.String hyperparameterRangeTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getHyperparameterRange

public java.lang.String getHyperparameterRange()
Get the range of hyperparameter values to consider during CV-based selection.

Returns:
the range of hyperparameters as a Stringe

setHyperparameterRange

public void setHyperparameterRange(java.lang.String hyperparameterRange)
Set the range of hyperparameter values to consider during CV-based selection

Parameters:
hyperparameterRange - the range of hyperparameter values

isDebug

public boolean isDebug()
Returns true if debug is turned on.

Returns:
true if debug is turned on

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class Classifier
Returns:
the revision