public class RandomProjection extends Filter implements UnsupervisedFilter, OptionHandler, TechnicalInformationHandler
@inproceedings{Fradkin003, address = {New York, NY, USA}, author = {Dmitriy Fradkin and David Madigan}, booktitle = {KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining}, pages = {517-522}, publisher = {ACM Press}, title = {Experiments with random projections for machine learning}, year = {003} }Valid options are:
-N <number> The number of dimensions (attributes) the data should be reduced to (default 10; exclusive of the class attribute, if it is set).
-D [SPARSE1|SPARSE2|GAUSSIAN] The distribution to use for calculating the random matrix. Sparse1 is: sqrt(3)*{-1 with prob(1/6), 0 with prob(2/3), +1 with prob(1/6)} Sparse2 is: {-1 with prob(1/2), +1 with prob(1/2)}
-P <percent> The percentage of dimensions (attributes) the data should be reduced to (exclusive of the class attribute, if it is set). This -N option is ignored if this option is present or is greater than zero.
-M Replace missing values using the ReplaceMissingValues filter
-R <num> The random seed for the random number generator used for calculating the random matrix (default 42).
Modifier and Type | Field and Description |
---|---|
static int |
GAUSSIAN
distribution type: gaussian
|
protected int |
m_distribution
Stores the distribution to use for calculating the
random matrix
|
protected int |
m_k
Stores the number of dimensions to reduce the data to
|
protected Filter |
m_ntob
The NominalToBinary filter applied to the data before this filter
|
protected boolean |
m_OutputFormatDefined
Keeps track of output format if it is defined or not
|
protected double |
m_percent
Stores the dimensionality the data should be reduced to as percentage of the original dimension
|
protected Random |
m_random
The random number generator used for generating the random matrix
|
protected Filter |
m_replaceMissing
The ReplaceMissingValues filter
|
protected double[][] |
m_rmatrix
The random matrix
|
protected long |
m_rndmSeed
Stores the random seed used to generate the random matrix
|
protected boolean |
m_useGaussian
Is the random matrix will be computed using
Gaussian distribution or not
|
protected boolean |
m_useReplaceMissing
Should the missing values be replaced using
unsupervised.ReplaceMissingValues filter
|
static int |
SPARSE1
distribution type: sparse 1
|
static int |
SPARSE2
distribution type: sparse 2
|
static Tag[] |
TAGS_DSTRS_TYPE
The types of distributions that can be used for
calculating the random matrix
|
m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
Constructor and Description |
---|
RandomProjection() |
Modifier and Type | Method and Description |
---|---|
boolean |
batchFinished()
Signify that this batch of input to the filter is finished.
|
protected double |
computeRandomProjection(int rpIndex,
int classIndex,
Instance instance)
computes one random projection for a given instance (skip missing values)
|
protected Instance |
convertInstance(Instance currentInstance)
converts a single instance to the required format
|
String |
distributionTipText()
Returns the tip text for this property
|
Capabilities |
getCapabilities()
Returns the Capabilities of this filter.
|
SelectedTag |
getDistribution()
Returns the current distribution that'll be used for calculating the
random matrix
|
int |
getNumberOfAttributes()
Gets the current number of attributes (dimensionality) to which the data
will be reduced to.
|
String[] |
getOptions()
Gets the current settings of the filter.
|
double |
getPercent()
Gets the percent the attributes (dimensions) of the data will be reduced to
|
long |
getRandomSeed()
Gets the random seed of the random number generator
|
boolean |
getReplaceMissingValues()
Gets the current setting for using ReplaceMissingValues filter
|
String |
getRevision()
Returns the revision string.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
|
String |
globalInfo()
Returns a string describing this filter
|
boolean |
input(Instance instance)
Input an instance for filtering.
|
Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(String[] argv)
Main method for testing this class.
|
String |
numberOfAttributesTipText()
Returns the tip text for this property
|
String |
percentTipText()
Returns the tip text for this property
|
String |
randomSeedTipText()
Returns the tip text for this property
|
String |
replaceMissingValuesTipText()
Returns the tip text for this property
|
protected double |
rndmNum(boolean useDstrWithZero)
returns a double x such that
x = sqrt(3) * { -1 with prob. |
void |
setDistribution(SelectedTag newDstr)
Sets the distribution to use for calculating the random matrix
|
boolean |
setInputFormat(Instances instanceInfo)
Sets the format of the input instances.
|
void |
setNumberOfAttributes(int newAttNum)
Sets the number of attributes (dimensions) the data should be reduced to
|
void |
setOptions(String[] options)
Parses a given list of options.
|
protected void |
setOutputFormat()
Sets the output format
|
void |
setPercent(double newPercent)
Sets the percent the attributes (dimensions) of the data should be reduced to
|
void |
setRandomSeed(long seed)
Sets the random seed of the random number generator
|
void |
setReplaceMissingValues(boolean t)
Sets either to use replace missing values filter or not
|
protected int |
weightedDistribution(int[] weights)
Calculates a weighted distribution
|
batchFilterFile, bufferInput, copyValues, copyValues, filterFile, flushInput, getCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputFormatPeek, outputPeek, push, resetQueue, runFilter, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
protected int m_k
protected double m_percent
protected boolean m_useGaussian
public static final int SPARSE1
public static final int SPARSE2
public static final int GAUSSIAN
public static final Tag[] TAGS_DSTRS_TYPE
protected int m_distribution
protected boolean m_useReplaceMissing
protected boolean m_OutputFormatDefined
protected Filter m_ntob
protected Filter m_replaceMissing
protected long m_rndmSeed
protected double[][] m_rmatrix
protected Random m_random
public Enumeration listOptions()
listOptions
in interface OptionHandler
public void setOptions(String[] options) throws Exception
-N <number> The number of dimensions (attributes) the data should be reduced to (default 10; exclusive of the class attribute, if it is set).
-D [SPARSE1|SPARSE2|GAUSSIAN] The distribution to use for calculating the random matrix. Sparse1 is: sqrt(3)*{-1 with prob(1/6), 0 with prob(2/3), +1 with prob(1/6)} Sparse2 is: {-1 with prob(1/2), +1 with prob(1/2)}
-P <percent> The percentage of dimensions (attributes) the data should be reduced to (exclusive of the class attribute, if it is set). This -N option is ignored if this option is present or is greater than zero.
-M Replace missing values using the ReplaceMissingValues filter
-R <num> The random seed for the random number generator used for calculating the random matrix (default 42).
setOptions
in interface OptionHandler
options
- the list of options as an array of stringsException
- if an option is not supportedpublic String[] getOptions()
getOptions
in interface OptionHandler
public String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface TechnicalInformationHandler
public String numberOfAttributesTipText()
public void setNumberOfAttributes(int newAttNum)
newAttNum
- the goal for the dimensionspublic int getNumberOfAttributes()
public String percentTipText()
public void setPercent(double newPercent)
newPercent
- the percentage of attributespublic double getPercent()
public String randomSeedTipText()
public void setRandomSeed(long seed)
seed
- the random seed valuepublic long getRandomSeed()
public String distributionTipText()
public void setDistribution(SelectedTag newDstr)
newDstr
- the distribution to usepublic SelectedTag getDistribution()
public String replaceMissingValuesTipText()
public void setReplaceMissingValues(boolean t)
t
- if true then the replace missing values is usedpublic boolean getReplaceMissingValues()
public Capabilities getCapabilities()
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class Filter
Capabilities
public boolean setInputFormat(Instances instanceInfo) throws Exception
setInputFormat
in class Filter
instanceInfo
- an Instances object containing the input
instance structure (any instances contained in the object are
ignored - only the structure is required).Exception
- if the input format can't be set
successfullypublic boolean input(Instance instance) throws Exception
input
in class Filter
instance
- the input instanceIllegalStateException
- if no input format has been setNullPointerException
- if the input format has not been
defined.Exception
- if the input instance was not of the correct
format or if there was a problem with the filtering.public boolean batchFinished() throws Exception
batchFinished
in class Filter
NullPointerException
- if no input structure has been defined,Exception
- if there was a problem finishing the batch.protected void setOutputFormat()
protected Instance convertInstance(Instance currentInstance)
currentInstance
- the instance to convertprotected double computeRandomProjection(int rpIndex, int classIndex, Instance instance)
rpIndex
- offset the new random projection attributeclassIndex
- classIndex of the input instanceinstance
- the instance to convertprotected double rndmNum(boolean useDstrWithZero)
useDstrWithZero
- protected int weightedDistribution(int[] weights)
weights
- the weights to usepublic String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class Filter
public static void main(String[] argv)
argv
- should contain arguments to the filter:
use -h for helpCopyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.