public class SubspaceCluster extends ClusterGenerator
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Generates randomly distributed instances in the cluster.
-U <range> Generates uniformly distributed instances in the cluster.
-G <range> Generates gaussian distributed instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
Modifier and Type | Field and Description |
---|---|
static int |
CONTINUOUS
cluster subtype: continuous
|
static int |
GAUSSIAN
cluster type: gaussian
|
static int |
INTEGER
cluster subtype: integer
|
protected ClusterDefinition[] |
m_Clusters
cluster list
|
protected double[] |
m_globalMaxValue
store global max values
|
protected double[] |
m_globalMinValue
store global min values
|
protected double |
m_NoiseRate
noise rate in percent (option P, between 0 and 30)
|
protected int[] |
m_numValues
if nominal, store number of values
|
static Tag[] |
TAGS_CLUSTERSUBTYPE
the tags for the cluster types
|
static Tag[] |
TAGS_CLUSTERTYPE
the tags for the cluster types
|
static int |
TOTAL_UNIFORM
cluster type: total uniform
|
static int |
UNIFORM_RANDOM
cluster type: uniform/random
|
m_booleanCols, m_ClassFlag, m_nominalCols, m_NumAttributes
m_CreatingRelationName, m_DatasetFormat, m_Debug, m_DefaultOutput, m_NumExamplesAct, m_OptionBlacklist, m_Output, m_Random, m_RelationName, m_Seed
Constructor and Description |
---|
SubspaceCluster()
initializes the generator, sets the number of clusters to 0, since user
has to specify them explicitly
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
checkCoverage()
Checks, whether all attributes are covered by cluster definitions and
returns TRUE in that case.
|
String |
clusterDefinitionsTipText()
Returns the tip text for this property
|
protected double |
defaultNoiseRate()
returns the default noise rate
|
protected int |
defaultNumAttributes()
returns the default number of attributes
|
Instances |
defineDataFormat()
Initializes the format for the dataset produced.
|
Instance |
generateExample()
Generate an example of the dataset.
|
Instances |
generateExamples()
Generate all examples of the dataset.
|
String |
generateFinished()
Compiles documentation about the data generation after
the generation process
|
String |
generateStart()
Compiles documentation about the data generation before
the generation process
|
ClusterDefinition[] |
getClusterDefinitions()
returns the currently set clusters
|
protected ClusterDefinition[] |
getClusters()
returns the current cluster definitions, if necessary initializes them
|
double |
getNoiseRate()
Gets the percentage of noise set.
|
int[] |
getNumValues()
returns array that stores the number of values for a nominal attribute.
|
String[] |
getOptions()
Gets the current settings of the datagenerator.
|
String |
getRevision()
Returns the revision string.
|
boolean |
getSingleModeFlag()
Gets the single mode flag.
|
String |
globalInfo()
Returns a string describing this data generator.
|
boolean |
isBoolean(int index)
Returns true if attribute is boolean
|
boolean |
isNominal(int index)
Returns true if attribute is nominal
|
Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(String[] args)
Main method for testing this class.
|
String |
noiseRateTipText()
Returns the tip text for this property
|
String |
numAttributesTipText()
Returns the tip text for this property
|
void |
setClusterDefinitions(ClusterDefinition[] value)
sets the clusters to use
|
void |
setNoiseRate(double newNoiseRate)
Sets the percentage of noise set.
|
void |
setNumAttributes(int numAttributes)
Sets the number of attributes the dataset should have.
|
void |
setOptions(String[] options)
Parses a list of options for this object.
|
booleanColsTipText, checkIndices, classFlagTipText, getBooleanCols, getClassFlag, getNominalCols, getNumAttributes, nominalColsTipText, setBooleanCols, setBooleanIndices, setClassFlag, setNominalCols, setNominalIndices
addToBlacklist, clearBlacklist, debugTipText, defaultNumExamplesAct, defaultOutput, defaultRelationName, defaultSeed, enumToVector, formatTipText, getDatasetFormat, getDebug, getNumExamplesAct, getOutput, getRandom, getRelationName, getRelationNameToUse, getSeed, isOnBlacklist, makeData, makeOptionString, numExamplesActTipText, outputTipText, randomTipText, relationNameTipText, removeBlacklist, runDataGenerator, seedTipText, setDatasetFormat, setDebug, setNumExamplesAct, setOutput, setRandom, setRelationName, setSeed, toStringFormat
protected double m_NoiseRate
protected ClusterDefinition[] m_Clusters
protected int[] m_numValues
protected double[] m_globalMinValue
protected double[] m_globalMaxValue
public static final int UNIFORM_RANDOM
public static final int TOTAL_UNIFORM
public static final int GAUSSIAN
public static final Tag[] TAGS_CLUSTERTYPE
public static final int CONTINUOUS
public static final int INTEGER
public static final Tag[] TAGS_CLUSTERSUBTYPE
public SubspaceCluster()
public String globalInfo()
public Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class ClusterGenerator
public void setOptions(String[] options) throws Exception
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Generates randomly distributed instances in the cluster.
-U <range> Generates uniformly distributed instances in the cluster.
-G <range> Generates gaussian distributed instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
setOptions
in interface OptionHandler
setOptions
in class ClusterGenerator
options
- the list of options as an array of stringsException
- if an option is not supportedpublic String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class ClusterGenerator
DataGenerator.removeBlacklist(String[])
protected ClusterDefinition[] getClusters()
protected int defaultNumAttributes()
defaultNumAttributes
in class ClusterGenerator
public void setNumAttributes(int numAttributes)
setNumAttributes
in class ClusterGenerator
numAttributes
- the new number of attributespublic String numAttributesTipText()
numAttributesTipText
in class ClusterGenerator
protected double defaultNoiseRate()
public double getNoiseRate()
public void setNoiseRate(double newNoiseRate)
newNoiseRate
- new percentage of noisepublic String noiseRateTipText()
public ClusterDefinition[] getClusterDefinitions()
public void setClusterDefinitions(ClusterDefinition[] value) throws Exception
value
- the clusters do useException
- if clusters are not the correct classpublic String clusterDefinitionsTipText()
protected boolean checkCoverage()
public boolean getSingleModeFlag()
getSingleModeFlag
in class DataGenerator
public Instances defineDataFormat() throws Exception
defineDataFormat
in class DataGenerator
Exception
- data format could not be definedDataGenerator.defaultRelationName()
public boolean isBoolean(int index)
index
- of the attributepublic boolean isNominal(int index)
index
- of the attributepublic int[] getNumValues()
public Instance generateExample() throws Exception
generateExample
in class DataGenerator
Exception
- if format not defined or generating public Instances generateExamples() throws Exception
generateExamples
in class DataGenerator
Exception
- if format not definedpublic String generateFinished() throws Exception
generateFinished
in class DataGenerator
Exception
- no input structure has been definedpublic String generateStart()
generateStart
in class DataGenerator
public String getRevision()
public static void main(String[] args)
args
- should contain arguments for the data producer:Copyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.