public class InterquartileRange extends SimpleBatchFilter
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)Thanks to Dale for a few brainstorming sessions.
Modifier and Type | Field and Description |
---|---|
protected int[] |
m_AttributeIndices
the generated indices (only for performance reasons)
|
protected Range |
m_Attributes
the attribute range to work on
|
protected boolean |
m_DetectionPerAttribute
whether to generate Outlier/ExtremeValue attributes for each attribute
instead of a general one
|
protected boolean |
m_ExtremeValuesAsOutliers
whether extreme values are also tagged as outliers
|
protected double |
m_ExtremeValuesFactor
the factor for detecting extreme values, by default 2*m_OutlierFactor
|
protected double[] |
m_IQR
the interquartile range
|
protected double[] |
m_LowerExtremeValue
the lower extreme value threshold (= Q1 - EVF*IQR)
|
protected double[] |
m_LowerOutlier
the lower outlier threshold (= Q1 - OF*IQR)
|
protected double[] |
m_Median
the median
|
protected int[] |
m_OutlierAttributePosition
the position of the outlier attribute
|
protected double |
m_OutlierFactor
the factor for detecting outliers
|
protected boolean |
m_OutputOffsetMultiplier
whether to add another attribute called "Offset", that lists the
'multiplier' by which the outlier/extreme value is away from the median,
i.e., value = median + 'multiplier' * IQR
automatically enables m_DetectionPerAttribute! |
protected double[] |
m_UpperExtremeValue
the upper extreme value threshold (= Q3 + EVF*IQR)
|
protected double[] |
m_UpperOutlier
the upper outlier threshold (= Q3 + OF*IQR)
|
static int |
NON_NUMERIC
indicator for non-numeric attributes
|
m_Debug
m_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts
Constructor and Description |
---|
InterquartileRange() |
Modifier and Type | Method and Description |
---|---|
String |
attributeIndicesTipText()
Returns the tip text for this property
|
protected double |
calculateMultiplier(Instance inst,
int index)
returns the mulitplier of the IQR the instance is off the median for this
particular attribute.
|
protected void |
computeThresholds(Instances instances)
computes the thresholds for outliers and extreme values
|
String |
detectionPerAttributeTipText()
Returns the tip text for this property
|
protected Instances |
determineOutputFormat(Instances inputFormat)
Determines the output format based on the input format and returns
this.
|
String |
extremeValuesAsOutliersTipText()
Returns the tip text for this property
|
String |
extremeValuesFactorTipText()
Returns the tip text for this property
|
String |
getAttributeIndices()
Gets the current range selection
|
Capabilities |
getCapabilities()
Returns the Capabilities of this filter.
|
boolean |
getDetectionPerAttribute()
Gets whether an Outlier/ExtremeValue attribute pair is generated for
each numeric attribute ("true") or just one pair for all numeric
attributes together ("false").
|
boolean |
getExtremeValuesAsOutliers()
Get whether extreme values are also tagged as outliers.
|
double |
getExtremeValuesFactor()
Gets the factor for determining the thresholds for extreme values.
|
String[] |
getOptions()
Gets the current settings of the filter.
|
double |
getOutlierFactor()
Gets the factor for determining the thresholds for outliers.
|
boolean |
getOutputOffsetMultiplier()
Gets whether an additional attribute "Offset" is generated per
Outlier/ExtremeValue attribute pair that lists the multiplier the value
is off the median: value = median + 'multiplier' * IQR.
|
String |
getRevision()
Returns the revision string.
|
String |
globalInfo()
Returns a string describing this filter
|
protected boolean |
isExtremeValue(Instance inst)
returns whether the instance is an extreme value or not
|
protected boolean |
isExtremeValue(Instance inst,
int index)
returns whether the instance has an extreme value in the specified
attribute or not
|
protected boolean |
isOutlier(Instance inst)
returns whether the instance is an outlier or not
|
protected boolean |
isOutlier(Instance inst,
int index)
returns whether the instance has an outlier in the specified attribute
or not
|
Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(String[] args)
Main method for testing this class.
|
String |
outlierFactorTipText()
Returns the tip text for this property
|
String |
outputOffsetMultiplierTipText()
Returns the tip text for this property
|
protected Instances |
process(Instances instances)
Processes the given data (may change the provided dataset) and returns
the modified version.
|
void |
setAttributeIndices(String value)
Sets which attributes are to be used for interquartile calculations and
outlier/extreme value detection (only numeric attributes among the
selection will be used).
|
void |
setAttributeIndicesArray(int[] value)
Sets which attributes are to be used for interquartile calculations and
outlier/extreme value detection (only numeric attributes among the
selection will be used).
|
void |
setDetectionPerAttribute(boolean value)
Set whether an Outlier/ExtremeValue attribute pair is generated for
each numeric attribute ("true") or just one pair for all numeric
attributes together ("false").
|
void |
setExtremeValuesAsOutliers(boolean value)
Set whether extreme values are also tagged as outliers.
|
void |
setExtremeValuesFactor(double value)
Sets the factor for determining the thresholds for extreme values.
|
void |
setOptions(String[] options)
Parses a list of options for this object.
|
void |
setOutlierFactor(double value)
Sets the factor for determining the thresholds for outliers.
|
void |
setOutputOffsetMultiplier(boolean value)
Set whether an additional attribute "Offset" is generated per
Outlier/ExtremeValue attribute pair that lists the multiplier the value
is off the median: value = median + 'multiplier' * IQR.
|
batchFinished, hasImmediateOutputFormat, input
debugTipText, getDebug, reset, setDebug, setInputFormat
batchFilterFile, bufferInput, copyValues, copyValues, filterFile, flushInput, getCapabilities, getInputFormat, getOutputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputFormatPeek, outputPeek, push, resetQueue, runFilter, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapper
public static final int NON_NUMERIC
protected Range m_Attributes
protected int[] m_AttributeIndices
protected double m_OutlierFactor
protected double m_ExtremeValuesFactor
protected boolean m_ExtremeValuesAsOutliers
protected double[] m_UpperExtremeValue
protected double[] m_UpperOutlier
protected double[] m_LowerOutlier
protected double[] m_IQR
protected double[] m_Median
protected double[] m_LowerExtremeValue
protected boolean m_DetectionPerAttribute
protected int[] m_OutlierAttributePosition
protected boolean m_OutputOffsetMultiplier
public String globalInfo()
globalInfo
in class SimpleFilter
public Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class SimpleFilter
public void setOptions(String[] options) throws Exception
-D Turns on output of debugging information.
-R <col1,col2-col4,...> Specifies list of columns to base outlier/extreme value detection on. If an instance is considered in at least one of those attributes an outlier/extreme value, it is tagged accordingly. 'first' and 'last' are valid indexes. (default none)
-O <num> The factor for outlier detection. (default: 3)
-E <num> The factor for extreme values detection. (default: 2*Outlier Factor)
-E-as-O Tags extreme values also as outliers. (default: off)
-P Generates Outlier/ExtremeValue pair for each numeric attribute in the range, not just a single indicator pair for all the attributes. (default: off)
-M Generates an additional attribute 'Offset' per Outlier/ExtremeValue pair that contains the multiplier that the value is off the median. value = median + 'multiplier' * IQR Note: implicitely sets '-P'. (default: off)
setOptions
in interface OptionHandler
setOptions
in class SimpleFilter
options
- the list of options as an array of stringsException
- if an option is not supportedSimpleFilter.reset()
public String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class SimpleFilter
public String attributeIndicesTipText()
public String getAttributeIndices()
public void setAttributeIndices(String value)
value
- a string representing the list of attributes. Since
the string will typically come from a user, attributes
are indexed from 1. IllegalArgumentException
- if an invalid range list is suppliedpublic void setAttributeIndicesArray(int[] value)
value
- an array containing indexes of attributes to work on.
Since the array will typically come from a program,
attributes are indexed from 0.IllegalArgumentException
- if an invalid set of ranges is suppliedpublic String outlierFactorTipText()
public void setOutlierFactor(double value)
value
- the factor.public double getOutlierFactor()
public String extremeValuesFactorTipText()
public void setExtremeValuesFactor(double value)
value
- the factor.public double getExtremeValuesFactor()
public String extremeValuesAsOutliersTipText()
public void setExtremeValuesAsOutliers(boolean value)
value
- whether or not to tag extreme values also as outliers.public boolean getExtremeValuesAsOutliers()
public String detectionPerAttributeTipText()
public void setDetectionPerAttribute(boolean value)
value
- whether or not to generate indicator attribute pairs
for each numeric attribute.public boolean getDetectionPerAttribute()
public String outputOffsetMultiplierTipText()
public void setOutputOffsetMultiplier(boolean value)
value
- whether or not to generate the additional attribute.public boolean getOutputOffsetMultiplier()
public Capabilities getCapabilities()
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class Filter
Capabilities
protected Instances determineOutputFormat(Instances inputFormat) throws Exception
determineOutputFormat
in class SimpleFilter
inputFormat
- the input format to base the output format onException
- in case the determination goes wrongSimpleBatchFilter.hasImmediateOutputFormat()
,
SimpleBatchFilter.batchFinished()
protected void computeThresholds(Instances instances)
instances
- the data to work onprotected boolean isOutlier(Instance inst, int index)
inst
- the instance to testindex
- the attribute indexprotected boolean isOutlier(Instance inst)
inst
- the instance to testprotected boolean isExtremeValue(Instance inst, int index)
inst
- the instance to testindex
- the attribute indexprotected boolean isExtremeValue(Instance inst)
inst
- the instance to testprotected double calculateMultiplier(Instance inst, int index)
inst
- the instance to testindex
- the attribute indexprotected Instances process(Instances instances) throws Exception
process
in class SimpleFilter
instances
- the data to processException
- in case the processing goes wrongSimpleBatchFilter.batchFinished()
public String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class Filter
public static void main(String[] args)
args
- should contain arguments to the filter: use -h for helpCopyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.