public static class ArffLoader.ArffReader extends Object implements RevisionHandler
BufferedReader reader = new BufferedReader(new FileReader("/some/where/file.arff")); ArffReader arff = new ArffReader(reader); Instances data = arff.getData(); data.setClassIndex(data.numAttributes() - 1);Typical code for incremental usage:
BufferedReader reader = new BufferedReader(new FileReader("/some/where/file.arff")); ArffReader arff = new ArffReader(reader, 1000); Instances data = arff.getStructure(); data.setClassIndex(data.numAttributes() - 1); Instance inst; while ((inst = arff.readInstance(data)) != null) { data.add(inst); }
Modifier and Type | Field and Description |
---|---|
protected Instances |
m_Data
the actual data
|
protected int[] |
m_IndicesBuffer
Buffer of indices for sparse instance
|
protected int |
m_Lines
the number of lines read so far
|
protected StreamTokenizer |
m_Tokenizer
the tokenizer for reading the stream
|
protected double[] |
m_ValueBuffer
Buffer of values for sparse instance
|
Constructor and Description |
---|
ArffReader(Reader reader)
Reads the data completely from the reader.
|
ArffReader(Reader reader,
Instances template,
int lines)
Reads the data without header according to the specified template.
|
ArffReader(Reader reader,
Instances template,
int lines,
int capacity)
Initializes the reader without reading the header according to the
specified template.
|
ArffReader(Reader reader,
int capacity)
Reads only the header and reserves the specified space for instances.
|
Modifier and Type | Method and Description |
---|---|
protected void |
compactify()
compactifies the data
|
protected void |
errorMessage(String msg)
Throws error message with line number and last token read.
|
Instances |
getData()
Returns the data that was read
|
protected void |
getFirstToken()
Gets next token, skipping empty lines.
|
protected void |
getIndex()
Gets index, checking for a premature and of line.
|
protected Instance |
getInstance(Instances structure,
boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected Instance |
getInstanceFull(boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected Instance |
getInstanceSparse(boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected double |
getInstanceWeight()
Gets the value of an instance's weight (if one exists)
|
protected void |
getLastToken(boolean endOfFileOk)
Gets token and checks if its end of line.
|
int |
getLineNo()
returns the current line number
|
protected void |
getNextToken()
Gets next token, checking for a premature and of line.
|
String |
getRevision()
Returns the revision string.
|
Instances |
getStructure()
Returns the header format
|
protected void |
initBuffers()
initializes the buffers for sparse instances to be read
|
protected void |
initTokenizer()
Initializes the StreamTokenizer used for reading the ARFF file.
|
protected FastVector |
parseAttribute(FastVector attributes)
Parses the attribute declaration.
|
protected void |
readHeader(int capacity)
Reads and stores header of an ARFF file.
|
Instance |
readInstance(Instances structure)
Reads a single instance using the tokenizer and returns it.
|
Instance |
readInstance(Instances structure,
boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected void |
readTillEOL()
Reads and skips all tokens before next end of line token.
|
protected StreamTokenizer m_Tokenizer
protected double[] m_ValueBuffer
protected int[] m_IndicesBuffer
protected Instances m_Data
protected int m_Lines
public ArffReader(Reader reader) throws IOException
getData()
method.reader
- the reader to useIOException
- if something goes wronggetData()
public ArffReader(Reader reader, int capacity) throws IOException
readInstance()
.reader
- the reader to usecapacity
- the capacity of the new datasetIOException
- if something goes wrongIllegalArgumentException
- if capacity is negativegetStructure()
,
readInstance(Instances)
public ArffReader(Reader reader, Instances template, int lines) throws IOException
getData()
method.reader
- the reader to usetemplate
- the template headerlines
- the lines read so farIOException
- if something goes wronggetData()
public ArffReader(Reader reader, Instances template, int lines, int capacity) throws IOException
readInstance()
method.reader
- the reader to usetemplate
- the template headerlines
- the lines read so farcapacity
- the capacity of the new datasetIOException
- if something goes wronggetData()
protected void initBuffers()
m_ValueBuffer
,
m_IndicesBuffer
protected void compactify()
protected void errorMessage(String msg) throws IOException
msg
- the error message to be thrownIOException
- containing the error messagepublic int getLineNo()
protected void getFirstToken() throws IOException
IOException
- if reading the next token failsprotected void getIndex() throws IOException
IOException
- if it finds a premature end of lineprotected void getLastToken(boolean endOfFileOk) throws IOException
endOfFileOk
- whether EOF is OKIOException
- if it doesn't find an end of lineprotected double getInstanceWeight() throws IOException
IOException
protected void getNextToken() throws IOException
IOException
- if it finds a premature end of lineprotected void initTokenizer()
public Instance readInstance(Instances structure) throws IOException
structure
- the dataset header information, will get updated
in case of string or relational attributesIOException
- if the information is not read
successfullypublic Instance readInstance(Instances structure, boolean flag) throws IOException
structure
- the dataset header information, will get updated
in case of string or relational attributesflag
- if method should test for carriage return after
each instanceIOException
- if the information is not read
successfullyprotected Instance getInstance(Instances structure, boolean flag) throws IOException
structure
- the dataset header information, will get updated
in case of string or relational attributesflag
- if method should test for carriage return after
each instanceIOException
- if the information is not read
successfullyprotected Instance getInstanceSparse(boolean flag) throws IOException
flag
- if method should test for carriage return after
each instanceIOException
- if the information is not read
successfullyprotected Instance getInstanceFull(boolean flag) throws IOException
flag
- if method should test for carriage return after
each instanceIOException
- if the information is not read
successfullyprotected void readHeader(int capacity) throws IOException
capacity
- the number of instances to reserve in the data
structureIOException
- if the information is not read
successfullyprotected FastVector parseAttribute(FastVector attributes) throws IOException
attributes
- the current attributes vectorIOException
- if the information is not read
successfullyprotected void readTillEOL() throws IOException
IOException
- in case something goes wrongpublic Instances getStructure()
public Instances getData()
public String getRevision()
getRevision
in interface RevisionHandler
Copyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.