public class ADChunkSampleStream extends Object implements ObjectStream<ChunkSample>
The heuristic to extract chunks where based o paper 'A Machine Learning
Approach to Portuguese Clause Identification', (Eraldo Fernandes, Cicero
Santos and Ruy Milidiú).
Data can be found on this web site:
http://www.linguateca.pt/floresta/corpus.html
Information about the format:
Susana Afonso.
"Árvores deitadas: Descrição do formato e das opções de análise na Floresta Sintáctica"
.
12 de Fevereiro de 2006.
http://www.linguateca.pt/documentos/Afonso2006ArvoresDeitadas.pdf
Detailed info about the NER tagset: http://beta.visl.sdu.dk/visl/pt/info/portsymbol.html#semtags_names
Note: Do not use this class, internal use only!
Modifier and Type | Field and Description |
---|---|
protected ObjectStream<ADSentenceStream.Sentence> |
adSentenceStream |
static String |
OTHER |
Constructor and Description |
---|
ADChunkSampleStream(InputStream in,
String charsetName)
Creates a new
NameSample stream from a InputStream |
ADChunkSampleStream(ObjectStream<String> lineStream)
Creates a new
NameSample stream from a line stream, i.e. |
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the
ObjectStream and releases all allocated
resources. |
static String |
convertFuncTag(String t,
boolean useCGTags) |
protected String |
getChunkTag(ADSentenceStream.SentenceParser.Leaf leaf) |
protected String |
getChunkTag(ADSentenceStream.SentenceParser.Node node) |
protected String |
getPhraseTagFromPosTag(String functionalTag) |
protected boolean |
isIncludePunctuations() |
protected boolean |
isIntermediate(List<String> tags,
List<String> target,
String phraseTag) |
protected void |
processLeaf(ADSentenceStream.SentenceParser.Leaf leaf,
boolean isIntermediate,
String phraseTag,
List<String> sentence,
List<String> tags,
List<String> target) |
protected void |
processRoot(ADSentenceStream.SentenceParser.Node root,
List<String> sentence,
List<String> tags,
List<String> target) |
ChunkSample |
read()
Returns the next object.
|
void |
reset()
Repositions the stream at the beginning and the previously seen object sequence
will be repeated exactly.
|
void |
setEnd(int aEnd) |
void |
setStart(int aStart) |
protected final ObjectStream<ADSentenceStream.Sentence> adSentenceStream
public static final String OTHER
public ADChunkSampleStream(ObjectStream<String> lineStream)
NameSample
stream from a line stream, i.e.
ObjectStream
< String
>, that could be a
PlainTextByLineStream
object.lineStream
- a stream of lines as String
public ADChunkSampleStream(InputStream in, String charsetName)
NameSample
stream from a InputStream
in
- the Corpus InputStream
charsetName
- the charset of the Arvores Deitadas Corpuspublic ChunkSample read() throws IOException
ObjectStream
read
in interface ObjectStream<ChunkSample>
IOException
protected void processRoot(ADSentenceStream.SentenceParser.Node root, List<String> sentence, List<String> tags, List<String> target)
protected void processLeaf(ADSentenceStream.SentenceParser.Leaf leaf, boolean isIntermediate, String phraseTag, List<String> sentence, List<String> tags, List<String> target)
protected String getChunkTag(ADSentenceStream.SentenceParser.Leaf leaf)
protected String getChunkTag(ADSentenceStream.SentenceParser.Node node)
public void setStart(int aStart)
public void setEnd(int aEnd)
public void reset() throws IOException, UnsupportedOperationException
ObjectStream
reset
in interface ObjectStream<ChunkSample>
IOException
UnsupportedOperationException
public void close() throws IOException
ObjectStream
ObjectStream
and releases all allocated
resources. After close was called its not allowed to call
read or reset.close
in interface ObjectStream<ChunkSample>
IOException
protected boolean isIncludePunctuations()
Copyright © 2017 The Apache Software Foundation. All rights reserved.