XStringSet-io {Biostrings} | R Documentation |
Functions to read/write an XStringSet or XStringViews object from/to a file.
## Read FASTA (or FASTQ) files in an XStringSet object: read.BStringSet(filepath, format="fasta") read.DNAStringSet(filepath, format="fasta") read.RNAStringSet(filepath, format="fasta") read.AAStringSet(filepath, format="fasta") ## Extract basic information about FASTA (or FASTQ) files ## without loading them: fasta.info(filepath, use.descs=TRUE) fastq.geometry(filepath) ## Write an XStringSet object to a FASTA (or FASTQ) file: write.XStringSet(x, file="", append=FALSE, format="fasta", width=80) ## Serialize an XStringSet object: save.XStringSet(x, objname, dirpath=".", save.dups=FALSE, verbose=TRUE) ## Some legacy stuff: read.XStringViews(filepath, format="fasta", subjectClass, collapse="") write.XStringViews(x, file="", append=FALSE, format="fasta", width=80) FASTArecordsToCharacter(FASTArecs, use.names=TRUE) CharacterToFASTArecords(x) FASTArecordsToXStringViews(FASTArecs, subjectClass, collapse="") XStringSetToFASTArecords(x)
filepath |
A character vector containing the paths to the input files. |
format |
Either "fasta" (the default) or "fastq" .
Note that write.XStringSet and write.XStringViews only
support "fasta" for now.
|
use.descs |
Should the returned vector be named with the description lines found in the FASTA records? |
x |
For write.XStringSet and write.XStringViews , the object to
write to file .
For CharacterToFASTArecords , the (possibly named) character
vector to be converted to a list of FASTA records as one returned
by readFASTA .
For XStringSetToFASTArecords , the XStringSet object
to be converted to a list of FASTA records as one returned
by readFASTA .
|
file |
A connection, or a character string naming the file to write
to. If "" (the default), print to the standard output
connection (generally the console) unless redirected by sink .
|
append |
TRUE or FALSE . If TRUE output will be
appended to file ; otherwise, it will overwrite the contents
of file . See ?cat for the details.
|
width |
Only relevant if format is "fasta" .
The maximum number of letters per line of sequence.
|
objname |
The name of the serialized object. |
dirpath |
The path to the directory where to save the serialized object. |
save.dups |
TRUE or FALSE .
If TRUE then the Dups
object describing
how duplicated elements in x are related to each other is
saved too. For advanced users only.
|
verbose |
TRUE or FALSE .
|
subjectClass |
The class to be given to the subject of the XStringViews object
created and returned by the function.
Must be the name of one of the direct XString subclasses i.e.
"BString" , "DNAString" , "RNAString"
or "AAString" .
|
collapse |
An optional character string to be inserted between the views of the XStringViews object created and returned by the function. |
FASTArecs |
A list of FASTA records as one returned by readFASTA .
|
use.names |
Whether or not the description line preceding each FASTA records should be used to set the names of the returned object. |
Only FASTA and FASTQ files are supported for now. The identifiers and qualities stored in the FASTQ records are ignored (only the sequences are returned).
Reading functions read.BStringSet
, read.DNAStringSet
,
read.RNAStringSet
, read.AAStringSet
and read.XStringViews
load sequences from an input file (or set of input files) into an
XStringSet or XStringViews object. (Note that for now
read.XStringViews
can only read 1 FASTA file at a time but
this will be addressed ASAP).
When multiple input files are specified, they are read in the corresponding
order and their data are stored in the returned object in that order.
Note that when multiple input FASTQ files are specified, they must all
have the same "width" (i.e. all their sequences must have the same length).
The fasta.info
utility returns an integer vector with one element
per FASTA record in the input files. Each element is the length of the
sequence found in the corresponding record.
If use.descs
is TRUE
(the default) then the returned
vector is named with the description lines found in the FASTA records.
The fastq.geometry
utility returns an integer vector describing
the "geometry" of the FASTQ files i.e. a vector of length 2 where the
first element is the total number of FASTQ records in the files and
the second element the common "width" of these files (this width is
NA
if the files contain no FASTQ records or records with
different "widths").
Writing functions write.XStringSet
and write.XStringViews
write an XStringSet or XStringViews object to a file or
connection. They only support the FASTA format for now.
Serializing an XStringSet object with save.XStringSet
is equivalent to using the standard save
mechanism. But it will
try to reduce the size of x
in memory first before calling
save
. Most of the times this leads to a much reduced size on disk.
FASTArecordsToCharacter
, CharacterToFASTArecords
,
FASTArecordsToXStringViews
and XStringSetToFASTArecords
are helper functions used internally by write.XStringSet
and
read.XStringViews
for switching between different
representations of the same object.
readFASTA
,
writeFASTA
,
XStringSet-class,
XStringViews-class,
BString-class,
DNAString-class,
RNAString-class,
AAString-class
## --------------------------------------------------------------------- ## A. READ/WRITE FASTA FILES ## --------------------------------------------------------------------- filepath <- system.file("extdata", "someORF.fa", package="Biostrings") fasta.info(filepath) x <- read.DNAStringSet(filepath) x write.XStringSet(x) # writes to the console ## --------------------------------------------------------------------- ## B. READ FASTQ FILES ## --------------------------------------------------------------------- filepath <- system.file("extdata", "s_1_sequence.txt", package="Biostrings") fastq.geometry(filepath) ## Only the FASTQ sequences are returned (identifiers and qualities ## are dropped): read.DNAStringSet(filepath, format="fastq") ## --------------------------------------------------------------------- ## C. SERIALIZATION ## --------------------------------------------------------------------- library(BSgenome.Celegans.UCSC.ce2) ## Create a "sliding window" on chr I: sw_start <- seq.int(1, length(Celegans$chrI)-50, by=50) sw <- Views(Celegans$chrI, start=sw_start, width=10) my_fake_shortreads <- as(sw, "XStringSet") save.XStringSet(my_fake_shortreads, "my_fake_shortreads", dirpath=tempdir()) ## --------------------------------------------------------------------- ## D. SOME RELATED HELPER FUNCTIONS ## --------------------------------------------------------------------- ## Converting 'x'... ## ... to a list of FASTA records (as one returned by the "readFASTA" function) x1 <- XStringSetToFASTArecords(x) ## ... to a named character vector x2 <- FASTArecordsToCharacter(x1) # same as 'as.character(x)'