extractTranscripts {Biostrings}R Documentation

Extract a set of transcripts

Description

extractTranscripts allows the user to extract a set of transcripts specified by the starts and ends of their exons as well as the strand from which the transcript is coming.

transcriptWidths only returns the lengths of the transcripts (called the "widths" in this context) specified by the starts and ends of their exons.

transcriptLocs2refLocs converts transcript-based locations into reference-based locations.

Usage

  extractTranscripts(x, exonStarts=list(), exonEnds=list(),
                     strand=character(0), reorder.exons.on.minus.strand=FALSE)

  transcriptWidths(exonStarts=list(), exonEnds=list())

  transcriptLocs2refLocs(tlocs, exonStarts=list(), exonEnds=list(),
                         strand=character(0),
                         reorder.exons.on.minus.strand=FALSE)

Arguments

x A DNAString or MaskedDNAString object.
exonStarts, exonEnds The starts and ends of the exons, respectively.

Each argument can be a list of integer vectors, an IntegerList object, or a character vector where each element is a comma-separated list of integers. In addition, the lists represented by exonStarts and exonEnds must have the same shape i.e. have the same lengths and have elements of the same lengths. The length of exonStarts and exonEnds is the number of transcripts.

strand A character vector of the same length as exonStarts and exonEnds specifying the strand ("+" or "-") from which the transcript is coming.
reorder.exons.on.minus.strand TRUE or FALSE. Should the order of exons for transcripts coming from the minus strand be reversed?
tlocs A list of integer vectors of the same length as exonStarts and exonEnds. Each element in tlocs must contain transcript-based locations.

Details

extractTranscripts allows the user to extract a set of transcripts specified by the starts and ends of their exons as well as the strand from which the transcript is coming. See extractTranscriptsFromGenome in the GenomicFeatures package for extracting transcripts from a genome.

Value

A DNAStringSet object for extractTranscripts.

An integer vector for transcriptWidths.

A list of integer vectors of the same shape as tlocs for transcriptLocs2refLocs.

See Also

extractTranscriptsFromGenome, reverseComplement, DNAString-class, DNAStringSet-class

Examples

  ## ---------------------------------------------------------------------
  ## A. EXTRACTING WORM TRANSCRIPTS ZC101.3 AND F37B1.1
  ## ---------------------------------------------------------------------

  ## Transcript ZC101.3 (is on + strand):
  ##   Exons starts/ends relative to transcript:
  rstarts1 <- c(1, 488, 654, 996, 1365, 1712, 2163, 2453)
  rends1 <- c(137, 578, 889, 1277, 1662, 1870, 2410, 2561)
  ##   Exons starts/ends relative to chromosome:
  starts1 <- 14678410 + rstarts1
  ends1 <- 14678410 + rends1

  ## Transcript F37B1.1 (is on - strand):
  ##   Exons starts/ends relative to transcript:
  rstarts2 <- c(1, 325)
  rends2 <- c(139, 815)
  ##   Exons starts/ends relative to chromosome:
  starts2 <- 13611188 - rends2
  ends2 <- 13611188 - rstarts2

  exon_starts <- list(as.integer(starts1), as.integer(starts2))
  exon_ends <- list(as.integer(ends1), as.integer(ends2))

  library(BSgenome.Celegans.UCSC.ce2)
  ## Both transcripts are on chrII:
  chrII <- Celegans$chrII
  transcripts <- extractTranscripts(chrII,
                   exonStarts=exon_starts,
                   exonEnds=exon_ends,
                   strand=c("+","-"))

  ## Same as 'width(transcripts)':
  transcriptWidths(exonStarts=exon_starts, exonEnds=exon_ends)

  transcriptLocs2refLocs(list(c(1:6, 135:140, 1555:1560), c(1:6, 137:142, 625:630)),
                   exonStarts=exon_starts,
                   exonEnds=exon_ends,
                   strand=c("+","-"))

  ## A sanity check:
  ref_locs <- transcriptLocs2refLocs(list(1:1560, 1:630),
                   exonStarts=exon_starts,
                   exonEnds=exon_ends,
                   strand=c("+","-"))
  stopifnot(chrII[ref_locs[[1]]] == transcripts[[1]])
  stopifnot(complement(chrII)[ref_locs[[2]]] == transcripts[[2]])

[Package Biostrings version 2.18.2 Index]