Grouping-class {IRanges}R Documentation

Grouping objects

Description

In this man page, we call "grouping" the action of dividing a collection of NO objects into NG groups (some of which may be empty). The Grouping class and subclasses are containers for representing groupings.

The Grouping core API

Let's give a formal description of the Grouping core API:

Groups G_i are indexed from 1 to NG (1 <= i <= NG).

Objects O_j are indexed from 1 to NO (1 <= j <= NO).

Every object must belong to one group and only one.

Given that empty groups are allowed, NG can be greater than NO.

Grouping an empty collection of objects (NO = 0) is supported. In that case, all the groups are empty. And only in that case, NG can be zero too (meaning there are no groups).

If x is a Grouping object:

length(x): Returns the number of groups (NG).
names(x): Returns the names of the groups.
nobj(x): Returns the number of objects (NO). Equivalent to length(togroup(x)).

Going from groups to objects:

x[[i]]: Returns the indices of the objects (the j's) that belong to G_i. The j's are returned in ascending order. This provides the mapping from groups to objects (one-to-many mapping).
grouplength(x, i=NULL): Returns the number of objects in G_i. Works in a vectorized fashion (unlike x[[i]]). grouplength(x) is equivalent to grouplength(x, seq_len(length(x))). If i is not NULL, grouplength(x, i) is equivalent to sapply(i, function(ii) length(x[[ii]])).
members(x, i): Equivalent to x[[i]] if i is a single integer. Otherwise, if i is an integer vector of arbitrary length, it's equivalent to sort(unlist(sapply(i, function(ii) x[[ii]]))).
vmembers(x, L): A version of members that works in a vectorized fashion with respect to the L argument (L must be a list of integer vectors). Returns lapply(L, function(i) members(x, i)).

Going from objects to groups:

togroup(x, j=NULL): Returns the index i of the group that O_j belongs to. This provides the mapping from objects to groups (many-to-one mapping). Works in a vectorized fashion. togroup(x) is equivalent to togroup(x, seq_len(nobj(x))): both return the entire mapping in an integer vector of length NO. If j is not NULL, togroup(x, j) is equivalent to y <- togroup(x); y[j].
togrouplength(x, j=NULL): Returns the number of objects that belong to the same group as O_j (including O_j itself). Equivalent to grouplength(x, togroup(x, j)).

Given that length, names and [[ are defined for Grouping objects, those objects can be considered Sequence objects. In particular, as.list works out-of-the-box on them.

One important property of any Grouping object x is that unlist(as.list(x)) is always a permutation of seq_len(nobj(x)). This is a direct consequence of the fact that every object in the grouping belongs to one group and only one.

The H2LGrouping and Dups subclasses

[DOCUMENT ME]

The Partitioning subclass

A Partitioning container represents a block-grouping, i.e. a grouping where each group contains objects that are neighbors in the original collection of objects. More formally, a grouping x is a block-grouping iff togroup(x) is sorted in increasing order (not necessarily strictly increasing).

A block-grouping object can also be seen (and manipulated) as a Ranges object where all the ranges are adjacent starting at 1 (i.e. it covers the 1:NO interval with no overlap between the ranges).

Note that a Partitioning object is both: a particular type of Grouping object and a particular type of Ranges object. Therefore all the methods that are defined for Grouping and Ranges objects can also be used on a Partitioning object. See ?Ranges for a description of the Ranges API.

The Partitioning class is virtual with 2 concrete subclasses: PartitioningByEnd (only stores the end of the groups, allowing fast mapping from groups to objects), and PartitioningByWidth (only stores the width of the groups).

Binning subclass

A Binning container represents a grouping where each observation is assigned to a group or bin. It is similar in nature to taking a the integer codes of a factor object and splitting it up by its levels (i.e. myFactor <- factor(...); split(as.integer(myFactor), myFactor)).

Constructors

H2LGrouping(high2low=integer()): [DOCUMENT ME]
Dups(high2low=integer()): [DOCUMENT ME]
PartitioningByEnd(end=integer(), names=NULL): Return the PartitioningByEnd object made of the partitions ending at the values specified by end. end must contain sorted non-negative integer values. If the names argument is non NULL, it is used to name the partitions.
PartitioningByWidth(width=integer(), names=NULL): Return the PartitioningByWidth object made of the partitions with the widths specified by width. width must contain non-negative integer values. If the names argument is non NULL, it is used to name the partitions.
Binning(group=integer(), names=NULL): Return the Binning object made from the group argument, which takes a factor or positive valued integer vector. If the names argument is non NULL, it is used to name the bins. When group is a factor, the names are set to levels(group) unless specified otherwise.
Note that these constructors don't recycle their names argument (to remain consistent with what `names<-` does on standard vectors).

Author(s)

H. Pages and P. Aboyoun

See Also

Sequence-class, Ranges-class, IRanges-class, successiveIRanges, cumsum, diff

Examples

  showClass("Grouping")  # shows (some of) the known subclasses

  ## ---------------------------------------------------------------------
  ## A. H2LGrouping OBJECTS
  ## ---------------------------------------------------------------------
  high2low <- c(NA, NA, 2, 2, NA, NA, NA, 6, NA, 1, 2, NA, 6, NA, NA, 2)
  x <- H2LGrouping(high2low)
  x

  ## The Grouping core API:
  length(x)
  nobj(x)  # same as 'length(x)' for H2LGrouping objects
  x[[1]]
  x[[2]]
  x[[3]]
  x[[4]]
  x[[5]]
  grouplength(x)  # same as 'unname(sapply(x, length))'
  grouplength(x, 5:2)
  members(x, 5:2)  # all the members are put together and sorted
  togroup(x)
  togroup(x, 5:2)
  togrouplength(x)  # same as 'grouplength(x, togroup(x))'
  togrouplength(x, 5:2)

  ## The Sequence API:
  as.list(x)
  sapply(x, length)

  ## ---------------------------------------------------------------------
  ## B. Dups OBJECTS
  ## ---------------------------------------------------------------------
  x_dups <- as(x, "Dups")
  x_dups
  duplicated(x_dups)  # same as 'duplicated(togroup(x_dups))'

  ### The purpose of a Dups object is to describe the groups of duplicated
  ### elements in a vector-like object:
  x <- c(2, 77, 4, 4, 7, 2, 8, 8, 4, 99)
  x_high2low <- high2low(x)
  x_high2low  # same length as 'x'
  x_dups <- Dups(x_high2low)
  x_dups
  togroup(x_dups)
  duplicated(x_dups)
  togrouplength(x_dups)  # frequency for each element
  table(x)

  ## ---------------------------------------------------------------------
  ## C. Partitioning OBJECTS
  ## ---------------------------------------------------------------------
  x <- PartitioningByEnd(end=c(4, 7, 7, 8, 15), names=LETTERS[1:5])
  x  # the 3rd partition is empty

  ## The Grouping core API:
  length(x)
  nobj(x)
  x[[1]]
  x[[2]]
  x[[3]]
  grouplength(x)  # same as 'unname(sapply(x, length))' and 'width(x)'
  togroup(x)
  togrouplength(x)  # same as 'grouplength(x, togroup(x))'
  names(x)

  ## The Ranges core API:
  start(x)
  end(x)
  width(x)

  ## The Sequence API:
  as.list(x)
  sapply(x, length)

  ## Replacing the names:
  names(x)[3] <- "empty partition"
  x

  ## Coercion to an IRanges object:
  as(x, "IRanges")

  ## Other examples:
  PartitioningByEnd(end=c(0, 0, 19), names=LETTERS[1:3])
  PartitioningByEnd()  # no partition
  PartitioningByEnd(end=integer(9))  # all partitions are empty

  ## ---------------------------------------------------------------------
  ## D. RELATIONSHIP BETWEEN Partitioning OBJECTS AND successiveIRanges()
  ## ---------------------------------------------------------------------
  mywidths <- c(4, 3, 0, 1, 7)

  ## The 3 following calls produce the same ranges:
  x1 <- successiveIRanges(mywidths)  # IRanges instance.
  x2 <- PartitioningByEnd(end=cumsum(mywidths))  # PartitioningByEnd instance.
  x3 <- PartitioningByWidth(width=mywidths)  # PartitioningByWidth instance.
  stopifnot(identical(as(x1, "PartitioningByEnd"), x2))
  stopifnot(identical(as(x1, "PartitioningByWidth"), x3))

  ## ---------------------------------------------------------------------
  ## E. Binning OBJECTS
  ## ---------------------------------------------------------------------
  set.seed(0)
  x <- Binning(factor(sample(letters, 36, replace=TRUE), levels=letters))
  x

  grouplength(x)
  togroup(x)
  x[[2]]
  x[["u"]]

[Package IRanges version 1.6.16 Index]