Grouping-class {IRanges} | R Documentation |
In this man page, we call "grouping" the action of dividing a collection of NO objects into NG groups (some of which may be empty). The Grouping class and subclasses are containers for representing groupings.
Let's give a formal description of the Grouping core API:
Groups G_i are indexed from 1 to NG (1 <= i <= NG).
Objects O_j are indexed from 1 to NO (1 <= j <= NO).
Every object must belong to one group and only one.
Given that empty groups are allowed, NG can be greater than NO.
Grouping an empty collection of objects (NO = 0) is supported. In that case, all the groups are empty. And only in that case, NG can be zero too (meaning there are no groups).
If x
is a Grouping object:
length(x)
:
Returns the number of groups (NG).
names(x)
:
Returns the names of the groups.
nobj(x)
:
Returns the number of objects (NO). Equivalent to length(togroup(x))
.
Going from groups to objects:
x[[i]]
:
Returns the indices of the objects (the j's) that belong to G_i.
The j's are returned in ascending order.
This provides the mapping from groups to objects (one-to-many mapping).
grouplength(x, i=NULL)
:
Returns the number of objects in G_i.
Works in a vectorized fashion (unlike x[[i]]
).
grouplength(x)
is equivalent to grouplength(x, seq_len(length(x)))
.
If i
is not NULL, grouplength(x, i)
is equivalent to
sapply(i, function(ii) length(x[[ii]]))
.
members(x, i)
:
Equivalent to x[[i]]
if i
is a single integer.
Otherwise, if i
is an integer vector of arbitrary length, it's
equivalent to sort(unlist(sapply(i, function(ii) x[[ii]])))
.
vmembers(x, L)
:
A version of members
that works in a vectorized fashion with
respect to the L
argument (L
must be a list of integer
vectors). Returns lapply(L, function(i) members(x, i))
.
Going from objects to groups:
togroup(x, j=NULL)
:
Returns the index i of the group that O_j belongs to.
This provides the mapping from objects to groups (many-to-one mapping).
Works in a vectorized fashion. togroup(x)
is equivalent to
togroup(x, seq_len(nobj(x)))
: both return the entire mapping in
an integer vector of length NO.
If j
is not NULL, togroup(x, j)
is equivalent to
y <- togroup(x); y[j]
.
togrouplength(x, j=NULL)
:
Returns the number of objects that belong to the same group as O_j
(including O_j itself).
Equivalent to grouplength(x, togroup(x, j))
.
Given that length
, names
and [[
are defined
for Grouping objects, those objects can be considered Sequence
objects. In particular, as.list
works out-of-the-box on them.
One important property of any Grouping object x
is
that unlist(as.list(x))
is always a permutation of
seq_len(nobj(x))
. This is a direct consequence of the fact
that every object in the grouping belongs to one group and only
one.
[DOCUMENT ME]
A Partitioning container represents a block-grouping, i.e. a grouping
where each group contains objects that are neighbors in the original
collection of objects. More formally, a grouping x
is a
block-grouping iff togroup(x)
is sorted in increasing order
(not necessarily strictly increasing).
A block-grouping object can also be seen (and manipulated) as a Ranges object where all the ranges are adjacent starting at 1 (i.e. it covers the 1:NO interval with no overlap between the ranges).
Note that a Partitioning object is both: a particular type of Grouping
object and a particular type of Ranges object. Therefore all the
methods that are defined for Grouping and Ranges objects can also
be used on a Partitioning object. See ?Ranges
for a description of
the Ranges API.
The Partitioning class is virtual with 2 concrete subclasses: PartitioningByEnd (only stores the end of the groups, allowing fast mapping from groups to objects), and PartitioningByWidth (only stores the width of the groups).
A Binning container represents a grouping where each observation is assigned to a group or bin. It is similar in nature to taking a the integer codes of a factor object and splitting it up by its levels (i.e. myFactor <- factor(...); split(as.integer(myFactor), myFactor)).
H2LGrouping(high2low=integer())
:
[DOCUMENT ME]
Dups(high2low=integer())
:
[DOCUMENT ME]
PartitioningByEnd(end=integer(), names=NULL)
:
Return the PartitioningByEnd object made of the partitions ending
at the values specified by end
. end
must contain
sorted non-negative integer values. If the names
argument
is non NULL, it is used to name the partitions.
PartitioningByWidth(width=integer(), names=NULL)
:
Return the PartitioningByWidth object made of the partitions with
the widths specified by width
. width
must contain
non-negative integer values. If the names
argument
is non NULL, it is used to name the partitions.
Binning(group=integer(), names=NULL)
:
Return the Binning object made from the group
argument, which
takes a factor or positive valued integer vector. If the names
argument is non NULL, it is used to name the bins. When group
is a factor, the names
are set to levels(group)
unless
specified otherwise.
names
argument
(to remain consistent with what `names<-`
does on standard
vectors).
H. Pages and P. Aboyoun
Sequence-class, Ranges-class, IRanges-class, successiveIRanges, cumsum, diff
showClass("Grouping") # shows (some of) the known subclasses ## --------------------------------------------------------------------- ## A. H2LGrouping OBJECTS ## --------------------------------------------------------------------- high2low <- c(NA, NA, 2, 2, NA, NA, NA, 6, NA, 1, 2, NA, 6, NA, NA, 2) x <- H2LGrouping(high2low) x ## The Grouping core API: length(x) nobj(x) # same as 'length(x)' for H2LGrouping objects x[[1]] x[[2]] x[[3]] x[[4]] x[[5]] grouplength(x) # same as 'unname(sapply(x, length))' grouplength(x, 5:2) members(x, 5:2) # all the members are put together and sorted togroup(x) togroup(x, 5:2) togrouplength(x) # same as 'grouplength(x, togroup(x))' togrouplength(x, 5:2) ## The Sequence API: as.list(x) sapply(x, length) ## --------------------------------------------------------------------- ## B. Dups OBJECTS ## --------------------------------------------------------------------- x_dups <- as(x, "Dups") x_dups duplicated(x_dups) # same as 'duplicated(togroup(x_dups))' ### The purpose of a Dups object is to describe the groups of duplicated ### elements in a vector-like object: x <- c(2, 77, 4, 4, 7, 2, 8, 8, 4, 99) x_high2low <- high2low(x) x_high2low # same length as 'x' x_dups <- Dups(x_high2low) x_dups togroup(x_dups) duplicated(x_dups) togrouplength(x_dups) # frequency for each element table(x) ## --------------------------------------------------------------------- ## C. Partitioning OBJECTS ## --------------------------------------------------------------------- x <- PartitioningByEnd(end=c(4, 7, 7, 8, 15), names=LETTERS[1:5]) x # the 3rd partition is empty ## The Grouping core API: length(x) nobj(x) x[[1]] x[[2]] x[[3]] grouplength(x) # same as 'unname(sapply(x, length))' and 'width(x)' togroup(x) togrouplength(x) # same as 'grouplength(x, togroup(x))' names(x) ## The Ranges core API: start(x) end(x) width(x) ## The Sequence API: as.list(x) sapply(x, length) ## Replacing the names: names(x)[3] <- "empty partition" x ## Coercion to an IRanges object: as(x, "IRanges") ## Other examples: PartitioningByEnd(end=c(0, 0, 19), names=LETTERS[1:3]) PartitioningByEnd() # no partition PartitioningByEnd(end=integer(9)) # all partitions are empty ## --------------------------------------------------------------------- ## D. RELATIONSHIP BETWEEN Partitioning OBJECTS AND successiveIRanges() ## --------------------------------------------------------------------- mywidths <- c(4, 3, 0, 1, 7) ## The 3 following calls produce the same ranges: x1 <- successiveIRanges(mywidths) # IRanges instance. x2 <- PartitioningByEnd(end=cumsum(mywidths)) # PartitioningByEnd instance. x3 <- PartitioningByWidth(width=mywidths) # PartitioningByWidth instance. stopifnot(identical(as(x1, "PartitioningByEnd"), x2)) stopifnot(identical(as(x1, "PartitioningByWidth"), x3)) ## --------------------------------------------------------------------- ## E. Binning OBJECTS ## --------------------------------------------------------------------- set.seed(0) x <- Binning(factor(sample(letters, 36, replace=TRUE), levels=letters)) x grouplength(x) togroup(x) x[[2]] x[["u"]]