agrep {base}R Documentation

Approximate String Matching (Fuzzy Matching)

Description

Searches for approximate matches to pattern (the first argument) within the string x (the second argument) using the Levenshtein edit distance.

Usage

agrep(pattern, x, ignore.case = FALSE, value = FALSE,
      max.distance = 0.1, useBytes = FALSE)

Arguments

pattern a non-empty character string to be matched (not a regular expression!). Coerced by as.character to a string if possible.
x character vector where matches are sought. Coerced by as.character to a character vector if possible.
ignore.case if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.
value if FALSE, a vector containing the (integer) indices of the matches determined is returned and if TRUE, a vector containing the matching elements themselves is returned.
max.distance Maximum distance allowed for a match. Expressed either as integer, or as a fraction of the pattern length (will be replaced by the smallest integer not less than the corresponding fraction of the pattern length), or a list with possible components
all:
maximal (overall) distance
insertions:
maximum number/fraction of insertions
deletions:
maximum number/fraction of deletions
substitutions:
maximum number/fraction of substitutions
If all is missing, it is set to 10%, the other components default to all. The component names can be abbreviated.
useBytes logical. in a multibyte locale, should the comparison be character-by-character (the default) or byte-by-byte.

Details

The Levenshtein edit distance is used as measure of approximateness: it is the total number of insertions, deletions and substitutions required to transform one string into another.

As from R 2.10.0 this uses tre by Ville Laurikari (http://http://laurikari.net/tre/), which supports MBCS character matching much better than the previous version.

Value

Either a vector giving the indices of the elements that yielded a match, or, if value is TRUE, the matched elements (after coercion, preserving names but no other attributes).

Author(s)

Original version by David Meyer. Current version by Brian Ripley.

See Also

grep

Examples

agrep("lasy", "1 lazy 2")
agrep("lasy", c(" 1 lazy 2", "1 lasy 2"), max = list(sub = 0))
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)

[Package base version 2.11.0 Index]