Search code examples
rdataframenames

Does R ignore variable name extensions starting with a dot in a data frame?


I have a data frame where some variable names include a "." extension. It seems R will ignore this extension and give me the variable anyway if I try to access it without the complete variable name. What is causing this/why does it happen? Below is a mini example of my problem.

y <- rnorm(100)
x <- rlnorm(100)

data <- cbind.data.frame(y,x)

colnames(data) <- c("y.rnorm","x.rlnorm")

# these both return the same thing
data$y
data$y.rnorm

Solution

  • R is setup to provide results to partial matches by design.

    Read section 3.4 & 4.3 of the R language definition.

    3.4.1 Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.

    and

    4.3.2 Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ‘...’ then partial matching is only applied to arguments that precede it.

    update

    As noted by Uwe, there may be a pending update to the R language definition as the behaviour of [[ partial matching has changed. A look through R News shows the following as deprecated and defunct with the 3.1.0 release:

    Partial matching when using the $ operator on data frames now throws a warning and may become defunct in the future. If partial matching is intended, replace foo$bar by foo[["bar", exact = FALSE]]