Search code examples
rdataframeassignment-operator

R: Why does data.frame only give me nice column names if I use the = operator?


These four ways of creating a dataframe look pretty similar to me:

myData1 <- data.frame(a <- c(1,2), b <- c(3, 4))
myData2 <- data.frame(a = c(1,2), b = c(3,4))
myData3 <- data.frame(`<-`(a,c(1,2)), `<-`(b,c(3, 4)))
myData4 <- data.frame(`=`(a,c(1,2)), `=`(b,c(3,4)))

But If I print out the column names, I only get the nice column names that I would hope for if I use the = operator. In all the other cases, the whole expression becomes the column name, with all the non-alphanumerics replaced by periods:

> colnames(myData1)
[1] "a....c.1..2." "b....c.3..4."
> colnames(myData2)
[1] "a" "b"
> colnames(myData3)
[1] "a....c.1..2." "b....c.3..4."
> colnames(myData4)
[1] "a...c.1..2." "b...c.3..4."

I've read about differences between <- and = when used in function calls in terms of variable scope, but as far as I can reason (possibly not very far), that doesn't explain this particular behavior.

  1. What accounts for the difference between = and <-?
  2. What accounts for the difference between the prefix and infix versions of =?

Solution

  • When you call a function, including data.frame, = is not used as an assignment operator. It simply marks relationships between given parameter and a variable you pass to the function.

    Ignoring data.frame(a = c(1,2), b = c(3,4)), fore each of these calls <- and = are interpreted as normal assignments and create a and b variables in your environment.

    > ls()
    character(0)
    > myData1 <- data.frame(a <- c(1,2), b <- c(3, 4))
    [1] "a"       "b"       "myData1"
    > rm(list=ls())
    > ls()
    character(0)
    > myData3 <- data.frame(`<-`(a,c(1,2)), `<-`(b,c(3, 4)))
    > ls()
    [1] "a"       "b"       "myData3"
    > rm(list=ls())
    > ls()
    character(0)
    > myData4 <- data.frame(`=`(a,c(1,2)), `=`(b,c(3,4)))
    > ls()
    [1] "a"       "b"       "myData4"
    

    Data frame get expected values only because <- and = return invisibly the argument.

    > foo <- `=`(a,c(1,2))
    > foo
    [1] 1 2
    

    Because of that your data.frame calls are equivalent, ignoring variable assignment side effect, to

    > data.frame(c(1,2), c(3, 4))
      c.1..2. c.3..4.
    1       1       3
    2       2       4
    

    hence the results you see.