Search code examples
rparsingassignment-operator

Assignment in definition of data.frame


This one is not strictly a problem, but a thing that I encountered by accident. However, it is really intriguing to me.

I've run the following line in my console

sc_matrix <- data.frame(sc_start<-rpois(n=15, 0.4), sc_end<-rpois(n=15, 0.3))

and I was really surprised that the output was

head(sc_matrix, n=5)
#   sc_start....rpois.n...15..0.4. sc_end....rpois.n...15..0.3.
#1                               0                            1
#2                               0                            2
#3                               0                            0
#4                               1                            1
#5                               0                            0

First, I was surprised because the interpreter understood me (without even a warning). The data.frame was created even though I have used <- assignment inside of the data.frame constructor.

Second, the colnames seems to be created according to the rule change all non-alpha-numeric into .(dot) and use it as a name.

After reading the discussion on assignments comparison I guess my question is:

How R handles that line of code? Since there is no = operator it evaluates each argument, e.g. sc_start<-rpois(n=15, 0.4), creates column name from it and uses the value of the right-side evaluation?

It seems tricky, since the operator <- does not return any value and I would guess the created data.frame should contain something like NULL. I will appreciate any comments on this.


Solution

  • sc_matrix <- data.frame(sc_start<-rpois(n=15, 0.4), sc_end<-rpois(n=15, 0.3))
    

    To understand what happens here, you need to know that like almost everything in R (except data objects) <- is actually a function. You can even do things like `<-`(a, 1). This function has an invisible return value, which is the RHS of the assignment (see help("<-")), i.e., your assumption is wrong.

    If you don't pass column names to data.frame (as the LHS of =) it uses substitute to create names. These names are sanitized if check.names = TRUE, the default. What you observe is essentially the same as if you do something like data.frame(1).