Search code examples
rassignment-operator

Behavior of assignment operators ('=' and '<-') inside a function in R


This is related to Assignment operators in R: '=' and '<-'; however, my question is not answered there.

The linked question and answers explain that using <- inside of a function declares the variable assignment in the user workspace, so that the variable can be used after the function is called. (Ed note: that is not actually stated in the linked answer, and if it were stated, it would be wrong. If you made the statement about the evaluation of argument lists and restricted it to calls of such functions from the global environment it might be correct.)

This would seem to explain the following difference in behavior. This following code produces a data frame exactly as one might expect:

A <- data.frame(
  Sub = rep(c(1:3),each=3),
  Word = rep(c('Hap','Lap','Sap'),3),
  Vowel_Length = sample(c(1:100),9)
  )

The result is:

  Sub Word Vowel_Length
1   1  Hap           31
2   1  Lap            2
3   1  Sap           71
4   2  Hap           58
5   2  Lap           28
6   2  Sap           20
7   3  Hap           78
8   3  Lap           72
9   3  Sap           77

However, if we use <- inside of the data.frame() function, as follows, we get a different result.

B <- data.frame(
  Sub <- rep(c(1:3),each=3),
  Word <- rep(c('Hap','Lap','Sap'),3),
  Vowel_Length <- sample(c(1:100),9)
  )

This result is:

  Sub....rep.c.1.3...each...3. Word....rep.c..Hap....Lap....Sap....3.
1                            1                                    Hap
2                            1                                    Lap
3                            1                                    Sap
4                            2                                    Hap
5                            2                                    Lap
6                            2                                    Sap
7                            3                                    Hap
8                            3                                    Lap
9                            3                                    Sap
  Vowel_Length....sample.c.1.100...9.
1                                  31
2                                  15
3                                   4
4                                   2
5                                  89
6                                  55
7                                  12
8                                  72
9                                  47

I assume that, because using <- inside a function declares the variable globally, then the headers of the data frame are inherited from that global declaration, just as the linked question and answers would seem to indicate. [See the comments.]

However, I'm curious why you get, for example, Sub....rep.c.1.3...each...3. as the header of the first column in the data frame instead of Sub <- rep(c(1:3),each=3),, or even instead of 1 1 1 2 2 2 3 3 3.

Update:

As @AnandaMahto pointed out in a deleted comment, setting check.names to FALSE produces the following behavior.

C <- data.frame(
  Sub <- rep(c(1:3),each=3),
  Word <- rep(c('Hap','Lap','Sap'),3),
  Vowel_Length <- sample(c(1:100),9),
  check.names=FALSE
)

Where the result is:

  Sub <- rep(c(1:3), each = 3) Word <- rep(c("Hap", "Lap", "Sap"), 3)
1                            1                                    Hap
2                            1                                    Lap
3                            1                                    Sap
4                            2                                    Hap
5                            2                                    Lap
6                            2                                    Sap
7                            3                                    Hap
8                            3                                    Lap
9                            3                                    Sap
  Vowel_Length <- sample(c(1:100), 9)
1                                  15
2                                   3
3                                  82
4                                  33
5                                  99
6                                  53
7                                  89
8                                  77
9                                  47

And to clarify, my question is simply why this behavior is happening. In particular, why do you get Sub....rep.c.1.3...each...3. as a header instead of Sub <- rep(c(1:3),each=3), or 1 1 1 2 2 2 3 3 3 with check.names=TRUE.

And now, I suppose that I'm also curious why you get Sub <- rep(c(1:3),each=3), as the header with check.names=FALSE?


Solution

  • It appears that your question is about the strange naming that R ends up using, and you're wondering why it doesn't have spaces, <, and so on.

    If that's your actual question, you should look at the check.names argument in data.frame.

    From ?data.frame:

    check.names logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names and are not duplicated. If necessary they are adjusted (by make.names) so that they are.

    Thus, you can get the names you were expecting by setting check.names to FALSE:

    B <- data.frame( Sub <- rep(c(1:3),each=3), 
                     Word <- rep(c('Hap','Lap','Sap'),3), 
                     Vowel_Length <- sample(c(1:100),9),
                     check.names = FALSE)
    B
    #   Sub <- rep(c(1:3), each = 3) Word <- rep(c("Hap", "Lap", "Sap"), 3)
    # 1                            1                                    Hap
    # 2                            1                                    Lap
    # 3                            1                                    Sap
    # 4                            2                                    Hap
    # 5                            2                                    Lap
    # 6                            2                                    Sap
    # 7                            3                                    Hap
    # 8                            3                                    Lap
    # 9                            3                                    Sap
    #   Vowel_Length <- sample(c(1:100), 9)
    # 1                                  33
    # 2                                  20
    # 3                                   5
    # 4                                  83
    # 5                                  99
    # 6                                  79
    # 7                                  58
    # 8                                  46
    # 9                                  44