Search code examples
rstatadummy-variable

R equivalent of Stata `tabulate , generate( )` command


I want to mimic the behavior of Stata's tabulate , generate() command in R. It is illustrated below; the command's functionality is twofold. First, in my example, it produces a one-way table of frequency counts. Second, it generated dummy variables for each of the values contained on the variable (var1) using the prefix (stubname) declared in option ,generate() to name the generated dummy variables (d_1 - d_7). My question is regarding the second functionality. R-base solutions are preferred, but packaged dependent are also welcome.

[Edit]: My final goal is to generate a data.frame() that emulates the last data set printed on the screen.

clear all
input var1 
0
1
2
2
2
2
42
42
777
888
999999
end
tabulate var1 ,gen(d_)

/*     var1 |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          1        9.09        9.09
          1 |          1        9.09       18.18
          2 |          4       36.36       54.55
         42 |          2       18.18       72.73
        777 |          1        9.09       81.82
        888 |          1        9.09       90.91
     999999 |          1        9.09      100.00
------------+-----------------------------------
      Total |         11      100.00          */


list, sep(11)



/*   +--------------------------------------------------+
     |   var1   d_1   d_2   d_3   d_4   d_5   d_6   d_7 |
     |--------------------------------------------------|
  1. |      0     1     0     0     0     0     0     0 |
  2. |      1     0     1     0     0     0     0     0 |
  3. |      2     0     0     1     0     0     0     0 |
  4. |      2     0     0     1     0     0     0     0 |
  5. |      2     0     0     1     0     0     0     0 |
  6. |      2     0     0     1     0     0     0     0 |
  7. |     42     0     0     0     1     0     0     0 |
  8. |     42     0     0     0     1     0     0     0 |
  9. |    777     0     0     0     0     1     0     0 |
 10. |    888     0     0     0     0     0     1     0 |
 11. | 999999     0     0     0     0     0     0     1 |
     +--------------------------------------------------+ */

Solution

  • set.seed(123)
    df = data.frame(var1 = factor(sample(10, 20, TRUE)))
    
    df = data.frame(df, model.matrix(~0+var1, df)) # 0 here is to suppress the intercept. The smallest value will be the base group--and hence will be dropped. 
    names(df)[-1] = paste0('d_', 1:(ncol(df)-1))
    df
        var1 d_1 d_2 d_3 d_4 d_5 d_6 d_7 d_8 d_9
    1     3   0   1   0   0   0   0   0   0   0
    2     3   0   1   0   0   0   0   0   0   0
    3    10   0   0   0   0   0   0   0   0   1
    4     2   1   0   0   0   0   0   0   0   0
    5     6   0   0   0   0   1   0   0   0   0
    6     5   0   0   0   1   0   0   0   0   0
    7     4   0   0   1   0   0   0   0   0   0
    8     6   0   0   0   0   1   0   0   0   0
    9     9   0   0   0   0   0   0   0   1   0
    10   10   0   0   0   0   0   0   0   0   1
    11    5   0   0   0   1   0   0   0   0   0
    12    3   0   1   0   0   0   0   0   0   0
    13    9   0   0   0   0   0   0   0   1   0
    14    9   0   0   0   0   0   0   0   1   0
    15    9   0   0   0   0   0   0   0   1   0
    16    3   0   1   0   0   0   0   0   0   0
    17    8   0   0   0   0   0   0   1   0   0
    18   10   0   0   0   0   0   0   0   0   1
    19    7   0   0   0   0   0   1   0   0   0
    20   10   0   0   0   0   0   0   0   0   1