Search code examples
rstata

Creating temporary datasets in Stata


I am looking through lots of Stata documentation and still having a hard time finding simple examples of some basic tasks.

One item that is particularly difficult to understand is how to store the results of certain operations in variables (in the programming sense, not a field/column) so that I can compare them against one another. And I'm not talking about statistical models, for which I might use something like estimates.

Here's an example from the world of R, in which I store the means of groups of two fields within the same number of variables (again, in the programming sense):

library(dplyr)

category <- c('a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c')
first <- c(2, 1, 5, 3, 4, 2, 1, 3, 3) 
second <- c(3, 1, 6, 9, 12, 32, 113, 85, 123) 
df <- data.frame(category, first, second)

firstMean <- df %>% group_by(category) %>% summarise(mean = 
mean(first))
secondMean <- df %>% group_by(category) %>% summarise(mean = 
mean(second))

abs(firstMean[,2] - secondMean[,2])

# Results
# a 0.67
# b 14.67
# c 104.67

Questions:

  1. How can I accomplish the same task in Stata?

I am also reading about return list, but when I use the save command after calculating my mean, it seems to overwrite the previously ran command.

  1. Is there some way I can name that in a temporary way?

I do not want to save these things on file, I'm only looking to make quick, temporary data frames.


Solution

  • You can put the results in a variable (Stata sense) with just one line. Displaying it without repetition is also fairly easy. @Joe Patten's answer is helpful, but it destroys the current dataset.

    clear 
    input str1 category first second 
    a  2  3
    a  1  1
    a  5  6
    b  3  9 
    b  4  12
    b  2  32
    c  1  113
    c  3  85
    c  3  123
    end 
    
    egen difference = mean(first-second), by(category) 
    egen tag = tag(category) 
    
    list category difference if tag , noobs 
    
    tabdisp category, c(difference) format(%4.2f) 
    

    Here are the results:

    . list category difference if tag , noobs 
    
      +----------------------+
      | category   differe~e |
      |----------------------|
      |        a   -.6666667 |
      |        b   -14.66667 |
      |        c   -104.6667 |
      +----------------------+
    
    . tabdisp category, c(difference) format(%4.2f) 
    
    ----------------------
     category | difference
    ----------+-----------
            a |      -0.67
            b |     -14.67
            c |    -104.67
    ----------------------
    

    What you are reaching for could be quite different, e.g. using local macros or scalars or using Mata. There are several ways of doing it in Stata, just as there will be in R.

    As for documentation, there is no real substitute for reading the manuals, starting with [U]. You can waste a lot of time Googling, as there are many very limited or fragmentary tutorials on the internet. Mostly they support what the authors intended, but a fuller understanding is only possible through systematic reading.