Search code examples
statamultiple-databases

How to retrieve data from multiple Stata files?


I have 53 Stata .dta files each of them is 150 - 200 Mb and contain identical set of variables, but for different years. It is not useful to combine or merge them due to their size .

I need to retrieve some averaged values (percentages etc.) Therefore, I want to create a new Stata file New.dta and write a .do file that would run on that new Stata file in the following way: it should open each of those 53 Stata files, make certain calulations, and store the results in the new Stata file, New.dta.

I am not sure how i can keep two Stata file open simultaneuosly, and how can i store the calculated values?

When I open a second .dta file, how can i make the first one still be open? How can i store the calculated values in the global variable?


Solution

  • What springs to mind here is the use of postfile.

    Here is a simple example. First, I set up an example of several datasets. You already have this.

    clear
    
    forval i = 1/10 {
        set obs 100
        gen foo = `i' * runiform()
        save test`i'
        clear
    }
    

    Now I set up postfile. I need to set up a handle, what variables will be used, and what file will be used. Although I am using a numeric variable to hold file identifiers, it will perhaps be more typical to use a string variable. Also, looping over filenames may be a bit more challenging than this. fs from SSC is a convenience command that helps put a set of filenames into a local macro; its use is not illustrated here.

    postfile mypost what mean using alltest.dta
    
    forval i = 1/10 {
        use test`i', clear
        su foo, meanonly
        post mypost (`i')  (`r(mean)')
    }
    

    Now flush results

    postclose mypost
    

    and see what we have.

    u alltest
    
    list 
    
         +-----------------+
         | what       mean |
         |-----------------|
      1. |    1   .5110765 |
      2. |    2   1.016858 |
      3. |    3   1.425967 |
      4. |    4   2.144528 |
      5. |    5   2.438035 |
         |-----------------|
      6. |    6   3.030457 |
      7. |    7   3.356905 |
      8. |    8   4.449655 |
      9. |    9   4.381101 |
     10. |   10   5.017308 |
         +-----------------+
    

    I didn't use any global macros (not global variables) here; you should not need to.