How to retrieve data from multiple Stata files?

I have 53 Stata .dta files each of them is 150 - 200 Mb and contain identical set of variables, but for different years. It is not useful to combine or merge them due to their size .

I need to retrieve some averaged values (percentages etc.) Therefore, I want to create a new Stata file New.dta and write a .do file that would run on that new Stata file in the following way: it should open each of those 53 Stata files, make certain calulations, and store the results in the new Stata file, New.dta.

I am not sure how i can keep two Stata file open simultaneuosly, and how can i store the calculated values?

When I open a second .dta file, how can i make the first one still be open? How can i store the calculated values in the global variable?

Solution

What springs to mind here is the use of postfile.

Here is a simple example. First, I set up an example of several datasets. You already have this.

clear

forval i = 1/10 {
    set obs 100
    gen foo = `i' * runiform()
    save test`i'
    clear
}

Now I set up postfile. I need to set up a handle, what variables will be used, and what file will be used. Although I am using a numeric variable to hold file identifiers, it will perhaps be more typical to use a string variable. Also, looping over filenames may be a bit more challenging than this. fs from SSC is a convenience command that helps put a set of filenames into a local macro; its use is not illustrated here.

postfile mypost what mean using alltest.dta

forval i = 1/10 {
    use test`i', clear
    su foo, meanonly
    post mypost (`i')  (`r(mean)')
}

Now flush results

postclose mypost

and see what we have.

u alltest

list 

     +-----------------+
     | what       mean |
     |-----------------|
  1. |    1   .5110765 |
  2. |    2   1.016858 |
  3. |    3   1.425967 |
  4. |    4   2.144528 |
  5. |    5   2.438035 |
     |-----------------|
  6. |    6   3.030457 |
  7. |    7   3.356905 |
  8. |    8   4.449655 |
  9. |    9   4.381101 |
 10. |   10   5.017308 |
     +-----------------+

I didn't use any global macros (not global variables) here; you should not need to.