Search code examples
statastata-macros

How can I import specific files?


I am trying to import hundreds of U.S. county xls files together to form a complete dataset in Stata. The problem is that for every county, I have several files for different years, so that my list of file names looks like this:

county1-year1970.xls
county1-year1975.xls
county2-year1960.xls
county2-year1990.xls

For each county, I only want the file from the most recent year (which varies across counties).

So far, I have written code to loop through each possible file name, and if the file exists, to store the year in a local macro maxyear:

local years = 0
forvalues i = 1/500 {
    forvalues yr = 1900/2018 {
        capture confirm file county`i'-year`yr'.xls
        if _rc == 0 {
            local years `years' `yr'
        }
    }
    /* [code to extract the max value in `years'] */
    import excel county`i'-year`maxyear'.xls, clear
}

The loop seems to work, but it is still missing code that will extract the maximum value from the local list `years'. I want to use that maximum value to import the Excel sheet.

How can I identify the maximum value in a local macro or is there a simpler way to get what I want?


Solution

  • As you are looping over years from first possible to last possible, all you need is to keep track of the last valid year:

    forval i = 1/500 {
        local maxyear  
        forval yr = 1900/2018 {
            capture confirm file county`i'-year`yr'.xls
            if _rc == 0 local maxyear `yr'
        }
    
        if "`maxyear'" != "" {    
            import excel county`i'-year`maxyear'.xls, clear
        }
    }
    

    Otherwise put, keeping a record of all the years that were valid, and then finding the maximum over those, is more work than you need to do. (But notice that as you loop over increasing years, the maximum would just be the last item in your list.)

    This answer is close to the question, but @Pearly Spencer's answer is a neater solution in this case.