Search code examples
statatokenizestata-macros

How to tokenize a extended macro (local :dir )?


I know my title is confusing in the sense that the tokenize command is specified to a string.

I have many folders that contain massive, separated, ill-named Excel files (most of them are scraped from ahe website). It's inconvenient to select them manually so I need to rely on Stata extended macro function local :dir to read them.

My code looks as follows:

foreach file of local filelist {
    import excel "`file'", clear
    sxpose, clear 
    save "`file'.dta", replace
}

Such code will generate many new dta files and the directory is thus full of these files. I prefer to create a single new data file for the first xlsx file and then append others to it inside the foreach loop. So essentially, there's a if-else inside the loop.

We need an index of the macro filelist just created, so that we can write something like:

token `filelist'  // filelist is created in the former code

if "`i'" == `1' {
   import excel "`file'",clear
}
else {
   append using `i',clear
}

I know my code is inefficient and error-prone: the syntax of expression token 'filelist' is incorrect too (given that filelist is not a string). However, I still want to figure out the basic structure behind my pseudo code.

How could I correct my code and make it work?

Another more efficient approach is highly welcomed.


Solution

  • Various techniques spring to mind, none of which entails tokenizing.

    local count = 1 
    foreach file of local filelist {
        import excel "`file'",clear
        sxpose, clear 
    
        if `count' == 1 save alldata 
        else append using alldata 
    
        local ++count
    }
    
    
    local allothers "*" 
    foreach file of local filelist {
        import excel "`file'",clear
        sxpose, clear 
    
        `firstonly'   save alldata 
        `allothers'   append using alldata 
    
        local firstonly "*" 
        local allothers 
    }
    

    In the second block, the point is that lines prefixed by * are treated as comments, so any command that * precedes is ignored ("commented out"). The append statement is commented out first time round the loop and the save statement is preceded by an undefined local macro, which evaluates to an empty string, so it is not ignored.

    After the first time round the loop, commenting out on append is removed, but placed on the save.

    I don't think either of these approaches is more efficient than what you have in mind (works faster, uses less memory, is shorter, or whatever "efficient" means for you). The code clearly does presuppose that you have set up the file list correctly.