Search code examples
regexstatastata-macros

Importing files with different months in name


I have the following code:

local date "September"

global dir `c(pwd)'
global files "A B C" 

foreach x of global files { 
    import excel "${path}`x'_`date'.xlsx", sheet("1") cellrange(A3:O21) clear 
    generate Store="`x'"
    save `x', replace
}

The problem is that not all my files have the date September. Some have August or May.

How can I incorporate a solution for this in the above script?

The idea is that if September is not found, the code would still run. However, instead of September it would go for August and if that fails, then May.


Solution

  • Suppose that the following files are stored in your working directory:

    A_September.xlsx
    B_August.xlsx
    C_May.xlsx
    

    You can use the macro extended function dir and wildcards to create a local macro files, which will contain a list of qualifying files:

    local files : dir "`c(pwd)'" files "*_*.xlsx"                               
    
    foreach x of local files {
        display "`x'"
    }
    
    A_September.xlsx
    B_August.xlsx
    C_May.xlsx
    

    Typing help extended_fcn from Stata's command prompt will provide you with more information.

    Suppose now that in your working directory there are two additional files:

    ASeptember_34.xlsx
    C_May45.xlsx
    

    In this case, the files will be included in the list:

    local files : dir "`c(pwd)'" files "*_*.xlsx"                               
    
    foreach x of local files {
        display "`x'"
    }
    
    A_September.xlsx
    ASeptember_34.xlsx
    B_August.xlsx
    C_May.xlsx
    C_May45.xlsx
    

    In order to ignore these in your loop, you need to further filter the file names using a regular expression:

    local files : dir "`c(pwd)'" files "*_*.xlsx"                               
    
    foreach x of local files {
        if ustrregexm("`x'", "[A-Z]_([A-Z][a-z]+).xlsx") display "`x'"
    }
    
    A_September.xlsx
    B_August.xlsx
    C_May.xlsx
    

    Note that the complexity of the required regular expression will depend on the patterns of the file names included in your working directory.