Search code examples
parsingstata

Split or tokenize within Stata program with using statement?


I am trying to use a program to speed up a repetitive Stata task. This is the first part of my program:

program alphaoj 

syntax [varlist]  , using(string) occ_level(integer) ind_level(integer)


    import excel `using', firstrow
    display "`using'"
    split "`using'", parse(_)
    
    local year = `2'
    display "`year'"
    display `year'

When I run this program, using the line alphaoj, ind_level(4) occ_level(5) using("nat4d_2002_dl.xls"), I receive the error factor-variable and time-series operators not allowed r(101);

I am not quite sure what is being treated as a factor or time series operator.

I have replaced the split line with tokenize, and the parse statement with parse("_"), and I continue to run into errors. In that case, it says _ not found r(111);

Ideally, I would have it take the year from the filename and use that year as the local.

I am struggling with how I should perform this seemingly simple task.


Solution

  • An error is returned because the split command only accepts string variables. You can't pass a string directly to it. See help split for more details.

    You can achieve your goal of extracting the year from the filename and storing that as a local macro. See below:

    program alphaoj 
        syntax [varlist], using(string)
    
        import excel `using', firstrow
        
        gen stringvar = "`using'"
    
        split stringvar, parse(_)
        
        local year = stringvar2
        display `year'
    end
    
    alphaoj, using("nat4d_2002_dl.xls")
    

    The last line prints "2002" to the console.

    Alternative solution that avoids creating an extra variable:

    program alphaoj 
        syntax [varlist], using(string)
    
        import excel `using', firstrow
        
        local year = substr("`using'",7,4)
        
        di `year'
    end
    
    alphaoj, using("nat4d_2002_dl.xls")
    

    Please note that this solution is reliant on the Excel files all having the exact same character structure.