Search code examples
loopsforeachstata

Dropping a subset of variables with similar names


I have to import and alter a large set of csv files. At some point in this process I want to use the following code

    local vlist materialcost* costofemployees* city country 

    foreach v in vlist{
        capture drop  `v'
    }

However, locals don't allow an open-ended description like materialcost* Alternatively, if I try

    foreach v of var materialcost* costofemployees* city country {
        capture drop  `v'
    }

I run into a problem because it cannot always find a variable with materialcost in the name, raising an error.

So I want Stata to drop all variables that start with materialcost if they are actually present in the data.


Solution

  • First off, the statement that locals don't allow an open-ended description like materialcost* is incorrect both factually and in principle.

    I just did it in Stata:

    . local vlist materialcost* costofemployees* city country
    
    . di "`vlist'"
    materialcost* costofemployees* city country
    

    So, the assignment to a local can and does work in practice. The principle is that locals are just containers for text and have no idea about being fussy on what that text might be (or mean), beyond a length limit not biting here, or usually.

    The reason you said that may be that this didn't do what you wanted.

    foreach v in vlist {
       capture drop  `v'
    }
    

    That syntax is legal, but not the way to get foreach to work through the contents of the local just defined. That loop boils down to

    capture drop vlist 
    

    which will do nothing unless vlist is a variable name in your dataset.

    Your main problem is this. If you say

    foreach v of var materialcost* costofemployees* city country { 
    

    then it's essential that what follows the var keyword is in fact a variable list and foreach will throw you out without entering the loop if that is not so. It doesn't ignore exceptions. The capture inside the loop can't help; Stata never gets that far. However, my guess is that

    foreach v in materialcost* costofemployees* city country {
        capture drop  `v'
    }
    

    should work for you as (a) you don't make a claim that may be wrong about what variables may exist and (b) capture does the work of catching any error that drop will raise if such variables don't exist.