Search code examples
regexstatalocalstata-macros

Using regular expressions or subinstr() to clean local macros


My aim is to clean a given local from _ and all numbers following the underscore at the end of the words. Assume that I have underscores followed by numbers at the end of the words only.

By using subinstr(), I am able to specify that I want to eliminate _1 (and possibly loop over different numbers), but the double-loop syntax seems to be overly complicated for such task:

local list_x `" "rep78_3" "make_1" "price_1" "mpg_2" "'
local n_x : list sizeof list_x

forvalues j = 1/`n_x' {
    local varname: word `j' of `list_x'
    local clean_name: subinstr local varname "_1" "" 
    display "`clean_name'" 
}

I tried to look into regexm() and regexs(), but I am not quite sure how to set up the code.

I understand there might be multiple ways to solve this.

Maybe there is a simpler way to address the issue that I cannot see?


Solution

  • With the new version of regex functions in Stata 14, you can replace all matches at once.

    . local list_x `" "rep78_3" "make_1" "price_1" "mpg_2" "'
    
    . local fixed = ustrregexra(`"`list_x'"', "_[0-9]+","")
    
    . dis `"`fixed'"'
     "rep78" "make" "price" "mpg"