Search code examples
stringstatarecode

Recode A String Variable


input VAR1 VAR2
A1 1
A2 0
A3 1
A4 1
A5 1
A6 1
A7 1
A8 1
A9 1
A10 1
A15 1
B7 0
A1 0
A16 1
A17 1
A18 1
A19 1
A20 0
A21 1
end

Say you have data such as the ones shown. I have VAR1 and wish to create from it VAR2 which takes values 1 if VAR1 contains at the beginning: A1, A3-A10, A15-A19, A21 or if not then it is zero. I believe for this you can use strpos(VAR1) but is it possible to say for example: strpos(VAR1, "A1, A3/A10, A15/A19, A21") ?


Solution

  • The following works if you have a small number of strings of interest. You may need an alternate approach if you are searching for a larger number of strings, where writing out the ranges of strings (e.g. A3-A10) is unfeasible.

    clear
    input str3 VAR1 VAR2
    A1 1
    A2 0
    A3 1
    A4 1
    A5 1
    A6 1
    A7 1
    A8 1
    A9 1
    A10 1
    A15 1
    B7 0
    A1 1
    A16 1
    A17 1
    A18 1
    A19 1
    A20 0
    A21 1
    end
    
    gen wanted = 0
    
    local mystrings = "A1 A3 A4 A5 A6 A7 A8 A9 A10 A15 A16 A17 A18 A19 A21"
    
    foreach string in `mystrings' {
        replace wanted = 1 if strpos(VAR1, "`string'") == 1
    }
    
    assert wanted == VAR2
    

    Note that in your example input, the second occurrence of A1 had a value of 0 but should have a value of 1 according to your post.

    Here is a more generalisable solution for larger ranges of strings:

    gen     A = 0
    replace A = 1 if strpos(VAR1,"A") == 1
    
    gen newvar = substr(VAR1,2,.)
    destring newvar, replace
    
    gen wanted = 0
    replace wanted = 1 if A == 1 & (inlist(newvar,1,21) | inrange(newvar,3,10) | inrange(newvar,15,19))
    
    assert wanted == VAR2