Search code examples
statastata-macros

Different cases of variable name after append


I have appended multiple files into a single sSata dataset. It has now 335 variable names. Some variable names have casing issues like almirah and ALMIRAH storing the same information from different datasets.

I am replacing these variables like this one by one:

count if mi(almirah)
local first=r(N)

count if mi(ALMIRAH)
local sec=r(N)

if first<sec {
    replace almirah=ALMIRAH if mi(almirah)
}
else {

}

How do I program this for all variables which are the same variable in essence but have upper and lower case issues like this?


Solution

  • Suppose you have frog toad newt and FROG TOAD NEWT. Let's decide that the variable with lower case name is definitive. So, a loop with some or all of this may be helpful.

    foreach v in frog toad newt { 
        local V = upper("`v'") 
        generate `v'2 = cond(missing(`v'), `V', `v') 
        display  
    }
    

    I have created a new variable there because there may be other problems. If there are, overwriting your data may obscure what they are.

    Note: In your code segment you need at least

     if `first' < `sec'
    

    to make it legal, as references to first and sec will be interpreted as references to variables or scalars otherwise. But it's really not clear why the numbers of missing values are material at all. If I have 42 observations, then append 66 more, the result should be the same as the other way round.