I have appended multiple files into a single sSata dataset. It has now 335
variable names. Some variable names have casing issues like almirah
and ALMIRAH
storing the same information from different datasets.
I am replacing these variables like this one by one:
count if mi(almirah)
local first=r(N)
count if mi(ALMIRAH)
local sec=r(N)
if first<sec {
replace almirah=ALMIRAH if mi(almirah)
}
else {
}
How do I program this for all variables which are the same variable in essence but have upper and lower case issues like this?
Suppose you have frog toad newt
and FROG TOAD NEWT
. Let's decide that the variable with lower case name is definitive. So, a loop with some or all of this may be helpful.
foreach v in frog toad newt {
local V = upper("`v'")
generate `v'2 = cond(missing(`v'), `V', `v')
display
}
I have created a new variable there because there may be other problems. If there are, overwriting your data may obscure what they are.
Note: In your code segment you need at least
if `first' < `sec'
to make it legal, as references to first
and sec
will be interpreted as references to variables or scalars otherwise. But it's really not clear why the numbers of missing values are material at all. If I have 42 observations, then append
66 more, the result should be the same as the other way round.