My dataset contains multiple variables called avar_1
to bvar_10
referring to the history of an individual. For some reasons, the history is not always complete and there are some "gaps" (e.g. avar_1
and avar_4
are non-missing, but avar_2
and avar_3
are missing). For each individual, I want to store the first non-missing value in a new variable called var1
the second non-missing in var2
etc, so that I have a history without missing values.
I've tried the following code
local x=1
foreach wave in a b {
forval i=1/10 {
capture drop var`x'
generate var`x'=.
capture replace var`x'=`wave'var`i' if !mi(`wave'`var'`i')
if (!mi(var`x')) {
local x=1+`x'
}
}
}
var1
is generated properly but var2
only contains missings and following variables are not generated. However, I set trace on
and saw that the var2
is actually replaced for all variables from avar_1
to bvar_10
.
My guess is that the local x
is not correctly updated as its value change for the whole dataset but should be different for each observation.
Is that the problem and if so, how can I avoid it?
A concise concrete data example is worth more than a long explanation. Your description seems consistent with an example like this:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str1 id float(avar_1 avar_2 avar_3 bvar_1 bvar_2)
"A" 1 . 6 8 10
"B" 2 4 . 9 .
"C" 3 5 7 . 11
end
* 4 is specific to this example.
rename (bvar_*) (avar_#), renumber(4)
reshape long avar_, i(id) j(which)
(note: j = 1 2 3 4 5)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 3 -> 15
Number of variables 6 -> 3
j variable (5 values) -> which
xij variables:
avar_1 avar_2 ... avar_5 -> avar_
-----------------------------------------------------------------------------
drop if missing(avar_)
bysort id (which) : replace which = _n
list, sepby(id)
+--------------------+
| id which avar_ |
|--------------------|
1. | A 1 1 |
2. | A 2 6 |
3. | A 3 8 |
4. | A 4 10 |
|--------------------|
5. | B 1 2 |
6. | B 2 4 |
7. | B 3 9 |
|--------------------|
8. | C 1 3 |
9. | C 2 5 |
10. | C 3 7 |
11. | C 4 11 |
+--------------------+
Positive points:
Your data layout cries out for some structure given by a rename
and especially by a reshape long
. I don't give here code for a reshape wide
as for the great majority of Stata purposes, you'd be better off with this layout.
Negative points:
!mi(var`x')
returns whether the first value of a variable is not missing. If foo
were a variable in the dataset, !mi(foo)
is evaluated as !mi(foo[1])
. That is not what you want here. See https://www.stata.com/support/faqs/programming/if-command-versus-if-qualifier/ for the full story.
I'd recommend more evocative variable names.