My data are structured as follows (dataex
output at the end, but it is confusing since it only shows the numeric time expressions):
id yearmo birthmo smoke surveytime health
1 2002m1 2003m11 0.8 0 .
1 2002m2 2003m11 0.7 0 .
[...]
1 2004m1 2003m11 0.5 1 "good"
I merged a panel data set containing yearly survey information (e.g. on my dependent variable health
) with monthly information on smoking exposure (numeric). My time variable yearmo
contains the year and month and is in %tm format. Birthmo
is the individual's birth month and year and has the same format.
I want to generate a variable containing the total smoke exposure during pregnancy, which is in the period birthmo[_n-1], birthmo[_n-10]
.
Is it possible to use egen prebirth_smoke = total(smoke)
and refer it to this time period? I couldn't find anything so far. But since it is possible to calculate time differences like gen age2000 = (14610-birthday)/365.25
referring to a variable indicating the birthday, I thought there must be a solution for my problem as well...
My other approach would be to fill up the survey information for each month and use a command like by persnr: egen prebirth_smoke=total(smoke, smoke[_n-10]) if birthmo = moyear
. Then I would have to copy this information to each month of the year again and collapse the data to yearly information. Is there any easier way?
* Example generated by -dataex-. To install: ssc install dataex
clear
input double persnr float(moyear birthmo surveytime) double smoke
23908 504 . 0 23.96554252199413
23908 505 . 0 16.531705948372615
23908 506 . 0 19.731182795698928
23908 507 . 0 15.172916666666667
23908 508 . 0 12.199596774193546
23908 509 . 0 12.218055555555557
23908 510 . 0 10.207416911045943
23908 511 . 0 11.54166666666667
23908 512 . 0 14.311111111111112
23908 513 . 0 16.728005865102638
23908 514 . 0 22.759722222222226
23908 515 . 0 21.10752688172043
23908 516 515 1 27.638440860215056
23908 517 515 1 24.914434523809522
23908 518 515 1 22.103515874027796
23908 519 515 1 16.881249999999998
23908 520 515 1 14.51930596285435
23908 521 515 1 10.573909068193176
23908 522 515 1 10.057123655913978
23908 523 515 1 12.2486559139785
end
format %tm moyear
format %tm birthmo
This works for your example data:
egen BIRTHMO = mean(birthmo), by(persnr)
egen exposure = total(inrange(BIRTHMO - moyear, 1, 10) * smoke), by(persnr)
For a survey of related technique, see https://www.stata-journal.com/sjpdf.html?articlenum=dm0055
Small points:
There is nothing confusing about the dataex
code. The point is that its implications are clear once you run it.
Your egen
code wouldn't work, even in spirit. The total()
argument is illegal. Typo: second =
should be ==
. Warning: the help
for egen
is explicit about not using subscripted expressions.
egen, sum()
is undocumented as of Stata 9. Best to use and refer to total()
. The code is equivalent, but that still holds.