I have a list of person ids, and the types of medicines they got on specific dates.
I would like to create a variable count
whereby I can give the indicator 1 to the first drug that occurs, 2 to the second unique drug and 3 to the third unique drug. When the first drug occurs after the second and third, I want it to still have the indicator 1. Likewise for unique drug 2, it should maintain the value 2 throughout the person's whole medication history, and the same for drug 3.
+-------------------------------------+
| p_id date agent_~e count |
|-------------------------------------|
38. | 1001 13dec2001 thiazide 1|
39. | 1001 12apr2002 thiazide 1|
40. | 1001 15jul2002 thiazide 1|
41. | 1001 28aug2002 arb 2|
42. | 1001 26sep2002 CCB 3|
|-------------------------------------|
43. | 1001 26sep2002 arb 2|
44. | 1001 10oct2002 CCB 3|
45. | 1001 10oct2002 thiazide 1|
46. | 1001 10oct2002 arb 2|
47. | 1001 10dec2002 CCB 3|
|-------------------------------------|
48. | 1001 10dec2002 arb 2|
+-------------------------------------+
Because each person has a different set of drugs, I think I need quite a general solution as opposed to something like
gen count = 1 if agent_type == "thiazide".
For example, person two is below and they have a very different drug history to person one above.
+-------------------------------+
| p_id date agent_t~e |
|-------------------------------|
207. | 2001 08jul1999 ace_inhib |
208. | 2001 02aug1999 ace_inhib |
209. | 2001 25aug1999 ace_inhib |
210. | 2001 22oct1999 ace_inhib |
211. | 2001 18nov1999 CCB |
|-------------------------------|
212. | 2001 18nov1999 ace_inhib |
213. | 2001 14dec1999 CCB |
214. | 2001 12jan2000 CCB |
215. | 2001 03feb2000 CCB |
216. | 2001 03feb2000 arb |
|-------------------------------|
217. | 2001 02mar2000 CCB |
+-------------------------------+
"Unique" is a common misnomer here; strictly, it means occurring once only, which is not what you mean at all. "Distinct" is a much better word: for a discussion in Stata context, see here.
Please find out about dataex
from SSC to be able to show data examples that can be copied and pasted directly. Yours required some engineering to be made easy to use.
Your problem is already a Stata FAQ found here. It is a good idea to look through the FAQs before posting.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float p_id str8 agent_type float(wanted date)
1001 "thiazide" 1 15322
1001 "thiazide" 1 15442
1001 "thiazide" 1 15536
1001 "arb" 2 15580
1001 "CCB" 3 15609
1001 "arb" 2 15609
1001 "CCB" 3 15623
1001 "thiazide" 1 15623
1001 "arb" 2 15623
1001 "CCB" 3 15684
1001 "arb" 2 15684
2001 "ace_inhi" 1 14433
2001 "ace_inhi" 1 14458
2001 "ace_inhi" 1 14481
2001 "ace_inhi" 1 14539
2001 "CCB" 2 14566
2001 "ace_inhi" 1 14566
2001 "CCB" 2 14592
2001 "CCB" 2 14621
2001 "CCB" 2 14643
2001 "arb" 3 14643
2001 "CCB" 2 14671
end
format date %td
bysort p_id agent_type (date) : gen firstdate = date[1]
egen group = group(p_id firstdate agent_type)
bysort p_id (group date agent_type): gen count = sum(group != group[_n-1])
assert count == wanted
Note that the code takes care of the possibility that two or more drugs are first used on the same day by the same person.