Search code examples
statastata-macros

Grouping observations with same ID


I've taken a project in which the dataset separates the patient information and the disease observation, but the two observations have the same ID. Example:

ID: 12345 Patient age: 23, patient weight: 55, patient height: 180

ID: 12345 Disease progression: A, disease type: abc, Disease risk: 50

This goes on for every patient.

Now I want to run some statistics on this and for that I, as far as I know, have to consolidate this information in ONE observation, so we have patient characteristics and their disease characteristics in one observation. What would be the best way to do this? Bear in mind the two observations have mutually exclusive variables, so they can simply be "grouped" somehow.

* Example generated by -dataex-. To install: ssc install dataex
clear
input str32 record_id byte(treatment gender pcos aneurysmal_finding sz_anu_a_basilaris sz_anu_a_basilaris_2 sz_anu_a_basilaris_4)
"8a36ac06e58a541430cd8b31df3aeef2" . 1 0 2 . . .
"8a36ac06e58a541430cd8b31df3aeef2" . . . . . . .
"2afc1f12901992a1f973cde814615349" . 1 0 2 . . .
"2afc1f12901992a1f973cde814615349" . . . . . . .
"1e00442745c25082a64197b96065f755" . 1 0 2 . . .
"1e00442745c25082a64197b96065f755" . . . . . . .
"c90aef04e29f38fc3e21b919d5106ce8" . 1 0 2 . . .
"c90aef04e29f38fc3e21b919d5106ce8" . . . . . . .
"7cac71f3d31c7e9ec26e6a885ad554ab" . 2 0 2 . . .
"7cac71f3d31c7e9ec26e6a885ad554ab" . . . . . . .
"53c1f08aff25ace9afc46aca3263e7ca" . 1 0 2 . . .
"53c1f08aff25ace9afc46aca3263e7ca" . . . . . . .
"cdbf4328e0724f30950e437bc6bbe262" . 2 0 2 . . .
"cdbf4328e0724f30950e437bc6bbe262" . . . . . . .
"50d722dca92aee72c39c846066850a22" 1 2 0 2 . . .
"50d722dca92aee72c39c846066850a22" . . . . . . .
"ffe78f8927a81a5521f098aa077a755f" . 1 0 1 . . .
"ffe78f8927a81a5521f098aa077a755f" . . . . . . .
"aa2309be5c9b76012462fce3f43a8249" . 1 0 1 . . .
"aa2309be5c9b76012462fce3f43a8249" . . . . . . .
"4917b3d300e195b895e573474be6ccb6" . 1 0 2 . . .
"4917b3d300e195b895e573474be6ccb6" . . . . . . .
"b88557884343831060297ff4b67aeb36" . 1 . 2 . . .
"b88557884343831060297ff4b67aeb36" . . . . . . .
"ebe8ab86719aa71b68d7f0df3e451ce5" . . . 2 . . .
"8dd5267472002c796ce621984f9024ed" . . . . . . 3
"0b3e110c9765e14a5c41fadcc3cfc300" . . . . . . .
"8f58545ef8d37f290d26881743137a72" 1 2 0 2 . . .
"8f58545ef8d37f290d26881743137a72" . . . . . . .
"dcb6a27d1d4f5f1228860a76fa29e5ba" . 1 0 2 . . .
"dcb6a27d1d4f5f1228860a76fa29e5ba" . . . . . . .
"baedce78f2e736fe4d54dbdbe0460694" . 2 0 2 . . .
"baedce78f2e736fe4d54dbdbe0460694" . . . . . . .
"bb1db3b0eca9652cff3c76060b06d60b" 1 2 0 2 . . .
"bb1db3b0eca9652cff3c76060b06d60b" . . . . . . .
"6741bd218feba9de630dfe409a4e50ee" 1 2 0 2 . . .
"6741bd218feba9de630dfe409a4e50ee" . . . . . . .
"1e1425d670466e1a2c6c752d9227df17" . 2 0 2 . . .
"1e1425d670466e1a2c6c752d9227df17" . . . . . . .
"4c6672a06addc8e01842d2741be1857d" . 1 0 2 . . .
"4c6672a06addc8e01842d2741be1857d" . . . . . . .
"f1be80fbb7e4e1f5582780e25bfc8a2c" . 2 0 2 . . .
"f1be80fbb7e4e1f5582780e25bfc8a2c" . . . . . . .
"9991ec586e5f510e161fcad93fb1d79f" . 1 0 2 . . .
"9991ec586e5f510e161fcad93fb1d79f" . . . . . . .
"5c1eb56eccf9cf67ae6065f82b6eb6ce" . 1 0 2 . . .
"5c1eb56eccf9cf67ae6065f82b6eb6ce" . . . . . . .
"f9d10d2eb1951fa2ebc8b0509bb25593" . 1 0 2 . . .
"f9d10d2eb1951fa2ebc8b0509bb25593" . . . . . . .
"fbdf663512805caffe7a99d14fc9561f" . 2 0 2 . . .
"fbdf663512805caffe7a99d14fc9561f" . . . . . . .
"3b55aebe1b4b22e0c77168acc4b775dd" . 1 0 2 . . .
"3b55aebe1b4b22e0c77168acc4b775dd" . . . . . . .
"5f28194ddef4f9d057db2e4fcb7b5cf0" . 1 0 2 . . .
"5f28194ddef4f9d057db2e4fcb7b5cf0" . . . . . . .
"0b8d8253a8415275dbc2619e039985bb" . 1 0 2 5 . .
"0b8d8253a8415275dbc2619e039985bb" . . . . . . .
"4fb152c8524750b65b6717282cceb805" . 1 0 2 . . .
"4fb152c8524750b65b6717282cceb805" . . . . . . .
"ff5136e64c2110c355debca6acb74a13" . 1 0 2 . . .
"ff5136e64c2110c355debca6acb74a13" . . . . . . .
"29534fe6f18b75090b9d18f853ed7ec1" . 1 0 2 5 5 .
"29534fe6f18b75090b9d18f853ed7ec1" . . . . . . .
"8c334d2225db0661b25cf5f2c65fbcb9" . 1 0 2 . . .
"8c334d2225db0661b25cf5f2c65fbcb9" . . . . . . .
"68cf4b9f2db11cb9cf44fd0e03c53f16" . 2 . 2 . . .
"68cf4b9f2db11cb9cf44fd0e03c53f16" . . . . . . .
"6a44e65e7b1f33a3603acf2532bb40f9" . 1 0 2 . . .
"6a44e65e7b1f33a3603acf2532bb40f9" . . . . . . .
"2ed013748bf88df47c39d83bd48d8040" . 1 0 2 . . .
"2ed013748bf88df47c39d83bd48d8040" . . . . . . .
"c2f32f5b61b97d658f7b042b49b8da96" . 1 0 2 . . .
"c2f32f5b61b97d658f7b042b49b8da96" . . . . . . .
"58e1e0b5c29dee7d3739ec582d62b84c" . . . . . . .
"58e1e0b5c29dee7d3739ec582d62b84c" . . . . . . .
"8635b098d70b200fe8eef5dbf7c1c156" . 2 0 2 . . .
"8635b098d70b200fe8eef5dbf7c1c156" . . . . . . .
"266f1f1517fb50bafca92fff39c259d5" . 1 0 2 . . .
"266f1f1517fb50bafca92fff39c259d5" . . . . . . .
"d3df754a7322c02ed89f1208977a19ae" 1 2 0 2 5 . .
"d3df754a7322c02ed89f1208977a19ae" . . . . . . .
"46598c5d2da10731582d6342944e9337" . 1 0 2 . . .
"46598c5d2da10731582d6342944e9337" . . . . . . .
"8c2c5aa9b02eb1092b34cf38c2b1c83d" . 1 0 2 2 . .
"8c2c5aa9b02eb1092b34cf38c2b1c83d" . . . . . . .
"797c9cf7caf53f514f0154f34895fa80" . 1 0 2 . . .
"797c9cf7caf53f514f0154f34895fa80" . . . . . . .
"9b28a68095c520edcb56bee8aa5737b6" . 1 0 2 . . .
"9b28a68095c520edcb56bee8aa5737b6" . . . . . . .
"09e03748da35e9d799dc5d8ddf1909b5" . 1 0 2 . . .
"09e03748da35e9d799dc5d8ddf1909b5" . . . . . . .
"75d5574d8804d24932e3d0d9cbfa4b11" . 1 0 2 . . .
"75d5574d8804d24932e3d0d9cbfa4b11" . . . . . . .
"b5bda504efd4bd3b3be68513ccbf99ef" . 1 0 2 . . .
"b5bda504efd4bd3b3be68513ccbf99ef" . . . . . . .
"dc289c2a5a31355521dde31c4abd4c83" . 1 0 2 . . .
"dc289c2a5a31355521dde31c4abd4c83" . . . . . . .
"76ce83dbd64f05556e903deb54959d22" . 1 0 2 . . .
"76ce83dbd64f05556e903deb54959d22" . . . . . . .
"830ee6dd656938201f4a712607739768" . 1 0 2 . . .
end
label values treatment treatment_
label def treatment_ 1 "I blodfortyndende behandling", modify
label values gender gender_
label def gender_ 1 "Kvinde", modify
label def gender_ 2 "Mand", modify
label values pcos pcos_
label def pcos_ 0 "Nej", modify
label values aneurysmal_finding aneurysmal_finding_
label def aneurysmal_finding_ 1 "Screening", modify
label def aneurysmal_finding_ 2 "Tilfældig fund", modify
label values sz_anu_a_basilaris sz_anu_a_basilaris_
label def sz_anu_a_basilaris_ 2 "7-12 mm", modify
label def sz_anu_a_basilaris_ 5 "Uoplyst", modify
label values sz_anu_a_basilaris_2 sz_anu_a_basilaris_2_
label def sz_anu_a_basilaris_2_ 5 "Uoplyst", modify
label values sz_anu_a_basilaris_4 sz_anu_a_basilaris_4_
label def sz_anu_a_basilaris_4_ 3 "13-25 mm", modify

Essentially my problem could be solved using (assuming this syntax was correct):

gen obs1 = .
replace every variableValue in obs1 == variableValue in 12345

And then just iterating it for 1000s of observations..


Solution

  • On the assumption that each identifier is represented by at most 2 observations, and that each other variable is numeric and represented by a non-missing value in at most 1 observation, you can just collapse. The default collapse by means should work fine.

    ds record_id, not
    collapse `r(varlist)' , by(record_id)
    

    A more cautious approach would be to check the assumptions first!