I have a dataset containing diagnosis columns (DIAGX1-DIAGX42) for patients and I need to create a variable that sums the values for these based on weights from an external index.
patients = [('pat1', 'Z509', 'M33', 'M32', 'M315'),
('pat2', 'I099', 'I278', 'M05', 'F01'),
('pat3', 'N057', 'N057', 'N058', 'N057')]
labels = ['patient_num', 'DIAGX1', 'DIAGX2', 'DIAGX3', 'DIAGX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
pat1 Z509 M33 M32 M315
pat2 I099 I278 M05 F01
pat3 N057 N057 N058 N057
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 Score
pat1 Z509 M33 M32 M315 1
pat2 I099 I278 M05 F01 6
pat3 N057 N057 N058 N057 0
external_index, where if a column from the dataset above contains a value in any of the below that the value would be added. Only one member contributes to a value been given, e.g a value of both F01
, F02
both in dementia
will only result in 2
being allocated for that record/patient, values are only added/summed if they occur across grouped indexes e.g. F01
=2 and I099
=1 sum to 3
congestive_heart_failure = [
dementia = ["F01", "F02", "F03", "F051", "G30", "G311"]
chronic_pulmonary_disease = [
rheumatologic_disease = [
idx = {
'dementia': dementia,
'rheumatologic_disease': rheumatologic_disease,
'congestive_heart_failure': congestive_heart_failure,
'chronic_pulmonary_disease': chronic_pulmonary_disease,
mapping = {v: k for k, vals in idx.items() for v in vals}
weights = {
'dementia': 2,
'rheumatologic_disease': 1,
'congestive_heart_failure': 2,
'chronic_pulmonary_disease': 1,
# Convert the dataframe into long format
df = df_patients.melt('patient_num')
# Substitute disease name inplace of codes
df['value'] = df['value'].map(mapping)
# Drop dupes per patient and disease
df = df.drop_duplicates(['patient_num', 'value'])
# Map the weights assigned to diseases
df['value'] = df['value'].map(weights)
# Sum the weights per patient and map it back to original dataframe
df_patients['Score'] = df['patient_num'].map(df.groupby('patient_num')['value'].sum())
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 Score
0 pat1 Z509 M33 M32 M315 1.0
1 pat2 I099 I278 M05 F01 6.0
2 pat3 N057 N057 N058 N057 0.0