I have the following (abbreviated) dictionary:
_d = {
"pain":
["C0030193",
"C0150055",
"C0151825",
"C0184567"],
"anxiety":
["C0003467",
"C0003469",
"C0027769",
"C0154587",
"C0231397",
"C0231401",
"C0231402"],
"depression":
["C0001539",
"C0005587",
"C0011579",
"C0011581",
"C0024517",
"C0086132"],
"fatigue":
["C0015672"]
}
and the following dataframe:
df =
cui
1 "C0015672"
2 "C0015634"
3 "C0011579"
4 "C0030193"
5 "C0031193"
6 "C0030193"
I want to match up the column df["cui"], such that if the value of df["cui"] is in any of the value lists in then dictionary then I want a new column "symptom" based on the dictionary key, otherwise it would remain null.
This is the desired output:
df =
cui symptom
1 C0015672 fatigue
2 C0015634 NaN
3 C0011579 depression
4 C0030193 pain
5 C0031193 NaN
6 C0030193 pain
I can do this by iterating over each row in the dataframe, but since there are 10s of millions of rows, it's super slow. I'm looking for a way to vectorize this.
You can iterate over the dictionary first, then map the dictionary to cui
column
d = {v: k
for k, lst in _d.items()
for v in lst}
df['symptom'] = df['cui'].map(d)