I am working through a DataFrame trying to clean things up, and came across a bit of an unusual element. Two test values are represented in the same results column posted below. I would like to split the CRP group into another 2 columns, tentatively titled "CRP_mgml" and "CRP_Result", but I am unsure how to find where my IL6 rows end and the CRP rows begin. I have a feeling I could use a regex function to parse through, but I'm pretty new to this so need some help. Thank you in advance.
df2[['Group_Id', 'Result']]
Out[46]:
Group_Id Result
0 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
1 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
2 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
3 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 392.0
4 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
5 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
6 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
7 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
**8 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
9 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 328.0
10 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
11 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
12 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
13 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
14 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
15 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 400.0
16 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 70.9
17 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 3.4
18 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 2.6
19 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 1.9
20 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 30.0
21 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 19.3
22 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 39.4
23 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 17.0
24 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 36.9
25 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 2.6
26 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 10.4
27 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 3.4
28 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 11.3
29 LAB\(LLB63) DBNull\(LLB64) DBNull\INTERL6\BCIL... 281.0
... ...
4558 CRP (mg/L) 1.5
4559 CRP (mg/L) 2.0
4560 CRP (mg/L) 5.3
4561 CRP (mg/L) 0.8
4562 CRP (mg/L) 1.2
4563 CRP (mg/L) 2.0
4564 CRP (mg/L) 2.6
4565 CRP (mg/L) 3.1
4566 CRP (mg/L) 3.9
4567 CRP (mg/L) 5.5
4568 CRP (mg/L) 7.8
4569 CRP (mg/L) 13.0
4570 CRP (mg/L) 20.2
4571 CRP (mg/L) 25.0
4572 CRP (mg/L) 24.8
4573 CRP (mg/L) 42.3
4574 CRP (mg/L) 71.0
4575 CRP (mg/L) 171.2
4576 CRP (mg/L) 271.6
4577 CRP (mg/L) 289.0
4578 CRP (mg/L) 148.7
4579 CRP (mg/L) 77.3
4580 CRP (mg/L) 69.0
4581 CRP (mg/L) 68.5
4582 CRP (mg/L) 69.7
4583 CRP (mg/L) 125.5
4584 CRP (mg/L) 41.9
4585 CRP (mg/L) 12.9
4586 CRP (mg/L) 4.1
4587 CRP (mg/L) 10.9
[4588 rows x 2 columns]**
It looks like your group ID for all of the CRP rows is the same. If this is true, you could create 2 new data frames based on group_id.
It would look something like:
crpDF = df2[df2.Group_Id == "CRP (mg/L)"]
il6DF = df2[~(df2.Group_Id == "CRP (mg/L)")]
the wavy line (~) is a negation symbol so it will take the series returned by (df.Group_Id == 'CRP (mg/L)') and return the opposite.
I like this solution because it doesn't depend on the df being sorted.