As most of these help questions begin, I'm new to Python and Pandas. I've been learning by doing, especially when I have a particular task to complete. I have searched the help pages and could not find an answer that addressed by specific problem and I could not devise a solution based on answers to similar problems.
I have a data set with 50K+ entries. The general format is:
code value
0 101 0.0
1 102 0.0
2 103 23.2
3 104 10.3
4 105 0.2
5 106 0.0
6 107 22.6
7 108 0.0
8 109 0.0
9 110 2.2
10 111 3.8
11 112 0.0
My first task was to segregate consecutive non-zero values. Through trial and error, I managed to condense my script to one line that accomplished this.
df[df['value'] != 0].groupby((df['value'] == 0).cumsum())
for grp, val in df[df['value'] != 0].groupby((df['value'] == 0).cumsum()):
print(f'[group {grp}]')
print(val)
The output is:
[group 2]
code value
2 103 23.2
3 104 10.3
4 105 0.2
[group 3]
code value
6 107 22.6
[group 5]
code value
9 110 2.2
10 111 3.8
I have other manipulations and calculations to do on this data set and I think the easiest way to access these data would be to transform the groupby object into a column (if that is even the correct terminology?), like so:
code value group
0 103 23.2 2
1 104 10.3 2
2 105 0.2 2
3 107 22.6 3
4 110 2.2 5
5 111 3.8 5
Obviously, I get a "Length of values does not match length of index" error. I searched the help pages and it seemed that I needed to do some type of reset_index method. I tried various syntax structures and many other coding solutions suggested in other threads the past day and a half without success. I finally decided to give up and ask for help when I returned from a short break and found my cat rolling on the keyboard, adding and deleting gobs of gibberish to the script snippets I had been testing.
If someone would be kind enough to help me with this script--to get the groupby object into a column, I would greatly appreciate it. Thanks.
This will give you the groups then drop the zero rows.
df = pd.DataFrame({'code': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112],
'value': [0.0, 0.0, 23.2, 10.3, 0.2, 0.0, 22.6, 0.0, 0.0, 2.2, 3.8, 0.0]})
df['group'] = df.value.eq(0).cumsum()
df = df.loc[df.value.ne(0)]
Output
code value group
2 103 23.2 2
3 104 10.3 2
4 105 0.2 2
6 107 22.6 3
9 110 2.2 5
10 111 3.8 5