I have a dataframe (named df) as follows:
s01 s03 s06 s07 s08
0 1 1 1 1 1
1 1 1 1 1 1
2 0 1 1 0 1
3 0 0 1 1 0
4 0 0 0 1 1
I would like to replace all the ones by its index value.
The final result should look like this:
s01 s03 s06 s07 s08
0 0 0 0 0 0
1 1 1 1 1 1
2 0 2 2 0 2
3 0 0 3 3 0
4 0 0 0 4 4
This is just a sample. The real dataframe has thousands of rows and thousands of columns. The priority is to have an efficient code that modifies the data as quickly as possible.
I have thought of 3 possible ways to solve this:
Using 2 'for' loops and an 'if' statement and loop over the panda object directly or converting the data to a 2D numpy array and looping over that.
Using some kind of pandas build-in filtering function over the pandas dataframe.
Converting the dataframe into a 2D Numpy array and using some kind of numpy build-in function to modify the data.
Which is the most time efficient way?
Is there some other way that is more efficient and I haven't thought of it?
Thank you
You can do with mask
:
df.mask(df.eq(1), df.index)
Output:
s01 s03 s06 s07 s08
0 0 0 0 0 0
1 1 1 1 1 1
2 0 2 2 0 2
3 0 0 3 3 0
4 0 0 0 4 4
If your index is numerical as in this sample, you can also do:
df.mul(df.index, axis=0)