Search code examples
pythonpandasnumpydataframeediting

Replacing values in a Pandas Dataframe


I have a dataframe (named df) as follows:

    s01  s03  s06  s07  s08
0   1    1    1    1    1
1   1    1    1    1    1
2   0    1    1    0    1
3   0    0    1    1    0
4   0    0    0    1    1

I would like to replace all the ones by its index value.

The final result should look like this:

    s01  s03  s06  s07  s08
0   0    0    0    0    0
1   1    1    1    1    1
2   0    2    2    0    2
3   0    0    3    3    0
4   0    0    0    4    4

This is just a sample. The real dataframe has thousands of rows and thousands of columns. The priority is to have an efficient code that modifies the data as quickly as possible.

I have thought of 3 possible ways to solve this:

  • Using 2 'for' loops and an 'if' statement and loop over the panda object directly or converting the data to a 2D numpy array and looping over that.

  • Using some kind of pandas build-in filtering function over the pandas dataframe.

  • Converting the dataframe into a 2D Numpy array and using some kind of numpy build-in function to modify the data.

Which is the most time efficient way?

Is there some other way that is more efficient and I haven't thought of it?

Thank you


Solution

  • You can do with mask:

    df.mask(df.eq(1), df.index)
    

    Output:

       s01  s03  s06  s07  s08
    0    0    0    0    0    0
    1    1    1    1    1    1
    2    0    2    2    0    2
    3    0    0    3    3    0
    4    0    0    0    4    4
    

    If your index is numerical as in this sample, you can also do:

    df.mul(df.index, axis=0)