Search code examples
pythonpandasslicemaskargmax

Set value of first item in slice in python pandas


So I would like make a slice of a dataframe and then set the value of the first item in that slice without copying the dataframe. For example:

df = pandas.DataFrame(numpy.random.rand(3,1))
df[df[0]>0][0] = 0

The slice here is irrelevant and just for the example and will return the whole data frame again. Point being, by doing it like it is in the example you get a setting with copy warning (understandably). I have also tried slicing first and then using ILOC/IX/LOC and using ILOC twice, i.e. something like:

df.iloc[df[0]>0,:][0] = 0
df[df[0]>0,:].iloc[0] = 0

And neither of these work. Again- I don't want to make a copy of the dataframe even if it id just the sliced version.

EDIT: It seems there are two ways, using a mask or IdxMax. The IdxMax method seems to work if your index is unique, and the mask method if not. In my case, the index is not unique which I forgot to mention in the initial post.


Solution

  • So using some of the answers I managed to find a one liner way to do this:

    np.random.seed(1)
    df = pd.DataFrame(np.random.randint(4, size=(5,1)))
    print df
       0
    0  1
    1  3
    2  0
    3  0
    4  3
    df.loc[(df[0] == 0).cumsum()==1,0] = 1
       0
    0  1
    1  3
    2  1
    3  0
    4  3
    

    Essentially this is using the mask inline with a cumsum.