Search code examples
pythonpandasdataframecsvdata-preprocessing

How to replace cost of an item with the previous cost of the same item in a dataframe using Pandas?


Suppose I have the following dataframe:

data = {'ID': ['A', 'B', 'C', 'A', 'C', 'O', 'B', 'A', 'B', 'O'], 'Item':['Apple','Banana','Carrot','Apple', 'Carrot', 'Orange', 'Banana', 'Apple', 'Banana', 'Orange'], 'Cost':[10, 12, 15, 13, 54, 20, 73, 22, 19, 32]}
dataframe = pd.DataFrame(data)
dataframe

enter image description here

And I want to replace the cost of the current item with the cost of the previous item using Pandas, with the first instance of each item being deleted. So the above dataframe would become

data2 = {'ID': ['A', 'C', 'B', 'A', 'B', 'O'], 'Item':['Apple', 'Carrot', 'Banana', 'Apple', 'Banana', 'Orange'], 'Cost':[10, 15, 12, 13, 73, 20]}
dataframe2 = pd.DataFrame(data2)
dataframe2 

enter image description here

What's a good way to do it?


Solution

  • You can use groupby on Item as well. This gives you output in the same order you expected

    data['Cost'] = data.groupby('Item')['Cost'].shift(fill_value=0)
    data[data['Cost'] != 0]
    

    This gives us expected output:

      ID    Item  Cost
    3  A   Apple    10
    4  C  Carrot    15
    6  B  Banana    12
    7  A   Apple    13
    8  B  Banana    73
    9  O  Orange    20