I have a DataFrame with the following structure:
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['R.04T', 1, 2013, 23456, 22, 1 ], ['R.04T', 15, 2014,
23456, 22, 1], ['F.04T', 9, 2010, 75920, 00, 3], ['F.04T', 4,
2012, 75920, 00, 3], ['R.04T', 7, 2013, 20054, 13, 1],
['R.04T',12, 2014, 20058,13, 1]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['product_code', 'sold', 'year', 'city_number',
'district_number', 'number_of_the_department'])
print(df)
I want to know if the locations ('city_number' + 'district_number' + 'number_of_the_department') have increased or decreased the amount of sales per year, per article. Id thought about joining the columns to one location column like the following:
# join the locations
df['location'] = df['city_number'].astype(str) + ','+
df['district_number'].astype(str) + ','+ df['number_of_the_department'].astype(str)
But I'm not sure how to groupby? the df to get my answer of the question.
I want to know if the sales have increased or decreased (per year and item) by a certain percentage per year (p.ex. 2013 to 2014 x% decreased).
Maybe someone can help? :)
Try this:
df = df.assign(
pct_change_sold=df.sort_values(by="year")
.groupby(by=["city_number", "district_number", "number_of_the_department"])["sold"]
.pct_change()
.fillna(0)
)
product_code sold year city_number district_number number_of_the_department pct_change_sold
0 R.04T 1 2013 23456 22 1 0.000000
1 R.04T 15 2014 23456 22 1 14.000000
2 F.04T 9 2010 75920 0 3 0.000000
3 F.04T 4 2012 75920 0 3 -0.555556
4 R.04T 7 2006 75920 22 1 0.000000
5 U.90G 12 2005 75021 34 3 0.000000