I have this data:
Date Month ProductCategory Sales
1/1/2009 2009-Jan Clothing 1755
1/1/2009 2009-Jan Grossery 524
1/1/2009 2009-Jan Toys 936
2/1/2009 2009-Feb Clothing 1729
2/1/2009 2009-Feb Grossery 496
2/1/2009 2009-Feb Toys
I want it as this table:
Date Month Clothing Sales Grossery Sales Toys Sales Total Sales
1/1/2009 2009-Jan 1755 524 936 3215
2/1/2009 2009-Feb 1729 496 2225
I tried this code:
train_cross =pd.crosstab([df_train.Date,df_train.Sales],
df_train.ProductCategory, margins=True)
.rename_axis(None,1)
.reset_index()train_cross
.head()
I got these results:
Date Sales Grossery Toys Clothing All
1/1/2009 524 1 0 0 1
1/1/2009 936 0 1 0 1
1/1/2009 1755 0 0 1 1
2/1/2009 496 1 0 0 1
2/1/2009 1729 0 0 1 1
Where I am wrong?
Change first list to new index by columns Date
and Month
, add Sales
to values
, add aggregate function and specify column name of total column:
df = pd.crosstab(index=[df_train.Date,df_train.Month],
columns=df_train.ProductCategory,
values=df_train.Sales,
aggfunc='sum',
margins=True,
margins_name='Total Sales')
print (df)
ProductCategory Clothing Grossery Toys Total Sales
Date Month
1/1/2009 2009-Jan 1755.0 524.0 936.0 3215.0
2/1/2009 2009-Feb 1729.0 496.0 0.0 2225.0
Total Sales 3484.0 1020.0 936.0 5440.0
If necessary remove last row and convert MultiIndex
to columns:
df = df.iloc[:-1].reset_index().rename_axis(None, axis=1)
print (df)
Date Month Clothing Grossery Toys Total Sales
0 1/1/2009 2009-Jan 1755.0 524.0 936.0 3215.0
1 2/1/2009 2009-Feb 1729.0 496.0 0.0 2225.0
Solution with pivot_table
with no margins
:
df = df_train.pivot_table(index=['Date','Month'],
columns='ProductCategory',
values='Sales', aggfunc='sum')
df['Total Sales'] = df.sum(axis=1)
df = df.reset_index().rename_axis(None, axis=1)
print (df)
Date Month Clothing Grossery Toys Total Sales
0 1/1/2009 2009-Jan 1755.0 524.0 936.0 3215.0
1 2/1/2009 2009-Feb 1729.0 496.0 0.0 2225.0
And solution with margins
:
df = df_train.pivot_table(index=['Date','Month'],
columns='ProductCategory',
values='Sales',
aggfunc='sum',
margins=True,
margins_name='Total Sales')
print (df)
ProductCategory Clothing Grossery Toys Total Sales
Date Month
1/1/2009 2009-Jan 1755.0 524.0 936.0 3215.0
2/1/2009 2009-Feb 1729.0 496.0 0.0 2225.0
Total Sales 3484.0 1020.0 936.0 5440.0
df = df.iloc[:-1].reset_index().rename_axis(None, axis=1)