Given the dataframe below:
Name Activity Hour Month
A TT 5 1
A TT 2 1
A UU 1 1
A UU 1 2
A UU 1 3
B TT 40 3
C UU 10 1
D TT 2 2
D TT 2 2
D TT 2 2
D TT 5 1
The next step is to get the summation if the rows have identical value of the column Name
and Activity
.
For example, for the case Name: A
and Activity: TT
will give the summation of 7
Then, I would like to present it in a cross tab format which is group according to month
and activity
, as shown below
Month
1 2 3
TT UU TT UU TT UU
A 7 1 0 1 0 1
B 0 0 0 0 40 0
C 0 10 0 0 0 0
D 5 0 6 0 0 0
May I know whether this can achieve directly using pandas crosstab?
p.s., similar but different topic has been discussed separately here. Please let me know if this thread is considered to be a duplicate of the OP
Considering below df:
In [93]: df
Out[93]:
Name Activity Hour Month
0 A TT 5 1
1 A TT 2 1
2 A UU 1 1
3 A UU 1 2
4 A UU 1 3
5 B TT 40 3
6 C UU 10 1
7 D TT 2 2
8 D TT 2 2
9 D TT 2 2
10 D TT 5 1
Solution:1 If you want using pd.crosstab
, you can do this:
In [92]: pd.crosstab(df.Name, columns=[df.Month, df.Activity], values=df.Hour, aggfunc='sum').fillna(0)
Out[92]:
Month 1 2 3
Activity TT UU TT UU TT UU
Name
A 7.0 1.0 0.0 1.0 0.0 1.0
B 0.0 0.0 0.0 0.0 40.0 0.0
C 0.0 10.0 0.0 0.0 0.0 0.0
D 5.0 0.0 6.0 0.0 0.0 0.0
Solution:2
You can use df.pivot_table
:
In [89]: df.pivot_table(index='Name', columns=['Month', 'Activity'], values='Hour', aggfunc='sum', fill_value=0)
Out[89]:
Month 1 2 3
Activity TT UU TT UU TT UU
Name
A 7 1 0 1 0 1
B 0 0 0 0 40 0
C 0 10 0 0 0 0
D 5 0 6 0 0 0