I have a dataframe which looks like this:
My end goal is to calculate the percent of each purpose and for each origin
so the final table should look this:
Any assistance or direction will be greatly appreciated :)
Thanks in advance :)
Shei
Use transform
to get exactly what you need.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.transform.html .
Even though aggregation
returns a reduced version of the input data, the best thing about transform
is that the output returned is the same shape as the input:
df['%_values'] = 100 * (df['values'] / (df.groupby('Origin')['values'].transform('sum')))
The above 1-liner divides each value with the grouped sum of values (grouped by origin), and then returns the results in a new column.
which will print you:
Origin Destination Purpose values %_values
0 a a business 1490 37.664307
1 a b business 1301 32.886754
2 a c pleasure 1165 29.448938
3 b a pleasure 1777 57.064868
4 b b business 580 18.625562
5 b c pleasure 757 24.309570
6 c a business 1852 41.599281
7 c b pleasure 1949 43.778077
8 c c undeclared 651 14.622642
and of course to get your output with 0 (or any) decimal places, you can use round
:
df['%_values'] = round(100 * (df['values'] / (df.groupby('Origin')['values'].transform('sum'))))
Origin Destination Purpose values %_values
0 a a business 1490 38.0
1 a b business 1301 33.0
2 a c pleasure 1165 29.0
3 b a pleasure 1777 57.0
4 b b business 580 19.0
5 b c pleasure 757 24.0
6 c a business 1852 42.0
7 c b pleasure 1949 44.0
8 c c undeclared 651 15.0