I have the following code that finds the total and unique sales for each employee using a group by with Employee_id
and aggregation with Customer_id
.
Sales.groupby('Employee_id').agg({
'Customer_id': [
('total_sales', 'count'),
('unique_sales', 'nunique')
]})
It is important to know that I will perform aggregations with other columns as well, but so far this is all I have written. So if you have a proposed solution, I ask that you please consider that in case it makes a difference.
While this does exactly what I want in terms of computing total and unique sales for each employee and creating two columns, it creates nested column names. So the column names look like, [('Customer_id', 'total_sales'), ('Customer_id', 'unique_sales')], which I don't want. Is there any way to easily get rid of the nested part to only include ['total_sales', 'unique_sales'], or is the easiest thing to just rename the columns once I have finished everything?
Thanks!
You could simply rename the columns:
import numpy as np
import pandas as pd
np.random.seed(2018)
df = pd.DataFrame(np.random.randint(10, size=(100, 3)), columns=['A','B','C'])
result = df.groupby('A').agg({'B': [('D','count'),('E','nunique')],
'C': [('F','first'),('G','max')]})
result.columns = result.columns.get_level_values(1)
print(result)
Alternatively, you could save the groupby
object, and use grouped[col].agg(...)
to produce sub-DataFrames which can then be pd.concat
'ed together:
import numpy as np
import pandas as pd
np.random.seed(2018)
df = pd.DataFrame(np.random.randint(10, size=(100, 3)), columns=['A','B','C'])
grouped = df.groupby('A')
result = pd.concat([grouped['B'].agg([('D','count'),('E','nunique')]),
grouped['C'].agg([('F','first'),('G','max')])], axis=1)
print(result)
both code snippets yield the following (though with columns perhaps in a different order):
D E F G
A
0 18 8 8 9
1 12 8 6 6
2 14 8 0 8
3 10 9 8 9
4 7 6 3 5
5 8 5 6 7
6 9 7 9 9
7 8 6 4 7
8 8 7 2 9
9 6 5 7 9
Overall, I think renaming the columns after-the-fact is the easiest and more readable option.