I would like to chain a sequence of processes in python. One of theses processes is create some variables and use the groupby function.
Actually, I want create a new dataframe from my original data base. I can do it in some lines, but I would like some more concise using chain. My original data base is 'df'. First, I create a new binary variable indicating if the feature 'var1' has certain propertie: NaN ou non NaN.
data = df
data['aux1'] = data['var1'].map(math.isnan)
data['count'] = 1
pie = data.groupby(['aux1'])['count'].sum()
In R, I can do something like this:
pie = df %>% select('var1') %>% mutate( aux1 = is.na('var1') , count = 1 )
%>% group_by(aux1) %>% summarise(count = sum('count'))
Is there some chain in python?
You can compare column var1
with Series.isna
and for count use Series.value_counts
:
pie = data['var1'].isna().value_counts()
Or create column aux1
by DataFrame.assign
and aggregate GroupBy.size
, helper column with 1
is not necessary:
pie = data.assign(aux1=data['var1'].isna()).groupby('aux1').size()
But column count
is possible create:
pie = data.assign(aux1=data['var1'].isna(), count=1).groupby('aux1')['count'].sum()