First time posting here - have decided to try and learn how to use python whilst on Covid-19 forced holidays.
I'm trying to summarise some data from a pretty simple database and have been using the value_counts function.
Rather than running it on every column individually, I'd like to loop it over each one and return a summary table. I can do this using df.apply(pd.value_counts) but can't work out how to enter parameters into the the value counts as I want to have dropna = False.
Basic example of data I have:
# Import libraries
import pandas as pd
import numpy as np
# create list of winners and runnerup
data = [['john', 'barry'], ['john','barry'], [np.nan,'barry'], ['barry','john'],['john',np.nan],['linda','frank']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['winner', 'runnerup'])
# print dataframe.
df
How I was doing the value counts for each column:
#Who won the most?
df['winner'].value_counts(dropna=False)
Output:
john 3
linda 1
barry 1
NaN 1
Name: winner, dtype: int64
How can I enter the dropna=False when using apply function? I like the table it outputs below but want the NaN to appear in the list.
#value counts table
df.apply(pd.value_counts)
winner runnerup
barry 1.0 3.0
frank NaN 1.0
john 3.0 1.0
linda 1.0 NaN
#value that is missing from list
#NaN 1.0 1.0
Any help would be appreciated!!
You can use df.apply
, like this:
df.apply(pd.value_counts, dropna=False)