python pandas histogram categorical-data

histogram/value counts from pandas dataframe columns with categorical data and custom "bins"

Consider the following dataframe:

import pandas as pd

x = pd.DataFrame([[ 'a', 'b'], ['a', 'c'], ['c', 'b'], ['d', 'c']])
print(x)

   0  1
0  a  b
1  a  c
2  c  b
3  d  c

I would like to obtain the relative frequencies of the data in each column of the dataframe based on some custom "bins" which would be (a possible super-set of) the unique data values. For example, if:

b = ['a', 'b', 'c', 'd', 'e', 'f']

I would like to obtain:

Is there a one (or two) liner to achieve this?

Solution

Try apply value_counts, then reindex based on b:

import pandas as pd

x = pd.DataFrame([['a', 'b'], ['a', 'c'], ['c', 'b'], ['d', 'c']])

b = ['a', 'b', 'c', 'd', 'e', 'f']
df = x.apply(lambda s: s.value_counts()).reindex(b).fillna(0).astype(int)

print(df)

df:

A melt and crosstab option:

import pandas as pd

x = pd.DataFrame([['a', 'b'], ['a', 'c'], ['c', 'b'], ['d', 'c']])

b = ['a', 'b', 'c', 'd', 'e', 'f']
df = x.melt()
df = pd.crosstab(df['value'], df['variable']) \
    .reindex(b).fillna(0).astype(int) \
    .rename_axis(None, axis=1).rename_axis(None, axis=0)

print(df)

df: