How do I plot a block of histograms from a group of data in a dataframe? For example, given:
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter': x, 'N': y})
I tried:
df.groupby('Letter').hist()
...which failed with the error message:
TypeError: cannot concatenate 'str' and 'float' objects
I'm on a roll, just found an even simpler way to do it using the by keyword in the hist method:
df.hist('N', by='Letter')
That's a very handy little shortcut for quickly scanning your grouped data!
For future visitors, the product of this call is the following chart:
In answer to questions below, here's an example of specific tailoring of the histogram plots:
# import libraries
import pandas as pd
import numpy as np
# Create test dataframe
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
z = np.random.randn(1000)
df = pd.DataFrame({'Letter':x, 'N1':y, 'N2':z})
# Plot histograms
axes = df.hist(['N1','N2'], by='Letter',bins=10, layout=(2,2),
legend=True, yrot=90,sharex=True,sharey=True,
log=True, figsize=(6,6))
for ax in axes.flatten():
ax.set_xlabel('N')
ax.set_ylabel('Count')
ax.set_ylim(bottom=1,top=100)