Search code examples
pythonpandasgroup-byboxplot

Plot pandas groupby boxplot and dataframe plot in the same figure


Why the script below does not work? How can I match the groupby boxplot and the DataFrame plot in the same figure?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 1, figsize=(15, 15))
n = 480
ts = pd.DataFrame(np.random.randn(n), index=pd.date_range(start="2014-02-01",periods=n,freq="H"))
ts.groupby(lambda x: x.strftime("%Y-%m-%d")).boxplot(subplots=False, rot=90, ax = axes)
ts.plot(ax = axes)
plt.show()

Solution

  • Reason why you got the boxplots crammed to the left and the line plot crammed to the right:

    Matplotlib internally transforms strings/categories on the x-axis to integers starting from 0. But for dates, it transforms them to float values corresponding to the number of days since 01/01/1970. That's why I use 16102 (I added 0.5 to put the box in the middle of the month instead of the beginning).

    fig, axes = plt.subplots(1, 1, figsize=(10, 5))
    n = 480
    ts = pd.DataFrame(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n,freq="H"))
    g = ts.groupby(lambda x: x.strftime("%Y-%m-%d"))
    g.boxplot(subplots=False, ax=axes, positions=np.arange(16102.5, 16122.5))
    ts.plot(ax = axes)
    
    # To format the x-tick labels
    labels = pd.date_range(start="2014-02-01", periods=20, freq="D")
    labels = [t.strftime('%Y-%m-%d') for t in labels]
    axes.set_xticklabels(labels=labels, rotation=70)
    plt.show()
    

    plot