Search code examples
pythonpandasmatplotlibpandas-groupbypython-datetime

pandas plot mixing bars and lines


I have the results from two groupby operations, the first one, m_y_count, in this multiindex format (first column years and second column months):

2007    12    39
2008    1     3
        2     120
2009    6     1000
2010    1     86575
        2     726212
        3     2987954
        4     3598215
        6     160597

and the other one, y_count, only has years:

2007    69
2008    3792
2009    5
2010    791

My question is: How do I plot them in the same figure, with different (log) y-axes, and m_y_count with bars while y_count with a line with marker?

My attempt:

ax = y_count.plot(kind="bar", color='blue', log = True)
ax2 = ax.twinx()
m_y_count.plot(kind="bar", color='red', alpha = 0.5, ax = ax2)

This produces the bars for both pandas Series, but when I try to change to kind="line" in the first line, no line appears.

Any hint on how to proceed? Thanks!


Solution

  • Edit:

    I forgot you wanted one as bars.

    Also, if you don't want to mess with all this datetime stuff, you can just plot the years as integers on the x-axis (with months being 1/12 fractions). But I find that using datetime is pretty smart once you get everything as a time object.


    I am not so familiar with plotting stuff straight out of pandas, but you can pretty easily do this in matplotlib. I couldn't quite copy your data in, though: to follow the example below you would have to convert your multi-index to a single datetimeindex, which I think would not be too hard.

    import datetime as dt
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    
    #making fake data
    dates1 = pd.date_range('12-01-2007','06-01-2010',periods=9)
    data1 = np.random.randint(0,3598215,9)
    df1 = pd.DataFrame(data1,index=dates1,columns=['Values'])
    dates2 = pd.date_range('01-01-2006',periods=4,freq='1Y') #i don't get why but this starts at the end of 2006, near 2007
    df2 = pd.DataFrame([69,3000,5,791],index=dates2,columns=['Values'])
    
    #plotting
    fig, ax = plt.subplots()
    ax.bar(df2.index,df2['Values'],width=dt.timedelta(days=200),color='red',label='df2')
    ax.set_yscale('log')
    ax.set_ylabel('DF2 values',color='red')
    
    ax2 = ax.twinx()
    ax2.plot(df1.index,df1['Values'],color='blue',label='df1')
    ax2.set_yscale('log',)
    ax2.set_ylabel('DF1 values',color='blue')
    
    years = mdates.YearLocator() #locate years for the ticks
    ax.xaxis.set_major_locator(years) #format the ticks to just show years
    xfmt = mdates.DateFormatter('%Y')
    ax.xaxis.set_major_formatter(xfmt)
    
    ax.legend(loc=0)
    ax2.legend(loc=2)
    

    enter image description here

    I can elaborate if you can't port this to your case.