Search code examples
pythonpandasdatetimematplotlibplot

pandas bar plot combined with line plot shows the time axis beginning at 1970


I am trying to draw a stock market graph

timeseries vs closing price and timeseries vs volume.

Somehow the x-axis shows the time in 1970

the following is the graph and the code

enter image description here

The code is:

import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.dates as mdates


pd_data = pd.DataFrame(data, columns=['id', 'symbol', 'volume', 'high', 'low', 'open', 'datetime','close','datetime_utc','created_at'])

pd_data['DOB'] = pd.to_datetime(pd_data['datetime_utc']).dt.strftime('%Y-%m-%d') 

pd_data.set_index('DOB')

print(pd_data)

print(pd_data.dtypes)

ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")

#ax.pd_data['volume'].plot(secondary_y=True,  kind='bar')
ax1=pd_data.plot(y='volume',secondary_y=True, ax=ax,kind='bar')
ax1.set_ylabel('Volumne')


# Choose your xtick format string
date_fmt = '%d-%m-%y'

date_formatter = mdates.DateFormatter(date_fmt)
ax1.xaxis.set_major_formatter(date_formatter)

# set monthly locator
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))

# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()

plt.show()

Also tried the two graphs independently without ax=ax

ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")

ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')

then price graph shows years properly whereas volumen graph shows 1970

And if i swap them

ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')

ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")

Now the volume graph shows years properly whereas the price graph shows the years as 1970

I tried removing secondary_y and also changing bar to line. BUt no luck

Somehow pandas Data after first graph is changing the year.


Solution

    • I do not advise plotting a bar plot with such a numerous amount of bars.
    • This answer explains why there is an issue with the xtick labels, and how to resolve the issue.
    • Plotting with pandas.DataFrame.plot works without issue with .set_major_locator
    • Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.2
    import pandas as pd
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    import yfinance as yf  # conda install -c conda-forge yfinance or pip install yfinance --upgrade --no-cache-dir
    
    # download data
    df = yf.download('amzn', start='2015-02-21', end='2021-04-27')
    
    # plot
    ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')
    
    ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, alpha=0.5, rot=0, lw=0.5)
    ax1.set(ylabel='Volume')
    
    # format
    date_fmt = '%d-%m-%y'
    years = mdates.YearLocator()   # every year
    yearsFmt = mdates.DateFormatter(date_fmt)
    
    ax.xaxis.set_major_locator(years)
    ax.xaxis.set_major_formatter(yearsFmt)
    
    plt.setp(ax.get_xticklabels(), ha="center")
    plt.show()
    

    enter image description here


    • Why are the OP x-tick labels starting from 1970?
    • Bar plots locations are being 0 indexed (with pandas), and 0 corresponds to 1970
      • See Pandas bar plot changes date format
      • Most solutions with bar plots simply reformat the label to the appropriate datetime, however this is cosmetic and will not align the locations between the line plot and bar plot
      • Solution 2 of this answer shows how to change the tick locators, but is really not worth the extra code, when plt.bar can be used.
    print(pd.to_datetime(ax1.get_xticks()))
    
    DatetimeIndex([          '1970-01-01 00:00:00',
                   '1970-01-01 00:00:00.000000001',
                   '1970-01-01 00:00:00.000000002',
                   '1970-01-01 00:00:00.000000003',
                   ...
                   '1970-01-01 00:00:00.000001552',
                   '1970-01-01 00:00:00.000001553',
                   '1970-01-01 00:00:00.000001554',
                   '1970-01-01 00:00:00.000001555'],
                  dtype='datetime64[ns]', length=1556, freq=None)
    
    ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')
    print(ax.get_xticks())
    ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, kind='bar')
    print(ax1.get_xticks())
    ax1.set_xlim(0, 18628.)
    
    date_fmt = '%d-%m-%y'
    years = mdates.YearLocator()   # every year
    yearsFmt = mdates.DateFormatter(date_fmt)
    
    ax.xaxis.set_major_locator(years)
    ax.xaxis.set_major_formatter(yearsFmt)
    
    [out]:
    [16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]  ← ax tick locations
    [   0    1    2 ... 1553 1554 1555]  ← ax1 tick locations
    

    enter image description here

    • With plt.bar the bar plot locations are indexed based on the datetime
    ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)', rot=0)
    plt.setp(ax.get_xticklabels(), ha="center")
    print(ax.get_xticks())
    
    ax1 = ax.twinx()
    ax1.bar(df.index, df.Volume)
    print(ax1.get_xticks())
    
    date_fmt = '%d-%m-%y'
    years = mdates.YearLocator()   # every year
    yearsFmt = mdates.DateFormatter(date_fmt)
    
    ax.xaxis.set_major_locator(years)
    ax.xaxis.set_major_formatter(yearsFmt)
    
    [out]:
    [16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]
    [16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]
    

    enter image description here

    • sns.barplot(x=df.index, y=df.Volume, ax=ax1) has xtick locations as [ 0 1 2 ... 1553 1554 1555], so the bar plot and line plot did not align.