Search code examples
python-3.xpandastime-series

Split up time series per year for plotting


I would like to plot a time series, start Oct-2015 and end Feb-2018, in one graph, each year is a single line. The time series is int64 value and is in a Pandas DataFrame. The date is in datetime64[ns] as one of the columns in the DataFrame.

How would I create a graph from Jan-Dez with 4 lines for each year.

graph['share_price'] and graph['date'] are used. I have tried Grouper, but that somehow takes Oct-2015 values and mixes it with the January values from all other years.

This groupby is close to what I want, but I loose the information which year the index of the list belongs to.

graph.groupby('date').agg({'share_price':lambda x: list(x)})

Then I have created a DataFrame with 4 columns, 1 for each year but still, I don't know how to go ahead and group these 4 columns in a way, that I will be able to plot a graph in a way I want.


Solution

  • You can achieve this by:

    1. extracting the year from the date
    2. replacing the dates by the equivalent without the year
    3. setting the year as the columns and the date as the index

    At this point, each year will be a column, and each date within the year a row, so you can just plot normally.

    Here's an example.

    Assuming that your DataFrame looks something like this:

    >>> import pandas as pd
    >>> import numpy as np
    >>> index = pd.date_range('2015-10-01', '2018-02-28')
    >>> values = np.random.randint(-3, 4, len(index)).cumsum()
    >>> df = pd.DataFrame({
    ...    'date': index,
    ...    'share_price': values
    >>> })
    >>> df.head()
            date  share_price
    0 2015-10-01            0
    1 2015-10-02            3
    2 2015-10-03            2
    3 2015-10-04            5
    4 2015-10-05            4
    >>> df.set_index('date').plot()
    

    enter image description here

    You would transform the DataFrame as follows:

    >>> df['year'] = df.date.dt.year
    >>> df['date'] = df.date.dt.strftime('%m-%d')
    >>> unstacked = df.pivot(index='date', columns='year', values='share_price')
    >>> unstacked.head()
    year   2015  2016  2017  2018
    date                         
    01-01   NaN  28.0 -16.0  21.0
    01-02   NaN  29.0 -14.0  22.0
    01-03   NaN  29.0 -16.0  22.0
    01-04   NaN  26.0 -15.0  23.0
    01-05   NaN  25.0 -16.0  21.0
    

    And just plot normally:

    unstacked.plot()
    

    enter image description here