Search code examples
pythonpandasmatplotlibseaborn

how to make line charts by iterating pandas columns


I have weekly time-series data that I want to make a weekly line chart using matplotlib/seaborn. To do so, I did aggregate given time series data correctly and tried to make plots, but the output was not correct to me. Essentially, in my data, columns are the list of countries, and the index is the weekly time index. What I wanted to do is, first iterate pandas columns by each country then group it by year and week, so I could have a weekly linechart for each countries. The way of aggregating my data is bit inefficient, which I assume gave me the problem. Can anyone suggest me possible way of doing this? Any way to get line chart by iterating pandas columns where grouping its time index? Any idea?

my attempt and data

import pandas as pd
import matplotlib.pyplot as plt

url = 'https://gist.githubusercontent.com/adamFlyn/7c96d7f7c05f16abcc39befcd74f5ca8/raw/8997332cd3cdec7610aeaa0300a1b85f9daafccb/prod_sales.csv'
df = pd.read_csv(url, parse_dates=['date'])
df.drop(columns=['Unnamed: 0'], inplace=True)

df1_bf.index = pd.to_datetime(df1_bf.index,  errors="coerce")
df1_bf.index.name = 'date'
df1_bf.reset_index('date')
df1_bf['year'] = pd.DatetimeIndex(df1_bf.index).year
df1_bf['week'] = pd.DatetimeIndex(df1_bf.index).week

for i in df1_bf.columns:
    df_grp = df1.groupby(['year', 'week'])[i].sum().unstack()
    fig,ax1 = plt.subplots(nrows=1,ncols=1,squeeze=True,figsize=(16,10))
    for j in df_grp['year']:
        ax1.plot(df_grp.week, j, next(linecycler),linewidth=3)
plt.gcf().autofmt_xdate()
plt.style.use('ggplot')
plt.xticks(rotation=0)
plt.show()
plt.close()

but I couldn't get the correct plot by attempting the above. Seems I might wrong with data aggregation part for making plot data. Can anyone suggest me possible way of making this right? any thoughts?

desired output

This is the example plot that I want to make. I want to iterate pandas columns then group its timeindex, so I want to get line chart of weekly time series for each country in loop.

enter image description here

how should I get this desired plot? Is there any way of doing this right with matplotlib or seaborn? Any idea?


Solution

  • You need to melt your dataframe and then groupby. Then, use Seaborn to create a plot, passing the data, x, y and hue. Passing hue allows you to avoid looping and makes it a lot cleaner:

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    url = 'https://gist.githubusercontent.com/adamFlyn/7c96d7f7c05f16abcc39befcd74f5ca8/raw/8997332cd3cdec7610aeaa0300a1b85f9daafccb/prod_sales.csv'
    df = pd.read_csv(url, parse_dates=['Unnamed: 0'])
    df = df.rename({'Unnamed: 0' : 'date'}, axis=1)
    df['year'] = df['date'].dt.year
    df['week'] = df['date'].dt.week
    df = df.melt(id_vars=['date','week','year'])
    df = df.groupby(['year', 'week'], as_index=False)['value'].sum()
    
    fig, ax = plt.subplots(squeeze=True,figsize=(16,10))
    sns.lineplot(data=df, x='week', y='value', hue='year',linewidth=3)
    plt.show()
    

    enter image description here

    This is the first and last 5 rows of df before plotting:

         year  week    value
    0    2018     1   2268.0
    1    2019     1  11196.0
    2    2019     2      0.0
    3    2019     3      0.0
    4    2019     4      0.0
    ..    ...   ...      ...
    100  2020    49  17111.0
    101  2020    50  18203.0
    102  2020    51  12787.0
    103  2020    52  26245.0
    104  2020    53  11772.0
    

    Per your comment, you are looking for relplot and pass kind='line'. There are all sorts of formatting parameters you can pass with relplot or you can search how to loop through the axes to make more changes:

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    url = 'https://gist.githubusercontent.com/adamFlyn/7c96d7f7c05f16abcc39befcd74f5ca8/raw/8997332cd3cdec7610aeaa0300a1b85f9daafccb/prod_sales.csv'
    df = pd.read_csv(url, parse_dates=['Unnamed: 0'])
    df = df.rename({'Unnamed: 0' : 'date'}, axis=1)
    df['year'] = df['date'].dt.year
    df['week'] = df['date'].dt.isocalendar().week
    df = df.melt(id_vars=['date','week','year'], var_name='country')
    df = df.loc[df['value'] < 3000].groupby(['country', 'year', 'week'], as_index=False)['value'].sum()
    sns.relplot(data=df, x='week', y='value', hue='year', row='country', kind='line', facet_kws={'sharey': False, 'sharex': True})
    df
    

    enter image description here