Search code examples
pythonpandasmatplotlibseabornrelplot

How to plot wide format dataframe with seaborn.relplot


I am trying to plot the following line graph with dummy data on 5 cities(C1-C5).

Data frame which has been imported already

Based on what I understand, x="Year", y="Number of Employees" and hue="City". How would I set up the code for it? I have tried doing it in the following manner, but it doesn't work!

Current Code

import seaborn as sns
import pandas as pd

Areas = r'C:\Users\Tachi\Desktop\City.xlsx'
df = pd.read_excel(Areas)
df.set_index('City', inplace=True)

sns.relplot(x="Year", y="Number of Employees",hue="City", kind="line", data=df)

Sample Data

data = {'City': ['C1', 'C2', 'C3', 'C4', 'C5'], 
        2015: [28564, 2585, 4679, 33227, 2000], 
        2016: [83659, 4429, 35834, 1447, 3454], 
        2017: [0, 453, 40903, 46826, 646], 
        2018: [39470, 8364, 29464, 36443, 8364]}
df = pd.DataFrame(data)
df.set_index('City', inplace=True)

       2015   2016   2017   2018
City                            
C1    28564  83659      0  39470
C2     2585   4429    453   8364
C3     4679  35834  40903  29464
C4    33227   1447  46826  36443
C5     2000   3454    646   8364

Solution

    • Given the test dataframe, df, in the OP, the easiest way to plot the dataframe is to use pandas.DataFrame.transpose, and plot with seaborn.relplot using a wide format.
      • This automatically uses the dataframe index as the x-axis, and the column headers for hue.
      • The visualization can also be produced with sns.lineplot(data=df, marker='o') instead of using relplot.
    # transpose the dataframe
    df = df.T
    
    # display(df)
    City     C1    C2     C3     C4    C5
    2015  28564  2585   4679  33227  2000
    2016  83659  4429  35834   1447  3454
    2017      0   453  40903  46826   646
    2018  39470  8364  29464  36443  8364
    
    # plot the dataframe
    sns.relplot(data=df, kind='line', marker='o')
    

    enter image description here

    • The index values are int dtype, so the x-axis is formatted with intermediated numbers.
      • One way to deal with this is to cast the index to a str dtype before plotting.
    # set the index of years to a str dtype
    df.index = df.index.astype(str)
    
    # plot the dataframe
    sns.relplot(data=df, kind='line', marker='o')
    

    enter image description here