Search code examples
pythonpython-3.xpandasplotseaborn

How do I create a multiline plot using seaborn?


I am trying out Seaborn to make my plot visually better than matplotlib. I have a dataset which has a column 'Year' which I want to plot on the X-axis and 4 Columns say A,B,C,D on the Y-axis using different coloured lines. I was trying to do this using the sns.lineplot method but it allows for only one variable on the X-axis and one on the Y-axis. I tried doing this

sns.lineplot(data_preproc['Year'],data_preproc['A'], err_style=None)
sns.lineplot(data_preproc['Year'],data_preproc['B'], err_style=None)
sns.lineplot(data_preproc['Year'],data_preproc['C'], err_style=None)
sns.lineplot(data_preproc['Year'],data_preproc['D'], err_style=None)

But this way I don't get a legend in the plot to show which coloured line corresponds to what. I tried checking the documentation but couldn't find a proper way to do this.


Solution

  • Seaborn favors the "long format" as input. The key ingredient to convert your DataFrame from its "wide format" (one column per measurement type) into long format (one column for all measurement values, one column to indicate the type) is pandas.melt. Given a data_preproc structured like yours, filled with random values:

    num_rows = 20
    years = list(range(1990, 1990 + num_rows))
    data_preproc = pd.DataFrame({
        'Year': years, 
        'A': np.random.randn(num_rows).cumsum(),
        'B': np.random.randn(num_rows).cumsum(),
        'C': np.random.randn(num_rows).cumsum(),
        'D': np.random.randn(num_rows).cumsum()})
    
    # Convert the dataframe from wide to long format 
    dfl = pd.melt(data_preproc, ['Year'])
    

    A single plot with four lines, one per measurement type, is obtained with

    sns.lineplot(data=dfl, x='Year', y='value', hue='variable')
    

    enter image description here

    (Note that 'value' and 'variable' are the default column names returned by melt, and can be adapted to your liking.)