Search code examples
pythonpandasmatplotlibseabornscatter-plot

Two or more pandas columns on the same seaborn scatterplot


I am trying to do a scatter plot for the following data with all columns in one plot.

Data for the dataframe

Actually I imported this data from csv file and saved in a dataframe df_inv and then I saved it in variable tips

tips = df_inv
sns.scatterplot(data=tips, x=df_inv.index, y = "a")
plt.show()

I want to add columns b, c, and d on the same plot but I am unable to find the right code. I have tried y = ["a", "b", "c", "d", "e"] but it didn't worked. I want my result in the following format ideally not all circles but some x, *, and other shapes.

Ideal output


Solution

  • You could re-shape your data in a different dataframe with pandas.melt:

    df_inv = df_inv.reset_index()
    columns = ['index', 'a', 'b', 'c', 'd']
    df_to_plot = df_inv[columns]
    
    df_to_plot = pd.melt(frame = df_to_plot,
                         id_vars = 'index',
                         var_name = 'column_name',
                         value_name = 'value')
    

    In this way, you will get something like:

        index column_name  value
    0       0           a    315
    1       1           a    175
    2       2           a     65
    3       3           a    370
    4       4           a    419
    5       0           b    173
    6       1           b    206
    7       2           b    271
    8       3           b    463
    9       4           b    419
    10      0           c     58
    ...
    

    Now you can finally plot with a single line of code:

    sns.scatterplot(ax = ax, data = df_to_plot, x = 'index', y = 'value', style = 'column_name', hue = 'column_name')
    

    Complete code

    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    import seaborn as sns
    
    
    N = 5
    df_inv = pd.DataFrame()
    df_inv['a'] = np.random.randint(low = 50, high = 500, size = N)
    df_inv['b'] = np.random.randint(low = 50, high = 500, size = N)
    df_inv['c'] = np.random.randint(low = 50, high = 500, size = N)
    df_inv['d'] = np.random.randint(low = 50, high = 500, size = N)
    
    
    df_inv = df_inv.reset_index()
    columns = ['index', 'a', 'b', 'c', 'd']
    df_to_plot = df_inv[columns]
    
    df_to_plot = pd.melt(frame = df_to_plot,
                         id_vars = 'index',
                         var_name = 'column_name',
                         value_name = 'value')
    
    
    fig, ax = plt.subplots()
    
    sns.scatterplot(ax = ax, data = df_to_plot, x = 'index', y = 'value', style = 'column_name', hue = 'column_name')
    
    plt.show()
    

    enter image description here