Search code examples
pythonpandasdataframematplotlibsubplot

Trying to make scatter plots in subplots using for-loops


I am trying to make subplots using for loop to go through my x variables in the dataframe. All plots would be a scatter plot.

X-variable: 'Protein', 'Fat', 'Sodium', 'Fiber', 'Carbo', 'Sugars' 
y-variable: 'Cal'

This is where I am stuck

plt.subplot(2, 3, 2)
for i in range(3):
     plt.scatter(i,sub['Cal'])

enter image description here


Solution

  • With this code:

    import matplotlib.pyplot as plt
    import pandas as pd
    
    df = pd.read_csv('data.csv')
    columns = list(df.columns)
    columns.remove('Cal')
    
    fig, ax = plt.subplots(1, len(columns), figsize = (20, 5))
    
    for idx, col in enumerate(columns, 0):
        ax[idx].plot(df['Cal'], df[col], 'o')
        ax[idx].set_xlabel('Cal')
        ax[idx].set_title(col)
    
    plt.show()
    

    I get this subplot of scatter plots:

    enter image description here

    However, maybe it is a better choice to use a single scatterplot and use marker color in order to distinguish data type. See this code:

    import matplotlib.pyplot as plt
    import pandas as pd
    import seaborn as sns
    sns.set_style('darkgrid')
    
    df = pd.read_csv('data.csv')
    # df.drop(columns = ['Sodium'], inplace = True)  # <--- removes 'Sodium' column
    table = df.melt('Cal', var_name = 'Type')
    
    fig, ax = plt.subplots(1, 1, figsize = (10, 10))
    sns.scatterplot(data = table,
                    x = 'Cal',
                    y = 'value',
                    hue = 'Type',
                    s = 200,
                    alpha = 0.5)
    
    plt.show()
    

    that give this plot where all data are together:

    enter image description here

    The 'Sodium' values are different from others by far, so, if you remove this column with this line:

    df.drop(columns = ['Sodium'], inplace = True)
    

    you get a more readable plot:

    enter image description here