Search code examples
pythonpandasmatplotlibscatter

How to plot scatter subplots of columns from pandas


Let's say I have a dataframe with 100 rows and 40 columns where column 40 represents the Y axis values for the scatter plots. For 39 scatter plots, I would like to plot column 40 in function of column 1, column 40 in function of column 2, column 40 in function of column 3, etcetera up to column 40 in function of column 39. What would be the best way to produce such a subplot without having to do it all manually?

For example (with a smaller dataframe), trying to scatter plot column 3 in function of column 1 and column 3 in function of column 2 in a subplot.

df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
df.plot(x=["AAA", "BBB"], y=["CCC"], kind="scatter", subplots=True, sharey=True)

Solution

  • One way would be to create the subplots externally and loop over the column names, creating a plot for each one of them.

    import pandas as pd
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
    
    fig, axes = plt.subplots(1,len(df.columns.values)-1, sharey=True)
    
    for i, col in enumerate(df.columns.values[:-1]):
        df.plot(x=[col], y=["CCC"], kind="scatter", ax=axes[i])
    
    plt.show()
    


    Another method which might work in pandas 0.19 is to use the subplots argument. According to the documentation

    subplots : boolean, default False
    Make separate subplots for each column

    I interprete this such that the following should work, however, I haven't been able to test it.

    import pandas as pd
    import matplotlib.pyplot as plt
    
    df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
    
    df.plot(x=df.columns.values[:-1], y=["CCC" for _ in df.columns.values[:-1]], 
                                kind="scatter", subplots=True, sharey=True)
    plt.show()