Search code examples
pythonpandasdataframesubplot

Plot specific dataframe columns as subplots via loop


I have a dataframe with columns including POSITION_x.VALUE, where x is 1 to 6. I have created a loop and carried out some statistical analysis to create OUTLIER_x. I would like to plot POSITION_x.VALUE and OUTLIER_x for each relevant column as part of [6] subplots in a figure.

Each individual plot shows up, with a blank subplot output instead of adding the axes to the subplot (7 figures not 1). The code is structured like:

## Initialize stats & axes
SD = {}; LCL = {}; UCL = {}; ax = {}

for (name, values) in df.items():
    ## If the columnname includes '.VALUE'...
    if '.VALUE' in name:
        ## Get the position number
        pos = int(name[9])
        ## Add a moving range column for each position
        df['MR_'+str(pos)] = np.absolute(df[name].shift(1) - df[name])
        ## Create (temporary) mean of data & moving range
        Mean = df[name].mean()
        Mean_MR = df['MR_'+str(pos)].mean()
        ## Calculate StdDev, LCL and UCL for the VALUE column (using MR data to create StdDev)
        SD[pos-1] = Mean_MR/1.128
        LCL[pos-1] = df[name].mean() - SD[pos-1]*3
        UCL[pos-1] = df[name].mean() + SD[pos-1]*3
        ## Add conditional 'OUTLIER_x' column for outside of 3SD
        df['Outlier_' + str(pos)] = np.where((df[name] > UCL[pos-1]) | (df[name] < LCL[pos-1]), df[name], np.nan)
        ## Create axis for each VALUE & OUTLIER combination
        ax[pos-1].plot(use_index = True, y = [df[name], df['Outlier_' + str(pos)]], style = ['o', 'o'], color = ['blue', 'red'], markersize = 2)

fig, ax = plt.subplots(6, 1, sharex=True)

plt.show()

Each subplot is correctly showing like: Individual subplot
(except that I don't want to plot them individually)

Then the main figure appears but with no subplots: Empty figure


Solution

  • Solved: The solution was to:

    1. Move fig, ax = plt.subplots(... to above the loop (thank you @Lfppfs) and change ax to axes
    2. Update the ax[pos-1] command to
      df.plot(ax=axes[pos-1], use_index = True, y = [name, 'Outlier_' + str(pos)], style = ['x', 'o'], color = ['blue', 'red'], markersize = 2)
      

    The full code is now:

    fig, axes = plt.subplots(6, 1, sharex=True)
    
    for (name, values) in df.items():
        ## If the columnname includes '.VALUE'...
        if '.VALUE' in name:
            ## Add conditional 'OUTLIER_x' column for outside of 3SD
            pos = int(name[9])
            ## Replace '-' with null and convert to float
            df[name].replace({'-': np.nan},inplace =True)
            df[name] = df[name].astype(float)
            ## Add a moving range column for each position
            df['MR_'+str(pos)] = np.absolute(df[name].shift(1) - df[name])
            ## Create (temporary) mean of data & moving range
            Mean = df[name].mean()
            Mean_MR = df['MR_'+str(pos)].mean()
            ## Calculate StdDev, LCL and UCL for the VALUE column (using MR data to create StdDev)
            SD[pos-1] = Mean_MR/1.128
            LCL[pos-1] = df[name].mean() - SD[pos-1]*3
            UCL[pos-1] = df[name].mean() + SD[pos-1]*3
            ## Add conditional 'OUTLIER_x' column for outside of 3SD
            df['Outlier_' + str(pos)] = np.where((df[name] > UCL[pos-1]) | (df[name] < LCL[pos-1]), df[name], np.nan)
            ## Create axis for each VALUE & OUTLIER combination
            df.plot(ax=axes[pos-1], use_index = True, y = [name, 'Outlier_' + str(pos)], style = ['x', 'o'], color = ['blue', 'red'], markersize = 2)
    
    plt.show()
    

    Note - ax[index].plot(... has now changed to df.plot(ax=axes[index]... This now gives the desired result (although I obviously need to tweak a couple of things!) Figure with six subplots from dataframe columns