Search code examples
pythonmatplotlibboxplotsubplotscatter

Python maxplotlib - boxsplot subplot + scatter plot


I am trying to perform a scatter plot within a boxplot as subplot. When I do for just one boxsplot, it works. I can define a specific point with specific color inside of the boxsplot. The green ball (Image 1) is representing an specific number in comparision with boxplot values.

  for columnName in data_num.columns:
    plt.figure(figsize=(2, 2), dpi=100)
    bp = data_num.boxplot(column=columnName, grid=False)
    y = S[columnName]
    x = columnName
    if y > data_num[columnName].describe().iloc[5]:
      plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
      count_G = count_G + 1
    elif y < data_num[columnName].describe().iloc[5]:
      plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
      count_L = count_L + 1
    else:
      plt.plot(1, y, 'r.', alpha=0.7,color='yellow',markersize=12)
      count_E = count_E + 1

Image 1 - Scatter + 1 boxplot

I can create a subplot with boxplots.

  fig, axes = plt.subplots(6,10,figsize=(16,16)) # create figure and axes
  fig.subplots_adjust(hspace=0.6, wspace=1)

  for j,columnName in enumerate(list(data_num.columns.values)[:-1]):
    bp = data_num.boxplot(columnName,ax=axes.flatten()[j])

Image 2 - Subplots + Boxplots
But when I try to plot a specific number inside of each boxplot, actually it subscribes the entire plot.

plt.subplot(6,10,j+1)  
if y > data_num[columnName].describe().iloc[5]:
  plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
  count_G = count_G + 1
elif y < data_num[columnName].describe().iloc[5]:
  plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
  count_L = count_L + 1
else:
  plt.plot(1, y, 'r.', alpha=0.7,color='black',markersize=12)
  count_E = count_E + 1

Image 3 - Subplots + scatter


Solution

  • It is not completely clear what is going wrong. Probably the call to plt.subplot(6,10,j+1) is erasing some stuff. However, such a call is not necessary with the standard modern use of matplotlib, where the subplots are created via fig, axes = plt.subplots(). Be careful to use ax.plot() instead of plt.plot(). plt.plot() plots on the "current" ax, which can be a bit confusing when there are lots of subplots.

    The sample code below first creates some toy data (hopefully similar to the data in the question). Then the boxplots and the individual dots are drawn in a loop. To avoid repetition, the counts and the colors are stored in dictionaries. As data_num[columnName].describe().iloc[5] seems to be the median, for readability the code directly calculates that median.

    from matplotlib import pyplot as plt
    import pandas as pd
    import numpy as np
    
    column_names = list('abcdef')
    S = {c: np.random.randint(2, 6) for c in column_names}
    data_num = pd.DataFrame({c: np.random.randint(np.random.randint(0, 3), np.random.randint(4, 8), 20)
                             for c in column_names})
    colors = {'G': 'limegreen', 'E': 'gold', 'L': 'crimson'}
    counts = {c: 0 for c in colors}
    
    fig, axes = plt.subplots(1, 6, figsize=(12, 3), gridspec_kw={'hspace': 0.6, 'wspace': 1})
    for columnName, ax in zip(data_num.columns, axes.flatten()):
        data_num.boxplot(column=columnName, grid=False, ax=ax)
        y = S[columnName]  # in case S would be a dataframe with one row: y = S[columnName].values[0]
        data_median = data_num[columnName].median()
        classification = 'G' if y > data_median else 'L' if y < data_median else 'E'
        ax.plot(1, y, '.', alpha=0.9, color=colors[classification], markersize=12)
        counts[classification] += 1
    print(counts)
    plt.show()
    

    example plot