I am trying to perform a scatter plot within a boxplot as subplot. When I do for just one boxsplot, it works. I can define a specific point with specific color inside of the boxsplot. The green ball (Image 1) is representing an specific number in comparision with boxplot values.
for columnName in data_num.columns:
plt.figure(figsize=(2, 2), dpi=100)
bp = data_num.boxplot(column=columnName, grid=False)
y = S[columnName]
x = columnName
if y > data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
count_G = count_G + 1
elif y < data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
count_L = count_L + 1
else:
plt.plot(1, y, 'r.', alpha=0.7,color='yellow',markersize=12)
count_E = count_E + 1
Image 1 - Scatter + 1 boxplot
I can create a subplot with boxplots.
fig, axes = plt.subplots(6,10,figsize=(16,16)) # create figure and axes
fig.subplots_adjust(hspace=0.6, wspace=1)
for j,columnName in enumerate(list(data_num.columns.values)[:-1]):
bp = data_num.boxplot(columnName,ax=axes.flatten()[j])
Image 2 - Subplots + Boxplots
But when I try to plot a specific number inside of each boxplot, actually it subscribes the entire plot.
plt.subplot(6,10,j+1)
if y > data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='green',markersize=12)
count_G = count_G + 1
elif y < data_num[columnName].describe().iloc[5]:
plt.plot(1, y, 'r.', alpha=0.7,color='red',markersize=12)
count_L = count_L + 1
else:
plt.plot(1, y, 'r.', alpha=0.7,color='black',markersize=12)
count_E = count_E + 1
Image 3 - Subplots + scatter
It is not completely clear what is going wrong. Probably the call to plt.subplot(6,10,j+1)
is erasing some stuff. However, such a call is not necessary with the standard modern use of matplotlib, where the subplots are created via fig, axes = plt.subplots()
. Be careful to use ax.plot()
instead of plt.plot()
. plt.plot()
plots on the "current" ax, which can be a bit confusing when there are lots of subplots.
The sample code below first creates some toy data (hopefully similar to the data in the question). Then the boxplots and the individual dots are drawn in a loop. To avoid repetition, the counts and the colors are stored in dictionaries. As data_num[columnName].describe().iloc[5]
seems to be the median, for readability the code directly calculates that median.
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
column_names = list('abcdef')
S = {c: np.random.randint(2, 6) for c in column_names}
data_num = pd.DataFrame({c: np.random.randint(np.random.randint(0, 3), np.random.randint(4, 8), 20)
for c in column_names})
colors = {'G': 'limegreen', 'E': 'gold', 'L': 'crimson'}
counts = {c: 0 for c in colors}
fig, axes = plt.subplots(1, 6, figsize=(12, 3), gridspec_kw={'hspace': 0.6, 'wspace': 1})
for columnName, ax in zip(data_num.columns, axes.flatten()):
data_num.boxplot(column=columnName, grid=False, ax=ax)
y = S[columnName] # in case S would be a dataframe with one row: y = S[columnName].values[0]
data_median = data_num[columnName].median()
classification = 'G' if y > data_median else 'L' if y < data_median else 'E'
ax.plot(1, y, '.', alpha=0.9, color=colors[classification], markersize=12)
counts[classification] += 1
print(counts)
plt.show()