Updated my question. See below.
I have a scatter plot, with a lot of noise. I only want to plot points above a density threshold.
I calculated the density of the points with gaussian_kde, but I don't know how to implement the threshold. I thought of masking the points, but this doesn't work.
thresh = 10
x = x_data
y = y_data
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
x1 = np.ma.masked_where(z > thresh, x) # mask points above threshold
y1 = np.ma.masked_where(z > thresh, y) # mask points above threshold
fig, ax = plt.subplots()
ax.scatter(x, y, c=z, s=10)
I expected a plot with fewer noise, but nothing changes when I plot x1 and y1. I only want to see the points with high density.
To reduce the noise I try to cluster the points based on their density. The density was calculated with gausian_kde.
I made a 3D scatter plot to estimate the thresholds to separate the clusters.
x = x_data
y = y_data
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
cI_t = 0.0000059
cI_x = np.ma.masked_where(z < cI_t, x).compressed()
cI_y = np.ma.masked_where(z < cI_t, y).compressed()
cII_t = 0.0000165
cII_x = np.ma.masked_where(z < cII_t, x).compressed()
cII_x_1 = cII_x[(cII_y <= 252)]
cII_y = np.ma.masked_where(z < cII_t, y).compressed()
cII_y_1 = cII_y[(cII_y >= 252)]
cIII_t = 0.0000048
cIII_x = np.ma.masked_where(z < cIII_t, x).compressed()
cIII_y = np.ma.masked_where(z < cIII_t, y).compressed()
cIV_t = 0.00003
cIV_x = np.ma.masked_where(z < cIV_t, x).compressed()
cIV_y = np.ma.masked_where(z < cIV_t, y).compressed()
# 3D Density plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, z)
plt.show()
# Scatter plot cII and cIV
fig2, ax2 = plt.subplots()
#plt.scatter(cI_x, cI_y)
plt.scatter(cII_x, cII_y)
#plt.scatter(cIII_x, cIII_y)
plt.scatter(cIV_x, cIV_y)
plt.axhline(y=255)
ax2.set_xlim(0,360)
ax2.set_ylim(0,360)
plt.show()
But know I need to select only the top blue points from cII cluster. Is there a way to select only the points above the blue line. (Ignore the orange dots, this is the cIV cluster.)
Solution:
Example for cluster cII: I made a pandas dataframe, from the x and y data and then selected the points based of the values from the scatter plot.
cII_t = 0.0000165
cII_x = np.ma.masked_where(z < cII_t, x).compressed()
cII_y = np.ma.masked_where(z < cII_t, y).compressed()
cII_df = pd.DataFrame({"x" : cII_x, "y" : c2II_y})
cII_df = cII_df[(cII_df["x"] >= 166) & (cII_df["x"] <= 227) & (cII_df["y"] >= 252) & (c2II_df["y"] <= 336)]
cII_x = cII_df["x"]
cII_y = cII_df["y"]
The final plot: