I'm working with a rather large dataset with an x, y, and z. The x and y are put on a scatter plot with z set as the colorbar values. There are 24 distinct columns and each column has ~20000 points. I'm trying to determine dominant z value in relation to the y value, but I don't want to get misled my the clear coloring seen in the image produced. Given that there are so many markers in 1 column, I want to know how Matplotlib is determining what markers are overlaid over others.
This may be hard to visualize so here is an image of my code and output. If we look at hour ~24, we see dominant low elevation coloring, but I don't want to make the assumption that the high elevations are being covered by the low elevation values. Is it wrong to assume that low elevation is dominant in that time slot or is there something I should try to make it clear? Just remember, there are about 20000 points in that single column so the possibility of cover up is nonzero.
I haven't found a clear cut answer on this matter so I would greatly appreciate any help
It appears that the essence of your question is in your statement, "I'm trying to determine dominant z value in relation to the y value." This is a question of the relative frequency of z values for any given y value (implicitly for a fixed x value). Also, the size of your dataset is fundamentally limiting your certitude about displaying z values using a colorbar scheme.
My suggestion is for each x value, generate a 2D histogram for y and z to display the dominant z for any given y. You can use hist2d to generate a color-coded 2D histogram. Or, if you prefer a "3D" display of the same kind of data, you can make a 3D bar graph.
Obviously, this method has the downside of increasing the dimensionality of your display by one. That may not be acceptable, but by looking empirically at the results for a few x values, you can probably get the answer to your original question, namely, whether the colorbar is a valid indicator of z-dominance.