I have a dataset that is tracking some position over time and some values that depend upon position, so I would like to use the seaborn plot to show this data. The plot looks like this:
And here is the code to make it. I can't share the dataset to make it, but this is to give you an idea of what I'm doing.
h = sns.jointplot(data=None,x=dimerDistance,y=Orientation,
kind='hex',cmap="gnuplot",ratio=4,
marginal_ticks=False,marginal_kws=dict(bins=25, fill=False))
plt.suptitle('Orientation Factor - Distance Histogram of Dimer')
plt.tight_layout()
plt.xlabel('Distance [Angstrom]')
plt.ylabel('k')
I would like to pick a bin that is generated by the hexbin function and extract the values that occupy that bin. For example, at around x=25 and y=1.7 is the bin with the highest count according to the colormap. I want to go to that bin with highest count, find the x values and the array index of x that are in this bin, and find the k values based on their shared index. Or you might say, I imagine that there would be something that would look like
bin[z]=[x[index1],x[index2]....x[indexn]]
where z is the index of the bin with the highest count so that I can make a new bin
newbin=[y[index1],y[index[2]...,y[indexn]]
As this data is time related, these indices would tell me the timeframes in which the system falls into the bin, so this would be very nice to know. I have done some snooping around on Stack. I found this post that seemed helpful. Getting information for bins in matplotlib histogram function
is there a way I can access the information I want like in this post?
Seaborn doesn't return this type of data. But the hexplot works similar to plt.hexbin
. Both create a PolyCollection
from which you can extract the values and the centers.
Here is an example of how the data can be extracted (and displayed):
import matplotlib.pyplot as plt
import seaborn as sns
penguins = sns.load_dataset('penguins')
g = sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm",
kind='hex', cmap="gnuplot", ratio=4,
marginal_ticks=False, marginal_kws=dict(bins=25, fill=False))
values = g.ax_joint.collections[0].get_array()
ind_max = values.argmax()
xy_max = g.ax_joint.collections[0].get_offsets()[ind_max]
g.ax_joint.text(xy_max[0], xy_max[1], f" Max: {values[ind_max]:.0f}\n x={xy_max[0]:.2f}\n y={xy_max[1]:.2f}",
color='lime', ha='left', va='bottom', fontsize=14, fontweight='bold')
g.ax_joint.axvline(xy_max[0], color='red')
g.ax_joint.axhline(xy_max[1], color='red')
plt.tight_layout()
plt.show()
print(f"The highest bin contains {values[ind_max]:.0f} values")
print(f" and has as center: x={xy_max[0]:.2f}, y={xy_max[1]:.2f}")
The highest bin contains 18 values
and has as center: x=45.85, y=14.78