I'm trying to create the chart in this question, using this answer. I'm open to any solution that works.
Visual borrowed from original question:
Difference from that question is I've already calculated my bins and frequency values so I don't use numpy
or matplotlib
to do so.
Here's my sample data, I refer to it as df_fd
in my sample code below:
low_bin high_bin frequency
0 13.142857 18.857143 3
1 18.857143 24.571429 5
2 24.571429 30.285714 8
3 30.285714 36.000000 8
4 36.000000 41.714286 7
5 41.714286 47.428571 7
6 47.428571 53.142857 1
7 53.142857 58.857143 1
Based off the cited question here's my code (df_fd
is the DataFrame
above):
fig, ax = plt.subplots()
ax.bar(df_fd.low_bin, df_fd.frequency, width= df_fd.high_bin-df_fd.low_bin)
X,Y = np.meshgrid(bins, df_fd['frequency'])
Y = Y.astype(np.float)
Y[Y>df_fd['frequency']] = np.nan
plt.scatter(X,Y)
This Y[Y>df_fd['frequency']] = np.nan
statement is what fails and I don't know how to get around it. I understand what it's trying to do and the best guess I have is somehow mapping the matrix index to the DataFrame index would help, but I'm not sure how to do that.
Thank you for helping me!
One hacky solution using a scatter plot:
(df.assign(bin=np.mean([df['low_bin'], df['high_bin']], axis=0))
.loc[lambda d: d.index.repeat(tmp['frequency'])]
.assign(Y=lambda d: d.groupby(level=0).cumcount())
.plot.scatter(x='bin', y='Y', s=600)
)
It works by getting the average of low/high as X value, then repeating the rows as many times as the "frequency" value, and incrementing the count with a groupby.cumcount
.
Output: