Search code examples
pythonmatplotlibdot-plot

Frequency plot using dots instead of bars?


I'm trying to create the chart in this question, using this answer. I'm open to any solution that works.

Visual borrowed from original question: enter image description here

Difference from that question is I've already calculated my bins and frequency values so I don't use numpy or matplotlib to do so.

Here's my sample data, I refer to it as df_fd in my sample code below:

     low_bin   high_bin  frequency
0  13.142857  18.857143          3
1  18.857143  24.571429          5
2  24.571429  30.285714          8
3  30.285714  36.000000          8
4  36.000000  41.714286          7
5  41.714286  47.428571          7
6  47.428571  53.142857          1
7  53.142857  58.857143          1

Based off the cited question here's my code (df_fd is the DataFrame above):

fig, ax = plt.subplots()
ax.bar(df_fd.low_bin, df_fd.frequency, width= df_fd.high_bin-df_fd.low_bin)
X,Y = np.meshgrid(bins, df_fd['frequency'])
Y = Y.astype(np.float)
Y[Y>df_fd['frequency']] = np.nan
plt.scatter(X,Y)

This Y[Y>df_fd['frequency']] = np.nan statement is what fails and I don't know how to get around it. I understand what it's trying to do and the best guess I have is somehow mapping the matrix index to the DataFrame index would help, but I'm not sure how to do that.

Thank you for helping me!


Solution

  • One hacky solution using a scatter plot:

    (df.assign(bin=np.mean([df['low_bin'], df['high_bin']], axis=0))
       .loc[lambda d: d.index.repeat(tmp['frequency'])]
       .assign(Y=lambda d: d.groupby(level=0).cumcount())
       .plot.scatter(x='bin', y='Y', s=600)
    )
    

    It works by getting the average of low/high as X value, then repeating the rows as many times as the "frequency" value, and incrementing the count with a groupby.cumcount.

    Output:

    enter image description here