So I have a variable as below.
var <- c(0L, 5L, 4L, 115L, 0L, 0L, 0L, 2L, 365L, 4L, 20L, 61L, 365L,
0L, 365L, 0L, 14L, 0L, 0L, 72L, 0L, 0L, 6L, 105L, 150L, 0L, 365L,
0L, 1L, 28L, 161L, 6L, 0L, 2L, 12L, 0L, 10L, 49L, 7L, 2L, 51L,
0L, 0L, 11L, 0L, 0L, 17L, 0L, 0L, 7L, 0L, 28L, 0L, 0L, 0L, 44L,
0L, 3L, 0L, 0L, 0L, 1L, 1L, 0L, 4L, 87L, 0L, 321L, 0L, 0L, 0L,
0L, 9L, 0L, 0L, 0L, 140L, 0L, 0L, 0L, 0L, 0L, 1L, 8L, 20L, 0L,
4L, 14L, 3L, 0L, 0L, 0L, 39L, 4L, 9L, 0L, 0L, 0L, 1L, 7L)
I want to create bins of different sizes (or same no matter) to categorize and plot as a bar chart for this variable.
I know it's possible to find automatic/reccommended binning however I am unsure how to do so in R?
Tried using the bin()
function to no avail . I read about the Jenks method as well, but is there a way to create the best possible bins in R?
Would like to use it to plot a bar plot in ggplot.
Your description sounds like you're wanting to plot a histogram of var
. This can be done easily enough in ggplot
using geom_histogram
. The key here is that ggplot
likes to have a data frame, so you just have to specify your variable in a dataframe first, which you can do inside the ggplot()
function:
ggplot(data.frame(var), aes(var)) + geom_histogram(color='black', alpha=0.2)
Gives you this:
The default is to use 30 bins, but you can specify either number of bins via bins=
or the size of the bins via binwidth=
:
ggplot(data.frame(var), aes(var)) + geom_histogram(bins=10, color='black', alpha=0.2)
If you want to plot the basic bar geom, then geom_histogram()
works just fine. If you change to use the stat_bin()
function instead, it will perform the same binning method, but then you can apply and use a different geom if you want to:
ggplot(data.frame(var), aes(var)) +
stat_bin(geom='area', bins=10, alpha=0.2, color='black')
If you're looking to grab just the numbers/data from "binning" a variable like you have, one of the simplest ways might be to use cut()
from dplyr
.
Use of cut()
is pretty simple. You specify the vector and a breaks=
argument. Breaks can be specified a list of places where you want to "cut" your data (or "bin" your data), or you can just set breaks=10
and it will give you an evenly cut set of 10 bins. The result is a factor
with levels=
that correspond to the range for each of the breaks. In the case of var
with breaks=10
, you get the following:
> var_cut <- cut(var, breaks = 10)
> levels(var_cut)
[1] "(-0.365,36.5]" "(36.5,73]" "(73,110]" "(110,146]" "(146,182]" "(182,219]" "(219,256]"
[8] "(256,292]" "(292,328]" "(328,365]"