I have a data frame. I would like a frequency table created that shows the bin frequency by "Group". If there is a bin with 0 entities, I want it to show that there are 0 entities in that bin.
If I use the table()
function, I get a frequency count of all bins in my data frame, but not by "Group". It also does not tell me that, for example, I do not have any rows within Group 1 Bin 3. I also looked into tabulate()
but that doesn't seem to be exactly what I need either. Somehow I need to tell it what the set of possible bins actually are.
Here is some example code.
df = as.data.frame(rbind(c(1,1.2), c(1,1.4), c(1,2.1), c(1,2.5), c(1,2.7), c(1,4.1), c(2,1.6), c(2,4.5), c(2,4.3), c(2,4.8), c(2,4.9)))
colnames(df) = c("Group", "Value")
df.in = split(df, df$Group)
FindBin = function(df){
maxbin = max(ceiling(df$Value),na.rm=TRUE)+1 #what is the maximum bin value.
bin = seq(from=0, to=maxbin, by=1) #Specify your bins: 0 to the maximum value by increments of 1
df$bin_index = findInterval(df$Value, bin, all.inside = TRUE) #Determine which bin the value is in
return(df)
}
df.out = lapply(names(df.in), function(x) FindBin(df.in[[x]]))
df.out2 = do.call(rbind.data.frame, df.out) #Row bind the list of dataframes to one dataframe
The output of the df.out2 looks like this:
Group Value bin_index
1 1 1.2 2
2 1 1.4 2
3 1 2.1 3
4 1 2.5 3
5 1 2.7 3
6 1 4.1 5
7 2 1.6 2
8 2 4.5 5
9 2 4.3 5
10 2 4.8 5
11 2 4.9 5
In addition to the output above, I'd like a summary output of my results that looks something like this:
Group Bin Freq
1 1 0
1 2 2
1 3 3
1 4 0
1 5 1
2 1 0
2 2 1
2 3 0
2 4 0
2 5 4
Any ideas?
Doesn't table
do what you want for your first question:
df$bin_index <- factor(df$bin_index, levels=1:5)
table(df[, c("Group", "bin_index")])
# bin_index
# Group 1 2 3 4 5
# 1 0 2 3 0 1
# 2 0 1 0 0 4
It shows the 0
entry for bin 3, group 2 (I presume that's what you meant, there are rows for bin 3 in group 1). Also, by setting the factor levels I was also able to get bin_index 1 to show up. For your second question, just use melt
:
library(reshape2)
melt(table(df[, c("Group", "bin_index")]))
# Group bin_index value
# 1 1 1 0
# 2 2 1 0
# 3 1 2 2
# 4 2 2 1
# 5 1 3 3
# 6 2 3 0
# 7 1 4 0
# 8 2 4 0
# 9 1 5 1
# 10 2 5 4