I have a dataframe with 3 columns and thousands of rows. I need to calculate the frequency of variable #3, and for that I used table
function in R. Problem is, table
returns a separate 2 column table (the variable on which the frequency is being calculated and the calculated frequency), and no reference to the original dataframe, so I can't relate the frequency results to the other columns on the dataframe.
I've tried 2 approaches with no luck (both solutions would work for me, though one can be more efficient than the other):
adding and ID column that refers to the original dataframe (at least for th first occurrence of the factor being considered for frequency) to the output of table
(in my case, would be something like applying cbind
with th frequency table and column #1 of the original df, but this won't work as both objects has different # of rows)
adding a new column to the original dataframe with the the frequency of a specific column (I've tried mutate
with no luck either)
Some example data:
dfg <- data.frame(f=c(1,2,3,4,5),v1=c("a","b","b","c","c"),v2=c("3r","3r","3r","gh","y"))
dfg
f v1 v2
1 1 a 3r
2 2 b 3r
3 3 b 3r
4 4 c gh
5 5 c y
Solution 1) would be:
3r gh y
3 1 1
f 1 4 5
Solution 2) would be:
f v1 v2 freq(v2)
1 1 a 3r 3
2 2 b 3r 3
3 3 b 3r 3
4 4 c gh 1
5 5 c y 1
dfg <- data.frame(f=c(1,2,3,4,5),v1=c("a","b","b","c","c"),v2=c("3r","3r","3r","gh","y"))
#1
library(dplyr)
dfg %>% group_by(v2) %>%
summarise(n = n(),
f = first(f)) %>%
t() %>% as.data.frame() %>%
janitor::row_to_names(1)
#> 3r gh y
#> n 3 1 1
#> f 1 4 5
#2
transform(dfg, freq_v2 = ave(dfg$f, dfg$v2, FUN = length))
#> f v1 v2 freq_v2
#> 1 1 a 3r 3
#> 2 2 b 3r 3
#> 3 3 b 3r 3
#> 4 4 c gh 1
#> 5 5 c y 1
Created on 2021-05-22 by the reprex package (v2.0.0)