I have a data frame with repeated rows and i have a function that calculate the frequency of similar rows. Here is my sample
#############
###Sample####
#############
ID=seq(from=1,to=12,by=1)
var1=c(rep("a",12))
var2=c(rep("b",12))
var3=c("c","c","b","d","e","f","g","h","i","j","k","k")
df=data.frame(ID,var1,var2,var3)
ID var1 var2 var3
1 1 a b c
2 2 a b c
3 3 a b b
4 4 a b d
5 5 a b e
6 6 a b f
7 7 a b g
8 8 a b h
9 9 a b i
10 10 a b j
11 11 a b k
12 12 a b k
###############
# function ####
###############
freq.f<- function(data){
vari=colnames(data[2:ncol(data)])
data %>%
dplyr:: count(!!! rlang::syms(vari)) %>%
mutate(frequency = n/sum(n))
}
# current output
freq.f(df)
var1 var2 var3 n frequency
1 a b b 1 0.08333333
2 a b c 2 0.16666667
3 a b d 1 0.08333333
4 a b e 1 0.08333333
5 a b f 1 0.08333333
6 a b g 1 0.08333333
7 a b h 1 0.08333333
8 a b i 1 0.08333333
9 a b j 1 0.08333333
10 a b k 2 0.16666667
What i want is calculating this frequency but keeping all my records because my ID are different persons even if they have the same row information, and i also want to be able to print the ID in my output to keep track of the individuals. So the desired output is
# desired output
ID var1 var2 var3 n freq
1 1 a b c 2 0.16666667
2 2 a b c 2 0.16666667
3 3 a b b 1 0.08333333
4 4 a b d 1 0.08333333
5 5 a b e 1 0.08333333
6 6 a b f 1 0.08333333
7 7 a b g 1 0.08333333
8 8 a b h 1 0.08333333
9 9 a b i 1 0.08333333
10 10 a b j 1 0.08333333
11 11 a b k 2 0.16666667
12 12 a b k 2 0.16666667
I really looked in almost every post in SO about frequency but can not find my answer. Thank you in advance for your help.
Adding a join within your function provides expected results.
freq.f<- function(data){
vari=colnames(data[2:ncol(data)])
inner_join(data, data %>% ##this is the new line
dplyr:: count(!!! rlang::syms(vari)) %>%
mutate(frequency = n/sum(n)))
}
freq.f(df)
ID var1 var2 var3 n frequency
1 1 a b c 2 0.16666667
2 2 a b c 2 0.16666667
3 3 a b b 1 0.08333333
4 4 a b d 1 0.08333333
5 5 a b e 1 0.08333333
6 6 a b f 1 0.08333333
7 7 a b g 1 0.08333333
8 8 a b h 1 0.08333333
9 9 a b i 1 0.08333333
10 10 a b j 1 0.08333333
11 11 a b k 2 0.16666667
12 12 a b k 2 0.16666667