Lets say I have two columns:
A B
1 1
2 2
3 4
4 4
5 4
6 6
Is there a way to calculate the percentage of similarity, so that in example above we find that columns A and B are 67% the same.
We could take the intersect
of elements in 'A' and 'B', get its length
and divide by the nrow
of 'df1'
paste0(round(100*length(intersect(df1$A, df1$B))/nrow(df1)), "%")
#[1] "67%"
If the comparison is between corresponding elements, use ==
instead of the intersect
,sum
the TRUE values from the logical output, divide by number of rows....
paste0(round(100*with(df1, sum(A==B))/nrow(df1)), "%")
#[1] "67%"
Or just use mean
paste0(round(100*with(df1, mean(A==B))), "%")
#[1] "67%"
NOTE: This is one of those examples where we get the same result by choosing any of the methods.