I want to get the number of values that are equal to every other row in dataframe:
library(tidyverse)
df <- tibble(
a = c(1, 1, 5, 1),
b = c(2, 3, 2, 8),
c = c(2, 6, 2, 2)
)
desired output:
# A tibble: 4 x 4
a b c desired_column
<dbl> <dbl> <dbl> <list>
1 1 2 2 <dbl [4]>
2 1 3 6 <dbl [4]>
3 5 2 2 <dbl [4]>
4 1 8 2 <dbl [4]>
In the column "desired_column": firt row: 3, 1, 2, 2:
3: is because the first row has the same three values compared to itself
1: is because there is one value with the same value in both rows and same column (first and second):
2: There are two values that are equal in first and third row and same column :
2: There are two values that are equal in first and fourth row and same column :
The second, third and fourth row of "desired_column" are results of the same process:
The i
th number in the result is the number of values in common between the current row and the i
th row
You can do this: in short, with each row of the dataframe, duplicate it to create a new dataframe with all values changed to that row, and compare that dataframe with the original (whether the values are the same). rowSums
of each of that comparison will give you the vectors you want.
# Create the desired output in list
lst <-
lapply(1:nrow(df), function(nr) {
rowSums(replicate(nrow(df), df[nr, ], simplify = FALSE) %>%
do.call("rbind", .) == df)})
# To create the desired dataframe
df %>% tibble(desired_column = I(lst))
In tibble
call in the last row, I()
is used to put in list output as a column.