The dataframe looks like the following:
var_1 var_2 var_3
5 2 2
5 2 1
6 3 4
dput of data
data <- structure(list(var_1 = c(5L, 5L, 6L), var_2 = c(2L, 2L, 3L),
var_3 = c(2L, 1L, 4L)), row.names = c(NA, -3L), class = "data.frame")
How do I remove duplicate values within a column, and repeat this for all columns so the data looks like:
var_1 var_2 var_3
5 2 2
6 3 1
4
Here's one approach, pivoting longer, filtering for distinct column-value combinations, then spreading:
library(tidyverse)
df %>%
pivot_longer(cols = everything()) %>%
distinct(name, value) %>%
group_by(name) %>%
mutate(row = row_number()) %>%
ungroup() %>%
pivot_wider(names_from = name, values_from = value)
Result
# A tibble: 3 x 4
row var_1 var_2 var_3
<int> <int> <int> <int>
1 1 5 2 2
2 2 6 3 1
3 3 NA NA 4
If you want the sum of distinct numbers, you could add as a last line:
%>% summarize(across(-row, ~sum(., na.rm = TRUE)))
Result:
# A tibble: 1 x 3
var_1 var_2 var_3
<int> <int> <int>
1 11 5 7