As input for a tree model I created an analysis table in SQL. Now I want to transfer it to R because the model which has this table as input is also running in R. One of the SQL-steps I'm not able to transform into R.
The analysis table has the following form:
df <- data.frame(
pseudonym = c("a", "a", "a", "b", "c", "c"),
var1 = c(1,1,0,1,1,0),
var2 = c(1,0,0,0,0,1),
var3 = c(0,0,0,0,0,1))
> df
pseudonym var1 var2 var3
1 a 1 1 0
2 a 1 0 0
3 a 0 0 0
4 b 1 0 0
5 c 1 0 0
6 c 0 1 1
In the next step I need the disctinct rows for pseudonym with keeping the information (1) from the other columns var1, var2, var3. (In SQL this is created through
max(case when...then 1 else 0 end) as var1
)
Thus the result df2 created from df1 should be
df2 <- data.frame(
pseudonym = c("a", "b", "c"),
var1 = c(1,1,1),
var2 = c(1,0,1),
var3 = c(0,0,1))
> df2
pseudonym var1 var2 var3
1 a 1 1 0
2 b 1 0 0
3 c 1 1 1
It would be very helpful if somebody has an idea.
Here's one way:
library(dplyr)
library(tidyr)
df <- data.frame(
pseudonym = c("a", "a", "a", "b", "c", "c"),
var1 = c(1,1,0,1,1,0),
var2 = c(1,0,0,0,0,1),
var3 = c(0,0,0,0,0,1))
df %>%
pivot_longer(cols = var1:var3) %>%
group_by(pseudonym, name) %>%
filter(max(value) == value) %>%
ungroup() %>%
distinct() %>%
pivot_wider(names_from = name, values_from = value)
#># A tibble: 3 x 4
#> pseudonym var1 var2 var3
#> <fct> <dbl> <dbl> <dbl>
#>1 a 1 1 0
#>2 b 1 0 0
#>3 c 1 1 1