I have a data frame that gets dynamically generated based on user inputs. Imagine two data frames:
df1 <- data.frame(A=c(1,2,1,4),
B=c(4,3,4,1),
C=c(3,2,3,4),
Y=c(2,7,6,3),
Z=c(5,8,9,10))
df2 <- data.frame(A=c(1,2,1,4),
B=c(4,3,4,1),
C=c(3,2,3,4),
D=c(2,1,2,3),
Y=c(2,7,6,3),
Z=c(5,8,9,10))
Row number 3 is a duplicate of row number 1 based on columns A,B,C, and D (for df2). I'm aware that normally I could use
library(dplyr)
df1 |> distinct(A,B,C, .keep_all=TRUE)
df2 |> distinct(A,B,C,D, .keep_all=TRUE)
but the issue is that columns A-D are dynamically generated. It may be columns A and B, it may be columns A-F... The only constants are columns Y and Z. Is there a way to use distinct()
, but for all columns EXCEPT Y and Z?
I thought to get the column names into a list, remove the two variables, and substitute that into the distinct call
names <- colnames(df2)
names <- names[names != c("Y", "Z")]
df2 |> distinct(names, .keep_all=TRUE)
But of course it is looking for a column called "names". Any help would be greatly appreciated.
Solution was provided by @Ben.
df2 |> distinct(pick(-c("Y", "Z")), .keep_all = TRUE)