This is a little subset of the data :
I have :
Id var1 var2
1 POS NA
1 NA NEG
2 NEG NA
2 NA NEG
3 POS NA
3 NA NEG
4 POS POS
5 POS NA
My ideal output
Id var1 var2
1 POS NEG
2 NEG NEG
3 POS NEG
4 POS POS
5 POS NA
I would simply like to delete duplicated Id and have one row per unique id with the good result in var1 and var2. Anyone see the issue? Help would be greatly appreciated. Thank you !
You could try a solution with na.omit
. This function will remove NA
within each group. Assuming your data frame is df
...
In base R:
aggregate(. ~ Id,
data = df,
FUN = function(x) {
y = na.omit(x)
y[length(y) == 0] <- NA
y
},
na.action = "na.pass")
Note that y[length(y) == 0]
is included to ensure cases like Id
5 and var2
are NA
and not character(0)
.
With dplyr
:
library(dplyr)
df %>%
group_by(Id) %>%
summarise(across(everything(), ~ first(na.omit(.))))
Using first
will include the first value within the group after NA
removed. across(everything())
will apply this method to all columns.
With data.table
:
library(data.table)
setDT(df)[, lapply(.SD, na.omit), by = Id]