Search code examples
rdplyrna

Removing cells with NA values (not columns, no rows, not trimming)


Say I have the following dataset:

A B C
1 NA NA
NA 2 NA
NA NA 3

and I am trying to get

A B C
1 2 3

The context for this is that I have such a dataset, where there are values if A is non-empty, B/C are always NA and same goes for B/C. Moreover, the lengths of ABC are not the same.

These are values I then use to calculate overlap coefficient with overlap(x,y) from library(bayestestR). However, if I leave NAs in, the results for overlap(a,a), i.e. the same set, are not 1 but .98 or something along those lines. I am not sure what to do. Please advise! Thanks in advance.

If someone has a tip on how to prompt overlap() to ignore NAs, that would work as well I guess!

I tried

df <- data.table(df)[, lapply(.SD, function(x) x[order(is.na(x))])]
df[!result[, Reduce(`&`, lapply(.SD, is.na))]]
df[complete.cases(df), ]

But it still left NA values, even plenty of rows where all values are NA! This bit of code helps to move up all the non-NA values to the top, but NAs are still there.


Solution

  • If your data only have one numeric value in each column, you can use colSums with na.rm = TRUE. If you want the output in a data.frame, wrap it in data.frame(as.list(...)):

    data.frame(as.list(colSums(df, na.rm = TRUE)))
    
    #   A B C
    # 1 1 2 3