Search code examples
rsummarize

R: summarise a dataframe with NAN in columns


I have the following data set:

  Data1      Data2      Data3
      3        NAN        NAN
      2        NAN        NAN
      3        NAN        NAN
    NAN          3        NAN
    NAN          5        NAN
    NAN          3        NAN
    NAN        NAN          7
    NAN        NAN          5
    NAN        NAN          1

I'd like to summarise it into this:

  Data1   Data2      Data3
      3       3          7
      2       5          5
      3       3          1

I've trying creating a row number, grouping by row number and applying summarise, but it's just filling the NaNs with 0.

data = data %>% 
  mutate(row = row_number()) %>% 
  dplyr::group_by(row) %>% 
  dplyr::summarise(across(c("Data1","Data2","Data3"), ~sum(., na.rm=T))) %>% 
  distinct(.)

Solution

  • If you have same number of NaN's in each column as shown in the example you can use na.omit to drop those values.

    library(dplyr)
    
    df %>% summarise(across(.fns = na.omit))
    #If in your data values are string 'NAN' then use the below
    #df %>% summarise(across(.fns = ~.x[.x!= 'NAN']))
    
    #  Data1 Data2 Data3
    #1     3     3     7
    #2     2     5     5
    #3     3     3     1
    

    In base R -

    as.data.frame(sapply(df, na.omit))
    

    data

    It is easier to help if you provide data in a reproducible format

    df <- structure(list(Data1 = c(3, 2, 3, NaN, NaN, NaN, NaN, NaN, NaN
    ), Data2 = c(NaN, NaN, NaN, 3, 5, 3, NaN, NaN, NaN), Data3 = c(NaN, 
    NaN, NaN, NaN, NaN, NaN, 7, 5, 1)), row.names = c(NA, -9L), class = "data.frame")