Search code examples
rtidyrspread

Spread and Gather table return duplicated rows with NA values


I have a table with categories and sub categories encoded in this format of columns name:

Date| Admissions__0 |Attendance__0 |Tri_1__0|Tri_2__0|... Tri_1__1|Tri_2__1|...|

and I would like to change it to this format of columns using spread and gather function of tidyverse:

Date| Country code| Admissions| Attendance| Tri_1|Tri_2|...

I tried a solution posted but the outcome actually return multiple rows with NA rather than a single row.

My code used:

temp <- data %>% gather(key="columns",value ="dt",-Date) 

temp <- temp %>% mutate(category = gsub(".*__","",columns)) %>% mutate(columns = gsub("__\\d","",columns))

temp %>%  mutate(row = row_number()) %>% spread(key="columns",value="dt") 

And my results is:

Date        country_code   row admissions attendance Tri_1 Tri_2 Tri_3 Tri_4 Tri_5
   <chr>       <chr>        <int> <chr>      <chr>      <chr> <chr> <chr> <chr> <chr>
 1 01-APR-2014 0              275 NA         209        NA    NA    NA    NA    NA   
 2 01-APR-2014 0              640 84         NA         NA    NA    NA    NA    NA   
 3 01-APR-2014 0             1005 NA         NA         5     NA    NA    NA    NA   
 4 01-APR-2014 0             1370 NA         NA         NA    33    NA    NA    NA   
 5 01-APR-2014 0             1735 NA         NA         NA    NA    62    NA    NA   
 6 01-APR-2014 0             2100 NA         NA         NA    NA    NA    80    NA   
 7 01-APR-2014 0             2465 NA         NA         NA    NA    NA    NA    29   
 8 01-APR-2014 1             2830 NA         138        NA    NA    NA    NA    NA   
 9 01-APR-2014 1             3195 66         NA         NA    NA    NA    NA    NA   
10 01-APR-2014 1             3560 NA         NA         N/A   NA    NA    NA    NA  

My expected results:

Date        country_code   row admissions attendance Tri_1 Tri_2 Tri_3 Tri_4 Tri_5
   <chr>       <chr>        <int> <chr>      <chr>      <chr> <chr> <chr> <chr> <chr>
 1 01-APR-2014 0              275 84         209        5    33    62    80    29   
 8 01-APR-2014 1             2830 66         138        66   ...   ...   ...   ...   

Solution

  • We can do a summarise_at coalesce to remove the NA elements after the spread

    library(tidyverse)
    data %>% 
      gather(key = "columns", val = "dt", -Date, na.rm = TRUE) %>%
      mutate(category = gsub(".*__","",columns)) %>%
      mutate(columns = gsub("__\\d","",columns)) %>% 
      group_by(Date, dt, columns, category) %>% 
      mutate(rn = row_number()) %>%
      spread(columns, dt) %>% 
      select(-V1) %>%
      summarise_at(vars(Admissions:Tri_5),list(~ coalesce(!!! .))) # %>%
      # filter if needed
      #filter_at(vars(Admissions:Tri_5), all_vars(!is.na(.)))