I have a table with categories and sub categories encoded in this format of columns name:
Date| Admissions__0 |Attendance__0 |Tri_1__0|Tri_2__0|... Tri_1__1|Tri_2__1|...|
and I would like to change it to this format of columns using spread and gather function of tidyverse:
Date| Country code| Admissions| Attendance| Tri_1|Tri_2|...
I tried a solution posted but the outcome actually return multiple rows with NA rather than a single row.
My code used:
temp <- data %>% gather(key="columns",value ="dt",-Date)
temp <- temp %>% mutate(category = gsub(".*__","",columns)) %>% mutate(columns = gsub("__\\d","",columns))
temp %>% mutate(row = row_number()) %>% spread(key="columns",value="dt")
And my results is:
Date country_code row admissions attendance Tri_1 Tri_2 Tri_3 Tri_4 Tri_5
<chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 01-APR-2014 0 275 NA 209 NA NA NA NA NA
2 01-APR-2014 0 640 84 NA NA NA NA NA NA
3 01-APR-2014 0 1005 NA NA 5 NA NA NA NA
4 01-APR-2014 0 1370 NA NA NA 33 NA NA NA
5 01-APR-2014 0 1735 NA NA NA NA 62 NA NA
6 01-APR-2014 0 2100 NA NA NA NA NA 80 NA
7 01-APR-2014 0 2465 NA NA NA NA NA NA 29
8 01-APR-2014 1 2830 NA 138 NA NA NA NA NA
9 01-APR-2014 1 3195 66 NA NA NA NA NA NA
10 01-APR-2014 1 3560 NA NA N/A NA NA NA NA
My expected results:
Date country_code row admissions attendance Tri_1 Tri_2 Tri_3 Tri_4 Tri_5
<chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 01-APR-2014 0 275 84 209 5 33 62 80 29
8 01-APR-2014 1 2830 66 138 66 ... ... ... ...
We can do a summarise_at
coalesce
to remove the NA
elements after the spread
library(tidyverse)
data %>%
gather(key = "columns", val = "dt", -Date, na.rm = TRUE) %>%
mutate(category = gsub(".*__","",columns)) %>%
mutate(columns = gsub("__\\d","",columns)) %>%
group_by(Date, dt, columns, category) %>%
mutate(rn = row_number()) %>%
spread(columns, dt) %>%
select(-V1) %>%
summarise_at(vars(Admissions:Tri_5),list(~ coalesce(!!! .))) # %>%
# filter if needed
#filter_at(vars(Admissions:Tri_5), all_vars(!is.na(.)))