Search code examples
rtidyverse

Create columns if missing in dataframe with R (tidyverse)


I have a df with this structure

id 1 2 3 5 
1  1 0 2 0
2  3 4 1 0
3  1 1 0 2

What I would like is to complete it as I was need it to conform to a format that goes on columns from 1 thru 6, so the expected result would be

id 1 2 3 4 5 6 
1  1 0 2 0 0 0
2  3 4 1 0 0 0
3  1 1 0 0 2 0

This is an example 'missing' columns can vary in this example it was 4 and 6, so the idea is that if the column is missing it will be created and filled with zeros.

Thanks!


Solution

  • One way you could do this would be to reshape long, use tidyr::complete to get the range of column names, then reshape wide. Since id is unknown for the new column, I also drop the id = NA row.

    Note, R doesn't always play well with numeric column names, and they are not considered syntactic. https://stat.ethz.ch/R-manual/R-devel/library/base/html/make.names.html

    A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.

    But we can make a dataframe with numeric strings as the column names if we tell R not to check:

    library(tidyverse)
    data.frame(                 
      check.names = FALSE,          
               id = c(1L, 2L, 3L),
              `1` = c(1L, 3L, 1L),
              `2` = c(0L, 4L, 1L),
              `3` = c(2L, 1L, 0L),
              `5` = c(0L, 0L, 2L)
    ) %>% 
      pivot_longer(-id, names_transform = as.numeric) %>%
      complete(name = 1:6) %>%
      pivot_wider(names_from = name, values_from = value, values_fill = 0) %>%
      drop_na(id)
    

    Result

    # A tibble: 3 × 7
         id   `1`   `2`   `3`   `4`   `5`   `6`
      <int> <int> <int> <int> <int> <int> <int>
    1     1     1     0     2     0     0     0
    2     2     3     4     1     0     0     0
    3     3     1     1     0     0     2     0