I have a df with this structure
id 1 2 3 5
1 1 0 2 0
2 3 4 1 0
3 1 1 0 2
What I would like is to complete it as I was need it to conform to a format that goes on columns from 1 thru 6, so the expected result would be
id 1 2 3 4 5 6
1 1 0 2 0 0 0
2 3 4 1 0 0 0
3 1 1 0 0 2 0
This is an example 'missing' columns can vary in this example it was 4 and 6, so the idea is that if the column is missing it will be created and filled with zeros.
Thanks!
One way you could do this would be to reshape long, use tidyr::complete
to get the range of column names, then reshape wide. Since id
is unknown for the new column, I also drop the id = NA row.
Note, R doesn't always play well with numeric column names, and they are not considered syntactic. https://stat.ethz.ch/R-manual/R-devel/library/base/html/make.names.html
A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number.
But we can make a dataframe with numeric strings as the column names if we tell R not to check:
library(tidyverse)
data.frame(
check.names = FALSE,
id = c(1L, 2L, 3L),
`1` = c(1L, 3L, 1L),
`2` = c(0L, 4L, 1L),
`3` = c(2L, 1L, 0L),
`5` = c(0L, 0L, 2L)
) %>%
pivot_longer(-id, names_transform = as.numeric) %>%
complete(name = 1:6) %>%
pivot_wider(names_from = name, values_from = value, values_fill = 0) %>%
drop_na(id)
Result
# A tibble: 3 × 7
id `1` `2` `3` `4` `5` `6`
<int> <int> <int> <int> <int> <int> <int>
1 1 1 0 2 0 0 0
2 2 3 4 1 0 0 0
3 3 1 1 0 0 2 0