I have data that has the following structure:
data <- data.frame(
uniqueid = c(1, 1, 2, 2, 3, 3),
year = c(2010, 2011, 2010, 2011, 2010, 2011),
agency = c("SZ", "SZ", "SZ", NA, "SZ", "HE"),
switch = c(0, 0, 0, NA, 0, 1)
)
As you can see, the data is organized by uniqueids appearing in a given year. Keep in mind that for the agency column there can be 13 different unique strings appearing across different uniqueids. I would like the data to look like the following:
data <- data.frame(
uniqueid = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3),
year = c(2010, 2010, 2011, 2011, 2010, 2010, 2011, 2011, 2011, 2010, 2010, 2011, 2011),
agency = c("SZ", "HE", "SZ", "HE", "SZ", "HE", "SZ", NA, "HE", "SZ", "HE", "SZ", "HE"),
switch = c(0, 0, 0, 0, 0, 0, NA, NA, NA, 0, 0, 0, 1)
)
In this transformation, the rows take on the different unique values of the agency variable, and the switch variable largely maps what it had previously looked like. I'm not really sure how to accomplish this in R, though I would prefer the solution to be within tidyverse. Thank you!
I've been trying something like the following but I don't seem to be getting exactly what I want:
data1 <- data %>%
pivot_wider(names_from = agency, values_from = lead, names_prefix = "agency_", values_fill = "0") %>%
gather(key = agency, value = lead, starts_with("agency_")) %>%
arrange(uniqueid, year, agency)
Here's yet another way:
Steps:
complete
the data (add a row for every variation of id, year, and just the "SZ" and "HE" agencies)data |>
complete(uniqueid, year, agency = c("SZ", "HE")) |>
mutate(switch = ifelse(!any(is.na(agency)) & is.na(switch), 0, switch), .by = c(uniqueid, year))
Output:
# A tibble: 13 × 4
uniqueid year agency switch
<dbl> <dbl> <chr> <dbl>
1 1 2010 HE 0
2 1 2010 SZ 0
3 1 2011 HE 0
4 1 2011 SZ 0
5 2 2010 HE 0
6 2 2010 SZ 0
7 2 2011 HE NA
8 2 2011 SZ NA
9 3 2010 HE 0
10 3 2010 SZ 0
11 3 2011 HE 1
12 3 2011 SZ 0
13 2 2011 NA NA