I have a dataset like this:
structure(list(`Frequency
Percent` = c("car", "window", "ball",
"ups"), AI = c("2\n0.00", "3\n0.00", "1\n0.00", "2\n0.00"), BLK = c("0\n0.00",
"218\n0.29", "48\n0.06", "0\n0.00"), HIANIC = c("1\n0.00", "8\n0.01",
"4\n0.01", "0\n0.00"), NATRICAN = c("9\n0.01", "7\n0.01", "8\n0.01",
"0\n0.00"), UNK = c("15\n0.02", "83\n0.11", "36\n0.05", "0\n0.00"
), yy = c("111\n0.15", "897\n1.20", "756\n1.02", "1\n0.00")), class = "data.frame", row.names = c(NA,
-4L))
How can I split each row by "" and remove n to make two new columns. For instance, car
and AI
cell (“2\n0.00”), I will have 2 and 0.00 in two different columns.
Using tidyr::separate_rows
and tidyr::pivot_wider
you could do:
library(tidyr)
library(dplyr)
dat |>
mutate(unit = c("n\npct")) |>
separate_rows(-1, sep = "\n") |>
pivot_wider(names_from = "unit", values_from = -1)
#> # A tibble: 4 × 15
#> Frequency\n…¹ AI_n AI_pct BLK_n BLK_pct HIANI…² HIANI…³ NATRI…⁴ NATRI…⁵ UNK_n
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 car 2 0.00 0 0.00 1 0.00 9 0.01 15
#> 2 window 3 0.00 218 0.29 8 0.01 7 0.01 83
#> 3 ball 1 0.00 48 0.06 4 0.01 8 0.01 36
#> 4 ups 2 0.00 0 0.00 0 0.00 0 0.00 0
#> # … with 5 more variables: UNK_pct <chr>, yy_n <chr>, yy_pct <chr>,
#> # unit_n <chr>, unit_pct <chr>, and abbreviated variable names
#> # ¹`Frequency\n Percent`, ²HIANIC_n, ³HIANIC_pct, ⁴NATRICAN_n,
#> # ⁵NATRICAN_pct