Search code examples
rtidy

Separate numeric columns without special characters in R


I want to separate the variable "population" in two different columns. The first one ("pop1") must be composed by the first 2 values. The second one ("pop2"), the last value.

df <- dplyr::tibble(
  city = c("a", "a", "b", "b", "c", "c"), 
  sex = c(1,0,1,0,1,0),
  age = c(1,2,1,2,1,2),
  population = c(100, 123, 189, 234, 221, 435),
  accidents = c(87, 98, 79, 43,45,65)
)

Expected output


df <- dplyr::tibble(
  city = c("a", "a", "b", "b", "c", "c"), 
  sex = c(1,0,1,0,1,0),
  age = c(1,2,1,2,1,2),
  pop1 = c(10, 12, 18, 23, 22, 43),
  pop2 = c(0,3,9,4,1,5),
  accidents = c(87, 98, 79, 43,45,65)
)

Thanks


Solution

  • Another solution based on extract:

    library(tidyr)
    
    df %>%
      extract(population,
              into = c("pop1", "pop2"),
              regex = "(\\d\\d)(\\d)")
    # A tibble: 6 × 6
      city    sex   age pop1  pop2  accidents
      <chr> <dbl> <dbl> <chr> <chr>     <dbl>
    1 a         1     1 10    0            87
    2 a         0     2 12    3            98
    3 b         1     1 18    9            79
    4 b         0     2 23    4            43
    5 c         1     1 22    1            45
    6 c         0     2 43    5            65