Search code examples
rsplitextracttidyr

Splitting one column to three columns for uneven characters in r


I have tried to split a single column to three columns. But I failed. I have the following data set

> dat
 name
 Jhon Austin B 100kg
 Mick Gray C 110kg
 Tom Jef A 30kg

First I tried to extract last word using following codes

library(tidyr)

   dt<-dat %>% separate(name, into = c('name', 'pack'), sep = -6, convert = TRUE)

I got the following one

name           pack
Jhon Austin B  100kg
Mick Gray C    110kg
Tom Jef        A30kg

Where A was added with 30 kg. Though both should be in separate column. My final result should be like this

name         class   pack
Jhon Austin   B      100kg
Mick Gray     C      110kg
Tom Jef       A      30kg

I will be grateful if anyone helps me. Thanks in advance.


Solution

    • Option 1

    You could try separate_wider_regex

    dat %>%
        separate_wider_regex(
            name,
            patterns = c(name = ".*", " ", class = "\\w", " ", pack = "\\d+kg")
        )
    
    • Option 2

    With base R, you can try sub + read.table

    with(
        dat,
        setNames(
            read.table(
                text =
                    sub("^(.*)\\s(\\w)\\s(\\d+.*)$", "\\1_\\2_\\3", name),
                sep = "_"
            ),
            c("name", "class", "pack")
        )
    )
    

    which gives

    # A tibble: 3 × 3
      name        class  pack
      <chr>       <chr> <chr>
    1 Jhon Austin B     100kg
    2 Mick Gray   C     110kg
    3 Tom Jef     A     30kg
    

    data

    dat <- data.frame(
        name = c(
            "Jhon Austin B 100kg",
            "Mick Gray C 110kg",
            "Tom Jef A 30kg"
        )
    )