Search code examples
rsplit

text slitting in R


item<-c("A","B")
Type<-c("P400","1200C")

> test_df<-data.frame(item, Type)
> test_df
  item  Type
1    A  P400
2    B 1200C

test_df<-test_df |> tidyr::separate(Type, into = c("A", "B"), sep = "(?<=[0-9])(?=[A-Za-z])")

> test_df
  item    A    B
1    A P400 <NA>
2    B 1200    C

Hi all, with tidyr::separate, I am able to split test_df$Type into number and text, the code splits 1200C into 1200 & C, however, for P400, I would have expected P & 400, yet NA is returned. Can you please give me a hand? Cheers.


Solution

  • Another approach that might work for you. Here I first extract the numeric part for A, and then remove that from the text to get B.

    library(tidyverse)
    test_df |> 
      mutate(A = parse_number(Type),
             B = Type |> str_remove(as.character(A)))
    
    
      item  Type    A B
    1    A  P400  400 P
    2    B 1200C 1200 C