Search code examples
rstringtm

deleting all not latin characters in R


here two strings

*3472459 PIVO 何か-何か-何か/100х1,5g

*3472459 VINO 何か何か何か100х1,5g

How to deleting all not latin characters? output should be

PIVO
Vino

Solution

  • given the text string in text, str_extract from stringr or stri_extract from stringi returns the expected result.

    text <- c("*3472459 PIVO 何か-何か-何か/100х1,5g",
              "*3472459 VINO 何か何か何か100х1,5g")
    
    stringr::str_extract(text, "[:alpha:]+")
    [1] "PIVO" "VINO"
    
    stringi::stri_extract(text, regex = "[:alpha:]+")
    [1] "PIVO" "VINO"