Search code examples
rstringrstringi

How to using `regexp` to remove all the character not in chinese and english


There is ori_string ,how to using regexp to remove all the character not in chinese and english? Thanks!

ori_string<-"没a w t _ 中/国.sz"

the wished result is

  "没awt中国sz"

Solution

  • I have coded it in python, as you didn't specify anything. The idea is here.

    def remove_non_english_chinese(text):
        # Use a regex pattern to match any character that is not a letter or number
        pattern = r'[^a-zA-Z0-9\u4e00-\u9fff]'
    
        # Replace all non-English and non-Chinese characters with an empty string
        return re.sub(pattern, '', text)