Search code examples
rdata-manipulationstringrtext-manipulation

How do I remove parts of strings using stringr and rebus?


I would like to use stringr and rebus to remove parts of strings in a dataframe. Specifically, I would like to remove the part where it starts with a space and a number till the end.

The following is my dataframe:

df<-data.frame(ID = 1:8, Medication = c("FOLIC ACID 5MG TABLET", "RIBAVIRIN 200MG TAB", "ACARBOSE 50MG TABLET", 
                                        "AmLODIPine 5MG TABLET", "MAGNESIUM TRISILICATE MIXTURE 200ML", 
                                        "RESONIUM 15G/60ML SUSPENSION", "CALCIUM & VIT D TABLET", NA))

My desired dataframe is:

df_new<-data.frame(ID = 1:8, Medication = c("FOLIC ACID", "RIBAVIRIN", "ACARBOSE", 
                                            "AmLODIPine", "MAGNESIUM TRISILICATE MIXTURE", 
                                            "RESONIUM", "CALCIUM & VIT D TABLET", NA))

I tried the following code but it only helps to remove the drug strength (e.g. 5MG) not the unit of measurement (e.g. TABLET):

df %>% mutate(Medication = str_replace(Medication, pattern = SPC %R% 
                                         one_or_more(DGT) %R% 
                                         one_or_more(WRD) %R%
                                         or(one_or_more(SPC), one_or_more(WRD)), 
                                       replace = ""))

How can I work on this?


Solution

  •   transform(df,Medication=sub("\\s\\d.*","",df$Medication))
      ID                    Medication
    1  1                    FOLIC ACID
    2  2                     RIBAVIRIN
    3  3                      ACARBOSE
    4  4                    AmLODIPine
    5  5 MAGNESIUM TRISILICATE MIXTURE
    6  6                      RESONIUM
    7  7        CALCIUM & VIT D TABLET
    8  8                          <NA>