I would like to use stringr
and rebus
to remove parts of strings in a dataframe. Specifically, I would like to remove the part where it starts with a space and a number till the end.
The following is my dataframe:
df<-data.frame(ID = 1:8, Medication = c("FOLIC ACID 5MG TABLET", "RIBAVIRIN 200MG TAB", "ACARBOSE 50MG TABLET",
"AmLODIPine 5MG TABLET", "MAGNESIUM TRISILICATE MIXTURE 200ML",
"RESONIUM 15G/60ML SUSPENSION", "CALCIUM & VIT D TABLET", NA))
My desired dataframe is:
df_new<-data.frame(ID = 1:8, Medication = c("FOLIC ACID", "RIBAVIRIN", "ACARBOSE",
"AmLODIPine", "MAGNESIUM TRISILICATE MIXTURE",
"RESONIUM", "CALCIUM & VIT D TABLET", NA))
I tried the following code but it only helps to remove the drug strength (e.g. 5MG) not the unit of measurement (e.g. TABLET):
df %>% mutate(Medication = str_replace(Medication, pattern = SPC %R%
one_or_more(DGT) %R%
one_or_more(WRD) %R%
or(one_or_more(SPC), one_or_more(WRD)),
replace = ""))
How can I work on this?
transform(df,Medication=sub("\\s\\d.*","",df$Medication))
ID Medication
1 1 FOLIC ACID
2 2 RIBAVIRIN
3 3 ACARBOSE
4 4 AmLODIPine
5 5 MAGNESIUM TRISILICATE MIXTURE
6 6 RESONIUM
7 7 CALCIUM & VIT D TABLET
8 8 <NA>