I am doing data preprocessing and am stuck at a problem.I have data like Telma 2525 mg tablet. I want it to be converted to Telma 25 mg tablet.Can this be done?
Thanks
gusb()
> x<-rep("Telma 2525 mg tablet",10)
> x
[1] "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet"
[6] "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet" "Telma 2525 mg tablet"
> gsub("Telma 2525 mg tablet","Telma 25 mg tablet",x)
[1] "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet"
[6] "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet" "Telma 25 mg tablet"
where x
is your data source
EDIT - UPDATED TO MAKE IT GENERIC
d<-data.frame(t=c("blah blah 2525 mg", "blah blah 7272 mg"),stringsAsFactors=F)
remdup<-function(s){
f<-regexec("[0-9]{4}",s)[[1]][1] # find the start point for 4 digits in a row
sub(substr(s,f,f+1),"",s) # remove the first match of the first 2 digits
}
lapply(d$t,FUN=function(x)remdup(x))
#[[1]]
#[1] "blah blah 25 mg"
#
#[[2]]
#[1] "blah blah 72 mg"