Within a large data frame, I have a column containing character strings e.g. "1&27&32" representing a combination of codes. I'd like to split each element in the column, search for a particular code (e.g. "1"), and return the row number if that element does in fact contain the code of interest. I was thinking something along the lines of:
apply(df["MEDS"],2,function(x){x.split<-strsplit(x,"&")if(grep(1,x.split)){return(row(x))}})
But I can't figure out where to go from there since that gives me the error:
Error in apply(df["MEDS"], 2, function(x) { :
dim(X) must have a positive length
Any corrections or suggestions would be greatly appreciated, thanks!
I see a couple of problems here (in addition to the missing semicolon in the function).
df["MEDS"]
is more correctly written df[,"MEDS"]
. It is a single column. apply()
is meant to operate on each column/row of a matrix as if they were vectors. If you want to operate on a single column, you don't need apply()
strsplit()
returns a list of vectors. Since you are applying it to a row at a time, the list will have one element (which is a character vector). So you should extract that vector by indexing the list element strsplit(x,"&")[[1]]
.
You are returning row(x)
is if the input to your function is a matrix or knows what row it came from. It does not. apply()
will pull each row and pass it to your function as a vector, so row(x)
will fail.
There might be other issues as well. I didn't get it fully running.
As I mentioned, you don't need apply()
at all. You really only need to look at the 1 column. You don't even need to split it.
OneRows <- which(grepl('(^|&)1(&|$)', df$MEDS))
as Matthew suggested. Or if your intention is to subset the dataframe,
newdf <- df[grepl((^|&)1(&|$)', df$MEDS),]