Search code examples
regexrstrsplit

R: searching within split character strings with apply


Within a large data frame, I have a column containing character strings e.g. "1&27&32" representing a combination of codes. I'd like to split each element in the column, search for a particular code (e.g. "1"), and return the row number if that element does in fact contain the code of interest. I was thinking something along the lines of:

apply(df["MEDS"],2,function(x){x.split<-strsplit(x,"&")if(grep(1,x.split)){return(row(x))}})

But I can't figure out where to go from there since that gives me the error:

Error in apply(df["MEDS"], 2, function(x) { : 
  dim(X) must have a positive length

Any corrections or suggestions would be greatly appreciated, thanks!


Solution

  • I see a couple of problems here (in addition to the missing semicolon in the function).

    1. df["MEDS"] is more correctly written df[,"MEDS"]. It is a single column. apply() is meant to operate on each column/row of a matrix as if they were vectors. If you want to operate on a single column, you don't need apply()

    2. strsplit() returns a list of vectors. Since you are applying it to a row at a time, the list will have one element (which is a character vector). So you should extract that vector by indexing the list element strsplit(x,"&")[[1]].

    3. You are returning row(x) is if the input to your function is a matrix or knows what row it came from. It does not. apply() will pull each row and pass it to your function as a vector, so row(x) will fail.

    There might be other issues as well. I didn't get it fully running.

    As I mentioned, you don't need apply() at all. You really only need to look at the 1 column. You don't even need to split it.

    OneRows <- which(grepl('(^|&)1(&|$)', df$MEDS))
    

    as Matthew suggested. Or if your intention is to subset the dataframe,

    newdf <- df[grepl((^|&)1(&|$)', df$MEDS),]