Search code examples
pythonregexfindall

Failure to extract correct contents using re.findall() with python


I am using Python 2 and I have a string

c = """
if ( is.data.frame(by) && ncol(by)>1 ) { by_by = sapply( by, paste.me, collapse, split) }
  ldat = sapply(xdat, by_by )
  out = data.table::rbindlist( lapply(ldat, FUN, ...) )
  return(out)
"""

I'd like to extract the second argument within either sapply() or lapply() and hence expect to get ['paste.me','by_by','FUN']. Unfortunately, I get ['split', 'by_by', '...'] using the following code.

re.findall(r"\b[sl]apply\(.+,\s*([\w._]+?)[,\s)]", c)

I already used the non-greedy qualifier ?. Why it is still going for the longest pattern without stopping at the second ,.


Solution

  • The trouble is that the .+ is slurping up the first comma, you should change it to .+?, or better yet, [^,]+