Search code examples
pythonregexgreedynon-greedy

De-greedifying a regular expression in python


I'm trying to write a regular expression that will convert a full path filename to a short filename for a given filetype, minus the file extension.

For example, I'm trying to get just the name of the .bar file from a string using

re.search('/(.*?)\.bar$', '/def_params/param_1M56/param/foo.bar')

According to the Python re docs, *? is the ungreedy version of *, so I was expecting to get

'foo'

returned for match.group(1) but instead I got

'def_params/param_1M56/param/foo'

What am I missing here about greediness?


Solution

  • What you're missing isn't so much about greediness as about regular expression engines: they work from left to right, so the / matches as early as possible and the .*? is then forced to work from there. In this case, the best regex doesn't involve greediness at all (you need backtracking for that to work; it will, but could take a really long time to run if there are a lot of slashes), but a more explicit pattern:

    '/([^/]*)\.bar$'