Search code examples
pythonregexjythondata-cleaningopenrefine

Return multiple results in OpenRefine using Python / Jython RegEx


So, I'm trying to extract a few dates displayed as dd.mm.yyyy.

Some of the cells contain only one date, some of them contain multiple dates (like from dd.mm.yyyy to dd.mm.yyyy), along with more texts I don't care about.

I would need to extract both dates in order to create two columns - "From" and "To", leaving blanks for the ones with the events which happened on only one date.

I've tried using the following expression in Python / Jython, but it only returns the first dates for the cells which contain more than one.

import re
g = re.search("([0-9])([0-9])\.([0-9])([0-9])\.([0-9])([0-9])([0-9])([0-9])", value)
return g.group()

How can I have both of the dates returned?

Thanks a lot!


Solution

  • You can use .findall() to get all matches. Also make your regex a bit simpler by removing capturing groups for each digits and using \d instead of [0-9]

    import re
    g = re.findall("\d{2}\.\d{2}\.\d{4}", value)
    return g