python parsing text-processing text-parsing string-parsing

Python: parsing text of unknown length

I have a database full of strings such as:

as.web.product.viewed(AT)2018-01-28T19:00:52.032Z(THEN)as.web.product.viewed(AT)2018-01-28T19:02:20.132Z

(another possible delimiter is "(WITH)" and action is as.web.product.purchased so ideally I'd need a solution that is as generic as possible)

There can be any number of actions in a sequence, and in more or less any order. I need to be able to isolate the action name (such as as.web.product.viewed) and the time at which it happened, as well as maintain the order of the actions.

What would be the most Python-esque way of doing this?

EDIT: desired output (for the example above) - 2 lists such as:

['as.web.product.viewed','as.web.product.viewed']
['2018-01-28T19:00:52.032Z','2018-01-28T19:02:20.132Z']

Solution

You could use a regular expression to split the string when text in round brackets occur:

import re
pat = re.compile('''\([A-Za-z]+\)''')
s = "as.web.product.viewed(AT)2018-01-28T19:00:52.032Z(THEN)as.web.product.viewed(AT)2018-01-28T19:02:20.132Z"
r = (re.split(pat, s))
print (list(zip(r[::2], r[1::2]))) # group pairwise if needed !

This returns:

[('as.web.product.viewed', '2018-01-28T19:00:52.032Z'), ('as.web.product.viewed', '2018-01-28T19:02:20.132Z')]