I have a database full of strings such as:
as.web.product.viewed(AT)2018-01-28T19:00:52.032Z(THEN)as.web.product.viewed(AT)2018-01-28T19:02:20.132Z
(another possible delimiter is "(WITH)" and action is as.web.product.purchased
so ideally I'd need a solution that is as generic as possible)
There can be any number of actions in a sequence, and in more or less any order. I need to be able to isolate the action name (such as as.web.product.viewed
) and the time at which it happened, as well as maintain the order of the actions.
What would be the most Python-esque way of doing this?
EDIT: desired output (for the example above) - 2 lists such as:
['as.web.product.viewed','as.web.product.viewed']
['2018-01-28T19:00:52.032Z','2018-01-28T19:02:20.132Z']
You could use a regular expression to split the string when text in round brackets occur:
import re
pat = re.compile('''\([A-Za-z]+\)''')
s = "as.web.product.viewed(AT)2018-01-28T19:00:52.032Z(THEN)as.web.product.viewed(AT)2018-01-28T19:02:20.132Z"
r = (re.split(pat, s))
print (list(zip(r[::2], r[1::2]))) # group pairwise if needed !
This returns:
[('as.web.product.viewed', '2018-01-28T19:00:52.032Z'), ('as.web.product.viewed', '2018-01-28T19:02:20.132Z')]