I'm webscraping a site using python. The returned results have the following format, ( https://regex101.com/r/irr14u/10 ), where everything works ok apart from the last occassion where i get 2 matches for the dates (1st match:Thur.-Sun., Tue., Wed. and second match: Mon.)
I'm using the following code to get the values that i want. I use BeautifoulSoup to get movieDate string, but here i hardcoded it.
movieDate="Thur.-Sun., Tue., Wed.: 20.50/ 23.00, Mon. 23.00"
weekDays=re.match(',? *(?P<weekDays>[^\d:\n]+):? *(?P<startTime>[^,\n]+)', movieDate).groupdict()['weekDays']
startTime=re.match(',? *(?P<weekDays>[^\d:\n]+):? *(?P<startTime>[^,\n]+)', movieDate).groupdict()['startTime']
I want to create a dictionary as following (it has two keys because the are two startTime values); The first key will be Thur.-Sun., Tue., Wed. with value =20.50/ 23.00 and the second key will be Mon. with value=23:00. There might be occassions with one or more than two keys. So the dictionary will be as following;
dictionary={ Thur.-Sun., Tue., Wed.: 20.50/ 23.00, Mon.: 23.00}
Any suggestions to achieve that in a non boggy way?
You can achieve the desired output using finditer
function, appending result of the captured groups to a dict dynamically.
Python snippet:
import re
movieDate = """
Thur.-Sun., Tue., Wed.: 20.50/ 23.00, Mon. 23.00
"""
d = dict();
r = re.compile(',? *(?P<weekDays>[^\d:\n]+):? *(?P<startTime>[^,\n]+)')
for m in r.finditer(movieDate):
d[m.group(1)] = m.group(2)
print(d)
Prints:
{'Thur.-Sun., Tue., Wed.': '20.50/ 23.00', 'Mon. ': '23.00'}