I'm trying to parse a formatted string. I need to know how many hours, minutes and seconds every project I retrieve has been worked on.
The data I receive is in this format, example:
PT5H12M3S, this means 5 hours 12 minutes 3 seconds.
However, if there is less than an hour of work, it will just not be displayed:
PT12M3S, this means 12 minutes 3 seconds.
Even more, if there has not been worked on a project (or only for less than a minute) the data will be displayed as so:
PT0S
If a project only has full hours worked on it, it will be displayed as:
PT5H
I tried parsing the data with the following code:
estimated = track_data['project']['estimate']['estimate'].split('PT')[1]
estimated_hours = estimated.split('H')[0]
estimated_minutes = estimated_hours.split('M')[0]
estimated_seconds = estimated_minutes.split('S')[0]
but this solution only works if the data is in the format of PT5H12M3S. All the other formats, this goes wrong. If I, for example, get the data PT5H, then estimated hours will be 5, but also estimated minutes and seconds will be 5 as well. Obviously this is not what we want.
Is there anybody who can give me guidance on where to look? I tried some other things with split but it does not seem to work because if it can't find the 'M' or 'S' it will just keep repeating the same number.
Hope this makes sense and thanks in advance.
You can use regular expressions for that:
import re
PROJECT_TIME_REGEX = re.compile(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?')
def get_project_time(s):
m = PROJECT_TIME_REGEX.match(s)
if not m:
raise ValueError('invalid string')
hour, min, sec = (int(g) if g is not None else 0 for g in m.groups())
return hour, min, sec
print(get_project_time('PT5H12M3S'))
# (5, 12, 3)
print(get_project_time('PT12M3S'))
# (0, 12, 3)
print(get_project_time('PT0S'))
# (0, 0, 0)
print(get_project_time('PT5H'))
# (5, 0, 0)