I need to calculate the duration from a specific date to now for elasticsearch index cleaning. My job will run in python. I have a configuration file:
indices:
- name: test
template: raw*
liveLength: 1d
How to parse string "1d" or "2m" to a valid time interval for calculating duration from a specific date from liveLength field?
You could use a regular expression to extract the number/time unit parts and then look up a multiplier in a dictionary. This way, it is a bit shorter and probably a whole lot more readable than your manual parsing and if/elif
chain.
>>> mult = {"s": 1, "m": 60, "h": 60*60, "d": 60*60*24}
>>> s = "2d 4h 13m 5.2s"
>>> re.findall(r"(\d+(?:\.\d)?)([smhd])", s)
[('2', 'd'), ('4', 'h'), ('3', 'm'), ('5.2', 's')]
>>> sum(float(x) * mult[m] for x, m in _)
187385.2
As a function:
def duration(string):
mult = {"s": 1, "m": 60, "h": 60*60, "d": 60*60*24}
parts = re.findall(r"(\d+(?:\.\d)?)([smhd])", string)
total_seconds = sum(float(x) * mult[m] for x, m in parts)
return timedelta(seconds=total_seconds)
print(duration("2d 4h 13m 5.2s"))
# 2 days, 4:13:05.200000
This will also ensure that the number part is actually a valid number (and not just any sequence of digits and dots). Also, it will raise an exception if any other than the allowed time units are used.
The function could be further optimized by pre-compiling the regex with re.compile
outside of the function. When I tested it with IPython's %timeit
, mine showed to be a bit faster (2.1µs vs. 2.8µs for yours, both without the timedelta
creation and with just float
instead of Decimal
). Also, I would consider this to be more readable by having a much more declarative and less imperative style, but that's certainly a matter of taste and preferences.