I want to calculate number of unique values of some parameter at certain time with two lists - one of values and one of timestamps (they contain millisecond info that is not really relevant and must be converted to seconds). Rn i have something like this
timestamps = ['00:22:33:645', '00:22:33:655', '00:22:34:645','00:22:34:745']
values = [1, 1, 2, 3]
grouped = groupby(zip(values, timestamps), lambda x: timestamp_to_seconds(x[1]))
but it results in
{1353:[(1, '00:22:33:645'), (1, '00:22:33:655')], 1354:[(2, '00:22:34:645'), (3, '00:22:34:745')]}
and i would prefer to keep only
{1353:[1, 1], 1354:[2, 3]}
so len(set(group))
would give accurate count. Is there a way to pass timestamps to key function without putting them in zip? Can lambda be skipped?
e: added actual example
You would have to post-process your groupby result. You can use a defaultdict
.
Given
import time
import datetime as dt
import collections as ct
timestamps = ["00:22:33:645", "00:22:33:655", "00:22:34:645","00:22:34:745"]
values = [1, 1, 2, 3]
# Helper
def timestamp_to_seconds(ts: str) -> int:
"""Return an int in total seconds from a timestamp."""
x = time.strptime(ts.rsplit(":", maxsplit=1)[0],"%H:%M:%S")
res = dt.timedelta(hours=x.tm_hour, minutes=x.tm_min, seconds=x.tm_sec).total_seconds()
return int(res)
Code
def regroup(tstamps: list, vals: list) -> dict:
"""Return a dict of seconds-value pairs."""
dd = ct.defaultdict(list)
for t, v in zip(tstamps, vals):
dd[timestamp_to_seconds(t)].append(v)
return dict(dd)
Demo
regroup(timestamps, values)
# {1353: [1, 1], 1354: [2, 3]}
{k: len(g) for k, g in regroup(timestamps, values).items()}
# {1353: 2, 1354: 2}
See also a post on converting timestamps to seconds.