Search code examples
pythonpython-itertools

passing arguments to key function in itertools.groupby() to count unique values for keys


I want to calculate number of unique values of some parameter at certain time with two lists - one of values and one of timestamps (they contain millisecond info that is not really relevant and must be converted to seconds). Rn i have something like this

timestamps = ['00:22:33:645', '00:22:33:655', '00:22:34:645','00:22:34:745']
values = [1, 1, 2, 3]

grouped = groupby(zip(values, timestamps), lambda x: timestamp_to_seconds(x[1]))

but it results in

{1353:[(1, '00:22:33:645'), (1, '00:22:33:655')], 1354:[(2, '00:22:34:645'), (3, '00:22:34:745')]}

and i would prefer to keep only {1353:[1, 1], 1354:[2, 3]} so len(set(group)) would give accurate count. Is there a way to pass timestamps to key function without putting them in zip? Can lambda be skipped?

e: added actual example


Solution

  • You would have to post-process your groupby result. You can use a defaultdict.

    Given

    import time
    import datetime as dt
    import collections as ct
    
    
    timestamps = ["00:22:33:645", "00:22:33:655", "00:22:34:645","00:22:34:745"]
    values = [1, 1, 2, 3]
    
    
    # Helper
    def timestamp_to_seconds(ts: str) -> int:
        """Return an int in total seconds from a timestamp."""
        x = time.strptime(ts.rsplit(":", maxsplit=1)[0],"%H:%M:%S")
        res = dt.timedelta(hours=x.tm_hour, minutes=x.tm_min, seconds=x.tm_sec).total_seconds()
        return int(res)
    

    Code

    def regroup(tstamps: list, vals: list) -> dict:
        """Return a dict of seconds-value pairs."""
        dd = ct.defaultdict(list)
    
        for t, v in zip(tstamps, vals):        
            dd[timestamp_to_seconds(t)].append(v)
    
        return dict(dd)
    

    Demo

    regroup(timestamps, values)
    # {1353: [1, 1], 1354: [2, 3]}
    
    {k: len(g) for k, g in regroup(timestamps, values).items()}
    
    # {1353: 2, 1354: 2}
    

    See also a post on converting timestamps to seconds.