Search code examples
pythonbinning

Grouping items by time-series bins in Python


I have data that looks like:

[[datetime1, label1],
 [datetime2, label2],
 [datetime3, label3]]

The labels are strings. I have a binning parameter (delta) that's a datetime.timedelta.

What I'm trying to do:

  1. Come up with the set of datetime bins, equally spaced by delta. In other words, below, datetimebin2 - datetimebin1 = datetimebin3 - datetimebin2 = delta .
  2. Bin the labels into those bins.

So I would end up with something like:

[[datetimebin1, [label1, label2],
 [datetimebin2, []],
 [datetimebin3, []],
 [datetimebin4, [label3]]

I've been pointed to pandas, but haven't found what I'm looking for. Any help is much appreciated!


Solution

  • Something along these lines should do:

    # data: a lists of lists (length 2) of measurements
    # res: resulting list of lists
    # delta: time delta
    
    # output list (will be a list of lists, as in the question
    
    res = []
    # end of first bin:
    binstart = data[0][0]
    res.append([binstart, []])
    
    # iterate through the data item
    for d in data:
        # if the data item belongs to this bin, append it into the bin
        if d[0] < binstart + delta:
            res[-1][1].append(d[1])
            continue
    
        # otherwise, create new empty bins until this data fits into a bin
        binstart += delta
        while d[0] > binstart + delta:
            res.append([binstart, [])
            binstart += delta
    
        # create a bin with the data
        res.append([binstart, [d[1]]])