Search code examples
pythongraphaggregate-functionsanalysisdata-analysis

Algorithm to sum/stack values from a time series graph where data points don't match on time


I have a graphing/analysis problem i can't quite get my head around. I can do a brute force, but its too slow, maybe someone has a better idea, or knows or a speedy library for python?

I have 2+ time series data sets (x,y) that i want to aggregate (and subsequently plot). The issue is that the x values across the series don't match up, and i really don't want to resort to duplicating values into time bins.

So, given these 2 series:

S1: (1;100) (5;100) (10;100)
S2: (4;150) (5;100) (18;150)

When added together, should result in:

ST: (1;100) (4;250) (5;200) (10;200) (18;250)

Logic:

x=1 s1=100, s2=None, sum=100
x=4 s1=100, s2=150, sum=250 (note s1 value from previous value)
x=5 s1=100, s2=100, sum=200
x=10 s1=100, s2=100, sum=200
x=18 s1=100, s2=150, sum=250

My current thinking is to iterate a sorted list of keys(x), keep the previous value for each series, and query each set if it has a new y for the x.

Any ideas would be appreciated!


Solution

  • Here's another way to do it, putting more of the behaviour on the individual data streams:

    class DataStream(object):
        def __init__(self, iterable):
            self.iterable = iter(iterable)
            self.next_item = (None, 0)
            self.next_x = None
            self.current_y = 0
            self.next()
    
        def next(self):
            if self.next_item is None:
                raise StopIteration()
            self.current_y = self.next_item[1]
            try:
                self.next_item = self.iterable.next()
                self.next_x = self.next_item[0]
            except StopIteration:
                self.next_item = None
                self.next_x = None
            return self.next_item
    
        def __iter__(self):
            return self
    
    
    class MergedDataStream(object):
        def __init__(self, *iterables):
            self.streams = [DataStream(i) for i in iterables]
            self.outseq = []
    
        def next(self):
            xs = [stream.next_x for stream in self.streams if stream.next_x is not None]
            if not xs:
                raise StopIteration()
            next_x = min(xs)
            current_y = 0
            for stream in self.streams:
                if stream.next_x == next_x:
                    stream.next()
                current_y += stream.current_y
            self.outseq.append((next_x, current_y))
            return self.outseq[-1]
    
        def __iter__(self):
            return self
    
    
    if __name__ == '__main__':
        seqs = [
            [(1, 100), (5, 100), (10, 100)],
            [(4, 150), (5, 100), (18, 150)],
            ]
    
        sm = MergedDataStream(*seqs)
        for x, y in sm:
            print "%02s: %s" % (x, y)
    
        print sm.outseq