I have a graphing/analysis problem i can't quite get my head around. I can do a brute force, but its too slow, maybe someone has a better idea, or knows or a speedy library for python?
I have 2+ time series data sets (x,y) that i want to aggregate (and subsequently plot). The issue is that the x values across the series don't match up, and i really don't want to resort to duplicating values into time bins.
So, given these 2 series:
S1: (1;100) (5;100) (10;100)
S2: (4;150) (5;100) (18;150)
When added together, should result in:
ST: (1;100) (4;250) (5;200) (10;200) (18;250)
Logic:
x=1 s1=100, s2=None, sum=100
x=4 s1=100, s2=150, sum=250 (note s1 value from previous value)
x=5 s1=100, s2=100, sum=200
x=10 s1=100, s2=100, sum=200
x=18 s1=100, s2=150, sum=250
My current thinking is to iterate a sorted list of keys(x), keep the previous value for each series, and query each set if it has a new y for the x.
Any ideas would be appreciated!
Here's another way to do it, putting more of the behaviour on the individual data streams:
class DataStream(object):
def __init__(self, iterable):
self.iterable = iter(iterable)
self.next_item = (None, 0)
self.next_x = None
self.current_y = 0
self.next()
def next(self):
if self.next_item is None:
raise StopIteration()
self.current_y = self.next_item[1]
try:
self.next_item = self.iterable.next()
self.next_x = self.next_item[0]
except StopIteration:
self.next_item = None
self.next_x = None
return self.next_item
def __iter__(self):
return self
class MergedDataStream(object):
def __init__(self, *iterables):
self.streams = [DataStream(i) for i in iterables]
self.outseq = []
def next(self):
xs = [stream.next_x for stream in self.streams if stream.next_x is not None]
if not xs:
raise StopIteration()
next_x = min(xs)
current_y = 0
for stream in self.streams:
if stream.next_x == next_x:
stream.next()
current_y += stream.current_y
self.outseq.append((next_x, current_y))
return self.outseq[-1]
def __iter__(self):
return self
if __name__ == '__main__':
seqs = [
[(1, 100), (5, 100), (10, 100)],
[(4, 150), (5, 100), (18, 150)],
]
sm = MergedDataStream(*seqs)
for x, y in sm:
print "%02s: %s" % (x, y)
print sm.outseq