python-3.x list datetime data-structures python-multithreading

How to execute python function at the given datetime

I have a list of dictionaries. Each item contains a datetime field in string format:

items = [{"Name":"Fooo","Time":"2 Jun, 7:20PM","Location":"LA"},
{"Name":"Yeam","Time":"27 Jun, 9:20PM","Location":"CA"},
{"Name":"Bar","Time":"12 Aug, 7:50PM","Location":"NY"},
{"Name":"Ahoy","Time":"20 Jul, 3:20AM","Location":"TX"}]

def myawesomefunc(item):
    # Do something awesome
    # and return the result
    pass

Now I want to invoke myawesomefunc for each item that satisfy:

datetime.now() >= datetime.strptime(item['Time'], '%d %b, %I:%M%p')

I can't sort items because it will changes continuously. Since the list may contain 30k+ items, iterating throw each item in items will be very time-consuming.

So how do I do this?

Solution

I suggest to use some sort of data structure that make search and insertion more efficient, for example Binary Search Tree (BST).

Let us define some notation:

SubTree(N) : Function that return the set of descendent nodes of N including N.

Parent(N) : Function that return the parent of N.

X.left, X.right : The left and right children of a node X.

In case of BST, your search key will be the timestamp of each item. At equal intervals you will search for a node X with a key less or equal to datetime.now(), and you will execute myawesomefunc for each node in the set S:

S = {X} ⋃ SubTree(X.left) ⋃ (SubTree(X.right) if X.right <= datetime.now() else {})

Then you have to update your tree to exclude all processed node:

Parent(X).left = None if X.right <= datetime.now() else X.right

The insertion of new item is straight forward (normal insertion as any BST).

Now regarding execution of myawesomefunc you have two cases:

myawesomefunc is IO/bound operation: use ThreadPoolExecutor.
myawesomefunc is CPU/bound operation: use ProcessPoolExecutor.