Search code examples
python-3.xlistdatetimedata-structurespython-multithreading

How to execute python function at the given datetime


I have a list of dictionaries. Each item contains a datetime field in string format:

items = [{"Name":"Fooo","Time":"2 Jun, 7:20PM","Location":"LA"},
{"Name":"Yeam","Time":"27 Jun, 9:20PM","Location":"CA"},
{"Name":"Bar","Time":"12 Aug, 7:50PM","Location":"NY"},
{"Name":"Ahoy","Time":"20 Jul, 3:20AM","Location":"TX"}]

def myawesomefunc(item):
    # Do something awesome
    # and return the result
    pass

Now I want to invoke myawesomefunc for each item that satisfy:

datetime.now() >= datetime.strptime(item['Time'], '%d %b, %I:%M%p')

I can't sort items because it will changes continuously. Since the list may contain 30k+ items, iterating throw each item in items will be very time-consuming.

So how do I do this?


Solution

  • I suggest to use some sort of data structure that make search and insertion more efficient, for example Binary Search Tree (BST).

    Let us define some notation:

    SubTree(N) : Function that return the set of descendent nodes of N including N.

    Parent(N) : Function that return the parent of N.

    X.left, X.right : The left and right children of a node X.

    In case of BST, your search key will be the timestamp of each item. At equal intervals you will search for a node X with a key less or equal to datetime.now(), and you will execute myawesomefunc for each node in the set S:

    S = {X} ⋃ SubTree(X.left) ⋃ (SubTree(X.right) if X.right <= datetime.now() else {})
    

    Then you have to update your tree to exclude all processed node:

    Parent(X).left = None if X.right <= datetime.now() else X.right 
    

    The insertion of new item is straight forward (normal insertion as any BST).

    Now regarding execution of myawesomefunc you have two cases:

    1. myawesomefunc is IO/bound operation: use ThreadPoolExecutor.
    2. myawesomefunc is CPU/bound operation: use ProcessPoolExecutor.