I have a list of dictionaries. Each item contains a datetime
field in string format:
items = [{"Name":"Fooo","Time":"2 Jun, 7:20PM","Location":"LA"},
{"Name":"Yeam","Time":"27 Jun, 9:20PM","Location":"CA"},
{"Name":"Bar","Time":"12 Aug, 7:50PM","Location":"NY"},
{"Name":"Ahoy","Time":"20 Jul, 3:20AM","Location":"TX"}]
def myawesomefunc(item):
# Do something awesome
# and return the result
pass
Now I want to invoke myawesomefunc
for each item
that satisfy:
datetime.now() >= datetime.strptime(item['Time'], '%d %b, %I:%M%p')
I can't sort items
because it will changes continuously. Since the list may contain 30k+ items, iterating throw each item in items
will be very time-consuming.
So how do I do this?
I suggest to use some sort of data structure that make search and insertion more efficient, for example Binary Search Tree (BST).
Let us define some notation:
SubTree(N)
: Function that return the set of descendent nodes of N
including N
.
Parent(N)
: Function that return the parent of N
.
X.left, X.right
: The left and right children of a node X
.
In case of BST, your search key will be the timestamp
of each item. At equal intervals you will search for a node X
with a key less or equal to datetime.now()
, and you will execute myawesomefunc
for each node in the set S
:
S = {X} ⋃ SubTree(X.left) ⋃ (SubTree(X.right) if X.right <= datetime.now() else {})
Then you have to update your tree to exclude all processed node:
Parent(X).left = None if X.right <= datetime.now() else X.right
The insertion of new item is straight forward (normal insertion as any BST).
Now regarding execution of myawesomefunc
you have two cases:
myawesomefunc
is IO/bound operation: use ThreadPoolExecutor
.myawesomefunc
is CPU/bound operation: use ProcessPoolExecutor
.