I am a bit frustrated about not being able to solve this seemingly simple problem:
I have a function that takes some time to load data:
def import_data(id):
time.sleep(5)
return 'data' + str(id)
A DataModel
class calls this function and manages two datasets.
class DataModel():
def __init__(self):
self._data_1 = import_data(1)
self._data_2 = import_data(2)
def retrieve_data_1(self):
return self._data_1
def retrieve_data_2(self):
return self._data_2
Now, the main UI creates the DataModel
, calling both import_data
functions, which blocks it.
def main_ui():
# This takes 5 seconds for each dataset and blocks the main UI thread
dm = DataModel()
# Other stuff is happening. This time could be used to load data in the background
time.sleep(2)
# Retrieve the first dataset
data_1 = dm.retrieve_data_1()
# User interaction. This time could be used to load even larger datasets
time.sleep(10)
# Retrieve the second dataset
data_2 = dm.retrieve_data_2()
I want the datasets to be loaded in the background to reduce the time the UI is blocked. My idea would be to implement it like this pseudocode:
class DataModel():
def __init__(self):
self._data_1 = Thread(import_data(1)).start()
self._data_2 = Thread(import_data(2)).start()
def retrieve_data_1(self):
return self._data_1.wait_for_result()
def retrieve_data_2(self):
return self._data_2.wait_for_result()
The import_data
functions are called in separate threads and return Future
objects.
The retrieve_data
functions either block the main thread waiting for the Future
to evaluate or return its result instantly.
Is there an easy way to implement this in Python 3.x with threading
and/or asyncio
? Thanks in advance!
(Edit: syntax correction)
Use the concurrent.futures
module which is designed exactly for that kind of usage:
_pool = concurrent.futures.ThreadPoolExecutor()
class DataModel():
def __init__(self):
self._data_1 = _pool.submit(import_data, 1)
self._data_2 = _pool.submit(import_data, 2)
def retrieve_data_1(self):
return self._data_1.result()
def retrieve_data_2(self):
return self._data_2.result()
If your functions are global, and your data serializable, you can even seamlessly switch from ThreadPoolExecutor
to ProcessPoolExecutor
and benefit from true (process-based) parallelism.