I'm using a user-friendly json based document oriented database named TinyDB. But I'm unable to add multiple pieces of data to my database because I'm making use of multiproccesing. After a while I get the error that id x already exists in the database (this because 2 or more processes are trying to add data at the same time). Is there any way to solve this?
Every run I insert new unique params.
Example params:
params = {'id' = 1, 'name': 'poop', 'age': 99}
Code:
resultsDb = TinyDB('db/resultsDb.json')
def run(params):
resultsDb.insert({'id': params['id'], 'name': params['name'], 'age': params['age']})
maxProcesses = 12 # Cores in my pc
for i in range(maxProcesses):
processes.append(Process(target=run, args=(params,)))
for p in processes:
p.start()
for p in processes:
p.join()
I could not test this on a Linux system that I have access to because it is a shared server on which certain facilities required to run the code have been forbidden access. This is a Windows version below. But the key features are:
Lock
to ensure that the insertions are serialized, which I believe is necessary for it to run without error. This, of course, defeats the purpose of parallelizing your code and one can conclude that there is really no point in using multiprocessing or multithreading.resultsDb = TinyDB('db.json')
statement to within the run
function because on platforms where spawn
is used to create new processes, such as Windows, if I had left that statement at global scope it would have been executed anyway for each newly created process. However, for Linux, where fork
is used to create new processes, it would not be executed for each new process and instead each new process would inherit the single database that was opened by the main process. This might or might not have worked -- you can try it both ways with the statement at global scope or not. If you put it back at global scope to see if it works there, you do not need the same statement towards the bottom of the source.from tinydb import TinyDB
from multiprocessing import Process, Lock
def run(lock, params):
resultsDb = TinyDB('db/resultsDb.json')
with lock:
resultsDb.insert({'id': params['id'], 'name': params['name'], 'age': params['age']})
print('Successfully inserted.')
# required by Windows:
if __name__ == '__main__':
params = {'id': 1, 'name': 'poop', 'age': 99}
maxProcesses = 12 # Cores in my pc
lock = Lock()
processes = []
for i in range(maxProcesses):
processes.append(Process(target=run, args=(lock, params)))
for p in processes:
p.start()
for p in processes:
p.join()
# remove the following if the first one is at global scope:
resultsDb = TinyDB('db/resultsDb.json')
print(resultsDb.all())
Prints:
Successfully inserted.
Successfully inserted.
Successfully inserted.
Successfully inserted.
Successfully inserted.
Successfully inserted.
Successfully inserted.
Successfully inserted.
Successfully inserted.
Successfully inserted.
Successfully inserted.
Successfully inserted.
[{'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}, {'id': 1, 'name': 'poop', 'age': 99}]