I back up files with the backupFile
function, I add the backed up files to the hashList
by hashing them, and when backing up other files, I check whether they have been backed up before by looking at the hashList
. I can backup multiple files at the same time using thread
and queue
, but I get a race condition
error because more than one thread is processing on the same hashList
. I used a lock to solve this, but using lock = threading.Lock()
prevents parallelism. While a thread is running, other threads are waiting. which makes my purpose of using threads meaningless. Because my purpose of using threads was to save time.
I want to both use the thread and check if the file has been backed up before.
I may be asking a lot but I need your ideas, thanks
my code;
import threading, hashlib, queue, os
def hashFile(fileName):
with open(fileName, "rb") as f:
sha256 = hashlib.sha256()
while chunk := f.read(4096):
sha256.update(chunk)
return sha256.hexdigest()
def backupFile(q):
while not q.empty():
fileName = q.get()
with lock:
if hashFile(filesToBackupPath+fileName) in hashList:
print(f"\033[33m{fileName} daha once yedeklenmis\033[0m")
else:
print(f"\033[32m{fileName} yedeklendi\033[0m")
hashList.append(hashFile(filesToBackupPath+fileName))
q.task_done()
filesToBackupPath = "yedeklenecekDosyalar/"
fileList = os.listdir(filesToBackupPath)
hashList = []
q = queue.Queue()
for file in fileList:
q.put(file)
lock = threading.Lock()
for i in range(20):
t = threading.Thread(target=backupFile, args=(q,))
t.start()
q.join()
print('\n',len(hashList))
There is no reason for you to be locking the call to hashfile
.
hash = hashFile(filesToBackupPath+fileName)
with lock:
if hash in hashList:
alreadyBackedUp = True
else:
alreadyBackedUp = False
hashList.append(hash)
Everything else outside the lock.
The only place you need to lock in when accessing hashList
.
Why are you using a list rather than set?