So I am trying to use multiprocessing
to iterate concurrently through files in separate folders. I have a function that calls the parallel process:
from multiprocessing.dummy import Pool
lsFolders = ['Folder1', 'Folder2']
pool = Pool( processes = 6 )
iterateThroughFiles = IterateThroughFiles() # instantiated by call to pool.map()
pool.map( iterateThroughFiles.runProcess, lsFolders )
Then I have the implementation of the IterateThroughFiles
-class:
class IterateThroughFiles( object ):
def runProcess( self, folder ):
self.sessionId = uuid.uuid4()
print( self.sessionId ) # Prints a correct sessionId
logAtLevel( "INFO", "Session ID of: "
+ str( self.sessionId )
+ " has been generated for folder: "
+ folder
)
print( self.sessionId ) # Prints only the second generated
# # session id for both threads
print( folder ) # Prints the correct folder
When I generate the sessionId
and print it directly after, the sessionId
is correct, additionally the logAtLevel()
wrapper function logs the correct value of the sessionId
.
The next print statement, though, prints only the second session id and apparently the first sessionId
is forgotten in the thread.
Does anyone know why this is happening? I thought when running in parallel each thread was distinct in terms of the objects it created and its memory? Is this incorrect? Does this have something to with the uuid generator?
The issue is that you are only generating one instance of IterateThroughFiles
which is being used in both threads.
Instead, you want something like the following
def factory(folder):
return IterateThroughFiles().runProcess(folder)
and pass that factory function into map. That way you will get two instances.