So I have a program in which I get a list of file paths from a database, delete those files on the filesystem and finally delete the file paths from the database. I put all operations inside a transaction to ensure that the paths would be deleted from the database iff all of the files are deleted in the filesystem.
Something like this
val result = for {
deletePath <- (fr""" select path from files""").query[String].stream //Stream[doobie.ConnectionIO,String]
_ <- Stream.eval(AsyncConnectionIO.liftIO(File(deletePath).delete()) //Stream[doobie.ConnectionIO,Unit]
_ <- Stream.eval(sql"delete from files where path = ${deletePath}".withUniqueGeneratedKeys)
}
result.compile.drain.transact(transactor)
Unfortunately, the file system is distributed which means individual operation is slow but it allows multiple operations at once.
So my question is, how do I parallelize the filesystem deletion operation here?
Yeah, you can. Just use appropriate combinators instead of the for
syntax.
val result =
(fr""" select path from files""")
.query[String]
.stream
.parEvalMapUnordered(maxConcurrent = 64) { deletePath =>
AsyncConnectionIO.liftIO(File(deletePath).delete()) >>
sql"delete from files where path = ${deletePath}".withUniqueGeneratedKeys
}
result.compile.drain.transact(transactor)
Remember to change the maxConcurrent
parameter to something that makes sense for your use case.
(I couldn't test the code so it may have some typos)