Search code examples
pythonneo4jconcurrencyparallel-processingpython-asyncio

Loading data to Neo4j with Python concurrently - asyncio function is blocked


I have code that reads from a json file and loads it into the neo4j database. Because the data is big, I want to do this with concurrent processing. I decided to use asyncio library but I'm having a problem. In the code below, session.run function blocks the running and I don't have parallelism. Is there a way to have parallelism with neo4j sessions?

def json_loading_function():
    ...
    asyncio.run(self.run_all_cqls(process_params), debug=True)

async def run_all_cqls(self, params):
    results = await asyncio.gather(*(self.run_cql(p['driver'], p['session_index'], p['cql'], p['rows_dict']) for p in params))
    return results

async def run_cql(self, session, sessionIndex,cql,dict):
    with self._driver.session(**self.db_config) as session:
        print('Running session %d' % sessionIndex)
        session.run(cql, dict=dict).consume()

Solution

  • I ended up using neo4j python driver version 5 alpha with async support. Here is my fork to pyingest repo : https://github.com/cuneyttyler/pyingest