Search code examples
apache-sparkpysparkneo4j

Connect APIs, Parse the result using pyspark and store it in neo4j


My Requirement is straight forward. I have a API call which retrieves huge amount of data. I want to use PySpark to convert those as Dataframes and write into Neo4J.

Convert the API result into Dataframes and store it in Neo4j, Could you please let me know if that's possible?


Solution

  • It is certainly possible. Are you aware of the Neo4j Spark connector?

    df.write \
      .format("org.neo4j.spark.DataSource") \
      .mode("ErrorIfExists") \
      .option("url", "bolt://localhost:7687") \
      .option("labels", ":Person") \
      .save()
    

    Above is an example on the command on how you can save from Spark to Neo4j. In this case it saves ":Person" nodes.

    Of course you will need to create your dataframe in such a format that it makes sense as graph data.

    Probably something like Source,Destination,Weight or something like that. Or as nodes like the code above.

    Without knowing the schema of your data I cant help any more I am afraid.