Search code examples
nebula-graph

NebulaGraph Database: How to write data with spark-connector in pyspark?


In the example, I saw the way of writing data in scala. Is there a way to write nebulagraph data in python?

/spark/bin/pyspark --driver-class-path nebula-spark-connector-3.0.0.jar --jars nebula-spark-connector-3.0.0.jar

df = spark.read.format(
  "com.vesoft.nebula.connector.NebulaDataSource").option(
    "type", "vertex").option(
    "spaceName", "basketballplayer").option(
    "label", "player").option(
    "returnCols", "name,age").option(
    "metaAddress", "metad0:9559").option(
    "partitionNumber", 1).load()

Solution

  • It seems that pyspark is already supported by nebula-spark-connector. The related issue has been addressed and closed on Github Issue #19.

    If you search "pyspark" on the Github project README, you'll see some examples.
    Just make sure that you set the paths to the spark-connector jar file in SparkConf before starting your spark application.

    An example taken from the README:

    df.write.format("com.vesoft.nebula.connector.NebulaDataSource").option(
        "type", "vertex").option(
        "spaceName", "basketballplayer").option(
        "label", "player").option(
        "vidPolicy", "").option(
        "vertexField", "_vertexId").option(
        "batch", 1).option(
        "metaAddress", "metad0:9559").option(
        "graphAddress", "graphd1:9669").option(
        "passwd", "nebula").option(
        "user", "root").save()