Search code examples
apache-sparkneo4jcypherapache-spark-sqlgraphframes

Edge attribute filter on GraphFrames motif search not working


I've got some sample data on a family graph I want to query on.

I'd like to use the find method on the GraphFrames object in order to query the motif A->B where the edge is of type "Mother".

Since GraphFrames uses a subset of the cypher language of Neo4J I was wondering if the following would be the correct query?

graph.find("(A)-[edge:Mother]->(B)").show

Or what would be the best way to implement this in GraphFrames?

GraphFrame(vertex, graph.edges.filter("attr=='Mother'")).vertices.show

This doesn't work since I cannot filter on the direction, so I only want to get the mothers :)

Any idea?


Solution

  • Suppose this is your test data:

    import org.graphframes.GraphFrame
    
    val edgesDf = spark.sqlContext.createDataFrame(Seq(
      ("a", "b", "Mother"),
      ("b", "c", "Father"),  
      ("d", "c", "Father"),
      ("e", "b", "Mother")    
    )).toDF("src", "dst", "relationship")
    
    val graph = GraphFrame.fromEdges(edgesDf)
    graph.edges.show()
    
    +---+---+------------+
    |src|dst|relationship|
    +---+---+------------+
    |  a|  b|      Mother|
    |  b|  c|      Father|
    |  d|  c|      Father|
    |  e|  b|      Mother|
    +---+---+------------+
    

    You can use a motif query and apply a filter to it:

    graph.find("()-[e]->()").filter("e.relationship = 'Mother'").show()
    
    +------------+
    |           e|
    +------------+
    |[a,b,Mother]|
    |[e,b,Mother]|
    +------------+
    

    Or, since your case is relatively simple, you can apply a filter to the edges of the graph:

    graph.edges.filter("relationship = 'Mother'").show()
    
    +---+---+------------+
    |src|dst|relationship|
    +---+---+------------+
    |  a|  b|      Mother|
    |  e|  b|      Mother|
    +---+---+------------+
    

    Here's some alternative syntax (each gets the same result as immediately above):

    graph.edges.filter($"relationship" === "Mother").show()
    graph.edges.filter('relationship === "Mother").show()
    

    You mention filtering on direction, but the direction of each relationship is encoded in the graph itself (i.e. from source to destination).