Search code examples
pythonpandasneo4jpy2neo

Parsing py2neo paths into Pandas


We are returning paths from a cypher query using py2neo. We would like to parse the result into a Pandas DataFrame. The cypher query is similar to the following query

query='''MATCH p=allShortestPaths(p1:Type1)-[r*..3]-(p2:Type1)
WHERE p1.ID =123456
RETURN distinct(p)''
result = graph.run(query)

The resulting object is a walkable object - which can be traversed. It should be noted that the Nodes and Relationships don't have the same properties.
What would be the most pythonic way to iterate over the object? Is it necessary to process the entire path or since the object is a dictionary is it possible to use the Pandas.from_dict method? There is an issue that sometimes the length of the paths are not equal.
Currently we are enumerating the object and if it is an un-equal object then it is a Node , otherwise we process the object as a relationship.

for index, item in enumerate(paths):
  if index%2 == 0:
    #process as Node
  else:
    #process as Relationship

We can use the isinstance method i.e.

 if isinstance(item, py2neo.types.Node ):
   #process as Node

But that still requires processing every element separately.


Solution

  • I solve the problem as follows:
    I wrote a function that receives a list of paths with the properties of the nodes and relationships

    def neo4j_graph_to_dict(paths, node_properties, rels_properties):   
        paths_dict=OrderedDict()
        for (pathID, path) in enumerate(paths):
            paths_dict[pathID]={}
            for (i, node_rel) in enumerate(path):
                n_properties = [node_rel[np] for np in node_properties]
                r_properties = [node_rel[rp] for rp in rels_properties]
                if isinstance(node_rel, Node):
                    node_fromat = [np+': {}|'for np in node_properties]
                    paths_dict[pathID]['Node'+str(i)]=('{}: '+' '.join(node_fromat)).format(list(node_rel.labels())[0], *n_properties)                
                elif isinstance(node_rel, Relationship):
                    rel_fromat = [np+': {}|'for np in rels_properties]
                    reltype= 'Rel'+str(i-1)
                    paths_dict[pathID][reltype]= ('{}: '+' '.join(rel_fromat)).format(node_rel.type(), *r_properties)
        return paths_dict 
    

    Assuming the query returns the paths, nodes and relationships we can run the following code:

    query='''MATCH paths=allShortestPaths(
        (pr1:Type1 {ID:'123456'})-[r*1..9]-(pr2:Type2 {ID:'654321'}))  
        RETURN paths, nodes(paths) as nodes, rels(paths) as rels'''  
    
    df_qf = pd.DataFrame(graph.data(query))
    node_properties = set([k for series in df_qf.nodes for node in series for k in node.keys() ]) # get unique values for Node properites
    rels_properties = set([k for series in df_qf.rels for rel in series for k in rel.keys() ]) # get unique values for Rels properites
    wg = [(walk(path))  for path in df_qf.paths ]
    paths_dict = neo4j_graph_to_dict(wg, node_properties, rels_properties)
    df = pd.DataFrame(paths_dict).transpose()
    df = pd.DataFrame(df, columns=paths_dict[0].keys()).drop_duplicates()