pandas matplotlib neo4j cypher neo4j-python-driver

Show the results of Cypher query with Pandas Dataframe

I'm working with Neo4j from Python and I compare two graphs at the node level in this way:

query = """
  MATCH (n)-[r1]-(), (n)-[r2]-()
  WHERE r1.disease = 'MyLabel1'
  AND r2.disease = 'MyLabel2'
  RETURN DISTINCT n
"""

results = driver.execute_query(query)
results[0]

The results of the query are stored in one at the first index of a list, I would like to show the results using a Pandas Dataframe;
I would like to plot the count of nodes in each graph and the nodes in common using a Venn diagramm in matplotlib

Solution

Example 1: Counting using Cypher, creating 1-row DataFrame

import pandas as pd
from matplotlib_venn import venn2
import matplotlib.pyplot as plt
from neo4j import GraphDatabase

uri = "bolt://localhost:7687"
user = "neo4j"
password = "xxxxxx"

driver = GraphDatabase.driver(uri, auth=(user, password))

query = """
  MATCH (n1)-[r1]-()
  WHERE r1.disease = 'MyLabel1'
  WITH COLLECT(DISTINCT n1) AS n1s
  MATCH (n2)-[r2]-()
  WHERE r2.disease = 'MyLabel2'
  WITH n1s, COLLECT(DISTINCT n2) AS n2s
  RETURN SIZE(n1s) AS count1, SIZE(n2s) AS count2, SIZE([n IN n1s WHERE n IN n2s]) AS countIntersection
"""

with driver.session() as session:
    results = session.run(query)
    data = tuple(record.values() for record in results)

df = pd.DataFrame(data, columns=('count1', 'count2', 'countIntersection'))

if not df.empty:
    venn2(subsets=df.iloc[0], set_labels=('MyLabel1 nodes', 'MyLabel2 nodes'))
    plt.show()
else:
    print("no results")

And here is a sample resulting Venn diagram:

enter image description here ]

Notes:

This Cypher query just returns a single row of counts, so it is probably not actually worth putting in a DataFrame. Nevertheless, the example shows how to do create a DataFrame from the results and use it to generate the plot.
The Cypher query uses undirected relationship patterns (like your original query), so the nodes at each end of the same relationship are included in count1 and count2.
Breaking up your MATCH into 2 MATCHes in the way shown above (where the results from the first MATCH are aggregated before doing the second MATCH) avoids producing a cartesian product.

Example 2: Using DataFrames with all data

This query assumes that every node has a unique id property.

import pandas as pd
from matplotlib_venn import venn2
import matplotlib.pyplot as plt
from neo4j import GraphDatabase

uri = "bolt://localhost:7687"
user = "neo4j"
password = "xxxxxx"

driver = GraphDatabase.driver(uri, auth=(user, password))

query = """
  MATCH (n1)-[r1]-()
  WHERE r1.disease = 'MyLabel1'
  WITH COLLECT(DISTINCT {labels: labels(n1), properties: properties(n1)}) AS n1s
  MATCH (n2)-[r2]-()
  WHERE r2.disease = 'MyLabel2'
  RETURN n1s, COLLECT(DISTINCT {labels: labels(n2), properties: properties(n2)}) AS n2s
"""

with driver.session() as session:
    results = session.run(query)
    data = tuple(record.values() for record in results)

# Create list of dictionaries for every node.
n1s_data = []
for node in data[0][0]:
    node_dict = node['properties']
    node_dict['labels'] = node['labels']
    n1s_data.append(node_dict)

n2s_data = []
for node in data[0][1]:
    node_dict = node['properties']
    node_dict['labels'] = node['labels']
    n2s_data.append(node_dict)

df_n1s = pd.DataFrame(n1s_data)
df_n2s = pd.DataFrame(n2s_data)

n1s_ids = set(node['id'] for node in df_n1s.to_dict('records'))
n2s_ids = set(node['id'] for node in df_n2s.to_dict('records'))

venn2(subsets=(len(n1s_ids), len(n2s_ids), len(n1s_ids & n2s_ids)), set_labels=('MyLabel1 nodes', 'MyLabel2 nodes'))
plt.show()