Search code examples
neo4jgeopandas

How to graph relationships between points with Geopandas and Neo4j data?


I have a simple Neo4J database of Roads, with cities(nodes) that has latitude, longitude, id and roads with relationships GO.

For example: MATCH (city) RETURN (city) LIMIT 5

    {"latitude":"41.974556","id":"0","longitude":"-121.904167"} 
    {"latitude":"41.974766","id":"1","longitude":"-121.902153"} 
    {"latitude":"41.988075","id":"2","longitude":"-121.896790"} 
    {"latitude":"41.998032","id":"3","longitude":"-121.889603"} 
    {"latitude":"42.008739","id":"4","longitude":"-121.886681"} 

and MATCH (n1)-\[r\]-(n2) RETURN "GO", n1.id, n2.id LIMIT 4

"GO" "n1.id" "n2.id"
"GO" "0" "1"
"GO" "0" "3"
"GO" "1" "0"
"GO" "1" "2"

With the follow code I can create a graph with the nodes over the map:

    from py2neo import Graph
    import pandas as pd
    import geopandas
    import matplotlib.pyplot as plt    

    port = "7687"
    user = "****"
    pswd = "*****"    

    try:
        graph = Graph('bolt://localhost:'+port, auth=(user, pswd))
        print('SUCCESS: Connected to the Neo4j Database.')
    except Exception as e:
        print('ERROR: Could not connect to the Neo4j Database. See console for details.')
        raise SystemExit(e)    

    df = pd.DataFrame(graph.run("MATCH (n:Road) RETURN n.id, n.latitude, n.longitude").to_table(),columns=['ID','Latitude','Longitude']) 
    df.head()
    gdf = geopandas.GeoDataFrame(df, geometry=geopandas.points_from_xy(df.Longitude, df.Latitude))    

    world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
    ax = world[world.continent == 'North America'].plot(color='white', edgecolor='black')    

    gdf.plot(ax=ax, color='red')
    plt.show()

I don't know how can graph the relationships between the nodes. Any suggestions? Thanks.


Solution

  • In your example of plotting using GeoPandas.GeoDataFrame.plot() the geometry of the GeoDataFrame is point data (lat/lon coordinates from the Road nodes stored in Neo4j). To plot the relationships using this method you'd need the geometry for lines connecting the road nodes (intersections) as well.

    I just went through a similar exercise with airport and flight data, perhaps you can adapt to your data.

    First, in my Cypher query to fetch data that will later be loaded into a GeoDataFrame I find every flight using the pattern (:Airport)-[:FLIGHT_TO]->(:Airport) as I want each row in my GeoDataFrame to be a flight route between two airports. I also calculate weighted degree centrality for each airport so we can style node size relative to centrality in the plot. We also generate WKT for the airport location (POINT) as well as the flight (LINESTRING).

    AIRPORT_QUERY = """
      MATCH (origin:Airport)-[f:FLIGHT_TO]->(dest:Airport)
      CALL {
       WITH origin
       MATCH (origin)-[f:FLIGHT_TO]-()
       RETURN sum(f.num_flights) AS origin_centrality
      }
      CALL {
       WITH dest
       MATCH (dest)-[f:FLIGHT_TO]-()
       RETURN sum(f.num_flights) AS dest_centrality
      }
      RETURN {
        origin_wkt: "POINT (" + origin.location.longitude + " " + origin.location.latitude + ")",
        origin_iata: origin.iata, 
        origin_city: origin.city, 
        origin_centrality: origin_centrality, 
        dest_centrality: dest_centrality,
        dest_wkt: "POINT (" + dest.location.longitude + " " + dest.location.latitude + ")",
        dest_iata: dest.iata, 
        dest_city: dest.city, 
        length: f.length,
        num_flights: f.num_flights,
        geometry: "LINESTRING (" + origin.location.longitude + " " + origin.location.latitude + "," + dest.location.longitude + " " + dest.location.latitude + ")"
        
        } 
      AS airport
    """
    

    The Neo4j Python Driver has a to_df() method which we can use to convert the result set from our Cypher query into a Pandas DataFrame. Then when we create the Geopandas GeoDataFrame we can parse the WKT returned by the Cypher statement into Shapely geometries.

    def get_airport(tx):
        results = tx.run(AIRPORT_QUERY)
        df = results.to_df(expand=True)
        df.columns=['origin_city','origin_wkt', 'dest_city', 'dest_wkt', 'origin_centrality', 'length', 'origin_iata', 'geometry','num_flights', 'dest_centrality', 'dest_iata']
        df['geometry'] = geopandas.GeoSeries.from_wkt(df['geometry'])
        df['origin_wkt'] = geopandas.GeoSeries.from_wkt(df['origin_wkt'])
        df['dest_wkt'] = geopandas.GeoSeries.from_wkt(df['dest_wkt'])
        gdf = geopandas.GeoDataFrame(df, geometry='geometry')
        return gdf
    

    A GeoDataFrame of US flights

    And now we're ready to plot the flights. We can dynamically set the marker size for the airports using the weighted degree centrality column for the airport.

    world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
    
    base = world[world.name == 'United States of America'].plot(color='white', edgecolor='black')
    
    flights_gdf = flights_gdf.set_geometry("origin_wkt")
    flights_gdf.plot(ax=base, markersize='origin_centrality')
    
    flights_gdf = flights_gdf.set_geometry("geometry")
    flights_gdf.plot(ax=base, markersize=0.1, linewidth=0.01)
    
    plt.show()
    

    A plot of US airports and flights connecting them

    Hope that helps.