Search code examples
sqlnetworkxgraph-databases

Create graph from sql query


I have this UML schema:

enter image description here

SQL:

    SELECT DISTINCT
    Name,
    ID,
    File,
    Date
FROM
    Table1
    INNER JOIN Table2 ON Table1.ID = Table2.ID
    INNER JOIN Table3 ON Table2.ID2 = Table3.ID2
    INNER JOIN Table4 ON Table3.ID3 = Table4.ID3

I would like to create a graph with edge and nodes to generate a relationship plot with networkx. How to convert this SQL query into edges and nodes? Can someone give me an example?


Solution

  • The process of modeling graphs is actually making the connection(mostly logically/non-physically connected in non-graph DB).

    Thus my advice on doing such transmission would be:

    • Find connections/edges/relationships that you care

      • Here, I think they are ID-ID2(table2) and ID2-ID3(table3)
    • Make related ends of the connection a node/vertex type

      • Here, ID(table1), ID2(table3), ID3(table4) are vertex types/label/tag depending your graph databases.
    • Put properties into edge and vertex, this becomes a property graph schema already

      • edge types, think of PURCHASED/ORDERED
        • ID-ID3(table2-relation)
          • number, think of ORDER_ID
        • ID2-ID3(table3-relation), think of DELIVERRED_BY
          • other property if needed, think of delivery_id
      • tags(vertex types)
        • ID(table1), think of Account
          • birthday
        • ID2(table3), think of Goods
          • price
          • category
        • ID3(table), think of delivery agency
    • Revisit your design, make adaptions based on how you would query the data/graph, this is about whether to make some of the properties a new vertex type/tag/label

      • i.e. the goods category could be a tag/vertex type if your system will query FIND all category: "foobar" Goods or not, I put some tradeoff examples here, please be noted we should carefully design it to avoid supernode if we could(like one category is connected to 1M goods)

    In Nebula Graph(an Open Source Distributed Graph Database, speaks cypher and excels at huge scales of data for high concurrent write/read, cloud-native).

    In Nebula Graph, for edge/relationship, there will be 4-tuple to identify a relationship/edge instance: (src, dst, edge_type, rank), where rank will enable multiple connections/edges between two vertex, think of multiple times to purchase one goods for one account.

    The schema(in nGQL DDL) would be:

    CREATE SPACE IF NOT EXISTS my_graph_space0(partition_num=100, replica_factor=3, vid_type=FIXED_STRING(32));
    USE my_graph_space0;
    # Create types of VERTEX/NODE
    CREATE TAG account(birthday date); # think of table1,vertexID will be ID
    CREATE TAG goods(price_us_dollar int); # think of table3, vertexID will be ID2
    CREATE TAG delivery_agency(name string); # think of table3, vertexID will be ID3
    
    # Create types of EDGE/RELATIONSHIP
    CREATE EDGE ORDERED(order_id int); # think of ID-ID3(table2-relation), order_id could be both property and a rank
    CREATE EDGE DELIVERRED_BY(delivery_id int); # think of ID2-ID3(table3-relation)
    

    Hopefully, it helps.

    BR, Wey