Search code examples
neo4jcyphergraph-databases

Bipartite to monopartite projection with multiple relationships


I'm new to Neo4j. I'm trying to create a monopartite projection from a bipartite graph. I've only got two types of nodes:

  • Post nodes (green): These are all pieces of content, such as tweet, reddit post, news article, etc.
  • Entity nodes (brown): These are the entities associated with the content

enter image description here

My challenge is that I have a handful of different relationships. Some examples:

  • (e1:Entity)-[r:TWEETED]->(p:Post)->[r:AT_MENTIONED]->(e2:Entity)
  • (e1:Entity)-[r:TWEETED]->(p1:Post)-->[r:QUOTE_TWEETED]->(p2:Post)<-[r:TWEETED]<-(e2:Entity)
  • (e1:Entity) -[r:PUBLISHED]->(p:Post)-[r:MENTIONS]->(e2:entity)

What I'm trying to do is

  1. Change this to a monopartite graph projection that has only the entities but infers a RELATED_TO edge based on all types of relations, not just a single type of relationship and
  2. Assigns an edge weight based on the number of times two entities co-occur.

In other words, using the examples above:

Example 1

  • Before: (e1:Entity)-[r:TWEETED]->(p:Post)->[r:AT_MENTIONED]->(e2:Entity)
  • After: (e1:Entity) -[r:RELATED_TO]-(e2:Entity)

Example 2

  • Before: (e1:Entity)-[r:TWEETED]->(p1:Post)-->[r:QUOTE_TWEETED]->(p2:Post)<-[r:TWEETED]<-(e2:Entity)
  • After: (e1:Entity) -[r:RELATED_TO]-(e2:Entity)

Example 3

  • Before: (e1:Entity)-[r:PUBLISHED]->(p:Post)-[r:MENTIONS]->(e2:entity)
  • After: (e1:Entity) -[r:RELATED_TO]-(e2:entity)

I can find examples online that convert only one type of relationship to a monopartite but can't seem to get anything to work for multiple relationship or relationships that have an intervening node of a different type (i.e. two post nodes between an entity node). I've done the graph data science training and couldn't find exactly what I was looking for there either.

Any advice?


Solution

  • Does this query work for you?

    MATCH (e1:Entity)-[*2..3]-(e2:Entity)
    WHERE id(e1) < id(e2)
    WITH e1, e2, count(*) as strength
    MERGE (e1)-[r:RELATED_TO]->(e2) 
    SET r.strength = strength
    

    Since we don't specify a type for the relationship between e1 and e2, any relationship in the graph will match. There can be between two and three relationships in the pattern, which would translate to one or two Post nodes between the Entity nodes.

    I assume that the direction of the relationships doesn't matter, so I left off the direction on the relationship arrows. I required the node id for e1 to be less than the node id for e2 to avoid creating the RELATED_TO relationship in both directions.

    If you need to look for paths longer than 3 relationships in the schema you described, you could consider using the apoc path expander to search for Entity-to-Entity paths with only Post nodes between.