Search code examples
gremlin

Sum paths in weighed graph


I have a bi-directional graph, similar to this: https://gremlify.com/6zxjsbstb5f, where out edges have a weighted property. There is a closeness relationship between articles, the weight total of all paths between 2 articles

So far I've been able to get the paths between articles, but the weighting is only the weight value of the unique path. I would like the aggregate (sum) weight of all paths between the starting article from the set returned by: repeat(outE().inV().simplePath()).until(hasLabel('article'))

g.V('70679').
  repeat(outE().inV().simplePath()).
  until(hasLabel('article')).as('a').
  path().as('p').
  map(unfold().coalesce(values('weight'),constant(0)).sum()).as('weighting').
  select('weighting', 'p')

Steps to create the sample graph (taken from Gremlify)

g.addV('article').as('1').
  addV('brand').as('2').
  addV('article').as('3').
  addV('category').as('4').
  addV('zone').as('5').
  addV('article').as('6').
  addV('article').as('7').
  addE('zone').from('1').to('5').  property('weight', 0.1).
  addE('category').from('1').to('4').property('weight', 0.5).
  addE('brand').from('1').to('2').property('weight', 0.8).
  addE('article').from('2').to('6').
  addE('article').from('2').to('1').
  addE('article').from('2').to('3').  
  addE('zone').from('3').to('5').property('weight', 0.1).  
  addE('category').from('3').to('4').property('weight', 0.3).
  addE('brand').from('3').to('2').property('weight', 0.4).
  addE('article').from('4').to('1').
  addE('article').from('4').to('3').
  addE('article').from('5').to('6').
  addE('article').from('5').to('7').
  addE('article').from('5').to('1').
  addE('article').from('5').to('3').
  addE('zone').from('6').to('5').property('weight', 0.1).
  addE('brand').from('6').to('2').property('weight', 0.6).
  addE('zone').from('7').to('5').property('weight', 0.1)   

I've been able to get this query which is close to what we require, where 8630 is an article Id in the graph

g.V('8630')
    .repeat(outE().inV().simplePath())
    .until(hasLabel('article')).as('foundArticle')
    .path()
    .map(unfold().coalesce(values('weight'), constant(0)).sum()).as('pathWeight')
    .group().by(select('foundArticle').id()).as('grouping')

This produces results similar to:

[
  {
    "8634": [0.1, 0.5, 0.8]
  },
  {
    "8640": [0.1, 0.8]
  },
  {
    "8642": [0.1]
  }
]

More desirable would be a result set similar to:

[
  {
    "8634": 1.4
  },
  {
    "8640": 0.9
  },
  {
    "8642": 0.1
  }
]

Solution

  • Just to make it easier I gave each article a custom ID. The ID A1 corresponds to the example output you showed for ID 8630.

    g.addV('article').as('1').property(id,'A1').
      addV('brand').as('2').
      addV('article').as('3').property(id,'A2').
      addV('category').as('4').
      addV('zone').as('5').
      addV('article').as('6').property(id,'A3').
      addV('article').as('7').property(id,'A4').
      addE('zone').from('1').to('5').  property('weight', 0.1).
      addE('category').from('1').to('4').property('weight', 0.5).
      addE('brand').from('1').to('2').property('weight', 0.8).
      addE('article').from('2').to('6').
      addE('article').from('2').to('1').
      addE('article').from('2').to('3').  
      addE('zone').from('3').to('5').property('weight', 0.1).  
      addE('category').from('3').to('4').property('weight', 0.3).
      addE('brand').from('3').to('2').property('weight', 0.4).
      addE('article').from('4').to('1').
      addE('article').from('4').to('3').
      addE('article').from('5').to('6').
      addE('article').from('5').to('7').
      addE('article').from('5').to('1').
      addE('article').from('5').to('3').
      addE('zone').from('6').to('5').property('weight', 0.1).
      addE('brand').from('6').to('2').property('weight', 0.6).
      addE('zone').from('7').to('5').property('weight', 0.1) 
    

    The query you had produced, was actually very close to having the result you wanted. I just added a second by step to the group to sum up the values.

    g.V('A1').
      repeat(outE().inV().simplePath()).
      until(hasLabel('article')).as('foundArticle').
      path().
      map(unfold().coalesce(values('weight'), constant(0)).sum()).as('pathWeight').
      group().
        by(select('foundArticle').id()).
        by(sum()).
      unfold()
    

    Which yields

    {'A2': 1.4}
    {'A3': 0.9}
    {'A4': 0.1}
    

    I think your query can also be simplified. If I come up with something simpler I will add it to this answer,

    UPDATED

    Here's a version of the query that uses sack and avoids needing to collect the path and post process it.

    g.withSack(0).
      V('A1').
      repeat(outE().sack(sum).by(coalesce(values('weight'),constant(0))).
             inV().simplePath()).
      until(hasLabel('article')).
      group().
        by(id()).
        by(sack().sum()).
      unfold()
    

    which again yields

    {'A2': 1.4}
    {'A3': 0.9}
    {'A4': 0.1}