Search code examples
gremlintinkerpopgremlinpython

Merge vertices, including properties and incoming/outgoing edges


How can I merge vertices, including their incoming/outgoing edges and their properties? I used this answer and it does almost exactly what I hoped for, but in case both vertices have the same property, only one of them gets to the resulting united vertex. I would like to get this property as a list, like what happens when adding a property to a vertex that already has such property.

example scenario:

g.V().has("author","book_id", 987).fold().
              coalesce(unfold(),
                       addV("author").property("book_id", 987))

g.V().hasLabel("author").has("book_id",987).property("author_id",123)


g.V().has("author","book_id", 654).fold().
              coalesce(unfold(),
                       addV("author").property("book_id", 654))

g.V().hasLabel("author").has("book_id",654).property("author_id",123)

After those 4 queries, we will get 2 vertices with author_id=123. Using this query (as suggested in the answer) to merge them:

g.V().has("author", "author_id", 123).fold().filter(count(local). is (gt(1))).unfold().
sideEffect(properties().group("p").by(key).by(value())).
sideEffect(outE().group("o").by(label).by(project("p", "iv").by(valueMap()).by(inV()).fold())).
sideEffect(inE().group("i").by(label).by(project("p", "ov").by(valueMap()).by(outV()).fold())).
sideEffect(drop()).
cap("p", "o", "i").as("poi").
addV("author").as("a").
sideEffect(
    select("poi").select("p").unfold().as("kv").
    select("a").property(select("kv").select(keys), select("kv").select(values))).
sideEffect(
    select("poi").select("o").unfold().as("x").select(values).
    unfold().addE(select("x").select(keys)).
from
(select("a")).to(select("iv"))).
sideEffect(
    select("poi").select("i").unfold().as("x").select(values).
    unfold().addE(select("x").select(keys)).
from
(select("ov")).to(select("a"))).iterate()   

Will result in a single vertex as expected, but the valueMap shows it only has one of the book_ids as a property:

{'book_id': [654], 'author_id': [123]}

How can I keep both? as in:

{'book_id': [654,987], 'author_id': [123]}

Solution

  • I found a solution, it is a bit hacky but it works (will be happy to get a better answer):

    In the second line of the query, I replace the "by" order so instead of getting a single value for each key I get a key for each value.

    sideEffect(properties().group("p").by(value).by(key()).unfold()).
    

    Then when assigning this properties to the new vertex I use the keys as values and vice versa

    sideEffect(
        select("poi").select("p").unfold().as("kv").
        select("a").property(select("kv").select(values), select("kv").select(keys))).
    

    The complete query:

    g.V().has("author", "author_id", 123).fold().filter(count(local). is (gt(1))).unfold().
    sideEffect(properties().group("p").by(value).by(key()).unfold()).
    sideEffect(outE().group("o").by(label).by(project("p", "iv").by(valueMap()).by(inV()).fold())).
    sideEffect(inE().group("i").by(label).by(project("p", "ov").by(valueMap()).by(outV()).fold())).
    sideEffect(drop()).
    cap("p", "o", "i").as("poi").
    addV("author").as("a").
    sideEffect(
        select("poi").select("p").unfold().as("kv").
        select("a").property(select("kv").select(values), select("kv").select(keys))).
    sideEffect(
        select("poi").select("o").unfold().as("x").select(values).
        unfold().addE(select("x").select(keys)).
    from
    (select("a")).to(select("iv"))).
    sideEffect(
        select("poi").select("i").unfold().as("x").select(values).
        unfold().addE(select("x").select(keys)).
    from
    (select("ov")).to(select("a"))).iterate()   
    
    

    EDIT: adding here a full query with another part that deduplicates edges

    g.V().has("author","author_id", 123).
      fold().filter(count(local).is(gt(1))).unfold().
      sideEffect(properties().group("p").by(key).by(value())).
      sideEffect(outE().group("o").by(label).by(project("p","iv").by(valueMap()).by(inV()).fold())).
      sideEffect(inE().group("i").by(label).by(project("p","ov").by(valueMap()).by(outV()).fold())).
      sideEffect(drop()).
      cap("p","o","i").as("poi").
      addV("author").as("a").
      sideEffect(
        select("poi").select("p").unfold().as("kv").
        select("a").property(select("kv").select(keys), select("kv").select(values))).
      sideEffect(
        select("poi").select("o").unfold().as("x").select(values).
        unfold().addE(select("x").select(keys)).from(select("a")).to(select("iv"))).
      sideEffect(
        select("poi").select("i").unfold().as("x").select(values).
        unfold().addE(select("x").select(keys))
          .from(select("ov")).to(select("a")))
    .sideEffect(select("a").outE().as("e").outV().id().as("final_ov").
                      select("e").inV().id().as("final_iv").
                      select("e", "final_ov", "final_iv").
                      group().
                        by(select("final_ov", "final_iv")).
                        by(select("e")).
                      select(values)
    .store("unique_e")).
    select("a")
    .outE()
    .where(without("unique_e"))
    .drop()