Why is the coalesce step not behaving correctly?

I have a traversal that performs a coalesce. Here is some simplified code:

g.V()
.coalesce(
  __.has("color", "green"),
  __.has("color", "blue"),
  __.has("color", "yellow")
)

My data contains vertices with green, yellow, and blue color. My expectation based on the docs for coalesce is that this traversal should return all the vertices with a green color (the first traversal, in order, that yields a result). However it is returning yellow instead (the last traversal).

If I remove the yellow step:

g.V()
.coalesce(
  __.has("color", "green"),
  __.has("color", "blue")
)

It will start returning only blue vertices.

If I remove the blue step:

g.V()
.coalesce(
  __.has("color", "green")
)

It will finally return green vertices. This seems inconsistent with the Tinkerpop documentation for coalesce. I understand that coalesce isn't converted to a native Neptune operation, but would still expect it to function as described.

I am running Neptune 1.1.0.0 and gremlin 3.6.0. I am also able to reproduce this behavior within a Sagemaker/Jupyter notebook.

Solution

My expectation based on the docs for coalesce is that this traversal should return all the vertices with a green color (the first traversal, in order, that yields a result).

That's not quite the right way to think about it. It won't return only "green" vertices. It will in turn, for each vertex, return any that match any of those three colors. The coalesce() is applied to each traverser (i.e. vertex in this case) that passes through it independently. It's not as though, a traverser triggers the "green" path and then all other traversers must also follow that path.

As you can see in the example below, the color "red" is filtered out while all the other traversers pass through:

gremlin> g.addV().property('color','yellow').
......1>   addV().property('color','green').
......2>   addV().property('color','red').
......3>   addV().property('color','yellow').
......4>   addV().property('color','blue').
......5>   addV().property('color','red').
......6>   addV().property('color','blue').
......7>   addV().property('color','green').
......8>   addV().property('color','red').
......9>   addV().property('color','yellow').iterate()
gremlin> g.V().coalesce(
......1>   __.has("color", "green"),
......2>   __.has("color", "blue"),
......3>   __.has("color", "yellow")).values('color')  
==>yellow
==>green
==>yellow
==>yellow
==>blue
==>blue
==>green

In this context, the use of coalesce() is probably not the right one to use. You'd be better off writing:

gremlin> g.V().has('color',within('green','blue','yellow')).values('color')
==>yellow
==>green
==>yellow
==>yellow
==>blue
==>blue
==>green

as it is basically just behaving as an or() style query.

Maybe an easy way to do what you're asking is to use choose():

gremlin> g.V().fold().as('x').limit(local,1).
......1>   choose(values('color')).
......2>     option("green", select('x').unfold().has('color','green')).
......3>     option("blue", select('x').unfold().has('color','blue')).
......4>     option("yellow", select('x').unfold().has('color','yellow')).
......5>   values('color')
==>yellow
==>yellow
==>yellow
gremlin> g.V().order().by('color').
......1>   fold().as('x').limit(local,1).
......2>   choose(values('color')).
......3>     option("green", select('x').unfold().has('color','green')).
......4>     option("blue", select('x').unfold().has('color','blue')).
......5>     option("yellow", select('x').unfold().has('color','yellow')).
......6>   values('color')
==>blue
==>blue