In Gremlin how does map() really work?

Why does these two yield different results?

graph.traversal()
   .V().map(__.out("contains"))
.valueMap(true).next(100)

compared to

graph.traversal()
   .V().out("contains")
.valueMap(true).next(100)

Why do I prefer map to directly calling the .out() method? This way I can organize my code where I can get traversals from methods and "map" to existing traversals.

Solution

In thinking about this issue, recall that Gremlin is a bit like a processing pipeline, where objects are pulled through each step in the pipeline to apply some transformation, filter, etc. So, given your example in its most simplistic form you would say that you are getting all vertices and traversing on out() edges, which means that you are comparing the following traversals and results:

gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> g.V().out()
==>v[3]
==>v[2]
==>v[4]
==>v[5]
==>v[3]
==>v[3]
gremlin> g.V().map(out())
==>v[3]
==>v[5]
==>v[3]

Those traversals return two different results because you are asking Gremlin for two different things. In the first case, out() is not a form of map(), it is a form of flatMap() which means that for each vertex traverser coming through the pipeline it will iterate all the outgoing edges and traverse to and return the adjacent vertex (i.e. one-to-many transform). In the second case, you are asking Gremlin to do a simple map() of a vertex to another object (i.e. one-to-one transform) which in this case would be the result of out() which is the first object in that traverser stream.

To demonstrate you could simply change map() to flatMap() as follows:

gremlin> g.V().flatMap(out())
==>v[3]
==>v[2]
==>v[4]
==>v[5]
==>v[3]
==>v[3]

or alternatively fold() the results of out() to a single object to maintain the one-to-one transform logic:

gremlin> g.V().map(out().fold())
==>[v[3],v[2],v[4]]
==>[]
==>[]
==>[v[5],v[3]]
==>[]
==>[v[3]]