The Scalding reference on Github (https://github.com/twitter/scalding/wiki/Fields-based-API-Reference#map-functions) says the following:
MapTo is equivalent to mapping and then projecting to the new fields, but is more efficient. Thus, the following two lines produce the same result:
pipe.mapTo(existingFields -> additionalFields){ ... } pipe.map(existingFields -> additionalFields){ ... }.project(additionalFields)
My question is:
Since you indicate which fields are remaining, there is no need to keep the fields that are going to be discarding along the way when doing the map
operation.
Depending on the number of fields discarded and on the volume of data, the difference can be highly noticeable.