Search code examples
c++c++14range-v3std-ranges

Why can ranges not be used for the pipes library functionality?


Jonathan Boccara (author of Fluent C++) wrote a library called pipes.

This "piping", the repository's main page says, is not like the use of ranges, even though it looks the same: It's not based on lazy pulling, but rather eager pushing. But it's stated that one cannot use the ranges library to perform various 'pipe' operations. For example:

  • unzip - Take a zipped input - a range of k-tuples essentially - and produce k separate, independent outputs.
  • fork - Produce multiple (independent) copies of a container/range.

I don't quite understand why, in principle, that is the case. (Of course with the exception of ranges where you can't get the end iterator/sentinel.)


Solution

  • What's being discussed is essentially the difference between a push-based processing methodology and a pull-based one. In a push-system like this pipes library, you establish a chain of processing, and each processing step pushes its data directly into the next. In a pull-system like ranges, you establish a representation of data, which you can access and modify as needed. Processing doesn't happen by itself; it happens only when someone attempts to consume the range.

    The unzip and fork operations are both one-to-many operations: they take a single input and map it to many processing operations.

    As a push system, the pipes library can handle one-to-many operations because of the structure of its API. An operation is represented by a function call; the input is implied by the point of use (using >>= or passing it to a processor). The parameters of the function define its output (ignoring parameters meant for the processor itself). And since C++ functions can have arbitrary numbers of parameters, a one-to-many mapping operation naturally falls out. You simply supply appropriate processors for the various outputs.

    As a pull system, ranges is based on return values. C++ has no language mechanism for returning multiple values, so the best we can do is return a "value" that represents multiple values.

    However, range adapter chaining is ultimately based on the inputs being ranges. And a "'value' that represents multiple values" is itself not a range. It may contain ranges, but that doesn't make it a range.

    So now you have to take this very definitely "not a range" type and make all of your range adapters work with it. Applying a range adapter must broadcast that operation across the type, creating a many-to-many operation. Doing that is not easy.

    But more importantly... that's probably not what you want. If you fork a range, then you almost certainly want to do different processing on the replicated ranges. And that completely shuts down any chance of using the | operation to do that. You'll have to build ways to apply adapters to specific parts of these range-tuples. And those ways are increasingly going to look like a push-based processor.

    At the end of the day, a pull-style system only has one output at each level. That's just part of the core concept of such an API: each processing step generates a range. This has its advantages (lazy processing) but representing one-to-many operations is one of its weak areas.

    Ranges can certainly have an unzip function (fork is really just copying the range). But it would not be a | style adapter; it would be a function that takes a range over some decomposable type and returns a tuple of ranges. If you want to do more processing with them, then you would need to store the tuple in a value, access the individual elements, and use them as you see fit.