Search code examples
pythongoogle-cloud-platformpipelineapache-beamdataflow

Difference between using a list or a pcollection


Im building a pipeline in apache beam and I just got curious about this, whats the difference between applying a ptransform to a list and a pcollection, is the performance affected by this or is just that the pcollection is inmutable and is this a bad way to aproach a pipeline with apache beam?


Solution

  • By definition, a PCollection is a unbounded collection. Immutable, and unbounded.

    The main difference with a list is mainly the unbounded characteristic and it's especially powerful when you are streaming data (from a large file, or from a unbounded source, like PubSub).