Search code examples
google-cloud-platformpublish-subscribegoogle-cloud-pubsubgoogle-cloud-data-fusioncdap

CDAP ingestion from PubSub


I'm trying to load data from PubSub messages to GCS files. Simple pipeline: PubSub source -> JSON Parser -> GCS sink.

Since PubSub only accept the data argument as utf-8, how can I decode it in CDAP? Should I build a custom plugin implementing a decode function or is it better to pass my data as string using attributes in the PuSub message instead of 'data'?


Solution

  • I solved the issue using a Projector plugin instead of the JSON Parser between PubSub source and GCS sink. The Projector casts the byte message attribute of the PubSub source to a string (plain text).