Search code examples
google-cloud-platformgoogle-bigquerygoogle-cloud-dataflowapache-beamgoogle-cloud-pubsub

How to append different PubSub objects and flatten them to write them altogether into bigquery as a single JSON?


I wanted to write three attributes (data, attributes and publish time) of a Pub/Sub message to Bigquery and wanted them to print in a flattened way so that all elements writes in a single row, for example:

data[0] data[1] attr[0] attr[0] key publishTime
data data attr attr key publishTime

I'm currently using the following piece of code for decoding and parsing the message but this is applicable only for the data part of the Pub/Sub message:

class decodeMessage:
    def decode_base64(self,element):
        """Decode base64, padding being optional."""
        return json.dumps(element.data.decode("utf-8"))

class parseMessage:
    def parseJsonMessage(self,element):

        return(json.loads(element))
       

I've also tried merging two json after dumping them from Json objects to Json string but it didn't go as planned, my ultimate goal is to bring all columns into a single JSON with the schema retained.

I hope my question remains clear to you! Thanks!


Solution

  • The solution to the following problem is to simply make a Python dictionary and append all the data into a new Dictionary.

    example:

        payload = dict()
        data = json.dumps(element.data.decode('utf-8'))
        attributes = json.dumps(element.attributes)
        messageKey = element.message_id
        publish_time = (element.publish_time).timestamp()*1000
        
        payload['et'] = publish_time
        payload['data'] = data
        payload['attributes'] = attributes
        payload['key'] = messageKey
        
        return (payload)