Search code examples
firebasegoogle-cloud-platformgoogle-cloud-firestoregoogle-bigqueryfirebase-extensions

Firestore -> BigQuery mirroring


I am currently working on a project where I need to mirror data from Firestore to BigQuery for further analysis and reporting purposes. However, I want to exclude certain fields from being mirrored into BigQuery.

I have explored the Stream Mirror extension for Firestore, which simplifies the mirroring process in real-time. However, it doesn't provide direct control over excluding specific fields during the mirroring process.

I am seeking guidance on whether there is a way to achieve this functionality. My goal is to customize the mirroring process to exclude specific fields from Firestore documents before they are stored in BigQuery.

I would like to know if there are any recommended approaches or techniques to accomplish this. Are there any available tools, libraries, or methods that can help me achieve selective field exclusion during the mirroring process?

I appreciate any insights or suggestions that the community can provide. Thank you in advance for your assistance!


Solution

  • As explained in the Extension's documentation the extension allows the use a of Transform Cloud Function for converting the Firestore data to be written to BigQuery.

    The transform Function should be an HTTP Cloud Function with the following logic: get the input object from the request, transform it, send it back in the response; as shown in the below CF skeleton:

    exports.bqTransform = functions.https.onRequest(async (req, res) => {
        
       const inputPayload = req.body // JS Object
       // ...
       // Transform the object 
       // ...
       const outputPayload = {...}   // JS Object
        
       res.send(outputPayload);
        });
    

    As explained in the doc, the inputPayload object (i.e. req.body) contains a data property (which is an array) which contains a representation of the Firestore document, has shown below:

    { 
      data: [{
        insertId: int;
        json: {
          timestamp: int;
          event_id: int;
          document_name: string;
          document_id: int;
          operation: ChangeType;
          data: string;  // <= String containing the stringified object representing the Firestore document data
        },
      }]
    }
    

    The transformation implemented in your Cloud Function code shall create an object with the same structure (outputPayload in our skeleton example above) where the data[0].json property is adapted according to your transformation requirements.


    Here is a very simple example in which we totally change the content of the Firestore record with just the foo field of the Firestore document plus some static data.

    exports.bqTransform = functions.https.onRequest(async (req, res) => {
    
        const inputPayload = req.body; 
        const inputData = inputPayload.data[0];
    
        const outputPayload = [{
            insertId: inputData.insertId,
            json: {
                timestamp: inputData.json.timestamp,
                event_id: inputData.json.event_id,
                document_name: inputData.json.document_name,
                document_id: inputData.json.document_id,
                operation: inputData.json.operation,
                data: JSON.stringify({ foo: inputData.json.data.foo, array: ["a1", "a2"], name: "Transformed Name" })
            },
        }]   
    
        res.send({ data: outputPayload });
    });