Search code examples
google-bigquerygoogle-workspace

Google Workspace bigquery ip_address is not always base64 correct


I've set up a Google Workspace report/log export to BigQuery as per the official guide. In the activity log, the field ip_address is BYTES and contains Base64 encoded IP address associated with the record.

I want to search all records for a specific ip address, so I do:

SELECT * FROM `project.dataset.activity` 
WHERE  ip_address IS NOT NULL 
AND  FROM_BASE64(CAST(ip_address AS STRING))=b'1.2.3.4' 
ORDER BY time_usec ASC LIMIT 10

However some rows appear to contain an invalid ip_address and I get the error: "Failed to decode invalid base64 string"

  1. Is there a saner way to decode the IP addresses in this specific case?
  2. How do I know which row caused the error?
  3. How do I work around this error?

Solution

  • This will generate same error you faced

    -- sample data
    WITH `project.dataset.activity` AS (
      SELECT '[email protected]' email, CAST(TO_BASE64(b'1.2.3.4') AS BYTES) ip_address UNION ALL
      SELECT '[email protected]' email, b'MS4yLjMuNA##' ip_address  --> invalid base64 string
    )
    SELECT FROM_BASE64(CAST(ip_address AS STRING)) FROM `project.dataset.activity`;
    
    -- query result
    ☞ Failed to decode invalid base64 string
    
    1. How do I know which row caused the error?

    You can use SAFE. prefix for this purpose. If base64 decoding fails, SAFE. prefixed function will return null as a result instead of quitting job with an error. Therefore you can identify which row has a problem from the result. In the sample, you can see that the second row has a problem.

    SELECT *, SAFE.FROM_BASE64(CAST(ip_address AS STRING)) decoded FROM `project.dataset.activity`;
    

    query result

    enter image description here


    If applied to your query,

    -- sample data
    WITH `project.dataset.activity` AS (
      SELECT '[email protected]' email, CAST(TO_BASE64(b'1.2.3.4') AS BYTES) ip_address UNION ALL
      SELECT '[email protected]' email, b'MS4yLjMuNA##' ip_address  --> invalid base64 string
    )
    -- query starts here
    SELECT * FROM `project.dataset.activity` 
     WHERE ip_address IS NOT NULL 
       AND SAFE.FROM_BASE64(CAST(ip_address AS STRING)) = b'1.2.3.4';
    
    -- query result
    +---------+------------------+
    |  email  |    ip_address    |
    +---------+------------------+
    | [email protected] | TVM0eUxqTXVOQT09 |
    +---------+------------------+