I've set up a Google Workspace report/log export to BigQuery as per the official guide. In the activity log, the field ip_address is BYTES and contains Base64 encoded IP address associated with the record.
I want to search all records for a specific ip address, so I do:
SELECT * FROM `project.dataset.activity`
WHERE ip_address IS NOT NULL
AND FROM_BASE64(CAST(ip_address AS STRING))=b'1.2.3.4'
ORDER BY time_usec ASC LIMIT 10
However some rows appear to contain an invalid ip_address and I get the error: "Failed to decode invalid base64 string"
This will generate same error you faced
-- sample data
WITH `project.dataset.activity` AS (
SELECT 'a@b.com' email, CAST(TO_BASE64(b'1.2.3.4') AS BYTES) ip_address UNION ALL
SELECT 'b@c.com' email, b'MS4yLjMuNA##' ip_address --> invalid base64 string
)
SELECT FROM_BASE64(CAST(ip_address AS STRING)) FROM `project.dataset.activity`;
-- query result
☞ Failed to decode invalid base64 string
- How do I know which row caused the error?
You can use SAFE. prefix for this purpose. If base64 decoding fails, SAFE.
prefixed function will return null
as a result instead of quitting job with an error. Therefore you can identify which row has a problem from the result. In the sample, you can see that the second row has a problem.
SELECT *, SAFE.FROM_BASE64(CAST(ip_address AS STRING)) decoded FROM `project.dataset.activity`;
query result
If applied to your query,
-- sample data
WITH `project.dataset.activity` AS (
SELECT 'a@b.com' email, CAST(TO_BASE64(b'1.2.3.4') AS BYTES) ip_address UNION ALL
SELECT 'b@c.com' email, b'MS4yLjMuNA##' ip_address --> invalid base64 string
)
-- query starts here
SELECT * FROM `project.dataset.activity`
WHERE ip_address IS NOT NULL
AND SAFE.FROM_BASE64(CAST(ip_address AS STRING)) = b'1.2.3.4';
-- query result
+---------+------------------+
| email | ip_address |
+---------+------------------+
| a@b.com | TVM0eUxqTXVOQT09 |
+---------+------------------+