Search code examples
google-bigquerygoogle-query-language

Google BigQuery Repeated field


I currently have a table that looks like this:

Id // Key
General 
  platformId  
  platformName
Products [
    Repeated Product {
    Country
    URL
      Offers [
         Repeated Offer 
          Type
          Price
          Currency
      ]
    }
]

I need to convert it to a different format:

Record ID // Key
Country 
Providers [
  Repeated provider
  platformName
  Offers [
    Repeated Offer 
      Type
      Price
      Currency 
  ]
]

I originally flatten the table and get something like this:

id,platformId,platformName,products.product.country,products.product.offers.offer.price,products.product.offers.offer.type,products.product.offers.offer.currency
1,123,AWS,US,1.99,CPU,USD
1,123,AWS,US,1.99,HDD,USD
1,123,AWS,US,1.99,RAM,USD
2,123,AWS,CA,2.99,CPU,CAN
2,123,AWS,CA,2.99,HDD,CAN
2,123,AWS,CA,2.99,RAM,CAN
3,123,GOOG,US,3.99,CPU,GBP
3,123,GOOG,US,3.99,HDD,GBP
3,123,GOOG,US,3.99,RAM,GBP

I would like to group the following fields by country and by platform name:

1,123,AWS,US,1.99,CPU,USD
1,123,AWS,US,1.99,HDD,USD
1,123,AWS,US,1.99,RAM,USD
3,123,GOOG,US,1.99,CPU,GBP
3,123,GOOG,US,1.99,HDD,GBP
3,123,GOOG,US,1.99,RAM,GBP

The field structure should look like this:

123,US,AWS
        CPU,1.99,USD
        HDD,1.99,USD
        RAM,1.99,USD
       GOOG
        CPU,3.99,USD
        HDD,3.99,USD
        RAM,3.99,USD

Any pointers? Currently Im not able to group by the country:

+---------+---------------+--------+--------+----------+
| country | platformName  | type   | price  | currency |
+---------+---------------+--------+--------+----------+
| US      | AWS           | CPU    |   1.99 | USD      |
|         |               | HDD    |   1.99 | USD      |
|         |               | RAM    |   1.99 | USD      |
| CA      | AWS           | CPU    |   2.99 | CAN      |
|         |               | HDD    |   2.99 | CAN      |
|         |               | RAM    |   2.99 | CAN      |
| US      | GOOG          | CPU    |   3.99 | USD      |
|         |               | HDD    |   3.99 | USD      |
|         |               | RAM    |   3.99 | USD      |
--------------------------------------------------------

This is my query

SELECT    
  country,
  platformName,
  NEST(type) AS type,
  NEST(price) AS price,
  CASE         
        WHEN NEST(currency) = '' THEN NULL         
        ELSE NEST(currency) 
  END AS currency,
FROM 
  tbl
WHERE
  master_id = 123 
GROUP BY 
  platform_name,
  country

Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    SELECT product.country, general.platformName, ARRAY_AGG(offer) AS offers
    FROM data, UNNEST(products) AS product, UNNEST(offers) AS offer
    WHERE id = 123
    GROUP BY product.country, general.platformName
    

    Hope i got your schema correctly

    I keep getting: Values referenced in UNNEST must be arrays for offers.

    That's totally 100% correct. As I mentioned - I hoped I got your schema correctly.
    So the query above works for schema as below (which I thought represents what you presented in question) enter image description here

    You can test it with below dummy data:

    #standardSQL
    WITH data AS (
      SELECT 1 AS Id, 
        STRUCT<platformId INT64, platformName STRING>(123, 'name 1') AS general,
        ARRAY<STRUCT<country STRING, url STRING, offers ARRAY<STRUCT<type STRING, price FLOAT64, currentcy STRING>>>>
        [
          ('US', 'google.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 1', 1.99, 'USD'), ('offer 2', 2.99, 'USD'),('offer 3', 3.99, 'USD')]),
          ('CA', 'yahoo.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 4', 1.99, 'USD'), ('offer 5', 2.99, 'USD')]),
          ('EU', 'apple.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 6', 1.99, 'USD')])
        ] AS products UNION ALL
      SELECT 2 AS Id, 
        STRUCT<platformId INT64, platformName STRING>(123, 'name 2') AS general,
        ARRAY<STRUCT<country STRING, url STRING, offers ARRAY<STRUCT<type STRING, price FLOAT64, currentcy STRING>>>>
        [
          ('US', 'google.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 7', 1.99, 'USD'), ('offer 8', 2.99, 'USD'),('offer 9', 3.99, 'USD')]),
          ('MX', 'yahoo.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 10', 1.99, 'USD'), ('offer 11', 2.99, 'USD')]),
          ('CA', 'apple.com', [STRUCT<type STRING, price FLOAT64, currentcy STRING>('offer 12', 1.99, 'USD')])
        ] AS products 
    )
    SELECT product.country, general.platformName, ARRAY_AGG(offer) AS offers
    FROM data, UNNEST(products) AS product, UNNEST(offers) AS offer
    WHERE id = 1
    GROUP BY product.country, general.platformName
    

    which produces result as below enter image description here

    Of course if your real schema is different - you should dig a little and try to adjust above to your particular case. I hope you will do this :o)