Search code examples
sqlgoogle-bigquerygoogle-cloud-spanner

SELECT AS STRUCT/VALUES


I am wondering what the possible use cases are for the SELECT AS STRUCT|VALUES modifier for GoogleSQL beyond the sort of textbook examples in the documentation.

The AS STRUCT reduces a non-scalar row into a scalar `STRUCT, so something like:

SELECT "David" AS name, 20 AS age

Could be converted into a subquery-able (scalar) item by doing something like:

SELECT  (
    SELECT AS STRUCT "David" AS name, 20 AS age
)

Or adding a bunch of debug info into a single (scalar) column by doing something like:

SELECT "David" AS name, 20 AS age, 
    (SELECT AS STRUCT 1 AS a, 2 AS b) debug
FROM (SELECT NULL)

But beyond that I can't see too much use, and I've never used it outside of trivial debugging queries myself. I was wondering if hopefully someone on the BigQuery team can explain:

  • What are some actual examples when you use or find these two modifier keywords being used?
  • Do any other SQL dialects use that construction (or a similar construction) and if not, why did GoogleSQL need to support it?

Solution

  • The STRUCT type is a container of ordered fields where each field has a type and name (optional). It can be used to combine multiple fields under one name.

    But beyond that I can't see too much use, and I've never used it outside of trivial debugging queries myself.

    Struct can be used to structure your data. For example:

    CREATE TEMPORARY TABLE rectangles (
      name string,
      coordinates STRUCT<
        topLeft STRUCT<x Int64, y int64>,
        bottomRight STRUCT<x Int64, y int64>
      >
    );
    
    INSERT INTO rectangles (name, coordinates) VALUES
    ('rectangle 1', ((10, 10), (90, 90)));
    

    Here is the resulting table schema and data.

    What are some actual examples when you use or find these two modifier keywords being used?

    SELECT AS STRUCT (and other struct constructors) create a struct from multiple values. This could be used to bypass some restrictions, for example, when the data must consist of 1 column x n rows:

    SELECT ARRAY(
      SELECT AS STRUCT gender, name
      FROM bigquery-public-data.usa_names.usa_1910_2013
      WHERE year = 2001 AND STate = 'WA'
      ORDER BY number
      LIMIT 5
    )
    

    Without struct, you will get the following error:

    ARRAY subquery cannot have more than one column unless
    using SELECT AS STRUCT to build STRUCT values
    

    Another example where the IN clause subquery needs to return more than one column for tuple comparison:

    SELECT *
    FROM bigquery-public-data.usa_names.usa_1910_2013
    WHERE (state, year, number) IN (
      SELECT AS STRUCT state, year, MAX(number)
      FROM bigquery-public-data.usa_names.usa_1910_2013
      WHERE year BETWEEN 2001 AND 2003
      GROUP BY state, year
    )
    

    Without struct, you will get the following error:

    Subquery of type IN must have only one output column
    

    Note that an IN clause subquery CAN return more than one column according to SQL standard (more on this below).

    SELECT AS VALUE operates on a one column select and that column must be a struct.

    While the input for AS STRUCT and AS VALUE is different, the result for both is a value table — a one column table where that column is a struct type and has no name. You may substitute SELECT AS STRUCT in the previous section with SELECT AS VALUE and make necessary changes to the columns.

    Do any other SQL dialects use that construction (or a similar construction) and if not, why did GoogleSQL need to support it?

    The answer depends on what you're trying to do.

    • You can think of struct as a user defined type (see the coordinates type in rectangles example above). Only PostgreSQL CREATE TYPE with Composite Type comes close.
    • You may use JSON functions to convert arbitrary data to a JSON object. All major RDBMS vendors support JSON functions.
    • You may use XML functions to convert arbitrary data to an XML document. SQL Server supports XML.
    • You may use Array functions to combine multiple values into one. But most RDBMS require same data type for all items in the array.
    • If it is just tuple comparison, the SQL standard allows you to compare tuples like SELECT (1, 1, 0) < (1, 1, 1) and WHERE (t1.a, t1.b, t1.c) IN (SELECT t2.a, t2.b, t2.c FROM t2 ...). Not all RDMBS implement this.