I have a Cloud Spanner query that is the UNION ALL
of potentially many subqueries, which are all identical except for a few parameters that vary with each subquery:
(SELECT t.FirstName, t.BirthDate FROM Singers AS t WHERE SingerId=2 AND t.LastName="Trentor")
UNION ALL
(SELECT t.FirstName, t.BirthDate FROM Singers AS t WHERE SingerId=3 AND t.LastName="Smith")
UNION ALL
...
It is sometimes taking several seconds for this query to execute. Is there something I can do to make it faster?
It might help for you to parameterize your query. This will also make your query more concise. Spanner has a query cache that stores the shapes of recent queries so that it can use the same execution plan if it sees another query of similar shape. A UNION ALL of 20 subqueries has a different shape than the UNION ALL of 21 subqueries, so this may be thwarting the cache. It's a bit tricky to pass in an array of structs (i.e. n-tuples) as query parameters, but here's an approach that works:
# bind seq1 to [2,3] in your query parameters
# bind seq2 to ["Trentor","Smith"] in your query parameters
SELECT Singers.FirstName,Singers.BirthDate FROM
(SELECT *
FROM
(SELECT id, x_1 FROM UNNEST(@seq1) AS id WITH OFFSET AS x_1)
JOIN
(SELECT name, x_2 FROM UNNEST(@seq2) AS name WITH OFFSET AS x_2)
ON x_1 = x_2) AS params
JOIN Singers
ON params.id=Singers.SingerId AND params.name=Singers.LastName
The inner query that produces params
is zipping seq1 and seq2 together, producing a table containing correlated entries from seq1 and seq2.