Search code examples
sqlpostgresqlpivotdynamic-sqlcrosstab

How to rotate a two-column table?


This might be a novice question – I'm still learning. I'm on PostgreSQL 9.6 with the following query:

SELECT locales, count(locales) FROM (
    SELECT lower((regexp_matches(locale, '([a-z]{2,3}(-[a-z]{2,3})?)', 'i'))[1]) 
    AS locales FROM users) 
AS _ GROUP BY locales

My query returns the following dynamic rows:

locales count
en 10
fr 7
de 3
n additional locales (~300)... n-count

I'm trying to rotate it so that locale values end up as columns with a single row, like this:

en fr de n additional locales (~300)...
10 7 3 n-count

I'm having to do this to play nice with a time-series db/app

I've tried using crosstab(), but all the examples show better defined tables with 3 or more columns.

I've looked at examples using join, but I can't figure out how to do it dynamically.


Solution

  • Base query

    In Postgres 10 or later you could use the simpler and faster regexp_match() instead of regexp_matches(). (Since you only take the first match per row anyway.) But don't bother and use the even simpler substring() instead:

    SELECT lower(substring(locale, '(?i)[a-z]{2,3}(?:-[a-z]{2,3})?')) AS locale
         , count(*)::int AS ct
    FROM   users
    WHERE  locale ~* '[a-z]{2,3}'  -- eliminate NULL, allow index support
    GROUP  BY 1
    ORDER  BY 2 DESC, 1
    

    Simpler and faster than your original base query.

    About those ordinal numbers in GROUP BY and ORDER BY:

    Subtle difference: regexp_matches() returns no row for no match, while substring() returns null. I added a WHERE clause to eliminate non-matches a-priori - and allow index support if applicable, but I don't expect indexes to help here.

    Note the prefixed (?i), that's a so-called "embedded option" to use case-insensitive matching.

    Added a deterministic ORDER BY clause. You'd need that for a simple crosstab().

    Aside: you might need _ in the pattern instead of - for locales like "en_US".

    Pivot

    Try as you might, SQL does not allow dynamic result columns in a single query. You need two round trips to the server. See;

    You can use a dynamically generated crosstab() query. Basics:

    Dynamic query:

    But since you generate a single row of plain integer values, I suggest a simple approach:

    SELECT 'SELECT ' || string_agg(ct || ' AS ' || quote_ident(locale), ', ')
    FROM  (
       SELECT lower(substring(locale, '(?i)[a-z]{2,3}(?:-[a-z]{2,3})?')) AS locale
            , count(*)::int AS ct
       FROM   users
       WHERE  locale ~* '[a-z]{2,3}'
       GROUP  BY 1
       ORDER  BY 2 DESC, 1
       ) t
    

    Generates a query of the form:

    SELECT 10 AS en, 7 AS fr, 3 AS de, 3 AS "de-at"
    

    Execute it to produce your desired result.

    In psql you can append \gexec to the generating query to feed the generated SQL string back to the server immediately. See: