Search code examples
sqlamazon-web-servicesdatabase-schemaamazon-redshiftmultiple-tables

AWS Redshift query multiple schema's


Is it possible to query multiple schema's within my Redshift database and combine the results?

Something like this:

SELECT DISTINCT name, SUM(usage) AS totalUsage
FROM schemaOne.terms, schemaTwo.terms

will throw an error:

Amazon Invalid operation: column reference "name" is ambiguous;

So, is this possible and would this be a good practice? I would like to create a schema for each client which will have it's own terms (tags) table and other tables.

For example

SchemaOne: terms

id,  name,  description,  usage
_______________________________
1    css    CSS is bla.   14
2    html   HTML rocks!   9

SchemaTwo: terms

id,  name,  description,  usage
_______________________________
1    css    CSS is cool.  8
2    other  x             4

Now I would like to combine the results, which could output something like:

name   totalUsage
_________________
css    22
html   9
other  4

Solution

  • First do UNION ALL in a derived table, then apply GROUP BY:

    select name, sum(usage) as totalUsage
    from
    (
        select name, usage
        from SchemaOne.terms
        union all
        select name, usage
        from SchemaTwo.terms
    ) dt
    group by name
    

    Note that some products have name and usage as reserved words, so you may need to delimit them as "name" and/or "usage".

    Perhaps better performance version:

    select name, sum(usage) as totalUsage
    from
    (
        select name, sum(usage) as usage
        from SchemaOne.terms
        group by name
        union all
        select name, sum(usage) as usage
        from SchemaTwo.terms
        group by name
    ) dt
    group by name
    

    Original answer:

    If the same column name exists in more than one table, you have to qualify that column, i.e put the table name before the column name:

    SELECT DISTINCT schemaOne.terms.name, SUM(usage) as totalUsage
    FROM schemaOne.terms, schemaTwo.terms
    

    Table aliases are convenient:

    SELECT DISTINCT s1t.name, SUM(usage) as totalUsage
    FROM schemaOne.terms s1t, schemaTwo.terms s2t