Search code examples
postgresqlunique

PostgreSQL select uniques from three different columns


I have one large table 100m+ rows and two smaller ones 2m rows ea. All three tables have a column of company names that need to be sent out to an API for matching. I want to select the strings from each column and then combine into a single column of unique strings.

I'm using a version of this response, but unsurprisingly the performance is very slow. Combined 2 columns into one column SQL

SELECT DISTINCT 
    unnest(string_to_array(upper(t.buyer) || '#' || upper(a.aw_supplier_name) || '#' || upper(b.supplier_source_string), '#')) 
FROM
    tenders t,
    awards a,
    banking b
;

Any ideas on a more performant way to achieve this?

Update: the banking table is the largest table with 100m rows.


Solution

  • Assuming PostgreSQL 9.6 and borrowing the select from rd_nielsen's answer, the following should give you a comma delimited string of the distinct names.

    WITH cte
    AS (
        SELECT UPPER(T.buyer) NAMES
        FROM tenders T
    
        UNION
    
        SELECT UPPER(A.aw_supplier_name) NAMES
        FROM awards A
    
        UNION
    
        SELECT UPPER(b.supplier_source_string) NAMES
        FROM banking b
        )
    SELECT array_to_string(ARRAY_AGG(cte.names), ',')
    FROM cte