Search code examples
sqlpostgresqlaggregate-functionscorrelated-subquerylateral

Subquery that matches column with several ranges defined in table


I've got a pretty common setup for an address database: a person is tied to a company with a join table, the company can have an address and so forth.

All pretty normalized and easy to use. But for search performance, I'm creating a materialized, rather denormalized view. I only need a very limited set of information and quick queries. Most of everything that's usually done via a join table is now in an array. Depending on the query, I can either search it directly or join it via unnest.

As a complement to my zipcodes column (varchar[]), I'd like to add a states column that has the (German fedaral) states already precomputed, so that I don't have to transform a query to include all kinds of range comparisons.

My mapping date is in a table like this:

CREATE TABLE zip2state (
    state TEXT NOT NULL,
    range_start CHARACTER VARYING(5) NOT NULL,
    range_end CHARACTER VARYING(5) NOT NULL
)

Each state has several ranges, and ranges can overlap (one zip code can be for two different states). Some ranges have range_start = range_end.

Now I'm a bit at wit's end on how to get that into a materialized view all at once. Normally, I'd feel tempted to just do it iteratively (via trigger or on the application level). Or as we're just talking about 5 digits, I could create a big table mapping zip to state directly instead of doing it via a range (my current favorite, yet something ugly enough that it prompted me to ask whether there's a better way)

Any way to do that in SQL, with a table like the above (or something similar)? I'm at postgres 9.3, all features allowed...

For completeness' sake, here's the subquery for the zip codes:

    (select array_agg(distinct address.zipcode)
      from  affiliation
      join  company
        on  affiliation.ins_id = company.id
      join  address
        on  address.com_id = company.id
     where  affiliation.per_id = person.id) AS zipcodes,

Solution

  • I suggest a LATERAL join instead of the correlated subquery to conveniently compute both columns at once. Could look like this:

    SELECT p.*, z.*
    FROM   person p
    LEFT   JOIN LATERAL (
       SELECT array_agg(DISTINCT d.zipcode) AS zipcodes
            , array_agg(DISTINCT z.state)   AS states
       FROM   affiliation    a
       -- JOIN   company     c ON a.ins_id = c.id  -- suspect you don't need this
       JOIN   address        d ON d.com_id = a.ins_id  -- c.id
       LEFT   JOIN zip2state z ON d.zipcode BETWEEN z.range_start AND z.range_end
       WHERE  a.per_id = p.id
       ) z ON true;
    

    If referential integrity is guaranteed, you don't need to join to the table company at all. I took the shortcut.

    Be aware that varchar or text behaves differently than expected for numbers. For example: '333' > '0999'. If all zip codes have 5 digits you are fine.

    Related: