Search code examples
sqlpostgresqlpostgis

Creating a view that contains all records from one table, that match the comma separated field content in another table


I have two tables au_postcodes and groups.

  1. Table groups contains a field called PostCodeFootPrint that contains the postcode set making up the footprint.
  2. Table au_postcodes contains a field called poa_code that contains a single postcode.

The records in groups.PostCodeFootPrint look like:

PostCodeFootPrint
2529,2530,2533,2534,2535,2536,2537,2538,2539,2540,2541,2575,2576,2577,2580
2640
3844
2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070, 2071, 2072, 2073, 2074, 2075, 2076, 2077, 2079, 2080, 2081, 2082, 2083, 2119, 2120, 2126, 2158, 2159
2848, 2849, 2850, 2852

Some records have only one postcode, some have multiple separated by a "," or ", " (comma and space).

The records in au_postcode.poa_code look like:

poa_code
2090
2092
2093
829
830
836
2080
2081

Single postcode (always).

The objective is to:

Get all records from au_postcode, where the poa_code appears in groups.*PostCodeFootPrint into a view.

I tried:

SELECT
    au_postcodes.poa_code, 
    groups."NameOfGroup"
FROM
    groups,
    au_postcodes
WHERE
    groups."PostcodeFootprint" LIKE '%au_postcodes.poa_code%'

But no luck


Solution

  • You can use regex for this. Take a look at this fiddle:

    https://dbfiddle.uk/?rdbms=postgres_14&fiddle=739592ef262231722d783670b46bd7fa

    Where I form a regex from the poa_code and the word boundary (to avoid partial matches) and compare that to the PostCodeFootPrint.

    select p.poa_code, g.PostCodeFootPrint
    from groups g
    join au_postcode p 
    on g.PostCodeFootPrint ~ concat('\y', p.poa_code, '\y')
    

    Depending on your data, this may be performant enough. I also believe that in postGres you have access to the array data type, and so it might be better to store the post code lists as arrays.

    https://dbfiddle.uk/?rdbms=postgres_14&fiddle=ae24683952cb2b0f3832113375fbb55b

    Here I stored the post code lists as arrays, then used ANY to join with.

    select p.poa_code, g.PostCodeFootPrint
    from groups g
    join au_postcode p 
    on p.poa_code = any(g.PostCodeFootPrint);
    

    In these two fiddles I use explain to show the cost of the queries, and while the array solution is more expensive, I imagine it might be easier to maintain.

    https://dbfiddle.uk/?rdbms=postgres_14&fiddle=7f16676825e10625b90eb62e8018d78e

    https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e96e0fc463f46a7c467421b47683f42f

    I changed the underlying data type to integer in this fiddle, expecting it to reduce the cost, but it didn't, which seems strange to me.

    https://dbfiddle.uk/?rdbms=postgres_14&fiddle=521d6a7d0eb4c45471263214186e537e

    It is possible to reduce the query cost with the # operator (see the last query here: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=edc9b07e9b22ee72f856e9234dbec4ba):

    select p.poa_code, g.PostCodeFootPrint
    from groups g
    join au_postcode p 
    on (g.PostCodeFootPrint # p.poa_code) > 0;
    

    but it is still more expensive than the regex. However, I think you can probably rearrange the way the tables are set up and radically change performance. See the first and second queries in the fiddle, where I take each post code in the footprint and insert it as a row in a table, along with an identifier for the group it was in:

    select p.poa_code, g.which
    from groups2 g
    join au_postcode p 
    on g.footprint = p.poa_code;
    

    The explain plan for this indicates that query cost drops significantly (from 60752.50 to 517.20, or two orders of magnitude) and the execution times go from 0.487 to 0.070. So it might be worth looking into changing the table structure.