I have two tables au_postcodes and groups.
The records in groups.PostCodeFootPrint look like:
PostCodeFootPrint |
---|
2529,2530,2533,2534,2535,2536,2537,2538,2539,2540,2541,2575,2576,2577,2580 |
2640 |
3844 |
2063, 2064, 2065, 2066, 2067, 2068, 2069, 2070, 2071, 2072, 2073, 2074, 2075, 2076, 2077, 2079, 2080, 2081, 2082, 2083, 2119, 2120, 2126, 2158, 2159 |
2848, 2849, 2850, 2852 |
Some records have only one postcode, some have multiple separated by a "," or ", " (comma and space).
The records in au_postcode.poa_code look like:
poa_code |
---|
2090 |
2092 |
2093 |
829 |
830 |
836 |
2080 |
2081 |
Single postcode (always).
The objective is to:
Get all records from au_postcode, where the poa_code appears in groups.*PostCodeFootPrint into a view.
I tried:
SELECT
au_postcodes.poa_code,
groups."NameOfGroup"
FROM
groups,
au_postcodes
WHERE
groups."PostcodeFootprint" LIKE '%au_postcodes.poa_code%'
But no luck
You can use regex for this. Take a look at this fiddle:
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=739592ef262231722d783670b46bd7fa
Where I form a regex from the poa_code and the word boundary (to avoid partial matches) and compare that to the PostCodeFootPrint.
select p.poa_code, g.PostCodeFootPrint
from groups g
join au_postcode p
on g.PostCodeFootPrint ~ concat('\y', p.poa_code, '\y')
Depending on your data, this may be performant enough. I also believe that in postGres you have access to the array data type, and so it might be better to store the post code lists as arrays.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=ae24683952cb2b0f3832113375fbb55b
Here I stored the post code lists as arrays, then used ANY to join with.
select p.poa_code, g.PostCodeFootPrint
from groups g
join au_postcode p
on p.poa_code = any(g.PostCodeFootPrint);
In these two fiddles I use explain to show the cost of the queries, and while the array solution is more expensive, I imagine it might be easier to maintain.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=7f16676825e10625b90eb62e8018d78e
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e96e0fc463f46a7c467421b47683f42f
I changed the underlying data type to integer in this fiddle, expecting it to reduce the cost, but it didn't, which seems strange to me.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=521d6a7d0eb4c45471263214186e537e
It is possible to reduce the query cost with the # operator (see the last query here: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=edc9b07e9b22ee72f856e9234dbec4ba):
select p.poa_code, g.PostCodeFootPrint
from groups g
join au_postcode p
on (g.PostCodeFootPrint # p.poa_code) > 0;
but it is still more expensive than the regex. However, I think you can probably rearrange the way the tables are set up and radically change performance. See the first and second queries in the fiddle, where I take each post code in the footprint and insert it as a row in a table, along with an identifier for the group it was in:
select p.poa_code, g.which
from groups2 g
join au_postcode p
on g.footprint = p.poa_code;
The explain plan for this indicates that query cost drops significantly (from 60752.50 to 517.20, or two orders of magnitude) and the execution times go from 0.487 to 0.070. So it might be worth looking into changing the table structure.