I have a problem with doing a subquery in PostgreSQL to get a calculated column value, it reports:
[21000] ERROR: more than one row returned by a subquery used as an expression
I have two tables:
Accounts table (subset of columns)
name | postal_code | state |
---|---|---|
Cust One | 00020 | NULL |
Cust Two | 63076 | CD |
Postal Code to State
pc_from | pc_to | state |
---|---|---|
10 | 30 | AB |
63000 | 63100 | CD |
The accounts
table has rows where the state value may be unknown, but there is a postal code value. The postal code field is char (but that is incidental).
The postal_code_to_state
table has rows with postcode (integer) from & to columns. That is the low/high numbers of the postal code range for a state.
There is no common field to do joins. To get the state from the postal_code_to_state
table the char field is cast to INT and the between
operator used, e.g.
SELECT state
FROM postal_code_to_state
WHERE CAST('00020' AS INT) BETWEEN pc_from AND pc_to
this works OK, there is also a unique index on pc_from
and pc_to
.
But I need to run a query selecting from the accounts table and populating the state column from the state column in the postal_code_to_state table using the postal_code from the accounts table to select the appropriate row.
I can't figure out why PostgreSQL is complaining about the subquery returning multiple rows. This is the query I am currently using:
SELECT id,
name,
postal_code,
state,
(SELECT state
FROM postal_code_to_state
WHERE CAST(accounts.postal_code AS INT) BETWEEN pc_from AND pc_to) AS new_state
FROM accounts
WHERE postal_code IS NOT NULL ;
If I use LIMIT 1
in the subquery it is OK, and it returns the correct state value from postal_code_to_state, but would like to have it working without need to do that.
@Adrian - thanks for query to find duplicates, I had to change your query a little, the != 'empty'
to != FALSE
.
When I run it on data I get this, groups of two rows (1 & 2, 3 & 4, etc.) shows the overlapping ranges.
state | pc_from | pc_to |
---|---|---|
CA | 9010 | 9134 |
OR | 9070 | 9170 |
UD | 33010 | 33100 |
PN | 33070 | 33170 |
TS | 34010 | 34149 |
GO | 34070 | 34170 |
CB | 86010 | 86100 |
IS | 86070 | 86170 |
So if I run...
SELECT pc_from,
pc_to,
state
FROM postal_code_to_state
WHERE int4range(pc_from, pc_to) @> 9070;
I get...
pc_from | pc_to | state |
---|---|---|
9010 | 9134 | CA |
9070 | 9170 | OR |
So, from the PostgreSQL side, the problem is clear - obviously it is the data. On the point of the data, what is shown on a site that has Italian ZIP code information is interesting:
https://zip-codes.nonsolocap.it/cap?k=12071&b=&c=
This was one of the dupes I had already removed.
The exact same ZIP code is used in two completely different provinces (states) - go figure! Given that the ZIP code is meant to resolve down to the street level, I can't see how one code can be valid for two localities.
Try this query to find duplicates:
select
a_tbl.state, a_tbl.pc_from, a_tbl.pc_to
from
postal_code_to_state as a_tbl,
(select * from postal_code_to_state) as b_tbl
where
a_tbl.state != b_tbl.state
and
int4range(a_tbl.pc_from, a_tbl.pc_to, '[]') && int4range(b_tbl.pc_from, b_tbl.pc_to, '[]') != 'empty';
If there are duplicates, after clearing them then you can do:
alter table
postal_code_to_state
add
constraint exclude_test EXCLUDE USING GIST (int4range(pc_from, pc_to, '[]') WITH &&);
This will set up an exclusion constraint to prevent overlapping ranges.
So:
insert into postal_code_to_state values (10, 30, 'AB'), (6300, 63100, 'CD');
insert into postal_code_to_state values (25, 40, 'SK');
ERROR: conflicting key value violates exclusion constraint "exclude_test"
insert into postal_code_to_state values (31, 40, 'SK');
INSERT 0 1
select * from postal_code_to_state ;
pc_from | pc_to | state
---------+-------+-------
10 | 30 | AB
63000 | 63100 | CD
31 | 40 | SK