Subquery returning field from second table with between operator using value from first table

I have a problem with doing a subquery in PostgreSQL to get a calculated column value, it reports:

[21000] ERROR: more than one row returned by a subquery used as an expression

Situation

I have two tables:

accounts
postal_code_to_state

Accounts table (subset of columns)

name	postal_code	state
Cust One	00020	NULL
Cust Two	63076	CD

Postal Code to State

pc_from	pc_to	state
10	30	AB
63000	63100	CD

The accounts table has rows where the state value may be unknown, but there is a postal code value. The postal code field is char (but that is incidental).

The postal_code_to_state table has rows with postcode (integer) from & to columns. That is the low/high numbers of the postal code range for a state.

There is no common field to do joins. To get the state from the postal_code_to_state table the char field is cast to INT and the between operator used, e.g.

SELECT state
FROM postal_code_to_state
WHERE CAST('00020' AS INT) BETWEEN pc_from AND pc_to

this works OK, there is also a unique index on pc_from and pc_to.

But I need to run a query selecting from the accounts table and populating the state column from the state column in the postal_code_to_state table using the postal_code from the accounts table to select the appropriate row.

I can't figure out why PostgreSQL is complaining about the subquery returning multiple rows. This is the query I am currently using:

SELECT id,
       name,
       postal_code,
       state,
       (SELECT state
        FROM postal_code_to_state
        WHERE CAST(accounts.postal_code AS INT) BETWEEN pc_from AND pc_to) AS new_state
FROM accounts
WHERE postal_code IS NOT NULL ;

If I use LIMIT 1 in the subquery it is OK, and it returns the correct state value from postal_code_to_state, but would like to have it working without need to do that.

UPDATE 2022-10-22

@Adrian - thanks for query to find duplicates, I had to change your query a little, the != 'empty' to != FALSE.

When I run it on data I get this, groups of two rows (1 & 2, 3 & 4, etc.) shows the overlapping ranges.

state	pc_from	pc_to
CA	9010	9134
OR	9070	9170
UD	33010	33100
PN	33070	33170
TS	34010	34149
GO	34070	34170
CB	86010	86100
IS	86070	86170

So if I run...

SELECT pc_from,
       pc_to,
       state
FROM postal_code_to_state
WHERE int4range(pc_from, pc_to) @> 9070;

I get...

pc_from	pc_to	state
9010	9134	CA
9070	9170	OR

So, from the PostgreSQL side, the problem is clear - obviously it is the data. On the point of the data, what is shown on a site that has Italian ZIP code information is interesting:

https://zip-codes.nonsolocap.it/cap?k=12071&b=&c=

This was one of the dupes I had already removed.

The exact same ZIP code is used in two completely different provinces (states) - go figure! Given that the ZIP code is meant to resolve down to the street level, I can't see how one code can be valid for two localities.

Solution

Try this query to find duplicates:

select 
    a_tbl.state, a_tbl.pc_from, a_tbl.pc_to 
from 
    postal_code_to_state as a_tbl, 
   (select *  from postal_code_to_state) as b_tbl 
where 
    a_tbl.state != b_tbl.state 
and 
   int4range(a_tbl.pc_from, a_tbl.pc_to, '[]') && int4range(b_tbl.pc_from, b_tbl.pc_to, '[]') != 'empty';

If there are duplicates, after clearing them then you can do:

alter table 
    postal_code_to_state 
add 
   constraint exclude_test EXCLUDE USING GIST  (int4range(pc_from, pc_to, '[]') WITH &&);

This will set up an exclusion constraint to prevent overlapping ranges.

So:

insert into postal_code_to_state values (10, 30, 'AB'), (6300, 63100, 'CD');
insert into postal_code_to_state values (25, 40, 'SK');
ERROR:  conflicting key value violates exclusion constraint "exclude_test"

insert into postal_code_to_state values (31, 40, 'SK');
INSERT 0 1

select * from postal_code_to_state ;
 pc_from | pc_to | state 
---------+-------+-------
      10 |    30 | AB
   63000 | 63100 | CD
      31 |    40 | SK