I'm trying to identify duplicate state_num
that are failing validation. The R is causing issues with validation, but I want to just search the first 7 characters and find the duplicate values, so that it returns the row that has an R in the string and the row that doesn't. The column is a type: char(15) But when trying to run a query it is not finding the matching 7 characters. My table only showing how it should look, its not showing what is actually being returned. It basically is just finding the state and only finding non R state_num
in results. It should be returning around 480 rows but is returning like 20k rows and not just showing the duplicates
I've tried querying a bunch of different ways but i've spen the last hour only being able to return the R row if i ad AND state_num[8] = 'R'
to the end of the query. Which defeats what I'm trying to find the duplicate first 7 characters. This is an informix db.
My Query:
SELECT id_ref, cont_ref, formatted, state_num, type, state
FROM state_form sf1
WHERE EXISTS (select cont_ref, san
FROM state_form sf2
WHERE sf1.cont_ref = sf2.cont_ref and left(sf1.state_num,7) = LEFT(sf2.state_num,7)
GROUP BY cont_ref, state_num
HAVING COUNT(state_num) > 1)
AND state = 'MT';
This is what I'd like my results to return:
id_ref | cont_ref | formatted | state_num | type | state |
---|---|---|---|---|---|
658311 | 5237 | 71-75011R | 7175011R | Y | MT |
1459 | 5237 | 71-75011 | 7175011 | I | MT |
7501 | 555678 | 99-67894 | 9967894 | I | MT |
345443 | 555678 | 99-67894R | 9967894R | Y | MT |
Here are a couple options producing the same results. This may need to be changed if you need to identify the 8th character as something such as a Letter. That is, this will also catch 12345678 and 1234567.
create table my_data (
id_ref integer,
cont_ref integer,
state_num varchar(20),
type varchar(5),
state varchar(5)
);
insert into my_data values
(1, 5237, '7175011R', 'Y', 'MT'),
(2, 5237, '7175011', 'I', 'MT'),
(3, 6789, '7878787', 'Y', 'CA'),
(4, 6789, '7878787R', 'I', 'CA'),
(5, 555678, '9967894', 'I', 'MT'),
(6, 555678, '9967894R', 'Y', 'MT'),
(7, 98765, '123456', 'I', 'MT');
Query #1
with dupes as (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
)
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;
Query #2
select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
from my_data m
where m.cont_ref in (
select cont_ref
from my_data
where state = 'MT'
group by cont_ref, left(state_num, 7)
having count(*) > 1
);
id_ref | cont_ref | state_num | type | state |
---|---|---|---|---|
1 | 5237 | 7175011R | Y | MT |
2 | 5237 | 7175011 | I | MT |
5 | 555678 | 9967894 | I | MT |
6 | 555678 | 9967894R | Y | MT |
UPDATE
If Informix does not want to group by left(column, 7), then you could get the target cont_ref values using this. Here's the CTE method, but you could also do with sub-query.
with dupes as (
select cont_ref
from (
select cont_ref, left(state_num, 7) as left_seven
from my_data
where state = 'MT'
)z
group by cont_ref
having count(*) > 1
)
select m.*
from my_data m
join dupes d
on m.cont_ref = d.cont_ref;