Search code examples
sqlinformix

Find matching first 7 chars to identify duplicates


I'm trying to identify duplicate state_num that are failing validation. The R is causing issues with validation, but I want to just search the first 7 characters and find the duplicate values, so that it returns the row that has an R in the string and the row that doesn't. The column is a type: char(15) But when trying to run a query it is not finding the matching 7 characters. My table only showing how it should look, its not showing what is actually being returned. It basically is just finding the state and only finding non R state_num in results. It should be returning around 480 rows but is returning like 20k rows and not just showing the duplicates

I've tried querying a bunch of different ways but i've spen the last hour only being able to return the R row if i ad AND state_num[8] = 'R' to the end of the query. Which defeats what I'm trying to find the duplicate first 7 characters. This is an informix db.

My Query:

SELECT id_ref, cont_ref, formatted, state_num, type, state
FROM state_form sf1
WHERE EXISTS (select cont_ref, san
  FROM state_form sf2
  WHERE sf1.cont_ref = sf2.cont_ref and left(sf1.state_num,7) = LEFT(sf2.state_num,7)
 GROUP BY cont_ref, state_num
 HAVING COUNT(state_num) > 1)
 AND state = 'MT';

This is what I'd like my results to return:

id_ref cont_ref formatted state_num type state
658311 5237 71-75011R 7175011R Y MT
1459 5237 71-75011 7175011 I MT
7501 555678 99-67894 9967894 I MT
345443 555678 99-67894R 9967894R Y MT

Solution

  • Here are a couple options producing the same results. This may need to be changed if you need to identify the 8th character as something such as a Letter. That is, this will also catch 12345678 and 1234567.

    create table my_data (
      id_ref integer, 
      cont_ref integer, 
      state_num varchar(20), 
      type varchar(5), 
      state varchar(5)
      );
      
     insert into my_data values 
     (1, 5237, '7175011R', 'Y', 'MT'),
     (2, 5237, '7175011', 'I', 'MT'),
     (3, 6789, '7878787', 'Y', 'CA'),
     (4, 6789, '7878787R', 'I', 'CA'),
     (5, 555678, '9967894', 'I', 'MT'),
     (6, 555678, '9967894R', 'Y', 'MT'),
     (7, 98765, '123456', 'I', 'MT');
    

    Query #1

    with dupes as (
      select cont_ref
      from my_data
      where state = 'MT'
      group by cont_ref, left(state_num, 7)
      having count(*) > 1
      )
    select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
    from my_data m
    join dupes d
      on m.cont_ref = d.cont_ref;
    

    Query #2

    select m.id_ref, m.cont_ref, m.state_num, m.type, m.state
    from my_data m
    where m.cont_ref in (
      select cont_ref
      from my_data
      where state = 'MT'
      group by cont_ref, left(state_num, 7)
      having count(*) > 1
      );
    
    id_ref cont_ref state_num type state
    1 5237 7175011R Y MT
    2 5237 7175011 I MT
    5 555678 9967894 I MT
    6 555678 9967894R Y MT

    View on DB Fiddle

    UPDATE

    If Informix does not want to group by left(column, 7), then you could get the target cont_ref values using this. Here's the CTE method, but you could also do with sub-query.

    with dupes as (
     select cont_ref
     from (
       select cont_ref, left(state_num, 7) as left_seven
       from my_data
       where state = 'MT'
       )z
     group by cont_ref
     having count(*) > 1
    )
    select m.*
    from my_data m
    join dupes d
      on m.cont_ref = d.cont_ref;