How can I group results where string has the same 6 characters from a total of 7 in MySQL/MariaDB?

I have a table that consists of vehicle plates, where all the plates have always 7 characters. Sometimes a plate was miswritten and so my SELECT query must be able to consider that "AAU1234" and "AAV1234" must be the same vehicle. I don't want to create rules between specific characters, like "U" and "V" or "I" and "1", but a rule able to group plates where 6 of the 7 characters are equal and in the same position.

eg.:

In this case, ids 1, 2 and 5 should appear only once.

It would be great if, when grouped, all the grouped plates were informed, concatened in another column.

More information: This is a big table and currently identical plates are grouped when inside the same date-time range of 15 minutes.

My query looks like this:

select plate, floor(unix_timestamp(date)/(15 * 60)) as timekey 
from table 
group by plate, timekey 
order by date desc

Following the image example above, my goal is to group ids 1, 2 and 5 in the same row, since id 1 and id 2 matches 6 of 7 strings and so id 1 and id 5.

The result could be something like:

Or:

The final date information is not important here, the most important thing here is to group the similar plates.

Update1 Questions and answers on "staging ground":

RafaelZG
This table receives data from a plate recognition camera software and I have a front end that shows all the entries, grouping identical plates in the same time range, because a vehicle can have 1, 2 or 10 photos. The problem is that sometimes the software can't read a characater correctly and it generates separate entries in my front end, so my goal is to reduce this problem as much as possible, grouping these plates where only 1 character was wrongly registered. This rule doesn't need to solve 100% of the wrong entries.

Shadow
Row with id=5 differs by only one character from id=1, but differs by 2 characters from row with id=2. So, these rows would only be grouped together if the starting row is id=1. How do you determine which row is the starting one from grouping perspective? Btw, you should look at levenshtein distance, but this question is a way more complicated than you think and mysql may not be the best tool to accomplish your task!

RafaelZG
@shadow Great point! Things start to get even more complicated, but since ids 1 and 2 are grouped, and id 5 has 6 char matches with id 1, it should be also added to the group. The limiting point should be a date range, so the query would need run the entire table infinitely.

RafaelZG
I'll try to answer again more objectively: if id 1 and id 2 are grouped and id 5 is 6-char similar to id 1, then all the 3 should be grouped. eg: ABC1234, ABG1234, ABO1234, ABOI234, ABOI284 should be all grouped, since each of them is 6-char-similar to at least another one.

Solution

See example with test data. The data is somewhat expanded compared to the examples in the question.

create table test (id int, plate varchar(7));
insert into test values
 (1,'AAU1234')
,(2,'AAV1234')
,(3,'BKP5678')
,(4,'CMD9081')
,(5,'A4U1234')
,(6,'ABC1234')
,(7,'ABG1234')
,(8,'ABO1234')
,(9,'ABOI234')
,(10,'ABOI284')
,(11,'ABGI234')
,(12,'ABGI284')
,(14,'CMD9031')
;

First, let's simple compare plate's and find the rows, where 6 characters match. In JOIN condition and (t1.s1=t2.s1 or t1.s2=t2.s2) - at least 1 must match.
Wait for matchn=6

with recursive
tn as(
  select *,substring(plate,1,1) s1,substring(plate,2,1) s2,substring(plate,3,1) s3
    ,substring(plate,4,1) s4,substring(plate,5,1) s5,substring(plate,6,1) s6,substring(plate,7,1) s7
  from test
)
,cmp as(
select t1.id,t1.plate,t2.id id2,t2.plate plate2
select t1.id,t1.plate,t2.id id2,t2.plate plate2
  ,case when (t1.s1=t2.s1) then 1 else 0 end +case when (t1.s2=t2.s2) then 1 else 0 end
  +case when (t1.s3=t2.s3) then 1 else 0 end +case when (t1.s4=t2.s4) then 1 else 0 end
  +case when (t1.s5=t2.s5) then 1 else 0 end +case when (t1.s6=t2.s6) then 1 else 0 end
  +case when (t1.s7=t2.s7) then 1 else 0 end matchn
from tn t1
left join tn t2 on t1.id<>t2.id and (t1.s1=t2.s1 or t1.s2=t2.s2)
)

Output (part of result)

id	plate	id2	plate2	matchn
1	AAU1234	2	AAV1234	6
1	AAU1234	5	A4U1234	6
4	CMD9081	14	CMD9031	6
6	ABC1234	7	ABG1234	6
6	ABC1234	8	ABO1234	6
7	ABG1234	8	ABO1234	6
7	ABG1234	11	ABGI234	6
8	ABO1234	9	ABOI234	6
9	ABOI234	10	ABOI284	6
9	ABOI234	11	ABGI234	6
10	ABOI284	12	ABGI284	6
11	ABGI234	12	ABGI284	6

This table can be considered as a description of a graph.
The resulting graph is not oriented because we have set the condition (t1.id <> t2.id ).

       (6)ABC1234                       (1)AAU1234
        /       \                          /  \
      C->G      C->O                     A->4 U->V
      /           \                      /      \
(7)ABG1234--G->O--(8)ABO1234            /   (2)AAV1234   
                         \         (5)A4U1234
                         1->I
                           \
                      (9)ABOI234                  (4)CMD9081
                          /    \                          |
                        O->G   3->8                      8->3
                        /        \                        |
                       / (10)ABOI284             (14)CMD9031
                      /         \
                     /          O->G
                    /              \ 
             (11)ABGI234-3->8-(12)ABGI284

Then we recursively traverse the directed graph to find all nodes starting from the vertex of the graph.

,r as(
  select distinct 0 lvl,id,plate p0,plate,id id2,plate plate2
    ,cast('' as char(1000)) as path
  from cmp
  where  matchn=6 

  union all

  select lvl+1,r.id,p0,r.plate2,t.id2 id2,t.plate2
    ,concat(r.path,',',r.plate) as path
  from r inner join cmp t on t.plate=r.plate2 -- find_in_set(t.plate,r.list)>0
  where t.matchn=6 
    and  find_in_set(t.plate,r.path)=0 
    -- and lvl<9 -- for debug only
)
,d as(
  select distinct id,p0,id2,plate2
  from r
)
,chains as(
  select min(id2) groupid,id,p0 plate
     ,group_concat(id2 order by id2) ids
     ,group_concat(plate2 order by id2) plates
  from d
  group by id,p0
)

Chains output

groupid	id	plate	ids	plates
1	1	AAU1234	1,2,5	AAU1234,AAV1234,A4U1234
1	2	AAV1234	1,2,5	AAU1234,AAV1234,A4U1234
1	5	A4U1234	1,2,5	AAU1234,AAV1234,A4U1234
4	4	CMD9081	4,14	CMD9081,CMD9031
4	14	CMD9031	4,14	CMD9081,CMD9031
6	6	ABC1234	6,7,8,9,10,11,12	ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284
6	7	ABG1234	6,7,8,9,10,11,12	ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284
6	8	ABO1234	6,7,8,9,10,11,12	ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284
6	9	ABOI234	6,7,8,9,10,11,12	ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284
6	10	ABOI284	6,7,8,9,10,11,12	ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284
6	11	ABGI234	6,7,8,9,10,11,12	ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284
6	12	ABGI284	6,7,8,9,10,11,12	ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284

Now we can merge the rows or just assign a group to each row.

,newGr as(
select t.*,coalesce(c.groupid,t.id) groupid
  ,c.ids,c.plates 
from test t
left join chains c on t.id=c.id
)
--   select * from cmp order by id,id2;
--   select * from r order by id,lvl,id2;
--  select * from d order by id,id2;
--  select * from chains order by groupid;
--  select * from newGr order by id;
select groupid,min(id) id -- ,grPlate
  ,group_concat(id) ids
  ,group_concat(plate) plates
from newGr
group by groupid

Output

id	grPlate	ids	plates
1	AAU1234	1,2,5	AAU1234,AAV1234,A4U1234
6	ABC1234	6,7,8,9,10,11,12	ABC1234,ABG1234,ABO1234,ABOI234,ABOI284,ABGI234,ABGI284
3	BKP5678	3	BKP5678
4	CMD9081	4,14	CMD9081,CMD9031

New groups for rows

id	plate	grPlate
1	AAU1234	AAU1234
2	AAV1234	AAU1234
3	BKP5678	BKP5678
4	CMD9081	CMD9081
5	A4U1234	AAU1234
6	ABC1234	ABC1234
7	ABG1234	ABC1234
8	ABO1234	ABC1234
9	ABOI234	ABC1234
10	ABOI284	ABC1234
11	ABGI234	ABC1234
12	ABGI284	ABC1234
14	CMD9031	CMD9081

Demo
and old Demo