Search code examples
mysqlsqlselectmariadbrecursive-query

MySQL - select distinct value from two column


I have a table with the following structure:

IdM|IdS
-------
 1 | 2
 1 | 3
 1 | 4
 2 | 1
 2 | 3
 2 | 4
 3 | 1
 3 | 2
 3 | 3
 3 | 4

How could I make a select statement on this table, which will return some rows of this table, where in each row, a specific id appears only one, indifferent on which column it is specified?

For the above result set, I would like a query that would return:

-------
 1 | 2
 3 | 4
-------

To give another example, if you would omit the first row in the original dataset:

IdM|IdS
-------
 1 | 3
 1 | 4
 2 | 1
 2 | 3
 2 | 4
 3 | 1
 3 | 2
 3 | 3
 3 | 4

the result set should be:

-------
 1 | 3
 2 | 4
-------

Solution

  • That's actually an interesting problem. If I follow you correctly, you want to iterate through the dataset and only retain rows where both values were never seen before. You could use a recursive query:

    with recursive 
        data as (
            select idm, ids, row_number() over(order by idm, ids) rn 
            from mytable 
            where idm <> ids
        ),
        cte as (
            select idm, ids, rn, 1 as to_keep , concat(idm, ',', ids) visited from data where rn = 1
            union all
            select d.idm, d.ids, d.rn,
                (not find_in_set(d.idm, c.visited) and not find_in_set(d.ids, c.visited)),
                case when (not find_in_set(d.idm, c.visited) and not find_in_set(d.ids, c.visited))
                    then concat_ws(',', c.visited, d.idm, d.ids)
                    else c.visited
                end
            from cte c
            inner join data d on d.rn = c.rn + 1
        )
    select idm, ids from cte where to_keep
    

    The first CTE enumerates the rows ordered by both columns. Then the recursive query walks the resultset, checks if both values are new, and sets a flag accordingly of the columns. Flagged numbers are retained to be used for filtering in the following iteration.

    Demo on DB Fiddle

    Note that, given your requirement, not all values may appear in the resultset. Consider the following dataset:

     idm   ids
    +-----+---
     1     2
     1     3
     1     4
    

    Your logic will only return the first row.