Eliminate duplicates before union

I need to run a query, that select two columns from a big table (3m+ rows, with selecting two columns, the result set is around 6-7m) and returns a list. So I used union to merge the columns into a list and also to eliminate duplicates. The problem is that I cant return the result in one query, I need to partition it, so I applied a LIMIT ?,? to the subqueries, which the application layer sets via Prepared Statements.

SELECT val
FROM 
(
    (SELECT fs.smr as val
    FROM `fr_search` as fs
    ORDER BY val LIMIT ?,?)

    UNION

    (SELECT fs.dmr as val
    FROM `fr_search` as fs
    ORDER BY val LIMIT ?,?)
) as vals
GROUP BY val

The problem: The union eliminates the duplicates, but only after the LIMIT is applied. Meaning If the two query returns 100+100=200 rows and most of them is a duplicate, I only return <200 rows. How can I apply a limit to such a query, that I can return a specific amount of rows? (If I apply the LIMIT after the subqueries, It will take more than two minutes to run, so It will not solve the problem. )

Solution

You don't actually need a subquery for this. The following will work for the first 100 rows:

 (SELECT DISTINCT fs.smr as val
  FROM `fr_search` as fs
  ORDER BY val
  LIMIT 100
 )
 UNION
 (SELECT DISTINCT fs.dmr as val
  FROM `fr_search` as fs
  ORDER BY val
  LIMIT 100
 )
 ORDER BY val
 LIMIT 100;

However, once you start putting in offset, it gets more complicated. For the next 100 rows:

 (SELECT DISTINCT fs.smr as val
  FROM `fr_search` as fs
  ORDER BY val
  LIMIT 200
 )
 UNION
 (SELECT DISTINCT fs.dmr as val
  FROM `fr_search` as fs
  ORDER BY val
  LIMIT 200
 )
 ORDER BY val
 LIMIT 100, 100;

The problem is that you don't know where the second set will come from.

If you actually need to page through the result set, I would suggest that you store it in a temporary table and page off of the temporary table.