Search code examples
sqlsqlitesql-updatesql-order-bywindow-functions

Subquery with order by random does not work on database with 500.000.000 rows


I want to perform an update query on a offline database with DB Browser for SQLite.

I tested my query on a few rows and its working perfectly there, but not with my database which has 500.000.000 rows +. It looks like the random subquery is not executed at all there and the first rows of the group by are taken.

The query:

UPDATE  
table
SET typ = 3 WHERE id IN (
  SELECT id FROM (
    SELECT * FROM table ORDER BY RANDOM()
  )  
  WHERE typ = 1 GROUP BY idg HAVING COUNT(idg) > 5
)

Sample data:

id |idg| typ
1  | 1 | 1
2  | 1 | 1
3  | 1 | 1
4  | 1 | 1
5  | 1 | 1
6  | 1 | 1
7  | 1 | 1
8  | 2 | 1
9  | 2 | 1
10 | 2 | 1
11 | 2 | 1
12 | 2 | 1
13 | 2 | 1
14 | 2 | 1
15 | 2 | 1

Is there any fix or workaround to execute my query successfully ?


Solution

  • If your version of SQLite is 3.33.0+, you can use the UPDATE ... FROM... syntax, so that you can join to the table a query that uses window function ROW_NUMBER() to check if a specific idg has more than 5 rows and returns a random id:

    WITH cte AS (
      SELECT *, ROW_NUMBER() OVER (PARTITION BY idg ORDER BY RANDOM()) rn
      FROM tablename
      WHERE typ = 1
    )
    UPDATE tablename AS t
    SET typ = 3 
    FROM cte AS c
    WHERE t.id = c.id AND c.rn = 6; -- rn = 6 makes sure that there are at least 6 rows
    

    See the demo.

    For SQLite 3.25.0+ use the operator IN with ROW_NUMBER() window function:

    UPDATE tablename
    SET typ = 3 
    WHERE id IN (
      SELECT id 
      FROM (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY idg ORDER BY RANDOM()) rn
        FROM tablename
        WHERE typ = 1
      )
      WHERE rn = 6 -- rn = 6 makes sure that there are at least 6 rows 
    );
    

    See the demo.