Search code examples
sqlpostgresqlpgadmin-4

Count(*) returns different results at each execution


Please note: this question is about PostgreSQL and pgAdmin4, so it is not a duplicate of Using COUNT returns different results, which is about MySQL.


I have a ~4,000,000 records table. The following SQL query:

SELECT COUNT(*) FROM (SELECT * FROM log LIMIT 40000)a WHERE some_column = true

returns a different value each time I execute it. What is the problem and how could I fix it?


Solution

  • This is your query:

    SELECT COUNT(*)
    FROM (SELECT * FROM log LIMIT 40000)a
    WHERE some_column = true
    

    The subquery is returning an arbitrary set of 40,000 rows. This set is different each time you execute the query. If you want a canonical set, you need ORDER BY with unique sorting keys.

    You are filtering afterwards, so the number that match your condition is arbitrary.

    If you want an arbitrary set of 40,000 rows where the condition is true, simply do:

    SELECT COUNT(*)
    FROM (SELECT l.*
          FROM log
          WHERE some_column = true
          LIMIT 40000
         ) ;
    

    If you want a random set of 40,000 rows, you can do:

    SELECT COUNT(*)
    FROM (SELECT l.*
          FROM log
          WHERE some_column = true
          ORDER BY random() 
          LIMIT 40000
         ) ;
    

    Arbitrary <> random. (Note: this is expensive; there are cheaper ways to get a random set.).

    If you want a repeatable sample, well, there are various methods to do that. One is to sort by something like a unique id and take the first 40,000 rows:

    SELECT COUNT(*)
    FROM (SELECT l.*
          FROM log l
          WHERE some_column = true
          ORDER BY primary_key
          LIMIT 40000
         ) ;
    

    This is not a random sample, just a repeatable sample.