Search code examples
sqlpostgresqlquery-optimization

How to optimize query which count last rows up to a certain condition?


I have table test: create table test

(
    id         serial
        primary key,
    status_id  integer,
    created_at timestamp with time zone default now() not null,
    client_id  integer
);

and I need to count all last rows with status_id != 3 sorted by created_at desc. Such table:

id, status_id, created_at, client_id
(1, 3, '2023-04-23 19:34:08.454000 +00:00', 2),
(2, 3, '2023-04-23 19:34:07.493000 +00:00', 2),
(3, 3, '2023-04-23 19:34:06.459000 +00:00', 2),
(4, 2, '2023-04-23 19:34:05.838000 +00:00', 2),
(5, 3, '2023-04-23 19:34:04.782000 +00:00', 2),
(6, 3, '2023-04-23 19:34:03.768000 +00:00', 2),
(7, 2, '2023-04-23 19:34:02.768000 +00:00', 2),
(8, 3, '2023-03-23 19:34:02.768000 +00:00', 1),
(9, 2, '2023-02-23 19:34:02.768000 +00:00', 1),
(10, 3, '2022-04-23 19:34:02.768000 +00:00', 1);

result for client with id 1 must be 3 and for client with id 2 will be 1:

client_id, count
    2,3
    1,1

I have this query:

select count(*), i.client_id
from public.test i
where i.created_at > (select x.created_at
                      from public.test x
                      where x.status_id != 3
                        and x.client_id = i.client_id
                      order by x.created_at desc
                      limit 1)
group by i.client_id;

its works but very slow. tried this index:

create index test_status_id_client_id_created_at_index
on public.test (status_id, client_id, created_at);

but index does not help.

Is there a way to optimize and speed up this query, or will it be necessary to make some kind of application that will export this data in a ready-made form in small time intervals?

explain:

GroupAggregate  (cost=12.80..12.85 rows=3 width=12) (actual time=0.149..0.152 rows=2 loops=1)
  Group Key: i.client_id
  ->  Sort  (cost=12.80..12.81 rows=3 width=4) (actual time=0.144..0.145 rows=4 loops=1)
        Sort Key: i.client_id
        Sort Method: quicksort  Memory: 25kB
        ->  Seq Scan on test i  (cost=0.00..12.77 rows=3 width=4) (actual time=0.054..0.091 rows=4 loops=1)
              Filter: (created_at > (SubPlan 1))
              Rows Removed by Filter: 6
              SubPlan 1
                ->  Limit  (cost=1.16..1.16 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=10)
                      ->  Sort  (cost=1.16..1.16 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=10)
                            Sort Key: x.created_at DESC
                            Sort Method: quicksort  Memory: 25kB
                            ->  Seq Scan on test x  (cost=0.00..1.15 rows=1 width=8) (actual time=0.001..0.002 rows=2 loops=10)
                                  Filter: ((status_id <> 3) AND (client_id = i.client_id))
                                  Rows Removed by Filter: 8
Planning Time: 0.826 ms
Execution Time: 0.314 ms

I use postgresql 14 version.

sqlfiddle


Solution

  • I'd replace the correlated sub-query with a regular sub-query, and join on it.

    SELECT
      last.client_id,
      COUNT(*)
    FROM
    (
      SELECT
        client_id,
        MAX(created_at)   created_at
      FROM
        public.test
      WHERE
        status_id <> 3
      GROUP BY
        client_id
    )
      AS last
    INNER JOIN
      public.test
        ON  test.client_id  = last.client_id
        AND test.created_at > last.created_at
    GROUP BY
      last.client_id;
    

    Then ensure an index on (client_id, created_at) and a partial index on (client_id, created_at) WHERE status_id <> 3