Search code examples
sqlpostgresqlquery-optimization

PostgreSQL Update Statement Performance


I have a table with these columns:

id (int) 
col1 (int) 
col2 (varchar) 
date1 (date) 
col3 (int) 
cumulative_col3 (int) 

and about 750k rows.

I want to update the cumulative_col3 with the sum of col3 of same col1, col2 and previous to date of date1.

I have indexes on (date1), (date1, col1, col2) and (col1, col2).

I have tried the following query but it takes a long time to complete.

update table_name
set cumulative_col3 = (select sum(s.col3)
                       from table_name s
                       where s.date1 <= table_name.date1
                         and s.col1 = table_name.col1
                         and s.col2 = table_name.col2);

What can I do to improve the performance of this query?


Solution

  • You can try to calculate the running sum in a derived table instead:

    update table_name
      set cumulative_col3 = t.cum_sum
    from (
      select id,
             sum(s.col3) over (partition by col1, col2 order by date1) as cum_sum
      from table_name
    ) s  
    where s.id = table_name.id;
    

    This assumes that id is the primary key of the table.