Search code examples
sqlpostgresqldatabase-performancedatabase-administration

Update using join on big table - performance tips?


Been struggling with this update, that never finishes

update votings v
set voter_id = (select pv.number from voters pv WHERE pv.person_id = v.person_id);

Table being currently has 96M records

select count(0) from votings;
  count   
----------
 96575239
(1 registro)

Update apparently is using index

explain update votings v                             
set voter_id = (select pv.number from voters pv WHERE pv.rl_person_id = v.person_id);
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Update on votings v  (cost=0.00..788637465.40 rows=91339856 width=1671)
   ->  Seq Scan on votings v  (cost=0.00..788637465.40 rows=91339856 width=1671)
         SubPlan 1
           ->  Index Scan using idx_voter_rl_person_id on voters pv  (cost=0.56..8.58 rows=1 width=9)
                 Index Cond: (rl_person_id = v.person_id)
(5 registros)

Here are the indexes I have for votings

Índices:
    "votings_pkey" PRIMARY KEY, btree (id)
    "votings_election_id_voter_id_key" UNIQUE CONSTRAINT, btree (election_id, person_id)
    "votings_external_id_external_source_key" UNIQUE CONSTRAINT, btree (external_id, external_source)
    "idx_votings_updated_at" btree (updated_at DESC)
    "idx_votings_vote_party" btree (vote_party)
    "idx_votings_vote_state_vote_party" btree (vote_state, vote_party)
    "idx_votings_voter_id" btree (person_id)
Restrições de chave estrangeira:
    "votings_election_id_fkey" FOREIGN KEY (election_id) REFERENCES elections(id)
    "votings_voter_id_fkey" FOREIGN KEY (person_id) REFERENCES people_all(id)

Guys, any ideia who plays the biggest part on the update running slowly? the number of rows or the join being used?


Solution

  • Updating all the rows in the table is going to be really expensive. I would suggest re-creating the table:

    create temp_votings as
        select v.*, vv.vote_id
        from votings v join
             voters vv
             on vv.person_id = v.person_id;
    

    For this query, you want an index on votes(person_id, vote_id). I am guessing that person_id might already be the primary key; if so, no additional index is needed.

    Then, you can replace the existing table -- but back it up first:

    truncate table votings;
    
    insert into votings ( . . . )    -- list columns here
        select . . .                 -- and the same columns here
        from temp_votings;