Delete duplicate rows from a BigQuery table

I have a table with >1M rows of data and 20+ columns.

Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).

If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.

I am not proficient in SQL or any other programming language so please excuse my ignorance.

delete from Accidents.CleanedFilledCombined 
where Fixed_Accident_Index 
in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined 
group by Fixed_Accident_Index 
having count(Fixed_Accident_Index) >1);


  • You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).

    A query that should work is here:

    SELECT *
    FROM (
              OVER (PARTITION BY Fixed_Accident_Index)
      FROM Accidents.CleanedFilledCombined
    WHERE row_number = 1