Search code examples
google-bigqueryrows

How to create a new table that only keeps rows with more than 5 data records under the same id in Bigquery


I have a table like this:

Id Date Steps Distance
1 2016-06-01 1000 1

There are over 1000 records and 50 Ids in this table, most ids have about 20 records, and some ids only have 1, or 2 records which I think are useless. I want to create a table that excludes those ids with less than 5 records. I wrote this code to find the ids that I want to exclude:

SELECT  
  Id,
  COUNT(Id) AS num_id
FROM `table` 
GROUP BY 
  Id
ORDER BY
  num_id 

Since there are only two ids I need to exclude, I use WHERE clause:

CREATE TABLE `` AS
SELECT  
  *
FROM ``
WHERE
  Id <> 2320127002
  AND Id <> 7007744171

Although I can get the result I want, I think there are better ways to solve this kind of problem. For example, if there are over 20 ids with less than 5 records in this table, what shall I do? Thank you.


Solution

  • Consider this:

    CREATE TABLE `filtered_table` AS
    SELECT *
      FROM `table` 
     WHERE TRUE QUALIFY COUNT(*) OVER (PARTITION BY Id) >= 5
    

    Note: You can remove WHERE TRUE if it runs successfully without it.