Search code examples
sqlgroup-byaverageamazon-athenasql-null

How to filter rows with condition in SQL?


I am trying to retrieve data from Athena with the following query:

SELECT DISTINCT cop.shop_id,
         cop.product_id,
         avg(cop.position) AS avg_position,
         cp.kes
FROM data_1 AS cop
JOIN data_2 AS cp
    ON cop.product_id = cp.product_id
WHERE cop.site_id = 1
        AND cop.product_id IS NOT NULL
GROUP BY  cop.shop_id, cop.product_id, cp.kes 

However, there are four columns in the data: product_id, shop_id, avg_position, kes.

Some rows come with both NA and non-NA in the kes column. I simply want to manipulate the data with the following condition:

  • If product_id , shop_id , and avg_position are the same, and if there are NA and Non-NA in the kes column, just leave the Non-NA row and remove the rows which include NA in the kes .
  • However, if there is no non-NA in the kes , don't remove it.

How can I do this?


Solution

  • I think that you want aggregation on kes:

    SELECT 
        cop.shop_id,
        cop.product_id,
        AVG(cop.position) AS avg_position,
        MAX(cp.kes) kes
    FROM data_1 AS cop
    JOIN data_2 AS cp
        ON cop.product_id = cp.product_id
    WHERE cop.site_id = 1 AND cop.product_id IS NOT NULL
    GROUP BY cop.shop_id, cop.product_id 
    

    Aggregate functions ignore null values; so MAX(cp.kes) gives you the highest non-null value of cp.kes. On the other hand, if all values of cp.kes are null within the group, max() gives null.

    Side note: DISTINCT and GROUP BY do not make sense together (although that's still valid SQL); GROUP BY guarantees no duplicate values in the SELECT list already.