sql group-by average amazon-athena sql-null

How to filter rows with condition in SQL?

I am trying to retrieve data from Athena with the following query:

SELECT DISTINCT cop.shop_id,
         cop.product_id,
         avg(cop.position) AS avg_position,
         cp.kes
FROM data_1 AS cop
JOIN data_2 AS cp
    ON cop.product_id = cp.product_id
WHERE cop.site_id = 1
        AND cop.product_id IS NOT NULL
GROUP BY  cop.shop_id, cop.product_id, cp.kes

However, there are four columns in the data: product_id, shop_id, avg_position, kes.

Some rows come with both NA and non-NA in the kes column. I simply want to manipulate the data with the following condition:

If product_id , shop_id , and avg_position are the same, and if there are NA and Non-NA in the kes column, just leave the Non-NA row and remove the rows which include NA in the kes .
However, if there is no non-NA in the kes , don't remove it.

How can I do this?

Solution

I think that you want aggregation on kes:

SELECT 
    cop.shop_id,
    cop.product_id,
    AVG(cop.position) AS avg_position,
    MAX(cp.kes) kes
FROM data_1 AS cop
JOIN data_2 AS cp
    ON cop.product_id = cp.product_id
WHERE cop.site_id = 1 AND cop.product_id IS NOT NULL
GROUP BY cop.shop_id, cop.product_id

Aggregate functions ignore null values; so MAX(cp.kes) gives you the highest non-null value of cp.kes. On the other hand, if all values of cp.kes are null within the group, max() gives null.

Side note: DISTINCT and GROUP BY do not make sense together (although that's still valid SQL); GROUP BY guarantees no duplicate values in the SELECT list already.