Presto | Athena: Duplicated values in not key column

I have the following table:

CD_CLIENTE	DE_CLIENTE
12345	Jhon One
23456	Jhon Two
34567	Jhon Three
12345	Jhon ONEe

Even using DISTINCT for CD_CLIENT I have it two times in my Result because the CD_CLIENT is equal, but DE_CLIENTE is different

My question:

select distinct taxid as CD_CLIENTE, name as DE_CLIENTE
FROM my_table_name
group by distinct taxid, name

I need this query to achive this result, without the duplicated CD_CLIENTE:

CD_CLIENTE	DE_CLIENTE
12345	Jhon One
23456	Jhon Two
34567	Jhon Three

I tryed this:

select distinct taxid as CD_CLIENTE, name as DE_CLIENTE
FROM my_table_name
group by distinct taxid, name

CD_CLIENTE	DE_CLIENTE
12345	Jhon One
23456	Jhon Two
34567	Jhon Three
12345	Jhon ONEe

And need this:

CD_CLIENTE	DE_CLIENTE
12345	Jhon One
23456	Jhon Two
34567	Jhon Three

Solution

You are grouping both by taxid, name so you will get all unique combinations of taxid and name in the output. DISTINCT is irrelevant here since it uses all columns in the select to perform comparisons for uniqueness (i.e. you can remove either group by or distinct the result will be the same).

One option is to remove name from the group by clause and use aggregate like arbitrary or max:

-- sample data
WITH dataset(taxid, name) AS (
    VALUES (12345,  'Jhon One'),
        (23456, 'Jhon Two'),
        (34567, 'Jhon Three'),
        (12345, 'Jhon ONEe')
)

-- query
select taxid, max(name)
from dataset
group by taxid;

Output:

taxid	_col1
12345	Jhon One
34567	Jhon Three
23456	Jhon Two

if you have some third column you can use to resolve "collisions" for name you can use max_by/min_by (max_by(name, some_3rd_col)).