Search code examples
sqlhive

SQL: Distinct and Distinct count(*) With Hive Tables


I am getting two very different numbers for these seemingly similar queries on (hive) tables:

select count(*) from test
# result: 2609173
select distinct count(*) from test
# result: 2609173
insert into testToo
select distinct * from test
# result: inserted 673065 rows

Any recommendations on how I might be able to discern what is going on? Am I using distinct somehow differently in the first few queries?


Solution

  • You want select count(distinct *) from test rather than select distinct count(*) from test

    The former means "select the count of distinct rows" and the latter means "select the distinct values of count(*)" (and there's only one value so it is semantically the same as select count(*) from test)