Search code examples
mysqlsqlcountsubstringimpala

Impala Query: Combine multiple COUNT DISTINCT WHERE clauses


On impala, I'm trying to count the number of distinct sample id's that contain "101", "102" or "103".

Here is an example of my data:

| sample_id | 
 ___________
| 101-345-5 | 
| 101-345-6 | 
| 101-345-6 | 
| 102-345-5 | 
| 103-345-5 | 
| 103-345-8 | 
| 103-345-8 | 

I'd like to get a count of how many distinct sample id's are in each study group:

| Study | Count | 
 _______________
| 101   |   2   | 
| 102   |   1   | 
| 103   |   2   | 

I can easily create separate queries to find the number in each group:

SELECT COUNT(DISTINCT ill.sample_id) as 101_count
FROM illumina_data ill
WHERE SUBSTRING(ill.sample_id, 1,3) = "101" 

But I'd really like to figure out how to use a single query to find a count for 101, 102 and 103.


Solution

  • A group by will do the trick http://sqlfiddle.com/#!9/1d75f/6

    SELECT SUBSTRING(sample, 1,3) , COUNT(DISTINCT sample)
    FROM samples 
    group by SUBSTRING(sample, 1,3);