Search code examples
sqlcountsasdatasetfrequency

How to get observation frequency counts from multiple dataset into one table?


I have a bunch of large datasets.

DS_1 (contains all unique IDs and names):

ID   Name
1    Apple
2    Banana
3    Cherry

DS_2:

ID   Observation
1    Apple detail
1    Apple detail
1    Apple detail
2    Banana detail
2    Banana detail
3    Cherry detail
3    Cherry detail
3    Cherry detail

DS_3:

ID   Observation
2    Banana detail
2    Banana detail
3    Cherry detail

I'm looking to create a new dataset that shows frequency counts across the datasets (and lastly calculates Total_Obs). I would output something like this:

ID   Name      DS_2    DS_3   Total_Obs
1    Apple     3       0      3
2    Banana    2       2      4
3    Cherry    3       1      4

The datasets are fairly large. Is there a more efficient way to do this apart from concatenating the datasets and doing a frequency table? Or having to creating a bunch of sorted frequency tables, then merging by ID across all the datasets?


Solution

  • You can do below -

    Select t1.id As id
          ,t1.name As name
          ,coalesce(DS_2_obs,0) as DS_2_obs
          ,coalesce(DS_3_obs,0) as DS_3_obs
          ,coalesce(DS_2_obs,0) + coalesce(DS_3_obs,0) As Total_Obs
    from DS_1 t1
    left join (Select id, count(1) as DS_2_obs from DS_2 group by id) t2
    on t1.id = t2.id
    left join (Select id, count(1) as DS_3_obs from DS_3 group by id) t3
    on t1.id = t3.id;
    

    Also, you should always tag which Database you are using.

    If the above SQL takes a lot of time, instead of t2 and t3 as inline queries you can consider creating aggregate observation tables with frequency/counts and having an index on id. That way when you join the observation aggregates with the primary table, the join can be faster based on indexes.