Search code examples
sqlpostgresqlinner-join

Joining same table based on whether employees were at location or not


I have a table as follows:

 year_id | loc_id | emp_id | freq 
 ---------------------------------
  1990   |   LA   | fred_01| 55   
  1990   |   LA   | mury_01| 34
  1990   |   BA   | fred_01| 20
  1990   |   CA   | mury_01| 40
  1990   |   MO   | fred_01| 100
  1990   |   PR   | bill_01| 500

I want everything grouped by the year and loc, but the tricky part is I want to get the sum of all "freq" at each location, then also get the sum of all the employees "freq" that worked at that location, when they're at another location (but not including original loc amount).

Desired results of first two rows:

 year_id | loc_id | sum(freq) loc | sum(freq)away 
 ------------------------------------------------
 1990       LA    |   89          |  160
 1990       BA    |   20          |  155 

I can do one at a time with an INNER JOIN, but when i try to get both sums, everything gets messed up:

select t1.loc_id , t1.year_id,sum(t2.freq),sum(t3.freq) from emp_freq t1
inner join emp_freq t2 
  on t1.year_id=t2.year_id and t1.emp_id=t2.emp_id and t1.loc_id=t2.loc_id 

inner join emp_freq t3 
on t1.year_id=t3.year_id and t1.emp_id=t3.emp_id and t1.loc_id<>t3.loc_id

group by t1.loc_id,t1.year_id
order by t1.loc_id,t1.year_id 

If i comment out the second inner join, the result for the grouped by loc is correct.


Solution

  • Here is another solution, that uses two levels of aggregation.

    First, aggregate records by year, location and employee, with an inline correlated subquery that compute the time spent by the employee this year at another location.

    SELECT 
        year_id, 
        loc_id, 
        emp_id, 
        SUM(freq) freq,
        (SELECT SUM(freq) FROM emp_freq t1 WHERE t1.year_id = t1.year_id AND t1.emp_id = t.emp_id AND t1.loc_id <> t.loc_id) other_freq
    FROM emp_freq t
    GROUP BY year_id, emp_id, loc_id
    

    Then, we just need to group once again the resultset, this time without the employee in the non-aggregated columns, to compute the totals:

    SELECT year_id, loc_id, SUM(freq) freq, SUM(other_freq) other_freq
    FROM (
        SELECT 
            year_id, 
            loc_id, 
            emp_id, 
            SUM(freq) freq,
            (SELECT SUM(freq) FROM emp_freq t1 WHERE t1.year_id = t1.year_id AND t1.emp_id = t.emp_id AND t1.loc_id <> t.loc_id) other_freq
        FROM emp_freq t
        GROUP BY year_id, emp_id, loc_id
    ) x GROUP BY year_id, loc_id
    

    This demo on DB Fiddle with your sample data returns :

    | year_id | loc_id | freq | other_freq |
    | ------- | ------ | ---- | ---------- |
    | 1990    | LA     | 89   | 160        |
    | 1990    | BA     | 20   | 155        |
    | 1990    | MO     | 100  | 75         |
    | 1990    | CA     | 40   | 34         |
    | 1990    | PR     | 500  |            |