This is a sample of my database:
I'm trying to write a query that returns the following:
worker_id,
avg salary for 2020 (basic_salary+bonus),
basic_salary for 2021,
ever been in level C (yes/no),
average basic_salary \>6000 (yes/no),
how many salaries the employee had until today.
here's a query i wrote:
// select worker_id, AVG(CASE WHEN year(month)=2020
THEN BASIC_SALARY+bonus
ELSE NULL END) AS avg_2020,
avg(case when year(month)=2021
then BASIC_SALARY
else null end) as avg_2021,
case when t1.level='C' then 'YES'
else 'NO'
end as 'Level C',
CASE WHEN AVG(BASIC_SALARY) >6000
THEN 'yes'
ELSE 'no'
END AS 'above_6000',
count(BASIC_SALARY) 'HowManySalaries'
from [workers_table] t1
join data t2
on t1.level=t2.level
group by worker_id,t1.level
order by WORKER_ID
//
and the results:
worker_id avg_2020 avg_2021 Level C above_6000 HowManySalaries
2011-11-11 NULL 3500 NO no 4
2011-11-11 NULL 5000 NO no 2
2011-11-11 NULL 4000 YES yes 3
2011-11-12 8666 4500 NO yes 4
2011-11-12 9666 NULL NO yes 3
2011-11-12 7000 6000 YES no 2
2011-11-12 9000 NULL NO no 1
2011-11-13 NULL 10000 NO yes 3
2011-11-13 9000 4500 NO no 3
2011-11-13 10000 18000 YES yes 5
2011-11-13 12500 6750 NO yes 5
2011-11-14 NULL 3600 NO no 1
2011-11-14 NULL 7200 NO yes 2
2011-11-14 NULL 4800 YES no 1
maybe there is a better way to write the query? i need to get 1 line for each worker_id.
Here's my solution. I just briefly checked through the result/values, so it may not be 100% correct, but seems correct. Double check for correctness before using this. :)
select
worker_id,
AVG(CASE WHEN year(w.month)=2020 THEN w.BASIC_SALARY+b.bonus ELSE NULL END) AS avg_2020,
AVG(CASE WHEN year(w.month)=2021 THEN w.BASIC_SALARY ELSE NULL END) AS avg_2021,
CASE WHEN MAX(CASE WHEN w.level='C' THEN 1 ELSE 0 END)=1 THEN 'YES' ELSE 'NO' END AS 'Level C',
CASE WHEN AVG(w.BASIC_SALARY) > 6000 THEN 'yes' ELSE 'no' END AS 'above_6000',
COUNT(w.BASIC_SALARY) 'HowManySalaries'
FROM workers_table w
JOIN data b on b.level=w.level
GROUP BY w.worker_id
In your query you're grouping by both worker id and the bonus level. So your rows end up with all possible combinations between those two values. Limiting it to just the worker_id results in one row per worker.
Though this will cause your "in level C" column to not work properly since a basic CASE WHEN
isn't paying attention to all rows given for a specific worker_id. Expanding the CASE WHEN
with a MAX
-aggregation will fix this though.