I have annual earnings (loneink
) for 2007 for each individial ID (personlopnr
), and I want to calculate for each company ID (peorglopnr
) how much of the total wage bill was paid to worker born after 1980 (birth year is fodelsear
), in aggregate. However, the code below produces a share
column which is 0 for a vast majority of the cases and 1 for the rest. (To be clear, the code with WHERE loneink > 0
as below produces only 1s — the zero comeback without that condition and having a NULLIF
to make sure I never divide by zero.) While there are many firms with no young workers, it is clearly not the case that all the other firms are young-only.
What is wrong here? This wasn't the way to generate a "young-wage" variable where for older workers earnings are zero, so the sum is only for the young? Or in theory this is OK, but I got the CASE WHEN
wrong? Or the SUM
/SUM
misbehaves with GROUP BY
?
What is a better way to do this?
CREATE VIEW sys.over26_2007 (personlopnr,peorglopnr,loneink,below26_loneink) AS (
SELECT personlopnr,peorglopnr,loneink, CASE WHEN fodelsear < 1981 THEN 0 ELSE loneink END AS below26_loneink
FROM sys.ds_chocker_lev_lisaindivid_2007 WHERE loneink > 0
);
SELECT COUNT(*) FROM over26_2007;
CREATE VIEW sys.share26_2007 (peorglopnr,share26_2007) AS (
SELECT peorglopnr, SUM(below26_loneink)/SUM(loneink)
FROM sys.over26_2007
WHERE loneink > 0
GROUP BY peorglopnr
);
My actual use case is in MonetDB, so hopefully we can stick to SQL:2003 solutions only, no mySQL or Oracle extensions.
First of all, when you create the view, you need to use 0.0 in your case statement. This will make sure that the column in the view is created using a correct data type (double in your case):
CREATE VIEW sys.over26_2007 (personlopnr,peorglopnr,loneink,below26_loneink) AS (
SELECT personlopnr,peorglopnr,loneink,
CASE WHEN fodelsear < 1981 THEN 0.0 ELSE loneink END AS below26_loneink
FROM sys.ds_chocker_lev_lisaindivid_2007 WHERE loneink > 0.0
);
Next, in your other view, CAST the sum to double as well:
CREATE VIEW sys.share26_2007 (peorglopnr,share26_2007) AS (
SELECT peorglopnr, CAST(SUM(below26_loneink) AS double) / CAST(SUM(loneink) AS double)
FROM sys.over26_2007
WHERE loneink > 0
GROUP BY peorglopnr
);