I have 5 variables:
Academicyear,
dept,
InstrGID
curenr
credit_hours
I need the mean of credit hours grouped by Academic year, dept, and InstrGID. The mean must exclude the zeros when "curenr" equals 0. However, the zero values should remain in the newly created dataset. The below script runs the average excluding the zeros, but the created dataset now deletes the 0 values.
How I can create the mean variable without deleting the 0 values from the dataset?
Dataset example:
Academicyear | Dept | INSTRGID | curenr | Credit_hours | mean_credit_hours |
---|---|---|---|---|---|
2022 | CS | G11111111 | 10 | 20 | 25 |
2022 | CS | G11111111 | 10 | 30 | 25 |
2023 | GDA | G11111111 | 20 | 10 | 50 |
2023 | GDA | G11111111 | 20 | 90 | 50 |
2023 | GDA | G22222222 | 0 | 0 | - |
2023 | GDA | G22222222 | 20 | 30 | 35 |
2023 | GDA | G22222222 | 20 | 40 | 35 |
My current script:
proc sql;
create table avg_cred_hrs as
select *,
mean(Credit_Hours) as mean_credit_hours
from work.QUERY_FOR_XLSTCOUNT
where CURENR >0
group by AcademicYear, dept , INSTRGID;
quit;
Use a CASE
statement that returns missing when CURENR
is 0, and Credit_Hours
otherwise. mean()
will exclude all missing values.
proc sql;
create table avg_cred_hrs as
select *,
mean(CASE(curenr)
when(0) then .
else Credit_Hours
END
) as mean_credit_hours
from work.QUERY_FOR_XLSTCOUNT
group by AcademicYear, dept, INSTRGID
;
quit;
Academicyear Dept INSTRGID curenr Credit_hours mean_credit_hours
2022 CS G1111111 10 30 25
2022 CS G1111111 10 20 25
2023 GDA G1111111 20 10 50
2023 GDA G1111111 20 90 50
2023 GDA G2222222 20 40 35
2023 GDA G2222222 20 30 35
2023 GDA G2222222 0 0 35