I am using MS SQL, and I have a table like this:
dogid incident_id incident_flags
1 1 a
1 1 c
1 1 d
1 20 b
1 20 a
2 12 NA
2 14 a
2 14 b
I would like to find out the average number of incident_flags per incident_id for each dogid. So for instance, I would like this output to look like:
dogid av_flags
1 2.5
2 1
These were found by:
For dogid 1, we have an incidentid with 3 flags, and an incidentid with 2 flags. Av(3, 2) = 2.5
For dogid 2, we have an incidentid with 0 flags (NA should be counted as 0, it will never occur with another incident_flag on the same incidentid), and an incident with 2 flags. Av(0, 2) = 1
Incident_id is unique to each dogid (you will never get, say, incident_id 1 under both dogid 1 and another dogid). Incident_flags will not be repeated for one incident (you can't have "a" twice under incident_id 1) but can be repeated for other incidents eg. can get incident_flag "a" for incident 1 and incident 20.
How would I go about this?
Using aggregate functions:
SELECT
dogid,
av_flags = SUM(CASE WHEN incident_flags <> 'NA' THEN 1 ELSE 0 END)/
(COUNT(DISTINCT incident_id) * 1.0)
FROM tbl
GROUP BY dogid