I'm very new to SQL, thank you for bearing with me.
I'm using PostgreSQL, trying to produce a simple summary statistic on the percentage of home team or away team victories in a dataset of soccer penalty shootouts. I've tried doing this with a few common table expressions, which I then try to perform some operations on (divide one from another) like so:
WITH home_team_wins (match) AS (
SELECT match
FROM penalties
WHERE neutral_stadium = 0
AND attacker_home = 1
AND match_winner = 1
GROUP BY match
),
away_team_wins (match) AS (
SELECT match
FROM penalties
WHERE neutral_stadium = 0
AND attacker_home = 0
AND match_winner = 1
),
num_home_wins (match) AS (
SELECT COUNT(DISTINCT match)
FROM home_team_wins
),
num_away_wins (match) AS (
SELECT COUNT(DISTINCT match)
FROM away_team_wins
),
total_matches (match_count) AS (
SELECT COUNT(DISTINCT match)
FROM penalties
WHERE neutral_stadium = 0
)
SELECT num_home_wins.match / total_matches.match_count, num_away_wins.match / total_matches.match_count
FROM num_home_wins, num_away_wins, total_matches;
I'm confused by the results, and don't fully understand what's happening beneath the hood. I expect to see a couple decimal values (in this case, 0.485 and 0.515), but instead both results come out to zero:
If I don't operate on the CTE variables and simply select them, I see the expected numbers:
SELECT num_home_wins.match, num_away_wins.match, total_matches.match_count
FROM num_home_wins, num_away_wins, total_matches;
Obviously, dividing these values shouldn't equal zero. Why does operating on them produce this weird result?
I won't be surprised if there's a more efficient way to do this, which will also be helpful to know, but he main question above still stands. Thank you!
Try something like this to make your query more compact :
SELECT COUNT(DISTINCT match) FILTER (WHERE attacker_home = 1 AND match_winner = 1) :: numeric / COUNT(DISTINCT match)
, COUNT(DISTINCT match) FILTER (WHERE attacker_home = 0 AND match_winner = 1) :: numeric / COUNT(DISTINCT match)
FROM penalties
WHERE neutral_stadium = 0
and as stated by @klin, you must cast to numeric to avoid zero as the result of an integer division.