Search code examples
sqldataexplorer

SQL for bucketing counts


I'm trying to create a histogram of people's reputations who ask questions on the site using StackExchange data explorer.

The following gives error message:

Each GROUP BY expression must contain at least one column that 
is not an outer reference.
Invalid column name 'lt_100'. ...

Suggestions appreciated

select
  case when Reputation < 100    then "lt_100"
       when Reputation >= 100 and Reputation < 200   then "100_199"
       when Reputation >= 200 and Reputation < 300   then "200_299"
       when Reputation >= 300 and Reputation < 400   then "300_399"
       when Reputation >= 400 and Reputation < 500   then "400_499"
       when Reputation >= 500 and Reputation < 600   then "500_599"
       when Reputation >= 600 and Reputation < 700   then "600_699"
       when Reputation >= 700 and Reputation < 800   then "700_799"
       when Reputation >= 800 and Reputation < 900   then "800_899"
       when Reputation >= 900 and Reputation < 1000  then "900_999"
       else "over 1000"
  end  ReputationRange,
  count(*) as TotalWithinRange
FROM Users
JOIN Posts ON Users.Id = Posts.OwnerUserId 
JOIN PostTags ON PostTags.PostId = Posts.Id
JOIN Tags on Tags.Id = PostTags.TagId
WHERE PostTypeId = 1 and Posts.CreationDate > '9/1/2010'
Group by 
1

Solution

  • You should use single-quotes for the ranges being classified. If you use " " it would be treated as a column name. Also you should include the calculation in the group by clause.

    Demo

    select
      case when Reputation < 100    then 'lt_100'
           when Reputation >= 100 and Reputation < 200   then '100_199'
           when Reputation >= 200 and Reputation < 300   then '200_299'
           when Reputation >= 300 and Reputation < 400   then '300_399'
           when Reputation >= 400 and Reputation < 500   then '400_499'
           when Reputation >= 500 and Reputation < 600   then '500_599'
           when Reputation >= 600 and Reputation < 700   then '600_699'
           when Reputation >= 700 and Reputation < 800   then '700_799'
           when Reputation >= 800 and Reputation < 900   then '800_899'
           when Reputation >= 900 and Reputation < 1000  then '900_999'
           else 'over 1000'
      end ReputationRange,
      count(*) as TotalWithinRange
    FROM Users
    JOIN Posts ON Users.Id = Posts.OwnerUserId 
    JOIN PostTags ON PostTags.PostId = Posts.Id
    JOIN Tags on Tags.Id = PostTags.TagId
    WHERE PostTypeId = 1 and Posts.CreationDate > '9/1/2010'
    Group by 
    case when Reputation < 100    then 'lt_100'
           when Reputation >= 100 and Reputation < 200   then '100_199'
           when Reputation >= 200 and Reputation < 300   then '200_299'
           when Reputation >= 300 and Reputation < 400   then '300_399'
           when Reputation >= 400 and Reputation < 500   then '400_499'
           when Reputation >= 500 and Reputation < 600   then '500_599'
           when Reputation >= 600 and Reputation < 700   then '600_699'
           when Reputation >= 700 and Reputation < 800   then '700_799'
           when Reputation >= 800 and Reputation < 900   then '800_899'
           when Reputation >= 900 and Reputation < 1000  then '900_999'
           else 'over 1000'
    end