Search code examples
sqlsqlitegroup-bycounthaving

Syntax error after adding WHERE. SQLite in R


I have a data of flights that includes Orgin, Dest and Month. I'm trying to be able to get the freq of flights from Origin to Dest in a month across the year.

I've done:

dbGetQuery(conn,"
                 SELECT
                  Month AS Month,
                  COUNT(OriDes) AS Freq,
                  OriDes 
                 FROM(
                  SELECT Origin || '-' || Dest AS OriDes,
                  Month AS Month
                  FROM flights
                  )
                 GROUP BY OriDes, Month
                 ")

by first combining Origin and Dest from database then counting the frequency of it to get output of:

   Month Freq  OriDes
1      1  123 ABE-ATL
2      2  140 ABE-ATL
3      3  117 ABE-ATL
4      4  112 ABE-ATL
5      5  111 ABE-ATL
6      6  120 ABE-ATL
7      7  113 ABE-ATL
8      8  124 ABE-ATL
9      9   94 ABE-ATL
10    10  119 ABE-ATL
11    11  116 ABE-ATL
12    12  124 ABE-ATL
13     6    1 ABE-AVP
14     1   66 ABE-CLE
15     2   72 ABE-CLE
16     3   99 ABE-CLE
17     4   89 ABE-CLE
18     5   91 ABE-CLE
19     6   97 ABE-CLE
20     7   92 ABE-CLE

but because this is a large data I hope to filter and only deal with Freq > 500. So I tried:

dbGetQuery(conn,"
                 SELECT
                  Month AS Month,
                  COUNT(OriDes) AS Freq,
                  OriDes 
                 FROM(
                  SELECT Origin || '-' || Dest AS OriDes,
                  Month AS Month
                  FROM flights
                  )
                 GROUP BY OriDes, Month
                 WHERE Freq > 500
                 ")

but I'll get an error:

Error: near "WHERE": syntax error

Can someone explain to me what I've done wrong?


Solution

  • You can aggregate directly in the table without a subquery and all you need is a HAVING clause to filter the results of the aggregation and not a WHERE clause which filters the table before the aggregation:

    SELECT Month,
           COUNT(*) AS Freq,
           Origin || '-' || Dest AS OriDes
    FROM flights
    GROUP BY Month, OriDes
    HAVING Freq > 500;
    

    Note that since you are using OriDes in the GROUP BY clause there is no point to use it inside COUNT(). You can count with COUNT(*).