Search code examples
sqldata-analysisdata-processing

SQL Query For Grouping Data Based On Several Particular Pattern?


Let's say that I have a table like this:

id |stop_duration | moving_duration | zone
1  |           20 |              10 | WAREHOUSE
2  |            5 |             100 | HAUL
3  |           15 |               5 | CHECKPOINT
4  |            3 |             130 | HAUL
5  |           30 |              10 | WAREHOUSE
6  |            6 |              95 | HAUL
7  |           15 |               5 | CHECKPOINT
8  |            8 |              90 | HAUL
9  |           25 |              10 | WAREHOUSE

So I would like to group it by a trip. A trip is consists of a WAREHOUSE row until before another WAREHOUSE row.

So for that table, a trip is consists of rows with an id of 1,2,3, and 4. Another trip is consists of rows with an id 5,6,7, and 8.

After grouping it, I would like to calculate the total and average of stop_duration and moving_duration.

Is that possible to do that using only a SQL Query? If it is not, what kind of tools that I need to use?


Solution

  • It is and there are plenty of ways, for example..

    SELECT
      *, 
      SUM(CASE WHEN zone = 'WAREHOUSE' THEN 1 ELSE 0 END)
        OVER(ORDER BY id) as tripId 
    FROM t
    

    You'll get a column with a number that increments only when WAREHOUSE is encountered. Grouping and averaging are then very easy using standard group by, after turning it into a subquery or a CTE using WITH. if you want to keep the rows of detail too, use eg AVG(stop_duration) OVER(PARTITION BY tripId)

    If you're using a database that doesn't support window functions you can achieve a similar effect by asking for the max id from a warehouse row, that is less than or equal the current row's id

    SELECT *,
      (SELECT MAX(id) FROM t sub WHERE sub.id <= main.id AND zone = 'WAREHOUSE') 
    FROM t main
    

    The column will go like 1,1,1,1,5,5,5,5 but for what you need it for it will be fine