Search code examples
sqloracle-databasegroup-byinner-query

limit the result of a query based on group by


I'm doing a query on oracle to update a token in a table to do some work on the marked rows.

My problem is that I want to limit the number of rows updated each time to a specific batch size and also to a group by result.

It is kinda hard for me to explain in abstract terms so I'll materialize it.

Say that I have a table called 'staging' with the following fields: (Id, IdFile Grouping, Name, Address1, Address2, Address3, Country, Token)

What I'm trying to do is create a query to do batches with this info.

For each batch, I update the token so that those records are marked to work later on. I want a batch by file, and each batch has a size limit, so i'm using the rownum to limit it. Another limitation I have is grouping by batch.

Let's say that my batch size is 5, and If I have 3 and only 3 rows with grouping column filled I have to create a batch with only that 3 rows.

This is my biggest pet peeve as when I'm doing the group by, I get as a result in just one batch results with and without grouping. (If I had a grouping, I had to see if the destiny address is the same for all rows and create a batch with it.)

Can you help me understand how I have to create this query?

What I'm using right now is:

update schema.staging set
token = 'token4'
where id in ( select t.id
               from (
    select stage.id, stage.grouping 
    from (select idFile 
            from (select a.idFile
                  from schema.staging a
                  inner join schema.config c on c.id = a.idcfgpriority
                  where nvl2(a.token,0,1) = 1
                  group by a.idFile, c.order
                  order by c.order desc) files 
            where rownum = 1 ) priorityFile
          inner join schema.staging stage on stage.idFile = priorityFile.idFile
        where nvl2(stage.token,0,1) = 1 
        group by stage.idFile, stage.token, stage.id, grouping sets ( (stage.grouping),
                                                                          (stage.name, stage.Address1,stage.Address2,stage.Address3,stage.Country))
        order by stage.grouping
    ) t
where rownum <= 5 )

For the following data:

 Id IdFile  Grouping    Name    Address1    Address2    Address3    Country     Token
 ====================================================================================
 1  1       null        Name1   Address1    Address1    Address1    Country     null
 2  1       1           Name1   Address1    Address1    Address1    Country     null
 3  1       1           Name1   Address1    Address1    Address1    Country     null
 4  1       1           Name1   Address1    Address1    Address1    Country     null
 5  1       2           Name1   Address1    Address1    Address1    Country     null
 6  1       2           LALAL   XPTO        Address1    Address1    Country     null
 7  1       null        Name1   Address1    Address1    Address1    Country     null
 8  1       null        Name1   Address1    Address1    Address1    Country     null
 9  1       null        Name1   Address1    Address1    Address1    Country     null
 10 2       null        Name1   Address1    Address1    Address1    Country     null
 11 2       null        Name1   Address1    Address1    Address1    Country     null
 12 2       null        Name1   Address1    Address1    Address1    Country     null
 13 2       null        Name1   Address1    Address1    Address1    Country     null
 14 2       null        Name1   Address1    Address1    Address1    Country     null
 15 2       null        Name1   Address1    Address1    Address1    Country     null

I would be expecting 6 batches.

batch#1: 1, 7, 8, 9
batch#2: 2, 3, 4
batch#3: 5
batch#4: 6
batch#5: 10, 11, 12, 13, 14
batch#6: 15

Sooo... Does anyone have any ideas?

Cheers


Solution

  • I'm not sure that fully understand all requreiments in details, but based on your sample data and desired output following query will work:

    SELECT dense_rank() over(ORDER BY "IdFile",
                                      "Grouping" nulls FIRST,
                                      "Name" desc,
                                      "Address1",
                                      "Address2",
                                      "Address3",
                                      "Country",
                                      "Token",
                                      row_num) batch_num,
           q.*
      FROM (SELECT trunc((row_number()
                          over(partition BY "IdFile",
                               "Grouping",
                               "Name",
                               "Address1",
                               "Address2",
                               "Address3",
                               "Country",
                               "Token" ORDER BY "Id") - 1) / 5) row_num,
                   s.*
              FROM sample s) q
     ORDER BY "Id";
    

    Result:

    | BATCH_NUM | ROW_NUM | ID | IDFILE | GROUPING |  NAME | ADDRESS1 | ADDRESS2 | ADDRESS3 | COUNTRY |  TOKEN |
    |-----------|---------|----|--------|----------|-------|----------|----------|----------|---------|--------|
    |         1 |       0 |  1 |      1 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         2 |       0 |  2 |      1 |        1 | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         2 |       0 |  3 |      1 |        1 | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         2 |       0 |  4 |      1 |        1 | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         3 |       0 |  5 |      1 |        2 | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         4 |       0 |  6 |      1 |        2 | LALAL |     XPTO | Address1 | Address1 | Country | (null) |
    |         1 |       0 |  7 |      1 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         1 |       0 |  8 |      1 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         1 |       0 |  9 |      1 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         5 |       0 | 10 |      2 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         5 |       0 | 11 |      2 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         5 |       0 | 12 |      2 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         5 |       0 | 13 |      2 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         5 |       0 | 14 |      2 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    |         6 |       1 | 15 |      2 |   (null) | Name1 | Address1 | Address1 | Address1 | Country | (null) |
    

    SQL Fiddle.

    I have played a little with ordering to fully simulate batch numbering you have provided. You can safely discard desc and nulls FIRST parts. Let me know if the query solves your problem.