Search code examples
sql-servergroup-bysql-order-bymaster-detailsql-limit

Selecting the first N rows of each group ordered by date


I'm trying to list the first N rows (100 first) ordered by DateTime grouping like Master Detail.

USE [Test]
Create Table [dbo].[Masters] (
    [MasterId] [nchar](36) NOT NULL PRIMARY KEY,
    [Tags] [nchar](100) NULL,
    [Numbers] [int] NOT NULL
);

Create Table [dbo].[Details] (
    [DetailId] [nchar](36) NOT NULL PRIMARY KEY,
    [MasterId] [nchar](36) FOREIGN KEY REFERENCES Masters(MasterId),
    [Date_Time] [datetime2](7) NOT NULL,
    [Value] [int] NOT NULL
);


INSERT INTO Masters (MasterId, Tags, Numbers) VALUES ('M0', 'Tag0,Tag1', 6);
INSERT INTO Masters (MasterId, Tags, Numbers) VALUES ('M1', 'Tag1,Tag2', 5);
INSERT INTO Masters (MasterId, Tags, Numbers) VALUES ('M2', 'Tag0,Tag2', 6);

INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D0', 'M0', '20190101 00:00:00 AM', 0);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D1', 'M0', '20200101 11:00:00 AM', 1);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D2', 'M0', '20200701 01:00:00 AM', 2);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D3', 'M0', '20210715 10:00:00 AM', 3);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D4', 'M0', '20210715 11:00:00 AM', 4);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D5', 'M0', '20210715 11:00:00 AM', 5);

INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D10', 'M1', '20190101 00:00:00 AM', 6);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D11', 'M1', '20200101 01:00:00 AM', 7);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D12', 'M1', '20200701 09:00:00 AM', 8);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D13', 'M1', '20210101 10:00:00 AM', 9);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D14', 'M1', '20210701 10:00:00 AM', 10);

INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D20', 'M2', '20190101 00:00:00 AM', 11);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D21', 'M2', '20190101 01:30:00 AM', 12);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D22', 'M2', '20200101 01:30:00 AM', 13);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D23', 'M2', '20200701 08:30:00 AM', 14);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D24', 'M2', '20210101 01:30:00 AM', 15);
INSERT INTO Details (DetailId, MasterId, Date_Time, Value) VALUES ('D25', 'M2', '20210701 01:30:00 AM', 16);

Select * from Masters;
Select * from Details;
--

enter image description here

Now my partial query:

SELECT m.MasterId, d.DetailId, m.Numbers, d.Date_Time, d.Value from Details AS d
INNER JOIN Masters AS m ON m.MasterId = d.MasterId
WHERE 
m.Tags LIKE '%Tag2%' AND 
d.Date_Time >= Convert(datetime, '2020-01-01' ) 
ORDER BY m.MasterId DESC, d.Date_Time;

But, How introduce Top 3 (really maybe 50 or 100 in the real situation) for my query in this example? I would like to obtain only 3 rows per MasterId.

enter image description here

According to the image we will get only six rows. Please help me to fix my query.


Solution

  • As well as the row_number solution, another option is CROSS APPLY(SELECT TOP:

    SELECT m.masterid,
           d.detailid,
           m.numbers,
           d.date_time,
           d.value
        FROM masters AS m
        CROSS APPLY (
            SELECT TOP (3) *
            FROM details AS d
            WHERE d.date_time >= '2020-01-01'
            AND m.masterid = d.masterid
        ) AS d
        WHERE m.tags LIKE '%Tag2%'
        ORDER BY m.masterid DESC,
                 d.date_time;
    

    This may be faster or slower than row_number, mostly depending on cardinalities (quantity of rows) and indexing.

    If indexing is good and it's a small number of rows it will usually be faster. If the inner table needs sorting or you are anyway selecting most rows then use row_number.