I have this table in SQL:
CREATE TABLE books_returned
(
name VARCHAR(50),
book_id VARCHAR(50),
year_book_returned INT
);
INSERT INTO books_returned (name, book_id, year_book_returned)
VALUES
('john', 'julius ceasar', 2010),
('john', 'julius caesar', 2010),
('john', 'hamlet', 2010),
('john', 'hamlet', 2010),
('john', 'othello', 2009),
('john', 'othello', 2009),
('kevin', 'macbeth', 2015),
('kevin', 'tempest', 2020),
('david', 'romeojuliet', 2010),
('david', 'romeojuliet', 2010),
('david', 'romeojuliet', 2010),
('david', 'king lear', 2005);
name book_id year_book_returned
------------------------------------------------
john julius ceasar 2010
john julius caesar 2010
john hamlet 2010
john hamlet 2010
john othello 2009
john othello 2009
kevin macbeth 2015
kevin tempest 2020
david romeojuliet 2010
david romeojuliet 2010
david romeojuliet 2010
david king lear 2005
For each name, I want to keep all columns and all rows for the book with the largest year.
Since there are ties, I do not care which set of duplicates are picked. Both of the following options are OK for me:
Output #1: acceptable
name book_id year_book_returned
------------------------------------------
john hamlet 2010
john hamlet 2010
evin tempest 2020
david romeojuliet 2010
david romeojuliet 2010
david romeojuliet 2010
Output #2: acceptable
name book_id year_book_returned
--------------------------------------------------
john julius ceasar 2010
john julius caesar 2010
kevin tempest 2020
david romeojuliet 2010
david romeojuliet 2010
david romeojuliet 2010
I tried this query that first finds the max year in a subquery and joins it back to the original table:
SELECT br.*
FROM books_returned br
JOIN (
SELECT name, MAX(year_book_returned) as max_year
FROM books_returned
GROUP BY name
) as subquery
ON br.name = subquery.name AND br.year_book_returned = subquery.max_year;
But this is incorrect: this shows all books for John :
name book_id year_book_returned
john julius ceasar 2010
john julius caesar 2010
john hamlet 2010
john hamlet 2010
kevin tempest 2020
david romeojuliet 2010
david romeojuliet 2010
david romeojuliet 2010
Can someone please show me how to do this correctly? I was thinking of using the partition row_number order by random()
statements to create a rank variable and then select the lowest rank?
One option is to use Max() Over() analytic function for year (used as join condition in ON clause) and Row_Number() Over() analytic functions twice (per name and book_id both ordered by year descending) in a subquery and then filter the result using those row numbers.
-- S Q L :
SELECT br.name, br.book_id, br.year_book_returned
FROM books_returned br
INNER JOIN ( Select name, book_id,
Max(year_book_returned) Over(Partition By name, book_id) as max_yr,
Row_Number() Over(Partition By name
Order By year_book_returned desc) as rn,
Row_Number() Over(Partition By book_id
Order By year_book_returned desc) as rn_book_id
From books_returned
) yr ON( yr.name = br.name And
yr.book_id = br.book_id and
yr.max_yr = br.year_book_returned)
WHERE yr.rn = 1 and yr.rn_book_id = 1;
/* R e s u l t :
NAME BOOK_ID YEAR_BOOK_RETURNED
------- -------------- ------------------
john hamlet 2010
john hamlet 2010
kevin tempest 2020
david romeojuliet 2010
david romeojuliet 2010
david romeojuliet 2010 */