I have this table in SQL (called "myt") about library books that are borrowed by different names:
CREATE TABLE myt (
name VARCHAR(10),
date_library_book_borrowed DATE
);
INSERT INTO myt (name, date_library_book_borrowed) VALUES
('red', '2010-01-01'),
('red', '2010-05-05'),
('red', '2011-01-01'),
('blue', '2015-01-01'),
('blue', '2015-09-01'),
('green', '2020-01-01'),
('green', '2021-01-01'),
('yellow', '2012-01-01');
Based on this table, I am trying to answer the following question:
I tried to do this using LEAD and LAG functions:
WITH RankedBorrowings AS (
SELECT
name,
date_library_book_borrowed,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS borrowing_rank
FROM
myt
),
BorrowingPairs AS (
SELECT
name,
borrowing_rank AS from_rank,
LEAD(borrowing_rank) OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS to_rank,
date_library_book_borrowed AS current_borrowing_date,
LEAD(date_library_book_borrowed) OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS next_borrowing_date
FROM
RankedBorrowings
)
SELECT
from_rank,
to_rank,
AVG(next_borrowing_date - current_borrowing_date) AS avg_days_between_borrowings
FROM
BorrowingPairs
WHERE
next_borrowing_date IS NOT NULL
GROUP BY
from_rank,
to_rank
ORDER BY
from_rank,
to_rank;
Can someone please tell me if this is the correct way to approach this problem? Or is it better to join the table to itself and then perform the same calculations?
Thanks!
Your approach is correct. You don't need to join the table to itself, it's not efficient.
But your solution is overcomplicated. to_rank
always equals to from_rank + 1
So, you can calculate all analytical functions in one step:
WITH BorrowingPairs AS (
SELECT
name,
ROW_NUMBER() OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS from_rank,
date_library_book_borrowed AS current_borrowing_date,
LEAD(date_library_book_borrowed) OVER (PARTITION BY name ORDER BY date_library_book_borrowed) AS next_borrowing_date
FROM
myt
)
SELECT
from_rank,
from_rank + 1 AS to_rank,
AVG(next_borrowing_date - current_borrowing_date) AS avg_days_between_borrowings
FROM
BorrowingPairs
WHERE
next_borrowing_date IS NOT NULL
GROUP BY
from_rank
ORDER BY
from_rank