I have created a table(trial).
create table trial(
started int,
ended int
);
insert into trial values(1, 7),
(5, 6),
(9, 14);
-- select * from trial;
Now I want to generate new rows for each row where new row starts from started column value until the new column ('nr') hits the value of ended column.
In SQLSERVER, I can write this to get the desired results.
with cte as (
Select started, ended, started nr from trial
union all
Select started, ended, nr+1 from cte
where nr<ended
)
Select started, ended, nr from cte
order by started;
Output:
started | ended | nr |
---|---|---|
1 | 7 | 1 |
1 | 7 | 2 |
1 | 7 | 3 |
1 | 7 | 4 |
1 | 7 | 5 |
1 | 7 | 6 |
1 | 7 | 7 |
5 | 6 | 6 |
5 | 6 | 5 |
9 | 14 | 9 |
9 | 14 | 10 |
9 | 14 | 11 |
9 | 14 | 12 |
9 | 14 | 13 |
9 | 14 | 14 |
Currently I'm getting this error while using same query.
How can I achieve the same results in spark SQL and pyspark using CTE? Can we write a code which helps to achieve the same?
You can use sequence
operation for this.
select
started,
ended,
explode(sequence(started, ended)) as nr
from trial