Search code examples
apache-sparkpysparkapache-spark-sql

CTE query to generate new rows based on existing column in spark SQL


I have created a table(trial).

create table trial(
  started int,
  ended int
);

insert into trial values(1, 7),
(5, 6),
(9, 14);

-- select * from trial;

Now I want to generate new rows for each row where new row starts from started column value until the new column ('nr') hits the value of ended column.

In SQLSERVER, I can write this to get the desired results.

with cte as (
  Select started, ended, started nr from trial
  union all
  Select started, ended, nr+1 from cte
  where nr<ended
)
Select started, ended, nr from cte
order by started;

Output:

started ended nr
1 7 1
1 7 2
1 7 3
1 7 4
1 7 5
1 7 6
1 7 7
5 6 6
5 6 5
9 14 9
9 14 10
9 14 11
9 14 12
9 14 13
9 14 14

Currently I'm getting this error while using same query.

Screenshot of error message

How can I achieve the same results in spark SQL and pyspark using CTE? Can we write a code which helps to achieve the same?


Solution

  • You can use sequence operation for this.

    select 
      started, 
      ended,
      explode(sequence(started, ended)) as nr
    from trial