Shifting Columns values towards left based on the value in them

I have business scenario, where I have an id column and value columns. There are multiple value columns like value_1, value_2 etc.

The idea is to shift values of value column to left.

The Input is as below:

Input
+------------+-----------+---------+----------+---------+---------+---------+
|          id|    value_1|  value_2|   value_3|  value_4|  value_5|  value_6|
+------------+-----------+---------+----------+---------+---------+---------+
|           1|       1011|     1012|      1011|     2018|     null|     1011|
|           2|       null|     1012|      null|     2018|     null|     2022|
|           7|       1011|     1011|      1016|     2018|     null|     null|
|           8|       1011|     1012|      1014|     2018|     null|     null|
+------------+-----------+---------+----------+---------+---------+---------+

The next step is to replace null values with actual by shifting the columns towards left. The idea is to all the valid valued columns should be kept to left and columns having null to be shfited right. The intermedidate output looks as below:

Output1
+------------+-----------+---------+----------+---------+---------+---------+
|          id|    value_1|  value_2|   value_3|  value_4|  value_5|  value_6|
+------------+-----------+---------+----------+---------+---------+---------+
|           1|       1011|     1012|      1011|     2018|     1011|     null|
|           2|       1012|     2018|      2022|     null|     null|     null|
|           7|       1011|     1011|      1016|     2018|     null|     null|
|           8|       1011|     1012|      1014|     2018|     null|     null|
+------------+-----------+---------+----------+---------+---------+---------+

Then, the duplicate values of rows must be removed and if possible the valid values must be shifted further left. The sample output looks as below:

Output2
+------------+-----------+---------+----------+---------+---------+---------+
|          id|    value_1|  value_2|   value_3|  value_4|  value_5|  value_6|
+------------+-----------+---------+----------+---------+---------+---------+
|           1|       1011|     1012|      2018|     null|     null|     null|
|           2|       1012|     2018|      2022|     null|     null|     null|
|           7|       1011|     1016|      2018|     null|     null|     null|
|           8|       1011|     1012|      1014|     2018|     null|     null|
+------------+-----------+---------+----------+---------+---------+---------+

Any leads on this would be helpful. Thanks.

Solution

Code

cols = ['value_1', 'value_2', 'value_3',
        'value_4', 'value_5', 'value_6']

arr = F.array_compact(F.array_distinct(F.array(*cols)))
df1 = df.select('id', *[arr[i].alias(c) for i, c in enumerate(cols)])

How this works

Create an array from the value columns then remove the duplicate values using array_distinct then use array_compact to drop the null values. Finally use a select expression to yank the cols from the resulting array and assign them back to the dataframe.

Result

df1.show()

+---+-------+-------+-------+-------+-------+-------+
|id |value_1|value_2|value_3|value_4|value_5|value_6|
+---+-------+-------+-------+-------+-------+-------+
|1  |1011   |1012   |2018   |null   |null   |null   |
|2  |1012   |2018   |2022   |null   |null   |null   |
|7  |1011   |1016   |2018   |null   |null   |null   |
|8  |1011   |1012   |1014   |2018   |null   |null   |
+---+-------+-------+-------+-------+-------+-------+