I am applying row_number on particular column based on one column partition. how to reset the row number to starting number 1 once it reaches 999 through pyspark?
partcode | row number |
---|---|
12345 | 0001 |
12345 | 0002 |
12345 | 0003 |
12345 | 9999 -- Reaches max |
12345 | 0001 |
12345 | 0002 |
After generating row number
:
1
from row number
to get row number starting from zerocut-off value (9999)
1
to the modulo result to get the row number
starting from 1 and wrapping back to 1 for row number cut-off value + 1
In my example I have used a smaller cutoff value of
3
from pyspark.sql import Window
from pyspark.sql import functions as F
df = spark.createDataFrame([(12345, )for _ in range(0, 10)], ("partcode", ))
window_spec = Window.partitionBy("partcode").orderBy("partcode")
cut_off = 3
df.withColumn("row number", F.row_number().over(window_spec))\
.withColumn("row number", F.lpad(((F.col("row number") - F.lit(1)) % F.lit(cut_off)) + F.lit(1), 4, '0'))\
.show()
+--------+----------+
|partcode|row number|
+--------+----------+
| 12345| 0001|
| 12345| 0002|
| 12345| 0003|
| 12345| 0001|
| 12345| 0002|
| 12345| 0003|
| 12345| 0001|
| 12345| 0002|
| 12345| 0003|
| 12345| 0001|
+--------+----------+