Search code examples
rapache-spark-sqlsparkrlocf

equivalent of na.locf in sparkR


I am new to R trying to rewrite an R code in sparkR. One of the operations on data.table named costTbl (which has 5 other columns) is

costTbl[,cost:=na.locf(cost,na.rm=FALSE),by=product_id]
costTbl[,cost:=na.locf(cost,na.rm=FALSE, fromLast=TRUE),by=product_id]

I am unable to find an equivalent operation in sparkR. I thought gapply can be used by grouping the df on product_id and performing this operation. But I am not able to make the code work.

Is gapply the right approach? Is there some other way for achieving this?


Solution

  • I was finally able to use SparkR UDFs to perform locf using the existing native R code. We can use gapply for this use case, by grouping my dataframe on the column product_id.

    Have shared my findings here : https://shbhmrzd.medium.com/stl-and-holt-from-r-to-sparkr-1815bacfe1cc