SparkR : How To make a "INDEX MATCH / VLOOKUP"

INDEX MATCH / VLOOKUP IN SPARKR

I'm Trying to make a "Index match / vlookup" at Rstudio with the package SparkR, Do any one have already done this before ? I'm trying to use the function locate to aproach the same method with the function match at normal RStudio but I receive a message that is not able to locate with two columns, so I did a loop , but also didn't worked ... do anyone have any clue on this ?

iUWYData <- SparkR::nrow(UWYData)

for(i in iUWYData){
UWYData[i,2] = LoBUWY_Data[SparkR::locate(UWYData[i,1], LoBUWY_Data$ICRF_ID),"Version"]

}

UWYData$Version = LoBUWY_Data[SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID),"Version"]

But I get these messages :

Error in UWYData[i, 1]: Expressions other than filtering predicates are not supported in the first parameter of extract operator [ or subset() method. traceback: eval(parse(text = code), envir = envir) test(Premium_rate_Index_script_v1 = Premium_rate_Index_script_v1) LoBUWY_Data[SparkR::locate(UWYData[i, 1], LoBUWY_Data$ICRF_ID), "Version"] LoBUWY_Data[SparkR::locate(UWYData[i, 1], LoBUWY_Data$ICRF_ID), "Version"] SparkR::locate(UWYData[i, 1], LoBUWY_Data$ICRF_ID) UWYData[i, 1] UWYData[i, 1] stop(paste0("Expressions other than filtering predicates are not supported ", "in the first parameter of extract operator [ or subset() method."))

And

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘locate’ for signature ‘"Column", "Column"’

traceback: eval(parse(text = code), envir = envir) test(Premium_rate_Index_script_v1 = Premium_rate_Index_script_v1) LoBUWY_Data[SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID), "Version"] LoBUWY_Data[SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID), "Version"] SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID) (function (classes, fdef, mtable)

Solution

Use SparkR::join.

UWYData <- SparkR::createDataFrame(data.frame(
  id = 1:5
))
LoBUWY_Data <- SparkR::createDataFrame(data.frame(
  ICRF_ID = 1:4,
  Version = 11:14
))

result <- SparkR::join(
  UWYData,
  LoBUWY_Data,
  UWYData$id == LoBUWY_Data$ICRF_ID,
  joinType = "left_outer"
)

SparkR::collect(result)
#>   id ICRF_ID Version
#> 1  1       1      11
#> 2  3       3      13
#> 3  5      NA      NA
#> 4  4       4      14
#> 5  2       2      12