INDEX MATCH / VLOOKUP IN SPARKR
I'm Trying to make a "Index match / vlookup" at Rstudio with the package SparkR, Do any one have already done this before ? I'm trying to use the function locate to aproach the same method with the function match at normal RStudio but I receive a message that is not able to locate with two columns, so I did a loop , but also didn't worked ... do anyone have any clue on this ?
iUWYData <- SparkR::nrow(UWYData)
for(i in iUWYData){
UWYData[i,2] = LoBUWY_Data[SparkR::locate(UWYData[i,1], LoBUWY_Data$ICRF_ID),"Version"]
}
OR
UWYData$Version = LoBUWY_Data[SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID),"Version"]
But I get these messages :
Error in UWYData[i, 1]: Expressions other than filtering predicates are not supported in the first parameter of extract operator [ or subset() method. traceback: eval(parse(text = code), envir = envir) test(Premium_rate_Index_script_v1 = Premium_rate_Index_script_v1) LoBUWY_Data[SparkR::locate(UWYData[i, 1], LoBUWY_Data$ICRF_ID), "Version"] LoBUWY_Data[SparkR::locate(UWYData[i, 1], LoBUWY_Data$ICRF_ID), "Version"] SparkR::locate(UWYData[i, 1], LoBUWY_Data$ICRF_ID) UWYData[i, 1] UWYData[i, 1] stop(paste0("Expressions other than filtering predicates are not supported ", "in the first parameter of extract operator [ or subset() method."))
And
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘locate’ for signature ‘"Column", "Column"’
traceback: eval(parse(text = code), envir = envir) test(Premium_rate_Index_script_v1 = Premium_rate_Index_script_v1) LoBUWY_Data[SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID), "Version"] LoBUWY_Data[SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID), "Version"] SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID) (function (classes, fdef, mtable)
Use SparkR::join
.
UWYData <- SparkR::createDataFrame(data.frame(
id = 1:5
))
LoBUWY_Data <- SparkR::createDataFrame(data.frame(
ICRF_ID = 1:4,
Version = 11:14
))
result <- SparkR::join(
UWYData,
LoBUWY_Data,
UWYData$id == LoBUWY_Data$ICRF_ID,
joinType = "left_outer"
)
SparkR::collect(result)
#> id ICRF_ID Version
#> 1 1 1 11
#> 2 3 3 13
#> 3 5 NA NA
#> 4 4 4 14
#> 5 2 2 12