Search code examples
rsparkr

SparkR : How To make a "INDEX MATCH / VLOOKUP"


INDEX MATCH / VLOOKUP IN SPARKR

I'm Trying to make a "Index match / vlookup" at Rstudio with the package SparkR, Do any one have already done this before ? I'm trying to use the function locate to aproach the same method with the function match at normal RStudio but I receive a message that is not able to locate with two columns, so I did a loop , but also didn't worked ... do anyone have any clue on this ?

iUWYData <- SparkR::nrow(UWYData)

for(i in iUWYData){
UWYData[i,2] = LoBUWY_Data[SparkR::locate(UWYData[i,1], LoBUWY_Data$ICRF_ID),"Version"]

}

OR

UWYData$Version = LoBUWY_Data[SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID),"Version"]

But I get these messages :

Error in UWYData[i, 1]: Expressions other than filtering predicates are not supported in the first parameter of extract operator [ or subset() method. traceback: eval(parse(text = code), envir = envir) test(Premium_rate_Index_script_v1 = Premium_rate_Index_script_v1) LoBUWY_Data[SparkR::locate(UWYData[i, 1], LoBUWY_Data$ICRF_ID), "Version"] LoBUWY_Data[SparkR::locate(UWYData[i, 1], LoBUWY_Data$ICRF_ID), "Version"] SparkR::locate(UWYData[i, 1], LoBUWY_Data$ICRF_ID) UWYData[i, 1] UWYData[i, 1] stop(paste0("Expressions other than filtering predicates are not supported ", "in the first parameter of extract operator [ or subset() method."))

And

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘locate’ for signature ‘"Column", "Column"’

traceback: eval(parse(text = code), envir = envir) test(Premium_rate_Index_script_v1 = Premium_rate_Index_script_v1) LoBUWY_Data[SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID), "Version"] LoBUWY_Data[SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID), "Version"] SparkR::locate(UWYData$ICRF_ID, LoBUWY_Data$ICRF_ID) (function (classes, fdef, mtable)


Solution

  • Use SparkR::join.

    UWYData <- SparkR::createDataFrame(data.frame(
      id = 1:5
    ))
    LoBUWY_Data <- SparkR::createDataFrame(data.frame(
      ICRF_ID = 1:4,
      Version = 11:14
    ))
    
    result <- SparkR::join(
      UWYData,
      LoBUWY_Data,
      UWYData$id == LoBUWY_Data$ICRF_ID,
      joinType = "left_outer"
    )
    
    SparkR::collect(result)
    #>   id ICRF_ID Version
    #> 1  1       1      11
    #> 2  3       3      13
    #> 3  5      NA      NA
    #> 4  4       4      14
    #> 5  2       2      12