After using sdf_pivot I was left with a huge number of NaN values, so in order to proceed with my analysis I need to replace the NaN with 0, I have tried using this:
data <- data %>%
spark_apply(function(e) ifelse(is.nan(e),0,e))
And this gererates the following error:
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file
'C:\.........\file18dc5a1c212e_spark.log':Permission denied
I'm using Spark 2.2.0 and the latest version of sparklyr
Does anyone have an idea on how to fix this issue? Thanks
You seem to have two different problems here.
if necessary.NULL
replacement.The latter one can solved using built-in functions and there is no need for inefficient spark_apply
df <- copy_to(sc,
data.frame(id=c(1, 1, 2, 3), key=c("a", "b", "a", "d"), value=1:4))
pivoted <- sdf_pivot(df, id ~ key)
# Source: table<sparklyr_tmp_f0550e429aa> [?? x 4]
# Database: spark_connection
id a b d
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 NaN
2 3 NaN NaN 1
3 2 1 NaN NaN
pivoted %>% na.replace(0)
# Source: table<sparklyr_tmp_f0577e16bf1> [?? x 4]
# Database: spark_connection
id a b d
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 0
2 3 0 0 1
3 2 1 0 0
Tested with sparklyr