How to store data in a Spark cluster using sparklyr?

If I connect to a Spark cluster, copy some data to it, and disconnect, ...

library(dplyr)
library(sparklyr)
sc <- spark_connect("local")
copy_to(sc, iris)
src_tbls(sc)
## [1] "iris"
spark_disconnect(sc)

then the next time I connect to Spark, the data is not there.

sc <- spark_connect("local")
src_tbls(sc)
## character(0)
spark_disconnect(sc)

This is different to the situation of working with a database, where regardless of how many times you connect, the data is just there.

How do I persist data in the Spark cluster between connections?

I thought sdf_persist() might be what I want, but it appears not.

Solution

Spark is technically an engine that runs on the computer/cluster to execute tasks. It is not a database or file-system. You can save the data when you are done to a file-system and load it up during your next session.

https://en.wikipedia.org/wiki/Apache_Spark

barplot multiple aggregation
Start a PowerShell script in R via system2()
plot from sankeyNetwork in networkD3 does not show output (issue is not number of unique nodes)
"Target position can only be set for new windows" in chromote in R
Extract the correct data type in a PDF table
Time conversion in R
Comparing the values of a certain number previous rows with the current row
Run a single test function in R's testthat
rpart package installation in R
An efficient way to assign value based on a min-max range and category
Change output of the `purrr::map` function
osmdata_sf returns failed to perform HTTP request curl::curl_fetch_memory() error in R?
Comparing nls() to nls2() - what am I doing wrong
How to add "variables grid" below ggplot
How can I use predefined code snippets outside of code chunks in Quarto within RStudio/Posit?
Wrap text for collapse rows in KableExtra for a long table in R
Implementation of Breusch-Pagan test for random effects in plm with unbalanced panels
Finding a value of a dataset in different ones
Replicate matrix
Unexpected results after converting raster data from geographic to projected coordinate system using the terra package
How to remove rows by condition in R?
How do I add an alias for magrittr pipe from R in vscode
Package ‘neuralnet’ in R, rectified linear unit (ReLU) activation function?
Reticulate with pywin32, a dependency of xlwings, not found when sourcing python script from R
Sub-subtitle in a graph made with `ggplot2`
How can I execute a statement and ignore warnings with tryCatch?
Enumerate events where n consecutive values are not NA
Serialize/deserialize a column with R and DuckDB
How can I structure my sapflow data to analyze using "TREX" package in R?
Putting multiple plots on the same page in R?