Search code examples

SparklyR removing a tbl from Spark Context

Similar to: SparklyR removing a Table from Spark Context, but different because:

The above question asks how to remove a "table" from spark, here created by the copy_to function. If the spark_read_csv() function is used instead it appears that there is a difference in class.

my_csv <- spark_read_csv("name", sc)


Error in UseMethod("db_drop_table") : 
  no applicable method for 'db_drop_table' applied to an object of class "c('tbl_spark', 'tbl_sql', 'tbl_lazy', 'tbl')"

Which indicates further that the object created here is not a table but a tbl, Hadleys data type of choice.

Therefore, how can I remove a specific tbl and only that tbl from the memory/session without exiting the full session?

Bonus: is there a button in RStudio Server interface that I've missed that will perform this process for me? I can't see on obvious way to do this in the spark connection tab.


  • In general sparklyr:

    • Creates temporary views - this just creates corresponding entries in the metastore but doesn't occupy any resources
    • By default eagerly caches the data (memory parameter for reader is set to TRUE).

    You can remove tables from metastore using dropView method:

    sc %>% spark_session() %>% invoke("catalog") %>%
      invoke("dropTempView", "my_table")

    or clear cache with clearCache method:

    sc %>% spark_session() %>% invoke("catalog") %>% 

    Unless you're worried about the name clashes you should probably focus on the second one, although I'd recommend avoiding eager caching, unless it is strictly necessary.