Search code examples
rtidyversesparklyrdatabricks

Convert sql data table to sparklyr dataframe


I uploaded the data.csv to Microsoft Azure Storage Explorer. Then copied the url and created a table in databricks.

%sql 
DROP TABLE If EXISTS data; 
CREATE TABLE IF NOT EXISTS data 
USING CSV 
OPTIONS (header "true", inferSchema "true") 
LOCATION "url/data.csv" 

Now I want to use sparklyr to manupulate "data".

How should I convert above data to a sparklyr dataframe to use the full potential of sparklyr?


Solution

  • First you must initialise your sparklyr session as follows:

    sc = spark_connect(method = 'databricks')
    

    you can then read directly from your SQL tables using:

    sdf_sql(sc, 'SELECT * From ...')
    

    and then perform all of the usual sparklyr/dplyr work as normal.

    Note that databricks clusters do not come pre-loaded with sparklyr as they want to push you towards using the SparkR API to interact with your data instead. If you with to use the sparklyr API you must install and load sparklyr each time your start the cluster.