Search code examples
apache-spark-sqlapache-zeppelinsparkr

Apache Zeppelin - Can't load a dataframe from a HIVE table using SparkR


I need to load a dataframe from a Hive table and for that I followed this instruction from Apache Spark 2.3 docs.(https://spark.apache.org/docs/latest/sparkr.html). I'm doing that by a Zeppelin notebook.

Can someone please explain how to create a dataframe using SparkR? Or what I'm doing wrong? Any answer is appreciated.

Documentation

Queries can be expressed in HiveQL. results <- sql("FROM src SELECT key, value")

My code:

sp_df <- sql("SELECT * FROM sparkr_test")

Results of my code:

head(sp_df) [1] “SELECT * FROM sparkr_test”


Solution

  • Where is your data located, and have you registered the source data as a table? You need to run something like:

    sql("CREATE TABLE IF NOT EXISTS sparkr_test (column1 INT, column2 STRING ...) USING hive")
    sql("LOAD DATA LOCAL INPATH 'path/to/data/data.txt' INTO TABLE sparkr_test")
    

    before you can query the table