I need to load a dataframe from a Hive table and for that I followed this instruction from Apache Spark 2.3 docs.(https://spark.apache.org/docs/latest/sparkr.html). I'm doing that by a Zeppelin notebook.
Can someone please explain how to create a dataframe using SparkR? Or what I'm doing wrong? Any answer is appreciated.
Queries can be expressed in HiveQL.
results <- sql("FROM src SELECT key, value")
sp_df <- sql("SELECT * FROM sparkr_test")
head(sp_df)
[1] “SELECT * FROM sparkr_test”
Where is your data located, and have you registered the source data as a table? You need to run something like:
sql("CREATE TABLE IF NOT EXISTS sparkr_test (column1 INT, column2 STRING ...) USING hive")
sql("LOAD DATA LOCAL INPATH 'path/to/data/data.txt' INTO TABLE sparkr_test")
before you can query the table