Search code examples
pythonapache-sparkpysparkapache-spark-sql

Spark read table from a specific location


I have saved a dataframe as a table using the following code:

yearly_calltype.write.option("path", "/home/user/tables/firstProject").saveAsTable('yearly_calltype_count')

But how do I read this table from this location?

When I am trying to do:

spark.read.table("/home/user/tables/firstProject/yearly_calltype_count")

I am getting this error:

[PARSE_SYNTAX_ERROR] Syntax error at or near '/'.(line 1, pos 0)

== SQL ==
/home/user/tables/firstProject/yearly_calltype_count
^^^

I believe when we try to read the tables, we cannot specify the location. And spark tries to read the table from default /home/user/spark-warehouse location. We can change this location by changing the spark.sql.warehouse.dir config. But I do not want to do that. Is there a way I can read this table by specifying the location of the table in the read.table ?


Solution

  • def table(tableName: String): org.apache.spark.sql.DataFrame will take only tableName not table path

    You can access table data like below

    spark
    .read
    .option("path","/home/user/tables/firstProject")
    .table("yearly_calltype_count")
    .show(false)
    

    OR

    spark
    .read
    .table("yearly_calltype_count")
    .show(false)
    

    OR

    spark
    .read
    .parquet("/home/user/tables/firstProject/yearly_calltype_count")
    .show(false)