I am trying to read in data from Databricks Hive_Metastore with PySpark. In screenshot below, I am trying to read in the table called 'trips' which is located in the database nyctaxi
.
Typically if this table was located on a AzureSQL server I was use code like the following:
df = spark.read.format("jdbc")\
.option("url", jdbcUrl)\
.option("dbtable", tableName)\
.load()
Or if the table was in the ADLS I would use code similar to the following:
df = spark.read.csv("adl://mylake.azuredatalakestore.net/tableName.csv",header=True)
Can some let me know how I would read in the table using PySpark from Databricks Database below:
The additional screenshot my also help
Ok, I've just realized that I think I should be asking how to read tables from "samples" meta_store.
In any case I would like help reading in the "trips" table from the nyctaxi
database please.
The samples
catalog can be accessed in using spark.table("catalog.schema.table")
.
So you should be able to access the table using:
df = spark.table("samples.nyctaxi.trips")
Note also if you are working direct in databricks notebooks, the spark session is already available as spark
- no need to get or create.