I am trying to read in data from Databricks Hive_Metastore with PySpark. In screenshot below, I am trying to read in the table called 'trips' which is located in the database nyctaxi
Typically if this table was located on a AzureSQL server I was use code like the following:
df = spark.read.format("jdbc")\
.option("url", jdbcUrl)\
.option("dbtable", tableName)\
Or if the table was in the ADLS I would use code similar to the following:
df = spark.read.csv("adl://mylake.azuredatalakestore.net/tableName.csv",header=True)
Can some let me know how I would read in the table using PySpark from Databricks Database below:
The additional screenshot my also help
Ok, I've just realized that I think I should be asking how to read tables from "samples" meta_store.
In any case I would like help reading in the "trips" table from the nyctaxi
database please.
The samples
catalog can be accessed in using spark.table("catalog.schema.table")
So you should be able to access the table using:
df = spark.table("samples.nyctaxi.trips")
Note also if you are working direct in databricks notebooks, the spark session is already available as spark
- no need to get or create.