python pyspark apache-spark-sql azure-databricks

Reading / Extracting Data from Databricks Database (hive_metastore ) with PySpark

I am trying to read in data from Databricks Hive_Metastore with PySpark. In screenshot below, I am trying to read in the table called 'trips' which is located in the database nyctaxi.

Typically if this table was located on a AzureSQL server I was use code like the following:

df = spark.read.format("jdbc")\
    .option("url", jdbcUrl)\
    .option("dbtable", tableName)\
    .load()

Or if the table was in the ADLS I would use code similar to the following:

df = spark.read.csv("adl://mylake.azuredatalakestore.net/tableName.csv",header=True)

Can some let me know how I would read in the table using PySpark from Databricks Database below:

The additional screenshot my also help

Ok, I've just realized that I think I should be asking how to read tables from "samples" meta_store.

In any case I would like help reading in the "trips" table from the nyctaxi database please.

Solution

The samples catalog can be accessed in using spark.table("catalog.schema.table").

So you should be able to access the table using:

df = spark.table("samples.nyctaxi.trips")

Note also if you are working direct in databricks notebooks, the spark session is already available as spark - no need to get or create.