What is the best way to find out full path (S3 key) to the data stored via AWS Glue Data Catalog using Spark (or PySpark)?
For example if I saved data following way:
my_spark_dataframe \
.write.mode("overwrite") '
.format("parquet") \
.saveAsTable("database_name.table_name")
One way is to get metadata information of a given table and then extract Location
portion:
full_s3_path = spark_session \
.sql("describe formatted database_name.table_name") \
.filter(col("col_name") == "Location") \
.select("data_type").head()[0]
This will return:
# full_s3_path=s3://some_s3_bucket/key_to_table_name