Search code examples
apache-sparkhiveapache-spark-sql

How to get the value of the location for a Hive table using a Spark object?


I am interested in being able to retrieve the location value of a Hive table given a Spark object (SparkSession). One way to obtain this value is by parsing the output of the location via the following SQL query:

describe formatted <table name>

I was wondering if there is another way to obtain the location value without having to parse the output. An API would be great in case the output of the above command changes between Hive versions. If an external dependency is needed, which would it be? Is there some sample spark code that can obtain the location value?


Solution

  • First approach

    You can use input_file_name with dataframe.

    it will give you absolute file-path for a part file.

    spark.read.table("zen.intent_master").select(input_file_name).take(1)
    

    And then extract table path from it.

    Second approach

    Its more of hack you can say.

    package org.apache.spark.sql.hive
    
    import java.net.URI
    
    import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog}
    import org.apache.spark.sql.catalyst.parser.ParserInterface
    import org.apache.spark.sql.internal.{SessionState, SharedState}
    import org.apache.spark.sql.SparkSession
    
    class TableDetail {
      def getTableLocation(table: String, spark: SparkSession): URI = {
        val sessionState: SessionState = spark.sessionState
        val sharedState: SharedState = spark.sharedState
        val catalog: SessionCatalog = sessionState.catalog
        val sqlParser: ParserInterface = sessionState.sqlParser
        val client = sharedState.externalCatalog match {
          case catalog: HiveExternalCatalog => catalog.client
          case _: InMemoryCatalog => throw new IllegalArgumentException("In Memory catalog doesn't " +
            "support hive client API")
        }
    
        val idtfr = sqlParser.parseTableIdentifier(table)
    
        require(catalog.tableExists(idtfr), new IllegalArgumentException(idtfr + " done not exists"))
        val rawTable = client.getTable(idtfr.database.getOrElse("default"), idtfr.table)
        rawTable.location
      }
    }