Search code examples
scalaapache-sparkparquet

How to show the scheme (including type) of a parquet file from command line or spark shell?


I have determined how to use the spark-shell to show the field names but it's ugly and does not include the type

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

println(sqlContext.parquetFile(path))

prints:

ParquetTableScan [cust_id#114,blar_field#115,blar_field2#116], (ParquetRelation /blar/blar), None

Solution

  • You should be able to do this:

    sqlContext.read.parquet(path).printSchema()
    

    From Spark docs:

    // Print the schema in a tree format
    df.printSchema()
    // root
    // |-- age: long (nullable = true)
    // |-- name: string (nullable = true)