apache-spark hive apache-spark-sql parquet

Query data in subdirectories in Hive Partitions using Spark SQL

How can I force spark sql to recursively get data stored in parquet format from subdirectories ? In Hive, I could achieve this by setting few Hive configs.

set hive.input.dir.recursive=true;
set hive.mapred.supports.subdirectories=true;
set hive.supports.subdirectories=true;
set mapred.input.dir.recursive=true;

I tried to set these configs through spark sql queries but I get 0 records all the times compared to hive which get me the expected results. I also put these confs in hive-site.xml file but nothing changed. How can I handle this issue ?

Spark Version : 2.1.0 I used Hive 2.1.1 on emr-5.3.1

By the way, this issue one appears while using parquet files while with JSON it works fine.

Solution

One solution for this problem is to force spark to Hive Parquet reader by using hive context which would make spark able to read files recursively.