Search code examples
apache-sparkparquet

Read few parquet files at the same time in Spark


I can read few json-files at the same time using * (star):

sqlContext.jsonFile('/path/to/dir/*.json')

Is there any way to do the same thing for parquet? Star doesn't works.


Solution

  • See this issue on the spark jira. It is supported from 1.4 onwards.

    Without upgrading to 1.4, you could either point at the top level directory:

    sqlContext.parquetFile('/path/to/dir/')
    

    which will load all files in the directory. Alternatively, you could use the HDFS API to find the files you want, and pass them to parquetFile (it accepts varargs).