Loading files based on pattern matching in spark

I have 31 inputs files with named from date=2018-01-01 till date=2018-01-31.

I am able to load all these files into an rdd this way:

val input = sc.textFile("hdfs://user/cloudera/date=*")

But what if I want to load the files for only 1 week? (files from date=2018-01-15 to date=2018-01-22).

Solution

You can specify your files individually to textFile by joining them with ,:

val files = (15 to 22).map(
  day => "hdfs://user/cloudera/date=2018-01-" + "%02d".format(day)
).mkString(",")

which produces:

hdfs://user/cloudera/date=2018-01-15,hdfs://user/cloudera/date=2018-01-16,hdfs://user/cloudera/date=2018-01-17,hdfs://user/cloudera/date=2018-01-18,hdfs://user/cloudera/date=2018-01-19,hdfs://user/cloudera/date=2018-01-20,hdfs://user/cloudera/date=2018-01-21,hdfs://user/cloudera/date=2018-01-22

and you can call it this way:

val input = sc.textFile(files)

Notice the formatting ("%02d".format(day)) of the day in order to add the leading 0 to days between 1 and 9.