I have 31 inputs files with named from date=2018-01-01
till date=2018-01-31
.
I am able to load all these files into an rdd this way:
val input = sc.textFile("hdfs://user/cloudera/date=*")
But what if I want to load the files for only 1 week? (files from date=2018-01-15 to date=2018-01-22).
You can specify your files individually to textFile
by joining them with ,
:
val files = (15 to 22).map(
day => "hdfs://user/cloudera/date=2018-01-" + "%02d".format(day)
).mkString(",")
which produces:
hdfs://user/cloudera/date=2018-01-15,hdfs://user/cloudera/date=2018-01-16,hdfs://user/cloudera/date=2018-01-17,hdfs://user/cloudera/date=2018-01-18,hdfs://user/cloudera/date=2018-01-19,hdfs://user/cloudera/date=2018-01-20,hdfs://user/cloudera/date=2018-01-21,hdfs://user/cloudera/date=2018-01-22
and you can call it this way:
val input = sc.textFile(files)
Notice the formatting ("%02d".format(day)
) of the day in order to add the leading 0 to days between 1 and 9.