I have a file such as C:/aaa a+b[1234].res.1.txt and I try to process it using SparkContext, e.g.:
...
sc.textFile(filename).cache()
val count = cache.filter(line => line.contains("e")).count()
...
Unfortunately this raises an exception:
Input Pattern file:/C:/aaa a+b[1234].abc.1.txt matches 0 files
org.apache.hadoop.mapred.InvalidInputException: Input Pattern file:/C:/aaa a+b[1234].res.1.txt matches 0 files
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
Probably this is error is due to the brackets "[" and "]" in the filename. If I simplify the filename I receive results. How can I encode the filename to succeed the request?
Ok, after Kirans suggestion I came up with a possible solution:
sc.textFile(filename.replace("[","?").replace("]","?"))
The '?' basically represents any character. Although that might be working in my use case I wonder if there isnt anything better since obviously it might happen that I read two files where I only want to read one