My data have structure like this:
+data
|-2014080700_00.txt
|-2014080700_01.txt
|-2014080701_00.txt
|- ...
|-2014080723_00.txt
|-2014080800_00.txt
|- ...
|-2014090800_00.txt
I know I can use all the file inside data directory with Tap
like below:
Tap inTap = new Hfs( new TextLine(), "/path/to/data");
But I want specific part of the directory, for example only file on date 20140807
. Hence it will include all file with prefix 20140807
. Is there any way to do it with Cascading? Or is there any way to do it with scalding?
I don't think you can do it using Hfs
, but it's
possible using GlobHfs
.
Try the following:
Tap inTap = new GlobHfs( new TextLine(), "/path/to/data/", new GlobFilter("20140807*"));
This creates a Globbing tap, using "/path/to/data/" directory as source and filtering the files inside using "20140807*"
glob pattern passed to GlobFilter
.