With HDFS's Java API, it's straightforward to read a file sequentially reading each block at a time. Here's a simple example.
I want to be able to read the file one block at a time using something like HDFS's FileSplits. The end goal is to read a file in parallel with multiple machines, each machine reading a zone of blocks. Given a HDFS Path, how can I get the FileSplits or blocks?
Map-Reduce and other processors are not involved. This is strictly a file system level operation.
This is how you would get the blocks locations of a File in HDFS
Path dataset = new Path(fs.getHomeDirectory(), <path-to-file>);
FileStatus datasetFile = fs.getFileStatus(dataset);
BlockLocation myBlocks [] = fs.getFileBlockLocations(datasetFile,0,datasetFile.getLen());
for(BlockLocation b : myBlocks){
System.out.println("Length "+b.getLength());
for(String host : b.getHosts()){
System.out.println("host "+host);
}
}