Search code examples
imagehadoopmapreducehdfssequencefile

Error when getting original images from Hadoop sequenceFile


I first pack all my images into Hadoop sequenceFile:

FSDataInputStream in = null;    
in = fs.open(new Path(uri)); //uri is the image location in HDFS
byte buffer[] = new byte[in.available()];
in.read(buffer);
context.write(imageID, new BytesWritable(buffer));

Then I want to get my original images back from Sequence file, in the reducer:

BufferedImage imag;    
imag = ImageIO.read(new ByteArrayInputStream(value.getBytes())); 

But the image is not properly got, since I have this error:

Error: javax.imageio.IIOException: Error reading PNG image data
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream

My question is how to get original images from sequence file in hadoop?


Solution

  • The problem is I use the wrong way to read the stream. Here is the right way.:

    import org.apache.commons.io.IOUtils;
    Configuration confHadoop = new Configuration();
    FileSystem fs = FileSystem.get(confHadoop);
    Path file = new Path(fs.getUri().toString() + "/" + fileName);
    in = fs.open(file);
    byte[] buffer = IOUtils.toByteArray(in);
    

    Then the buffer can be written to sequenceFile by new BytesWritable(buffer). Same when you read from the sequenceFile.