I first pack all my images into Hadoop sequenceFile:
FSDataInputStream in = null;
in = fs.open(new Path(uri)); //uri is the image location in HDFS
byte buffer[] = new byte[in.available()];
in.read(buffer);
context.write(imageID, new BytesWritable(buffer));
Then I want to get my original images back from Sequence file, in the reducer:
BufferedImage imag;
imag = ImageIO.read(new ByteArrayInputStream(value.getBytes()));
But the image is not properly got, since I have this error:
Error: javax.imageio.IIOException: Error reading PNG image data
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
My question is how to get original images from sequence file in hadoop?
The problem is I use the wrong way to read the stream. Here is the right way.:
import org.apache.commons.io.IOUtils;
Configuration confHadoop = new Configuration();
FileSystem fs = FileSystem.get(confHadoop);
Path file = new Path(fs.getUri().toString() + "/" + fileName);
in = fs.open(file);
byte[] buffer = IOUtils.toByteArray(in);
Then the buffer can be written to sequenceFile by new BytesWritable(buffer)
.
Same when you read from the sequenceFile.