Search code examples
javaapacheinputstreamapache-tika

Detecting File extension using ApacheTika corrupts the File


I am trying to detect the File Extension of a file passed as an InputStream, the extension is detected correctly but the file tends to become corrupted after that. Here is my method for detecting Extension -

public static Optional<String> detectFileExtension(InputStream inputStream) {

    // To provide mark/reset functionality to the stream required by Tika.
    InputStream bufferedInputStream = new BufferedInputStream(inputStream);

    String extension = null;
    try {
        MimeTypes mimeRepository = getMimeRepository();

        MediaType mediaType = mimeRepository.detect(bufferedInputStream, new Metadata());
        MimeType mimeType = mimeRepository.forName(mediaType.toString());
        extension = mimeType.getExtension();
        log.info("File Extension detected: {}", extension);

        // Need to reset input stream pos marker since it was updated while detecting the extension
        inputStream.reset();
        bufferedInputStream.close();

    } catch (MimeTypeException | IOException ignored) {
        log.error("Unable to detect extension of the file from the provided stream");
    }
    return Optional.ofNullable(extension);
}

private static MimeTypes getMimeRepository() {
    TikaConfig config = TikaConfig.getDefaultConfig();
    return config.getMimeRepository();
}

Now when I am trying to save this file after extension detection again using the same InputStream like -

byte[] documentContentByteArray = IOUtils.toByteArray(inputStream);

Optional<String> extension = FileTypeHelper.detectFileExtension(inputStream);
    if (extension.isPresent()) {
        fileName = fileName + extension.get();
    } else {
        log.warn("File: {} does not have a valid extension", fileName);         
    }
File file = new File("/tmp/" + fileName);
FileUtils.writeByteArrayToFile(file, documentContentByteArray);

It creates a file but a corrupted one. I guess after stream consumption in detectFileExtension the stream is not getting reset properly. If someone has done this before some guidance would be great, thanks in advance.


Solution

  • I fixed it by not using the same input stream again and again. I created a new stream to pass for extension detection and the initial stream for creating the file.

    byte[] documentContentByteArray = IOUtils.toByteArray(inputStream);
    
    //extension detection
    InputStream extensionDetectionInputStream = new ByteArrayInputStream(documentContentByteArray);
    Optional<String> extension = FileTypeHelper.detectFileExtension(inputStream);
        if (extension.isPresent()) {
            fileName = fileName + extension.get();
        } else {
            log.warn("File: {} does not have a valid extension", fileName);
        }
    extensionDetectionInputStream.close();
    
    //File creation
    File file = new File("/tmp/" + fileName);
    FileUtils.writeByteArrayToFile(file, documentContentByteArray);
    

    If there is a better way to do that by reusing the same stream it would be great and I'll gladly accept that answer, for now, I am marking this as the accepted answer.