BASICS
This is a Java 1.8
Spring Boot 1.5
Application.
It currently uses Apache Tika 1.22
to read Mime-Type information, but this can easily be changed.
SUMMARY
There is a mapper which User uses to download files. These files come from another URL
separate from the application. The file may be a variety of types (excel
, PDF
, text
, etc), and the application has no way of knowing what it will be until it pulls the file down.
ISSUE
In order to return the file download to User with the appropriate title, extension, and ContentType
, the application uses Apache Tika
to pull that information. Unfortunately, now that the header of the InputStream
is consumed, when the application writes the InputStream
to the HttpServletResponse
, the file is incomplete.
This means that, in order to function currently, the application closes the first InputStream
and then opens a second InputStream
to return to User.
That's not good, because it means that the URL
is being called twice, wasting system resources.
What is the proper way to have this function?
CODE EXAMPLE
@GetMapping("/My/Download/")
public void doDownload(HttpServletResponse httpServletResponse) {
String externalFileURL = "http://www.pdf995.com/samples/pdf.pdf";
try {
InputStream firstStream = new URL(externalFileURL).openStream();
TikaConfig tikaConfig = new TikaConfig();
MediaType mediaType = tikaConfig.getDetector().detect(TikaInputStream.get(firstStream), new Metadata());
firstStream.close();
InputStream secondStream = new URL(externalFileURL).openStream();
httpServletResponse.setHeader("Content-Disposition", String.format("attachment; filename=\"%s\"", "DownloadMe." + mediaType.getSubtype()));
httpServletResponse.setContentType(mediaType.getBaseType().toString());
FileCopyUtils.copy(secondStream, httpServletResponse.getOutputStream());
httpServletResponse.flushBuffer();
} catch (Exception e) {
}
}
Javadoc of detect()
says:
The given stream is guaranteed to support the
mark feature
and the detector is expected tomark
the stream before reading any bytes from it, and toreset
the stream before returning.
Javadoc of TikaInputStream
says:
The created TikaInputStream instance keeps track of the original resource used to create it, while behaving otherwise just like a normal, buffered
InputStream
. A TikaInputStream instance is also guaranteed to support themark(int)
feature.
Which means you should use TikaInputStream
to read the content, and try-with-resources to close it:
try (InputStream tikaStream = TikaInputStream.get(new URL(externalFileURL))) {
TikaConfig tikaConfig = new TikaConfig();
MediaType mediaType = tikaConfig.getDetector().detect(tikaStream, new Metadata());
httpServletResponse.setHeader("Content-Disposition", String.format("attachment; filename=\"%s\"", "DownloadMe." + mediaType.getSubtype()));
httpServletResponse.setContentType(mediaType.getBaseType().toString());
FileCopyUtils.copy(tikaStream, httpServletResponse.getOutputStream());
httpServletResponse.flushBuffer();
}