Search code examples
javapdfurlcontent-typefileinputstream

Unable to download file with content type text/html


I have one URL which will download the pdf file when I try it on the browser directly. But when I use the same URL to download the file using FileInputStream in Java code, I'm getting an issue like a content type of URL is text/html, instead of application/pdf because of which we are unable to open the file as the content type inside the URL is not pdf.

Here the confusion comes, how come I'm able to download the file from the browser when the content-type is not application/pdf?

Anything wrong with the code?

String pdfUrl = service.getPdfUrl(bpaRequest);
URL url1 = new URL(pdfUrl);
FileOutputStream fos1 = new FileOutputStream(fileName);
System.out.print("Connecting to " + url1.toString() + " ... ");
URLConnection urlConn = url1.openConnection();

// Checking whether the URL contains a PDF
if (!urlConn.getContentType().equalsIgnoreCase("application/pdf")) {
    throw new CustomException("INVALID_CONTENT", "contentType is not pdf");
} else {
    InputStream is1 = url1.openStream();
    while ((baLength = is1.read(ba1)) != -1) {
        fos1.write(ba1, 0, baLength);
    }
    fos1.flush();
    fos1.close();
    is1.close();
}

Solution

  • In your case it looks like url is redirected into another URL from which real content is downloaded.

    You need to get check the Location header and if its non null then get value from header close connection and open new one on that link.

    Then when you invoke method getContentType() it will be application/pdf