Search code examples
javaseleniumiofileutils

How to save a .pdf from a browser?


I tried to save .pdf file using different methods I found on stackoverflow including FileUtils IO, however, I would always get it damaged. As I opened the damaged file using a notepad I got the following stuff:

<HEAD>

    <TITLE>
        09010b129fasdf558a-
    </TITLE>

</HEAD>


<HTML>

<SCRIPT language="javascript" src="./js/windowClose.js"></SCRIPT>

<LINK href="./theme/default.css" rel="stylesheet" type="text/css">
<LINK href="./theme/additions.css" rel="stylesheet" type="text/css">

<BODY leftmargin="0" topmargin="0">

<TABLE cellpadding="0" cellspacing="0" width="100%">
    <TR>
        <TD class="mainSectionHeader">
            <A href="javascript:windowClose()" class="allLinks">
                CLOSE
            </A>
        </TD>

    </TR>

</TABLE>

                <script language='javaScript'>
                    alert('Session timed out. Please login again.\n');
                    window.close();
                </script>



</BODY>

</HTML>

Later, I tried to save a .pdf file from a browser using the answer provided by @BalusC. This solution is very helpful: I was able to get rid of the session issues. However, it also produces a damaged .pdf. But as I open it with a notepad, it is completely different. There are no login issues anymore though:

<HTML>

    <HEAD>

        <TITLE>
            Evidence System
        </TITLE>

    </HEAD>

<LINK href="./theme/default.css" rel="stylesheet" type="text/css">

<TABLE cellpadding="0" cellspacing="0" class="tableWidth760" align="center">
    <TR>
        <TD class="headerTextCtr">
            Evidence System
        </TD>
    </TR>
    <TR>
        <TD colspan="2">
            <HR size="1" noshade>
        </TD>
    </TR>
    <TR>
        <TD colspan="2">



<HTML>
<HEAD>
<link href="./theme/default.css" rel="stylesheet" type="text/css">
<script language="JavaScript">

function trim(str)
{
    var trmd_str

    if(str != "")
    {
        trmd_str = str.replace(/\s*/, "")
        if (trmd_str != ""){

            trmd_str = trmd_str.replace(/\s*$/, "")
        }

    }else{
        trmd_str = str
    }
    return trmd_str
}  

function validate(frm){
    //check for User name 
    var msg="";
    if(trim(frm.userName.value)==""){
        msg += "Please enter your user id.\n";
        frm.userName.focus();
    }

    if(trim(frm.password.value)==""){
        msg += "Please enter your password.\n";
        frm.userName.focus();
    }

    if (trim(msg)==""){
        frm.submit();
    }else{
        alert(msg);
    }
}

function numCheck(event,frm){
    if( event.keyCode == 13){
            validate(frm);  
    }
}

</script>
</HEAD>

<BODY onLoad="document.frmLogin.userName.focus();">

<FORM name='frmLogin' method='post' action='./ServletVerify'>
    <TABLE width="100%" cellspacing="20">
        <tr>
            <td class="mainTextRt">
                Username
                <input type="text" name="userName" maxlength="32" tabindex="1" value="" 
                onKeyPress="numCheck(event,this.form)" class="formTextField120">
            </TD>
            <td class="mainTextLt">
                Password
                <input type="password" name="password" maxlength="32" tabindex="2" value="" 
                onKeyPress="numCheck(event,this.form)" class="formTextField120">
            </TD>
        </TR>

        <tr>                    
            <td colspan="2" class="mainTextCtr" style="color:red">
                Unknown Error
            </td>
        </tr>

        <tr>
            <td colspan="2" class="mainTextCtr">
                <input type="button" tabindex="3" value="Submit" onclick="validate(this.form)" >
            </TD>
        </TR>
    </TABLE>

    <INPUT TYPE="hidden" NAME="actionFlag" VALUE="inbox">
</FORM>

</BODY>
</HTML>

        </TD>
    </TR>
    <TR>
        <TD height="2"></TD>
    </TR>
    <TR>
        <TD colspan="2">
            <HR size="1" noshade>
        </TD>
    </TR>
    <TR>
        <TD colspan="2">
            <LINK href="./theme/default.css" rel="stylesheet" type="text/css">

<TABLE width="80%" align="center" cellspacing="0" cellpadding="0">
    <TR>
        <TD class="footerSubtext">
            Evidence Management System
        </TD>
    </TR>

    <!-- For development builds, change the date accordingly when sending EAR files out to Wal-Mart -->
    <TR>
        <TD class="footerSubtext">
            Build:&nbsp;&nbsp;v3.1
        </TD>
    </TR>

</TABLE>
        </TD>
    </TR>
</TABLE>

</HTML>

What other options do I have?

PS: When I try to save the file manually using CTRL+Shift+S , the file gets saved OK.


Solution

  • From the errorneous response which appears to be just a HTML error page:

    alert('Session timed out. Please login again.\n');

    It thus appears that downloading the PDF file is required to take place in a valid HTTP session. The HTTP session is backed by a cookie. The HTTP session in turn contains in the server side usually information about the currenty active and/or logged-in user.

    The Selenium web driver manages cookies all by itself fully transparently. You can obtain them programmatically as follows:

    Set<Cookie> cookies = driver.manage().getCookies();
    

    When manually fiddling with java.net.URL outside control of Selenium, you should be making sure yourself that the URL connection is using the same cookies (and thus also maintaining the same HTTP session). You can set cookies on the URL connection as follows:

    URLConnection connection = new URL(driver.getCurrentUrl()).openConnection();
    
    for (Cookie cookie : driver.manage().getCookies()) {
        String cookieHeader = cookie.getName() + "=" + cookie.getValue();
        connection.addRequestProperty("Cookie", cookieHeader);
    }
    
    InputStream input = connection.getInputStream(); // Write this to file.