Search code examples
javaexcelvbscriptntlmbrowser-automation

Automate download of a dynamically generated file after NTLM Authentication using vbscript/java


I am trying to automate to download a dynamically generated file on a server, so here are the steps what I have to carry out manually -

  1. Login to the site using my credentials - It uses NTLM to authenticate.
  2. After login fill in the details in a form, which uses post method send the details to server and generates a file on server and revert back response to server.

So first I thought of doing it using VBScript:

  • Get a IE Automation object

  • sendkeys to send userword/password

  • Navigate to the page and download the file

  • Sendkeys to switch to save button - But here i got stuck because IE prompts me for save location which I can't determine. Also I thought of opening it but I am not able to get the automation object of Excel file opened this way :(

Even wget can't work as I have to post some data and basis of what it will provide me the resultant Excel file.

So after searching on web I found I can do it using MSXML2.xmlhttp object or Java sockets and download the page using get method, but again I have to provide my credentials to open the page.

So can anybody please help me how to authenticate the user and download the file in either way.

EDIT: Code

--Download the page

objmsXML.Open "GET", url, False
objmsXML.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
objmsXML.setRequestHeader "Referer", url
objmsXML.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36"
objmsXML.send

--Fill in the details and post data

If objmsXML.Status = 200 Then
  'First Response received.
  'get all the response headers
  responseHeaders = objmsXML.getAllResponseHeaders()
  responseBody = objmsXML.responseText    
End If

Dim viewState As String
Dim eventTarget, eventArgument, txtProjectID, btnSubmit, grdReportPostDataValue
Dim eventValidation As String

viewState = Split(Split(responseBody, "__VIEWSTATE")(2), """")(2)
viewState = URLEncode(viewState, False)

eventTarget = ""
eventArgument = ""
txtProjectID = projectID
btnSubmit = URLEncode("Submit")
grdReportPostDataValue = ""
eventValidation = Split(Split(responseBody, "__EVENTVALIDATION")(2), """")(2)
eventValidation = URLEncode(eventValidation, False)

objmsXML.Open "POST", url, False

Dim postData
postData = "__EVENTTARGET=" & eventTarget & "&__EVENTARGUMENT=" _
  & eventArgument & "&__VIEWSTATE=" & viewState & "&txtProjectID=" _
  & txtProjectID & "&btnSubmit=" & btnSubmit & "&grdReportPostDataValue" _
  & grdReportPostDataValue & "&__EVENTVALIDATION=" & eventValidation

objmsXML.send postData

But the problem is it is not giving me the second page as a result of form post. I believe this might be because I am not able to track the session cookie. Please help.


Solution

  • You can't authenticate a user without providing credentials. Manually download the file once while inspecting the session with something like Fiddler. That will reveal the headers required for the POST request. Then automate the request like this:

    url = "http://..."
    
    user = "..."
    pass = "..."
    credentials = "username=" & user & "&password=" & pass
    
    Set req = CreateObject("Msxml2.XMLHttp.6.0")
    req.open "POST", url, False
    req.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    req.send credentials
    

    Modify credentials according to what Fiddler revealed. You may need to encode the values (user and pass in the example above) with something like this:

    Function Encode(ByVal str)
      Set re = New RegExp
      re.Pattern = "[^a-zA-Z0-9_.~-]"
    
      enc = ""
      For i = 1 To Len(str)
        c = Mid(str, i, 1)
        If re.Test(c) Then c = "%" & Right("0" & Hex(Asc(c)), 2)
        enc = enc & c
      Next
    
      Encode = enc
    End Function
    

    Save the responseBody to a file like this:

    filename = "C:\your\output.xls"
    
    Set stream = CreateObject("ADODB.Stream")
    
    If req.status = 200 Then
      stream.Open
      stream.Type = 1 'binary
      stream.Write req.responseBody
      stream.SaveToFile filename, 2
      stream.Close
    End If
    

    If the filename is dynamically generated you may need to read the response and/or status text and send a second request to actually retrieve the file. However, that depends on the actual server response (as revealed by Fiddler).

    Edit: A solution I found here suggests using a WinHttpRequest object:

    Set req = CreateObject("WinHttp.WinHttpRequest.5.1")
    req.SetAutoLogonPolicy 0
    
    req.Open "POST", url, False
    req.SetRequestHeader "Content-Type", "application/x-www-form-urlencoded"
    req.Send credentials
    

    I can't test this, though, because I don't have a web server using NTLM authentication at hand.