I am getting to the stage of hair pulling with this one, I'm hoping someone can see if I'm doing anything wrong.
I'm trying to POST some form data to website using Azure data factory web activity however whilst I get a response (I get the page and some headers) it is different to the response I get if I make the exact same request using C# and HttpClient code. I've used fiddler to view the request being post'd using my C# script and according to the request information given in data factory they are exactly the same - so same headers, same content format etc...
This POST request is to login to a website which has a custom login mechanism, so no OAuth or anything like that unfortunately. It is supposed to return a cookie, which it does if I use my C# script, but if I make the same POST request using data factory web activity then I get different html sent back (it just returns the same login screen) and also different set of response headers in the "ADFWebActivityResponseHeaders" part of the activity output!?! See below for what is returned in the web activity output response headers:-
"ADFWebActivityResponseHeaders": {
"Pragma": "no-cache",
"Vary": "Accept-Encoding",
"X-Frame-Options": "DENY",
"Cache-Control": "no-store, must-revalidate, no-cache, post-check=0, pre-check=0",
"Date": "Wed, 09 Sep 2020 08:09:30 GMT",
"Server": "Microsoft-IIS/8.5"
}
If I do this via C# I also get a 'Set-Cookie' as well (strangely if I make a 'GET' request for the homepage of this site I do get a 'Set-Cookie' in the response!!!), but never when doing this via data factory. I'm struggling to see how this is possible unless data factory is modifying my request in some fashion? Below is my C# code, pretty simple/standard:-
var handler = new HttpClientHandler();
handler.CookieContainer = new CookieContainer();
handler.UseCookies = true;
handler.UseDefaultCredentials = false;
// Create our http client which will perform our web requests
var HttpClient = new HttpClient(handler);
HttpClient.BaseAddress = new Uri("**REMOVED**");
// Some of the extracts take a LONG time, so set the timeout for default of 30mins
HttpClient.Timeout = TimeSpan.FromMinutes(30);
// Set the 'form' parameters we're going to POST to the server in the request
var parameters = new Dictionary<string, string>
{
{ "username", "**REMOVED**" },
{ "password", "**REMOVED**" }
};
// URL encode the parameters
var content = new FormUrlEncodedContent(parameters);
// Submit our POST with the parameters
var response = await HttpClient.PostAsync("**REMOVED**", content);
Running this code and using fiddler I see the following request with headers, these are the only headers:-
Content-Length: 80
Content-Type: application/x-www-form-urlencoded
username=REMOVED&password=REMOVED
and in the 'input' side of the web activity is the details of the request, I've added the headers in the web activity and these are correct:-
"method": "POST",
"headers": {
"Content-Type": "application/x-www-form-urlencoded",
"Content-Length": 80
},
"body": "username=REMOVED&password=REMOVED"
Note that in the data factory I'm using a self hosted integration runtime as this website blocks addresses that do not come from the specific IP addresses used externally by our on-prem network/firewall. I know that is not the problem as I'm getting a response with the normal login page from the site (if I use the Azure integration runtime I get a denied response).
Here is a screen shot of the web activity in data factory:-
Really hope someone out there can see what I'm missing or whatever...
Turns out this does work and will list the cookies in the JSON output from the activity as shown below (note this is to be found in the output of the ADF activity, so you would pick up the cookie from the output a bit like... @activity('Login and get cookie').output.ADFWebActivityResponseHeaders["Set-Cookie"] )
However, in my case the url I was POSTing to was responding with a 302 (moved temporarily) but the 'Location' header which should be there is not in the ADFWebActivityResponseHeaders - which is why I missed it. I tried using Chrome with the developer tools and looked at the response directly which is where I found the 302 response code. After that, I just used the new URL given in the response headers (i.e. the url in the 'Location') that I found when using the browser dev tools.
Unfortunately at the time of writing, the Azure data factory HTTP activity does not follow redirects (and doesn't list all the response headers either!) so if anyone encounters the same problem they will need to manually find out and get the url's for any redirects. In other words, try using a tool like browser/postman and look at the response if it doesn't work in ADF... you might find there is a redirect going on :-)
There is a feature request logged for this here, be sure to add your vote :)