I'm developing an application that login in a website. I'm with problem 'cause, when i read the browser's request header, there is a cookie that the browser sends. I need to know how can i do that in my application, i mean, when i start a connection, it defines by itself the cookies of request. I tried to use this CookieHandler.setDefault( new CookieManager( null, CookiePolicy.ACCEPT_ALL ) );
but didn't work.
Source:
CookieHandler.setDefault( new CookieManager( null, CookiePolicy.ACCEPT_ALL ) );
URL url2 = new URL("https://m.example.com.br/login.jhtml");
HttpURLConnection conn = (HttpURLConnection) url2.openConnection();
conn.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
conn.setRequestMethod("POST");
conn.setRequestProperty("User-Agent","User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0");
conn.setRequestProperty("Content-Length", parameters + Integer.toString(parameters.getBytes().length));
conn.setFollowRedirects(true);
conn.setDoInput(true);
conn.setDoOutput(true);
conn.setUseCaches(false);
DataOutputStream wr = new DataOutputStream(conn.getOutputStream());
wr.writeBytes(parameters);
wr.flush();
wr.close();
if (conn.getResponseCode()== 200){
InputStream in = conn.getInputStream();
BufferedReader rd = new BufferedReader(new InputStreamReader(in));
String line=null;
StringBuffer response = new StringBuffer();
while((line = rd.readLine()) != null) {
response.append(line);
response.append('\r');
}
rd.close();
System.out.println(response.toString());
}
Request Header of my application:
Content-Type: application/x-www-form-urlencoded
User-Agent: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Connection: Keep-Alive
Accept-Encoding: gzip
Cookie: TS0163e05c="01ed0a5ec20a04efb37decf4185e55cfe68e06164c32f1a95d1d5b8f12c72abbee029ed64985c09681a55832e444c61821a1eb6fb22d6ed9880314fa0c342074316e309642";$Path="/";$Domain="example.com"; ps-website-switching-v2=%7B%22ps-website-switching%22%3A%22ps-website%22%7D; TS015a85bd=01ed0a5ec25aecf271e4e08c02f852e9ea6199a117a0a8e0339b3e98fd1d51518e5f09ead481039d4891f66e9cc48a13ced14792de
Content-Length: 198
Request Header of Browser:
Host: m.example.com
Connection: keep-alive
Content-Length: 197
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (Linux; Android 5.0.2; LG-D337 Build/LRX22G) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.137 Mobile Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
Cookie: _ga=GA1.3.313511484.1517525889; _gid=GA1.3.507266479.1517525889; DEretargeting=563; CSASF=; JS_SESS=; BT=%3B106%3B; DN....
Pay attention to the Cookies, why are they so difference? What can i do to send cookies like this without to have setting using the conn.setRequestProperty("Cookie",cookie);
?
HttpURLConnection
is not a very reliable way to scrape or interact with websites, for the following reasons:
CookieHandler
only works for cookies that are passed to you directly in the HTTP Response Headers. If anything within the content of the site (including embedded content like images) would cause more cookies to be created, you're not getting them with CookieHandler, because it doesn't understand HTML/JS/etc.You should use Selenium instead. Selenium automates a real web browser (or at least something closer to a real web browser) that can parse HTML and behaves according to the expectations of the web standards.
As far as which browser driver (backend) to use, here are a few options:
The difference between "headless" and "not headless" (or "headed", if you prefer) is that a headless browser does not create any GUI windows. If you're running on a headless Linux box, this is practically a requirement unless you want to create an Xvfb virtual X server or something. If you're running this from a computer with a graphical interface (Windows, MacOS, or desktop Linux), it's up to you if you want to see the browser pop up when you run your code.
Headless browsers do tend to be relatively faster, and you can scale out more instances of them in parallel because they aren't taking up any graphics resources on your system as you use them. They just use the browser engine itself to process the web content and allow you to access/drive it through Selenium.
If you do want headless, but you need the very latest web platform features and standards support, look into using Headless Chrome or Headless Firefox.
Headless Chrome intro: https://developers.google.com/web/updates/2017/04/headless-chrome
Headless Firefox intro: https://developer.mozilla.org/en-US/Firefox/Headless_mode