I have an application that needs to get nodes of the html page on a website. The problem is that the page requires the user to be logged in. I tried to find topics about logging in on the websites, people mostly have two fields: login and password.
But in my case there is a combobox with cities list: login form screenshot.
My current code:
class Program
{
static void Main(string[] args)
{
var client = new CookieAwareWebClient();
client.BaseAddress = @"https://mystat.itstep.org/ru/login";
var loginData = new NameValueCollection();
loginData.Add("login", "login");
loginData.Add("password", "password");
client.UploadValues("login.php", "POST", loginData);
string htmlSource = client.DownloadString("index.php");
Console.WriteLine("Logged in!");
}
}
public class CookieAwareWebClient : WebClient
{
private CookieContainer cookie = new CookieContainer();
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = cookie;
}
return request;
}
}
How can I choose one of cities in this list via c#?
You'll have to do an initial GET first, to get cookies and the csrf token that you need on your first post. The csrf token needs to be parsed out of the first html response so you can supply it together with your username and password.
This is what your mainflow should look like:
var client = new CookieAwareWebClient();
client.BaseAddress = @"https://mystat.itstep.org/en/login";
// do an initial get to have cookies sends to you
// have a server session initiated
// and we need to find the csrf token
var login = client.DownloadString("/");
string csrf;
// parse the file and go looking for the csrf token
ParseLogin(login, out csrf);
var loginData = new NameValueCollection();
loginData.Add("login", "someusername");
loginData.Add("password", "somepassword");
loginData.Add("city_id", "29"); // I picked this value fromn the raw html
loginData.Add("_csrf", csrf);
var loginResult = client.UploadValues("login.php", "POST", loginData);
// get the string from the received bytes
Console.WriteLine(Encoding.UTF8.GetString(loginResult));
// your task is to make sense of this result
Console.WriteLine("Logged in!");
The parsing needs to as complex as you need. I only implemented something that gets you the csrf token. I leave the parsing of the cities (hint: they start with <select
and then have <option
on each line until you find a </select>
) for you to implement as an advance exercise. Don't bother asking me for it.
Here is the csrf parsing logic:
void ParseLogin(string html, out string csrf)
{
csrf = null;
// read each line of the html
using(var sr = new StringReader(html))
{
string line;
while((line = sr.ReadLine()) != null)
{
// parse for csrf by looking for the input tag
if (line.StartsWith(@"<input type=""hidden"" name=""_csrf""") && csrf == null)
{
// string split by space
csrf = line
.Split(' ') // split to array of strings
.Where(s => s.StartsWith("value")) // value="what we need is here">
.Select(s => s.Substring(7,s.Length -9)) // remove value=" and the last ">
.First();
}
}
}
}
If you feel adventurous you can write am html parser, go crazy with string methods, try some regex or use a library
Keep in mind that scraping websites might be against the terms of service of the site. Verify that what you're doing is allowed / doesn't interfere with their operation.