Search code examples
c#web-scrapingscrapysharp

How to find the form in scrapysharp when it only has attributes i.e. no name or id


I am new to scrapySharp as well as web scraping. I am trying to scrape a site that is secured and has a login screen. The form element does not have a name/id attribute, thus making my life more complicated. I have been unable to figure out how to load the form using the code below. Any insight is greatly appreciated!

C#:

ScrapingBrowser browser = new ScrapingBrowser();
var homepage = browser.NavigateToPage(new Uri("https://somedomain.com/ProviderLogin.action/"));
var form1 = homepage.Find("form", ScrapySharp.Html.By.Text("form"));
var form2 = homepage.FindFormById("form[action='provider-login']");

HTML:

   <form action="provider-login" method="post">           
        <div class="login-box">   
            <input type="text" name="username" id="username" autocomplete="false" placeholder="Username" 
                   class="form-control input-lg login-input login-input-username" value="" />                   
            <input type="password" id="password" name="password" placeholder="Password" type="password"
             class="form-control input-lg login-input login-input-password" />
            <button name="login" type="submit" class="btn btn-primary btn-block btn-md login-btn" >
                Login
            </button>            
        </div>
    </form>

Solution

  • You can't achieve that using in ScrapySharp using the "By" since it has just four "Element Search Kinds" :

    {
       Text,
       Id,
       Name,
       Class
    }
    

    In your case, you don't have one of them so consider to use "CssSelect" instead to achieve your purpose :

    var form = homepage.Html.CssSelect("form[action='provider-login']");
    //Or
    var form = homepage.Html.CssSelect("form[action*='provider-login']");