Search code examples
phpauthenticationselenium-webdriversingle-sign-onlogin-script

Alternative to Selenium Webdriver for SSO in PHP


I want to scrape a website that requires login with SSO. My problem is, that the site use SSO for authentication. Now I have found a solution with Selenium Webdriver in Python that works and I wonder if there are a possibility to do the same in PHP?

Maybe someone has already had the same problem and can help me...


Solution

  • PhantomJS is headless Webkit and Chrome will soon support headless mode. (I mention PhantomJS even though the lead dev recently announced that they were giving up on development of the project.)

    PhantomJS is a language agnostic, full browser stack-based solution for performing a variety of tasks, including screenshots and advanced web scraping. The downside is performance drops significantly for scraping tasks because the whole page is loaded - images, Javascript, iframes, etc. In my opinion, while it works, PhantomJS is quite overkill for most scraping tasks where only a subset of the presented information is considered to be useful.

    For nearly all of my web scraping and server-to-server communication needs in PHP, I use the Ultimate Web Scraper Toolkit that I wrote and actively maintain. It comes with everything needed: Content retrieval and data extraction tools. It works with almost everything I've thrown at it, including some really hairy Word HTML websites.

    Getting through SSO can be quite tricky, which is especially true for providers that may present a CAPTCHA and/or require two-factor authentication (e.g. Google or Facebook). Once you get signed in, saving the session for later use is a good idea (i.e. the website's cookies). That way you only have to authenticate one time and then just keep the session alive via regular communication with the remote host.