Search code examples
c#selenium-webdrivertwitterselenium-chromedriversocks5

JS is disabled when trying to get page via Selenium "--headless=new"


The Twitter API is now paid, so now I need to write a parser for the tweet pages. I am using socks5 proxy.

So my first step was to get the tweet page directly through the socks5 proxy. I got a 302 code and an endless redirect.

Then I tried adding cookies and got a "Please enable JS" page.

So now we have decided to use Selenium to get this page. When I try to get the page without headless=new there is no problem, but when I try to use that argument, the "please include JS" page reappears.

What I've tried:

Also I tried to install different user agents, different Selenium libraries, explicitly set the path to the chrome driver (v114.0.5735.90 driver and google chrome v114.0.5735.199) and different browsers (Edge). JS was enabled.

I use the latest library version of the Selenium library, the language is C#

I created a simple console app for easy debugging - the basic code below should work (I believe):

using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;

Proxy proxy = new Proxy();
proxy.Kind = ProxyKind.Manual;
proxy.SocksVersion = 5;
proxy.SocksProxy = "host:port";
var options = new ChromeOptions();
options.AddArguments("--headless=new");
options.Proxy = proxy;
string pageSource = "";
using (var driver = new ChromeDriver(options))
{
    driver.Navigate().GoToUrl("https://twitter.com/ElonMuskAOC/status/1677171220184469505");
    pageSource = driver.PageSource;
}
Console.ReadLine(); 

Solution

  • Everything is fine, JS just needs time to execute). The noscript tag is always on the page and does not require additional time to appear