I am trying to create a headless chrome using .NET core and code I am using shared below. As per the "https://developer.chrome.com/docs/chromium/new-headless" it has to send the WebSocket URL to standard output. After starting the process I have created to delay to allow chrome to start. And I have read the Standard Output and Standard Error but both are empty. I have tried some trial and error as per the link above but still facing issue.
And apart from this is using "--dump-dom" is better approach for web scraping websites which has JS modifying the HTML or using this websocket URL in .NET Core
static void Main(string[] args)
{
StartBrowser().GetAwaiter().GetResult();
}
public static async Task StartBrowser()
{
using (var process = new Process())
{
process.StartInfo.FileName = "C:\\Program Files\\Google\\Chrome\\Application\\chrome";
process.StartInfo.CreateNoWindow = false;
process.StartInfo.UseShellExecute = false;
process.StartInfo.RedirectStandardOutput = true;
process.StartInfo.RedirectStandardError = true;
process.StartInfo.RedirectStandardInput = true;
string arguments = "--headless=new --remote-debugging-port=0 https://developer.chrome.com/";
process.StartInfo.Arguments = arguments;
Console.WriteLine("Start Time:" + DateTime.Now.ToString());
process.Start();
await Task.Delay(20000);
Task<string> output = process.StandardOutput.ReadToEndAsync();
Task<string> error = process.StandardError.ReadToEndAsync();
var task =await Task.WhenAll(output,error);
Console.WriteLine(process.StandardOutput.ReadToEnd());
Console.WriteLine(process.StandardError.ReadToEnd());
Console.WriteLine("End Time: " + DateTime.Now.ToString();
process.WaitForExit();
}
}
Besides that when I am trying to connect to CDP socket using postman and trying to perform some commands like "Page.enable",Network.enable" many or not working only "Target" related commands are working.
ReadToEnd()
is the culprit! It is trying to read till the end of stream, which will only happen when chrome exits.
You can remove all the awaits/Task/delay etc and use this instead:
// your existing code till process.Start()
while (true)
{
string? stdErrorLine = process.StandardError.ReadLine();
if (stdErrorLine != null)
{
Console.WriteLine(stdErrorLine);
}
else
{
break;
}
}
Console.WriteLine("End Time: " + DateTime.Now.ToString();
process.WaitForExit();
Now to shed some light on Page.enable
etc not working, this this may help.