Search code examples
c#cloudflarecefsharp

CefSharp GetSourceAsync() as byte array


I have searched for hours on GitHub, StackOverflow and Google without success, so I think I'm stuck.

I get the source of a page located behind CloudFlare, using CefSharp, with this method :

public static string GetCefSource(string url)
{
    Browser.Load(url); //An instance of ChromiumWebBrowser
    Thread.Sleep(15000); //Debatable (it would be better to wait for the real content instead of waiting for "some random time until CloudFlare does its thing")
    string source = Browser.GetSourceAsync().ConfigureAwait(true).GetAwaiter().GetResult(); //Don't ask me the purpose of ConfigureAwait, NetAnalyzers asks for it (I'll look into it)
    return source;
}

It works 100% of the time (for now), but what if I use this method to get an image, in order to write it to the disk ?

With the method above, I get something like

"�PNG\r\n\u001a\n\0\0\0\rIHDR\0\0\u0001�\0\0\0�\b\u0006..."

which is perfectly normal. As you can see, it's the PNG image I'm waiting for, as a string, Unicode and so on, but not in the format I need.

I would like to manipulate the image before writing it to the disk, so I need the source as a byte array, in order to use it with

using MagickImage image = new(data)

So the question is :

How can I get a remote file as a byte array, like I do using HttpClient with

HttpContent.ReadAsByteArrayAsync().GetAwaiter().GetResult().ToArray()

but with CefSharp, because of CloudFlare ?

Thanks !


Solution

  • Thanks to amaitland for his fast answer about DownloadUrlAsync().

    I can now get any resource located behind CloudFlare, as a string or a byte[], with this :

    public static object GetCefSource(string url, bool asByteArray = false)
    {
        Browser.Load(url); //An instance of ChromiumWebBrowser
        Thread.Sleep(15000); //Wait for CloudFlare's Javascript to execute
        object result = Browser.GetSourceAsync().ConfigureAwait(true).GetAwaiter().GetResult();
    
        if (asByteArray)
        {
            IFrame mainFrame = Browser.GetMainFrame();
            result = mainFrame.DownloadUrlAsync(url).ConfigureAwait(true).GetAwaiter().GetResult();
        }
    
        return result;
    }
    

    I would probably rewrite it with an alternative to Thread.Sleep() and without the result boxed in an object.

    Thanks !