Search code examples
javascripthtmldelphichromium-embedded

CEF4Delphi re-triggering HTML Retrieve after page change


I'm lost and confused with GetHTML methods in CEF4Delphi. No matter how I use them, I get practically the same code that was loaded at the beginning of page load, even if a lot of changes happen on the page. Consider with me the following example.

I load the page https://www.wp.pl/ or https://www.onet.pl/ for which I want to read the headlines of articles every day. After loading the page, the first headlines are of course visible, but I want to read all of them. So I scroll the page and try to read its content again after "loading" new elements that, after a while, are already visible on screen, but after trying to read HTML from the TChromium browser, I still get practically the same code as at the beginning, possibly with some changed line regarding ads on the page.

I use and handle the TChromium component, the display is via the CEFWindowParent component. There are roughly two approaches I've tried that work for reading HTML. First is this proc:

procedure GetHtml;
begin
  Form1.Chromium1.Browser.MainFrame.GetSourceProc(
    procedure(const aSource: ustring)
    begin
      html_result := aSource;
    end
  );
end;

and second is:

function GetSourceCEF4: string;
begin
  Form1.Chromium1.RetrieveHTML;
end;

with trigger

procedure TForm1.Chromium1TextResultAvailable(Sender: TObject; const aText: ustring);
begin
  html_result := aText;
end;

and this one is also possible, but of course it is not very suitable for my applications, because this event is not triggered after scrolling.

procedure TForm1.ChromiumBrowserLoadEnd(Sender: TObject; const browser: ICefBrowser;
  const frame: ICefFrame; httpStatusCode: Integer);
begin
  frame.GetSourceProc(
    procedure(const aSource: ustring)
    begin
      html_result := aSource;
    end
  );
end;

both approaches are quite familiar to CEF users and are described in more detail elsewhere. Of course I wait a few (dozens) seconds before reading HTML.

As I wrote - I have to scroll the page, but when the elements are fully loaded and visible in the screen browser, all the above-mentioned procedures still return the same code as immediately after loading the page from the link. Like no scroll happens. Of course, I mean all possible types of scrolling, both manual (simply with a mouse) and programmatically invoked:

  • SimulateKeyPress(Form1.Chromium1.Browser, VK_NEXT);
  • Form1.Chromium1.ExecuteJavaScript('window.scrollBy(0,500)', 'about:blank', 0);

I also tried various found commands in JS but none of them return anything to the TChromium browser:

Form1.Chromium1.Browser.MainFrame.ExecuteJavaScript(
  'window.cefQuery({request: document.documentElement.outerHTML});', 'about:blank', 0);

Form1.Chromium1.Browser.MainFrame.ExecuteJavaScript(
  'cefQuery = function() {' +
  '  var message = cef.createMessage("htmlContent");' +
  '  message.setArgument(0, document.documentElement.outerHTML);' +
  '  cef.sendMessage(message);' +
  '};' +
  'cefQuery();', 'about:blank', 0);

because none of them ultimately invokes any process trigger, for example OnProcessMessageReceived.

What am I doing wrong, why can't I access the full HMTL code?

PS. Just in case, I know that the code obtained from the context menu of the browser itself (right-click on the window) is a completely different method and copying it to the clipboard has no use for me (which, from what I can see, is hellishly difficult to program).


Solution

  • TChromiumCore.RetrieveHTML and ICefFrame.GetSourceProc use ICefFrame.GetSource internally.

    The MiniBrowser demo shows how to simulate a key press here.

    In your case, you should send VK_NEXT simulating WM_KEYDOWN, WM_CHAR and WM_KEYUP like this :

    var 
      TempKeyEvent : TCefKeyEvent;
    begin
      // WM_KEYDOWN
      TempKeyEvent.kind                    := KEYEVENT_RAWKEYDOWN;
      TempKeyEvent.modifiers               := 0;
      TempKeyEvent.windows_key_code        := VK_NEXT;
      TempKeyEvent.native_key_code         := 0;
      TempKeyEvent.is_system_key           := ord(False);
      TempKeyEvent.character               := #0;
      TempKeyEvent.unmodified_character    := #0;
      TempKeyEvent.focus_on_editable_field := ord(False);
      Chromium1.SendKeyEvent(@TempKeyEvent);
      // WM_CHAR
      TempKeyEvent.kind := KEYEVENT_CHAR;
      Chromium1.SendKeyEvent(@TempKeyEvent);
      // WM_KEYUP
      TempKeyEvent.kind := KEYEVENT_KEYUP;
      Chromium1.SendKeyEvent(@TempKeyEvent);
    end;
    

    I've tested TChromiumCore.RetrieveHTML in wp.pl after simulating several VK_NEXT key presses and the HTML is different.

    If you prefer to use JavaScript then download CEF4Delphi again and try the DOMVisitor demo. I just updated that demo to get the HTML using JS in the context menu here.

    It uses the console trick to send the document.documentElement.outerHTML result.

    The result is received in the TChromiumCore.OnConsoleMessage event.

    See this answer to know more details about sending information from JavaScript to Delphi.

    Edit : One important detail that could cause problems when you try to simulate keyboard events is that the browser must be focused while you send the key events.