Google Chrome’s reading mode side panel is pretty good. It seems (to me) to outperform readability.js, dom-distiller and trafilatura on Google Search and ACS. So I wanted to feed its output to GPT-4, but it seems currently impossible to copy HTML from the panel. I’m looking for a solution.
Popular alternative Chrome reader extensions, browser built-in readers and open source solutions. None seemed as capable, or could parse Google Search results.
Old Chrome’s copy-able reading mode (based on dom-distiller instead of read-anything) could still be reproduced on Chrome version 104 with chrome-distiller://<UUID>_<HASH>/?url=<URL>
, but seemed less satisfactory and stable.
The reading mode side panel has an accessibility tree (e.g. via Accessibility Inspector on macOS). Theoretically it is possible to programmatically get and parse it, though I have no experience in UI automation.
In Chromium’s source, the ReadAnythingUntrustedPageHandler::OnCopy
function (src) seems to call main_observer_->web_contents()->Copy()
, the main HTML instead of the distilled one (web_ui_->GetWebContents()
?). Perhaps the source code can be modified and built, though I have little experience with C++ and big software engineering.
chrome-untrusted://read-anything-side-panel.top-chrome
and <read-anything-app />
elements seem to be involved. Perhaps there is a hack to access any of them?
There should be no theoretically inherent difficulty in copying the distilled HTML, especially with its source code known. As (it seems) one of the best text extraction solutions, there is a good reason to try using it in research/production.
So how difficult would it be to become able to copy HTML from Chrome’s reading mode side panel, and how might I go about achieving that?
For me the following worked on Mac OSX: