Search code examples
htmlregexinnerhtmldocument-body

Removing <body> with a userscript


I'm sorry, but I am not very good with programming. I am trying to fix this irritating bug on my school's website through a userscript. I have tested the RegEx on several pages, at least that works. I need to make the userscript remove the parts I don't want to see. This is a snippet from the source of the website, I have marked what needs to be removed with '//'.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
//<html><head>
//<title>404 Not Found</title>
//</head><body>
//<h1>Not Found</h1>
//<p>The requested URL /get.php was not found on this server.</p>
//</body></html>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-gb" lang="en-gb" >
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta name="robots" content="index, follow" />

This is my userscript that does not work. I know it reflects my skills as a programmer, please don't hate.

var REGEX = /<HTML>(.*?)([^\n]*?\n+?)+?<\/BODY><\/HTML>/ig;
document.body.innerHTML=document.body.innerHTML.replace(REGEX, '');

Solution

  • This markup is obviously invalid, but the browser (at least Chrome and Firefox) will merge these two <html> sections together with its best guess. So interacting with document.body is probably not what you want.

    Doing something like this will visually fix the issue:

    document.querySelector('h1').remove() // remove first h1 "Not Found"
    document.querySelector('p').remove() // remove first p "The requested..."