Search code examples
javascripthtmldomstaticbulk

Applying DOM Manipulations to HTML and saving the result?


I have about 100 static HTML pages that I want to apply some DOM manipulations to. They all follow the same HTML structure. I want to apply some DOM manipulations to each of these files, and then save the resulting HTML.

These are the manipulations I want to apply:

# [start]
$("h1.title, h2.description", this).wrap("<hgroup>");
if ( $("h1.title").height() < 200 ) {
  $("div.content").addClass('tall');
}
# [end]
# SAVE NEW HTML

The first line (.wrap()) I could easily do with a find and replace, but it gets tricky when I have to determine the calculated height of an element, which can't be easily be determined sans-JavaScript.

Does anyone know how I can achieve this? Thanks!


Solution

  • While the first part could indeed be solved in "text mode" using regular expressions or a more complete DOM implementation in JavaScript, for the second part (the height calculation), you'll need a real, full browser or a headless engine like PhantomJS.

    From the PhantomJS homepage:

    PhantomJS is a command-line tool that packs and embeds WebKit. Literally it acts like any other WebKit-based web browser, except that nothing gets displayed to the screen (thus, the term headless). In addition to that, PhantomJS can be controlled or scripted using its JavaScript API.


    A schematic instruction (which I admit is not tested) follows.

    In your modification script (say, modify-html-file.js) open an HTML page, modify it's DOM tree and console.log the HTML of the root element:

    var page = new WebPage();
    
    page.open(encodeURI('file://' + phantom.args[0]), function (status) {
        if (status === 'success') {
            var html = page.evaluate(function () {
                // your DOM manipulation here
                return document.documentElement.outerHTML;
            });
            console.log(html);
        }
        phantom.exit();
    });
    

    Next, save the new HTML by redirecting your script's output to a file:

    #!/bin/bash
    
    mkdir modified
    for i in *.html; do
        phantomjs modify-html-file.js "$1" > modified/"$1"
    done