Search code examples
htmlscriptingincludetransformationserver-side-includes

Split HTML file by <section>, into separate include files?


I want to split a HTML file, by the <section> tag, into separate files.

An example might be:

mypage.html

<!DOCTYPE html>
<html>
    <head>
         ...
    </head>
<body>
    <!-- Section 1 -->
    <section class="foo">
        ...
    </section>

    <!-- Section 2 -->
    <section class="bar">
        ...
    </section>

    <!-- Section 3 -->
    ...
</body>
</html>

The desired outcome would then be enumerated as so:

/mypage.html            # (original file)
/mypage-split.html      # (original file, with placeholders to replace the section back in)

# component/include files (that of course will not be valid HTML, since it's just a portion and won't start with `DOCTYPE` or `html`)
/sections/mypage-1.htmlinc      # (section 1 markup)
/sections/mypage-2.inc          # (section 2 markup)
...
/sections/mypage-n.html

How can I perform this split?

A shell script might be the easiest way, but my scripting skill is very limited.

Or, is there any web standard to keep components of HTML pages in separated files (supported by browsers or web-servers), without having to resort on a web programming language? (server or client side)


Solution

  • tag=section
    sed -n "/<$tag>/,/<\/$tag>/p" section.inc
    

    This should be a starting point for you:
    you can specify the target HTML tag name into the tag environment variable;
    sed will extract the file content delimited by your tag and put it into the filepath