Search code examples
htmlrubyjekyllliquid

Indenting generated markup in Jekyll/Ruby


Well this is probably kind of a silly question but I'm wondering if there's any way to have the generated markup in Jekyll to preserve the indentation of the Liquid-tag. World doesn't end if it isn't solvable. I'm just curious since I like my code to look tidy, even if compiled. :)

For example I have these two:

base.html:

<body>
    <div id="page">
        {{content}}
    </div>
</body>

index.md:

---
layout: base
---
<div id="recent_articles">
    {% for post in site.posts %}
    <div class="article_puff">
        <img src="/resources/images/fancyi.jpg" alt="" />
        <h2><a href="{{post.url}}">{{post.title}}</a></h2>
        <p>{{post.description}}</p>
        <a href="{{post.url}}" class="read_more">Read more</a>
    </div>
    {% endfor %}    
</div>

Problem is that the imported {{content}}-tag is rendered without the indendation used above.

So instead of

<body>
    <div id="page">
        <div id="recent_articles">  
            <div class="article_puff">
                <img src="/resources/images/fancyimage.jpg" alt="" />
                <h2><a href="/articles/2012/11/14/gettin-down-with-rwd.html">Gettin' down with responsive web design</a></h2>
                <p>Everyone's talking about it. Your client wants it. You need to code it.</p>
                <a href="/articles/2012/11/14/gettin-down-with-rwd.html" class="read_more">Read more</a>
            </div>
        </div>
    </div>
</body>

I get

<body>
    <div id="page">
        <div id="recent_articles">  
<div class="article_puff">
<img src="/resources/images/fancyimage.jpg" alt="" />
    <h2><a href="/articles/2012/11/14/gettin-down-with-rwd.html">Gettin' down with responsive web design</a></h2>
    <p>Everyone's talking about it. Your client wants it. You need to code it.</p>
    <a href="/articles/2012/11/14/gettin-down-with-rwd.html" class="read_more">Read more</a>
</div>
</div>
    </div>
</body>

Seems like only the first line is indented correctly. The rest starts at the beginning of the line... So, multiline liquid-templating import? :)


Solution

  • Using a Liquid Filter

    I managed to make this work using a liquid filter. There are a few caveats:

    • Your input must be clean. I had some curly quotes and non-printable chars that looked like whitespace in a few files (copypasta from Word or some such) and was seeing "Invalid byte sequence in UTF-8" as a Jekyll error.

    • It could break some things. I was using <i class="icon-file"></i> icons from twitter bootstrap. It replaced the empty tag with <i class="icon-file"/> and bootstrap did not like that. Additionally, it screws up the octopress {% codeblock %}s in my content. I didn't really look into why.

    • While this will clean the output of a liquid variable such as {{ content }} it does not actually solve the problem in the original post, which is to indent the html in context of the surrounding html. This will provide well formatted html, but as a fragment that will not be indented relative to tags above the fragment. If you want to format everything in context, use the Rake task instead of the filter.

    -

    require 'rubygems'
    require 'json'
    require 'nokogiri'
    require 'nokogiri-pretty'
    
    module Jekyll
      module PrettyPrintFilter
        def pretty_print(input)
          #seeing some ASCII-8 come in
          input = input.encode("UTF-8")
    
          #Parsing with nokogiri first cleans up some things the XSLT can't handle
          content = Nokogiri::HTML::DocumentFragment.parse input
          parsed_content = content.to_html
    
          #Unfortunately nokogiri-pretty can't use DocumentFragments...
          html = Nokogiri::HTML parsed_content
          pretty = html.human
    
          #...so now we need to remove the stuff it added to make valid HTML
          output = PrettyPrintFilter.strip_extra_html(pretty)
          output
        end
    
        def PrettyPrintFilter.strip_extra_html(html)
          #type declaration
          html = html.sub('<?xml version="1.0" encoding="ISO-8859-1"?>','')
    
          #second <html> tag
          first = true
          html = html.gsub('<html>') do |match|
            if first == true
              first = false
              next
            else
              ''
            end
          end
    
          #first </html> tag
          html = html.sub('</html>','')
    
          #second <head> tag
          first = true
          html = html.gsub('<head>') do |match|
            if first == true
              first = false
              next
            else
              ''
            end
          end
    
          #first </head> tag
          html = html.sub('</head>','')
    
          #second <body> tag
          first = true
          html = html.gsub('<body>') do |match|
            if first == true
              first = false
              next
            else
              ''
            end
          end
    
          #first </body> tag
          html = html.sub('</body>','')
    
          html
        end
      end
    end
    
    Liquid::Template.register_filter(Jekyll::PrettyPrintFilter)
    

    Using a Rake task

    I use a task in my rakefile to pretty print the output after the jekyll site has been generated.

    require 'nokogiri'
    require 'nokogiri-pretty'
    
    desc "Pretty print HTML output from Jekyll"
    task :pretty_print do
      #change public to _site or wherever your output goes
      html_files = File.join("**", "public", "**", "*.html")
    
      Dir.glob html_files do |html_file|
        puts "Cleaning #{html_file}"
    
        file = File.open(html_file)
        contents = file.read
    
        begin
          #we're gonna parse it as XML so we can apply an XSLT
          html = Nokogiri::XML(contents)
    
          #the human() method is from nokogiri-pretty. Just an XSL transform on the XML.
          pretty_html = html.human
        rescue Exception => msg
          puts "Failed to pretty print #{html_file}: #{msg}"
        end
    
        #Yep, we're overwriting the file. Potentially destructive.
        file = File.new(html_file,"w")
        file.write(pretty_html)
    
        file.close
      end
    end