Search code examples
ruby-on-railsrubygsubactiontext

Simple Table of Contents in Rails is not working as intended


I had this thought of a simple Table of contents in Rails. I have created a new Rails 8 app, worked. I installed action text. Scaffold'ed out a post model with title and body. Body has the rich text and uses the trix editor. Post.rb model has the needed "has_rich_text :body". I went at this in a few different ways but finally landed on just grabbing the body in the model using "before_save" and making adjustments there. The concept is straight forward. When you create a new post, If you add a "heading" (will give you a h1 tag), then I can look through the body, grab those h1 tags, snatch the content, create the anchor tag from the content and modify the h1 to include the id attribute, which would also be the content.

I added the migration for adding "toc" to posts model where toc:rich_text. Ensured "has_rich_text :toc" was in the post.rb model and :toc was added to the strong params. Added the field in the _post.html.erb to show it. Here is my frustration. It works. Once the post is created I get the view with the links that are the heading in the post body. The h1 tags do not have the id attributes. Oh, but if I log it, the before save to body does have the modified code, after save, it is not there, but rather the original post body.

Here is the actions I am working on in the post.rb.

class Post < ApplicationRecord
  has_rich_text :body
  has_rich_text :toc
  # has_rich_text :toc_body
  before_save :process_body

  private

  def process_body
    return if body.blank?

    # Convert the ActionText::RichText to a string
    body_content = body.to_s

    toc_data = generate_toc(body_content)

    # Log the modified body content
    # Rails.logger.debug("Modified Body: #{toc_data[:content]}")

    self.body = toc_data[:content]  # Store the modified body

    # Log the body after update
    # Rails.logger.debug("Body after update: #{self.toc_body}")

    self.toc = toc_data[:toc]        # Store the TOC
  end

  # Method to generate TOC and modify body
  def generate_toc(body)
    headings = body.scan(/<h1[^>]*>(.*?)<\/h1>/).flatten
    return { toc: "", content: body } if headings.empty?

    toc = "<ul>"
    ids = []  # Array to store generated IDs

    headings.each do |heading|
      id = heading.gsub(/\s+/, "-").downcase
      toc += "<li><a href='##{id}'>#{heading}</a></li>"
      ids << { heading: heading, id: id }  # Store heading and corresponding ID
    end

    # Manually build the new body content
    ids.each do |item|
      body.gsub!(/<h1>(#{Regexp.escape(item[:heading])})<\/h1>/, "<h1 id='#{item[:id]}'>\\1</h1>")
    end

    toc += "</ul>"

    { toc: toc.html_safe, content: body.html_safe }
  end
end

Here are the logs, I put bold ** around the points of interest to try and help:

Started POST "/posts" for ::1 at 2024-11-29 15:15:00 -0500
Processing by PostsController#create as TURBO_STREAM
  Parameters: {"authenticity_token"=>"[FILTERED]", "post"=>{"title"=>"Ruby Blog", "body"=>"<h1>Chapter 1</h1><div>Rails is sunshine and lollipops</div><h1>Chapter 2</h1><div>Not everyone like Brussels Sprouts&nbsp;</div>"}, "commit"=>"Create Post"}
  Rendered vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb within layouts/action_text/contents/_content (Duration: 4.2ms | GC: 0.0ms)
**Modified Body:** <!-- BEGIN app/views/layouts/action_text/contents/_content.html.erb --><div class="trix-content">
  <!-- BEGIN vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb -->**<h1 id='chapter-1'>**Chapter 1</h1><div>Rails is sunshine and lollipops</div>**<h1 id='chapter-2'>**Chapter 2</h1><div>Not everyone like Brussels Sprouts&nbsp;</div>
<!-- END vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb --></div>
<!-- END app/views/layouts/action_text/contents/_content.html.erb -->
  Rendered vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb within layouts/action_text/contents/_content (Duration: 2.7ms | GC: 0.0ms)
**Body after update:** <!-- BEGIN app/views/layouts/action_text/contents/_content.html.erb --><div class="trix-content">
  <!-- BEGIN vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb --><div class="trix-content">
  **<h1>**Chapter 1</h1><div>Rails is sunshine and lollipops</div>**<h1>**Chapter 2</h1><div>Not everyone like Brussels Sprouts&nbsp;</div>
</div>

<!-- END vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb --></div>
<!-- END app/views/layouts/action_text/contents/_content.html.erb -->

Another on a Slack channel suggested using Nokogiri. I installed the gem and rewrote the post.rb actions using it. I am still getting the same result. A helpful sort has suggested that the post.rb model may not be the best place for this. The best place I could think of to intercept and modify before save was the post model. I will think on this. Any advice would be appreciated. Here is the rewrite with Nokogiri:

class Post < ApplicationRecord
  # ActionText body
  has_rich_text :body
  # ActionText toc
  has_rich_text :toc

  # Use separate callbacks for create and update
  before_save :process_body

  private

  def process_body
    # Extract the HTML content from the rich text body
    body_content = body.to_trix_html

    # Make modifications and create toc
    result = generate_toc(body_content)

    self.toc = result[:toc]  # Update the TOC

    # Log result before save
    Rails.logger.debug("MODIFIED BODY BEFORE SAVE: #{result[:content]}")
    self.body = result[:content]  # Update the body with modified content
    # Log result after save
    Rails.logger.debug("BODY AFTER SAVE: #{self.body}")
  end

  def generate_toc(body)
    # Parse the HTML body with Nokogiri
    doc = Nokogiri::HTML::DocumentFragment.parse(body)

    # Find all <h1> tags
    headings = doc.css("h1")
    return { toc: "", content: body } if headings.empty?

    toc = "<ul>"

    headings.each do |heading|
      # Generate an ID for the heading
      id = heading.text.gsub(/\s+/, "-").downcase
      toc += "<li><a href='##{id}'>#{heading.text}</a></li>"

      # Set the ID attribute on the heading
      heading["id"] = id
    end

    toc += "</ul>"

    # Return the TOC and the modified content
    { toc: toc.html_safe, content: doc.to_html.html_safe }
  end
end

I would love any help here. At first I thought it was a issue with the loop and gsub, but now I am not so sure.


Solution

  • Don't make your model do everything. It already has more than enough responisbilites without adding HTML processing and generation to the list.

    Start with the job of extracting out the headers from the body:

    # Extacts the headers from a chunk of HTML and adds ids for anchor links
    class Outliner
      def initialize(html)
        @document = Nokogiri::HTML::DocumentFragment.parse(html)
      end
    
      # not the most elegant signature ever but works
      def perform
         headers = @document.css("h1")
                 .map do |node|
                    id = title_to_anchor(node.text)
                    node[:id] = id # mutates the document
                    { id: id, text: node.text }
                 end
        [ headers, @document ]
      end
    
      def self.perform(html)
        new(html).perform
      end
    
      private
      # This is an extremely naive implementation for a pretty complex problem - YMMV
      def title_to_anchor(title)
        title.parameterize
      end
    end
    
    # require "test_helper"
    class OutlinerTest < ActiveSupport::TestCase
      setup do
        @html = <<~HEREDOC
          <div>
            <h1>Header 1</h1>
            <h1>Header 2</h1>
          </div>
        HEREDOC
      end
    
      test "it extracts the text and a parameterized id" do
        expected =  [
                      { text: "Header 1", id: "header-1" },
                      { text: "Header 2", id: "header-2" }
                    ]
        assert_equal expected, Outliner.perform(@html).first
      end
      test "it adds id's to the headers" do
        doc = Outliner.perform(@html).last
        assert_equal "Header 2", doc.css("#header-2").text
      end
    end
    

    This is just a Plain Old Ruby Object amd easy to test as it just does one thing - HTML parsing.

    Modifying the document at the same time is a bit dirty but a compromise so that we don't have to process the HTML fragment twice which can be expensive.

    Note that just dasherizing the header text is a very naive implementation and will cause issues if/when the titles collide with other elements on the page.

    And then we will give generating the table of contents the same treatment:

    # Creates a Table of Contents from a list of headers
    class OutlineGenerator
    
      def initialize(headers)
        @headers = headers
        @context = ApplicationController.new.view_context
      end
    
      # This isn't the most elegant solution ever
      # alternatives would be to use context.render_to_string to render a ERB template
      # or view components
      def perform
        @context.tag.ul do |builder|
          @headers.each do |h|
            a =  builder.a(h[:text], href: "#" + h[:id])
            @context.concat(builder.li(a)) # writes the content into the string buffer
          end
        end
      end
    
      def self.perform(headers)
        new(headers).perform
      end
    end
    
    require "test_helper"
    class OutlineGeneratorTest < ActiveSupport::TestCase
      setup do
        @headers = [
          { text: "Header 1", id: "header-1" },
          { text: "Header 2", id: "header-2" }
        ]
      end
    
      test "it generates a list" do
        html = '<ul><li><a href="#header-1">Header 1</a></li><li><a href="#header-2">Header 2</a></li></ul>'
        assert_equal html, OutlineGenerator.perform(@headers)
      end
    end
    

    This isn't the most elegant solution ever, alternatives would be to use @context.render_to_string to render a ERB template or libraries like ViewComponents. The important part here is just that your model should not have to care about the details of how the HTML is generated.

    Then the model just has to worry about updating the attributes:

    class Post < ApplicationRecord
      # ...
    
      def process_body
        headers, document = Outliner.perform(body.to_trix_html)
        return unless headers.any?
        self.body = document.to_html
        self.toc = OutlineGenerator.perform(headers)
      end
    end
    

    Note that using a callback isn't the only way (and maybe not the best way) to go about this. You might want to consider moving it into a ActiveJob for example so that you can offload the processing to a background process instead of making the web tread wait.