I had this thought of a simple Table of contents in Rails. I have created a new Rails 8 app, worked. I installed action text. Scaffold'ed out a post model with title and body. Body has the rich text and uses the trix editor. Post.rb model has the needed "has_rich_text :body". I went at this in a few different ways but finally landed on just grabbing the body in the model using "before_save" and making adjustments there. The concept is straight forward. When you create a new post, If you add a "heading" (will give you a h1 tag), then I can look through the body, grab those h1 tags, snatch the content, create the anchor tag from the content and modify the h1 to include the id attribute, which would also be the content.
I added the migration for adding "toc" to posts model where toc:rich_text. Ensured "has_rich_text :toc" was in the post.rb model and :toc was added to the strong params. Added the field in the _post.html.erb to show it. Here is my frustration. It works. Once the post is created I get the view with the links that are the heading in the post body. The h1 tags do not have the id attributes. Oh, but if I log it, the before save to body does have the modified code, after save, it is not there, but rather the original post body.
Here is the actions I am working on in the post.rb.
class Post < ApplicationRecord
has_rich_text :body
has_rich_text :toc
# has_rich_text :toc_body
before_save :process_body
private
def process_body
return if body.blank?
# Convert the ActionText::RichText to a string
body_content = body.to_s
toc_data = generate_toc(body_content)
# Log the modified body content
# Rails.logger.debug("Modified Body: #{toc_data[:content]}")
self.body = toc_data[:content] # Store the modified body
# Log the body after update
# Rails.logger.debug("Body after update: #{self.toc_body}")
self.toc = toc_data[:toc] # Store the TOC
end
# Method to generate TOC and modify body
def generate_toc(body)
headings = body.scan(/<h1[^>]*>(.*?)<\/h1>/).flatten
return { toc: "", content: body } if headings.empty?
toc = "<ul>"
ids = [] # Array to store generated IDs
headings.each do |heading|
id = heading.gsub(/\s+/, "-").downcase
toc += "<li><a href='##{id}'>#{heading}</a></li>"
ids << { heading: heading, id: id } # Store heading and corresponding ID
end
# Manually build the new body content
ids.each do |item|
body.gsub!(/<h1>(#{Regexp.escape(item[:heading])})<\/h1>/, "<h1 id='#{item[:id]}'>\\1</h1>")
end
toc += "</ul>"
{ toc: toc.html_safe, content: body.html_safe }
end
end
Here are the logs, I put bold ** around the points of interest to try and help:
Started POST "/posts" for ::1 at 2024-11-29 15:15:00 -0500
Processing by PostsController#create as TURBO_STREAM
Parameters: {"authenticity_token"=>"[FILTERED]", "post"=>{"title"=>"Ruby Blog", "body"=>"<h1>Chapter 1</h1><div>Rails is sunshine and lollipops</div><h1>Chapter 2</h1><div>Not everyone like Brussels Sprouts </div>"}, "commit"=>"Create Post"}
Rendered vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb within layouts/action_text/contents/_content (Duration: 4.2ms | GC: 0.0ms)
**Modified Body:** <!-- BEGIN app/views/layouts/action_text/contents/_content.html.erb --><div class="trix-content">
<!-- BEGIN vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb -->**<h1 id='chapter-1'>**Chapter 1</h1><div>Rails is sunshine and lollipops</div>**<h1 id='chapter-2'>**Chapter 2</h1><div>Not everyone like Brussels Sprouts </div>
<!-- END vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb --></div>
<!-- END app/views/layouts/action_text/contents/_content.html.erb -->
Rendered vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb within layouts/action_text/contents/_content (Duration: 2.7ms | GC: 0.0ms)
**Body after update:** <!-- BEGIN app/views/layouts/action_text/contents/_content.html.erb --><div class="trix-content">
<!-- BEGIN vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb --><div class="trix-content">
**<h1>**Chapter 1</h1><div>Rails is sunshine and lollipops</div>**<h1>**Chapter 2</h1><div>Not everyone like Brussels Sprouts </div>
</div>
<!-- END vendor/bundle/ruby/3.3.0/gems/actiontext-8.0.0/app/views/action_text/contents/_content.html.erb --></div>
<!-- END app/views/layouts/action_text/contents/_content.html.erb -->
Another on a Slack channel suggested using Nokogiri. I installed the gem and rewrote the post.rb actions using it. I am still getting the same result. A helpful sort has suggested that the post.rb model may not be the best place for this. The best place I could think of to intercept and modify before save was the post model. I will think on this. Any advice would be appreciated. Here is the rewrite with Nokogiri:
class Post < ApplicationRecord
# ActionText body
has_rich_text :body
# ActionText toc
has_rich_text :toc
# Use separate callbacks for create and update
before_save :process_body
private
def process_body
# Extract the HTML content from the rich text body
body_content = body.to_trix_html
# Make modifications and create toc
result = generate_toc(body_content)
self.toc = result[:toc] # Update the TOC
# Log result before save
Rails.logger.debug("MODIFIED BODY BEFORE SAVE: #{result[:content]}")
self.body = result[:content] # Update the body with modified content
# Log result after save
Rails.logger.debug("BODY AFTER SAVE: #{self.body}")
end
def generate_toc(body)
# Parse the HTML body with Nokogiri
doc = Nokogiri::HTML::DocumentFragment.parse(body)
# Find all <h1> tags
headings = doc.css("h1")
return { toc: "", content: body } if headings.empty?
toc = "<ul>"
headings.each do |heading|
# Generate an ID for the heading
id = heading.text.gsub(/\s+/, "-").downcase
toc += "<li><a href='##{id}'>#{heading.text}</a></li>"
# Set the ID attribute on the heading
heading["id"] = id
end
toc += "</ul>"
# Return the TOC and the modified content
{ toc: toc.html_safe, content: doc.to_html.html_safe }
end
end
I would love any help here. At first I thought it was a issue with the loop and gsub, but now I am not so sure.
Don't make your model do everything. It already has more than enough responisbilites without adding HTML processing and generation to the list.
Start with the job of extracting out the headers from the body:
# Extacts the headers from a chunk of HTML and adds ids for anchor links
class Outliner
def initialize(html)
@document = Nokogiri::HTML::DocumentFragment.parse(html)
end
# not the most elegant signature ever but works
def perform
headers = @document.css("h1")
.map do |node|
id = title_to_anchor(node.text)
node[:id] = id # mutates the document
{ id: id, text: node.text }
end
[ headers, @document ]
end
def self.perform(html)
new(html).perform
end
private
# This is an extremely naive implementation for a pretty complex problem - YMMV
def title_to_anchor(title)
title.parameterize
end
end
# require "test_helper"
class OutlinerTest < ActiveSupport::TestCase
setup do
@html = <<~HEREDOC
<div>
<h1>Header 1</h1>
<h1>Header 2</h1>
</div>
HEREDOC
end
test "it extracts the text and a parameterized id" do
expected = [
{ text: "Header 1", id: "header-1" },
{ text: "Header 2", id: "header-2" }
]
assert_equal expected, Outliner.perform(@html).first
end
test "it adds id's to the headers" do
doc = Outliner.perform(@html).last
assert_equal "Header 2", doc.css("#header-2").text
end
end
This is just a Plain Old Ruby Object amd easy to test as it just does one thing - HTML parsing.
Modifying the document at the same time is a bit dirty but a compromise so that we don't have to process the HTML fragment twice which can be expensive.
Note that just dasherizing the header text is a very naive implementation and will cause issues if/when the titles collide with other elements on the page.
And then we will give generating the table of contents the same treatment:
# Creates a Table of Contents from a list of headers
class OutlineGenerator
def initialize(headers)
@headers = headers
@context = ApplicationController.new.view_context
end
# This isn't the most elegant solution ever
# alternatives would be to use context.render_to_string to render a ERB template
# or view components
def perform
@context.tag.ul do |builder|
@headers.each do |h|
a = builder.a(h[:text], href: "#" + h[:id])
@context.concat(builder.li(a)) # writes the content into the string buffer
end
end
end
def self.perform(headers)
new(headers).perform
end
end
require "test_helper"
class OutlineGeneratorTest < ActiveSupport::TestCase
setup do
@headers = [
{ text: "Header 1", id: "header-1" },
{ text: "Header 2", id: "header-2" }
]
end
test "it generates a list" do
html = '<ul><li><a href="#header-1">Header 1</a></li><li><a href="#header-2">Header 2</a></li></ul>'
assert_equal html, OutlineGenerator.perform(@headers)
end
end
This isn't the most elegant solution ever, alternatives would be to use @context.render_to_string
to render a ERB template or libraries like ViewComponents. The important part here is just that your model should not have to care about the details of how the HTML is generated.
Then the model just has to worry about updating the attributes:
class Post < ApplicationRecord
# ...
def process_body
headers, document = Outliner.perform(body.to_trix_html)
return unless headers.any?
self.body = document.to_html
self.toc = OutlineGenerator.perform(headers)
end
end
Note that using a callback isn't the only way (and maybe not the best way) to go about this. You might want to consider moving it into a ActiveJob for example so that you can offload the processing to a background process instead of making the web tread wait.