I am trying to run an apps script function that should interpret text in a google document, and recognize markdown syntax, style it accordingly, and finally remove the markdown syntax characters.
I have been successfull converting # some headline
to a heading 1, and ## sub header
to a heading 2.
I would also like to generate bold elements. But It's proving challenging as bold text can occur in the middle of a sentence. And (I believe) that google documents split such a line into separate elements in the DOM.
The contents of my google document looks like this:
# Headline 1
## Secondary headline
Some **bold** text and some *italic text*
Some more text on a new line.
My apps script function:
function parseGoogleDocsFromMarkdown() {
var doc = DocumentApp.getActiveDocument();
var body = doc.getBody();
// Get the paragraphs in the document
var paragraphs = body.getParagraphs();
// Define the regular expression patterns to match Markdown styles
var heading1Pattern = /^# (.*)$/;
var heading2Pattern = /^## (.*)$/;
var boldPattern = /\*\*(.*?)\*\*/g;
// Process each paragraph for Markdown styles
for (var i = 0; i < paragraphs.length; i++) {
var paragraph = paragraphs[i];
var text = paragraph.getText();
// Process heading styles
if (heading1Pattern.test(text)) {
var modifiedText = text.replace(heading1Pattern, "$1");
paragraph.setHeading(DocumentApp.ParagraphHeading.HEADING1);
paragraph.setText(modifiedText);
} else if (heading2Pattern.test(text)) {
var modifiedText = text.replace(heading2Pattern, "$1");
paragraph.setHeading(DocumentApp.ParagraphHeading.HEADING2);
paragraph.setText(modifiedText);
}
// Process bold
var boldMatches = text.matchAll(boldPattern);
for (var match of boldMatches) {
var originalText = match[0];
var modifiedText = match[1];
var text = paragraph.getText();
var newText = text.replace(originalText, modifiedText); // remove astrix tags
paragraph.setText(newText); // update paragraph
paragraph.setBold(true); // This sets the entire line of text to bold, not just the word
}
}
}
In my function, where I update the text of the paragraph, I some how need to split it into several paragraphs I believe, and just set the desired one(s) to bold, but I can't figure out how to go about this - especially because there could be many bold words in one sentence separated with non-bold words. Any suggestions to an approach?
From your following reply,
But for now, the most important is headlines (1-6), and bold text. italic is not very important this moment
I believe your goal is as follows.
# Header1
, ## Header2
,,, to header 1, header 2,,, respectively.**bold**
to the bold style.In this case, how about the following sample script?
function sample() {
const headers = [...Array(6)].map((_, i) => ({ type: "header", search: `^#{${i + 1}} (.*)$`, c: i + 1, style: DocumentApp.ParagraphHeading[`HEADING${i + 1}`] })).reverse();
const texts = [...Array(2)].map((_, i) => ({ type: "text", search: `\\*{${i + 1}}(.*?)\\*{${i + 1}}`, c: i + 1, style: i == 0 ? "setItalic" : "setBold" })).reverse();
const searchPatterns = [...headers, ...texts];
const search = (body, { type, search, c, style }) => {
let s = body.findText(search);
while (s) {
const e = s.getElement();
const start = s.getStartOffset();
const end = s.getEndOffsetInclusive()
if (type == "header") {
const p = e.getParent();
if (p.getType() == DocumentApp.ElementType.PARAGRAPH) {
p.asParagraph().setHeading(style).editAsText().deleteText(0, c - 1);
}
} else if (type == "text") {
e.asText()[style](start, end, true).deleteText(end - (c - 1), end).deleteText(start, start + (c - 1));
}
s = body.findText(search, s);
}
}
const body = DocumentApp.getActiveDocument().getBody();
searchPatterns.forEach(o => search(body, o));
}
When this script is run, the following result is obtained.