I am getting markdown text from my API like this:
{
name:'Onur',
surname:'Gule',
biography:'## Computers
I like **computers** so much.
I wanna *be* a computer.',
membership:1
}
biography column includes markdown string like above.
## Computers
I like **computers** so much.
I wanna *be* a computer.
I want to take this markdown text and convert to docx string for my reports.
In my docx template:
{{markdownText|mark2html}}
{{simpleText}}
I am using python3 docxtpl package for creating docx and it's working for simple texts.
My current code:
import docx
from docxtpl import DocxTemplate, RichText
import markdown
import jinja2
import markupsafe
from bs4 import BeautifulSoup
import pypandoc
def safe_markdown(text):
return markupsafe.Markup(markdown.markdown(text))
def mark2html(value):
html = markdown.markdown(value)
soup = BeautifulSoup(html, features='html.parser')
output = pypandoc.convert_text(value,'rtf',format='md')
return RichText(value) #tried soup and pandoc..
def from_template(template):
template = DocxTemplate(template)
context = {
'simpleText':'Simple text test.',
'markdownText':'Markdown **text** test.'
}
jenv = jinja2.Environment()
jenv.filters['markdown'] = safe_markdown
jenv.filters["mark2html"] = mark2html
template.render(context,jenv)
template.save('new_report.docx')
So, how can I add rendered markdown to existed docx or while creating, maybe with a jinja2 filter?
I solved it without any shortcut. I turn the markdown to html with beautifulSoup and then process every paragraph by checking theirs tag names.
In my word template:
{% if markdownText != None %}
{% for mt in markdownText|mark2html %}
{{mt}}
{% endfor %}
{% endif %}
My template tag:
def mark2html(value):
if value == None:
return '-'
html = markdown.markdown(value)
soup = BeautifulSoup(html, features='html.parser')
paragraphs = []
global doc
for tag in soup.findAll(True):
if tag.name in ('p','h1','h2','h3','h4','h5','h6'):
paragraphs.extend(parseHtmlToDoc(tag))
return paragraphs
My code to insert docx:
def parseHtmlToDoc(org_tag):
contents = org_tag.contents
pars= []
for con in contents:
if str(type(con)) == "<class 'bs4.element.Tag'>":
tag = con
if tag.name in ('strong',"h1","h2","h3","h4","h5","h6"):
source = RichText("")
if len(pars) > 0 and str(type(pars[len(pars)-1])) == "<class 'docxtpl.richtext.RichText'>":
source = pars[len(pars)-1]
source.add(con.contents[0], bold=True)
else:
source.add(con.contents[0], bold=True)
pars.append(source)
elif tag.name == 'img':
source = tag['src']
imagen = InlineImage(doc, settings.MEDIA_ROOT+source)
pars.append(imagen)
elif tag.name == 'em':
source = RichText("")
source.add(con.contents[0], italic=True)
pars.append(source)
else:
source = RichText("")
if len(pars) > 0 and str(type(pars[len(pars)-1])) == "<class 'docxtpl.richtext.RichText'>":
source = pars[len(pars)-1]
pars.add(con)
else:
if org_tag.name == 'h2':
source.add(con,bold=True,size=40)
else:
source.add(con)
pars.append(source) # her zaman append?
return pars
It process html tags like b, i, img, headers. You can add more tags to process. I solved like that and it doesn't need any additional file transform like html2docx or etc.
I used this process in my code like this:
report_context = {'reportVariables': report_variables}
template = DocxTemplate('report_format.docx')
jenv = jinja2.Environment()
jenv.filters["mark2html"] = mark2html
template.render(report_context,jenv)
template.save('exported_1.docx')