Search code examples
angularparsingmarkdowneditorrich-text-editor

How to parse html to markdown from and a reverse action, parse markdown to html in Angular?


I need a parser, that can parse a text from Angular Editor, which is a string field filled with html, to markdown.

And I need a reverse action, that can parse markdown text to string field with html.

Thank you in advance.


Solution

  • Narek.

    I had the same problem related with parsing string with html to markdown, there were a few libraries with the ability to parse only in one direction, and then they did not parse all the elements.

    After a lot of searching and disappointment I desided to create service that can do two these actions html => markdown and markdown => html.

    Here the service I have created for my project, but maybe it can help you too.

    import { Injectable } from '@angular/core';
    import * as Markdown from 'marked';
    
    @Injectable()
    export class MarkdownHtmlParserService {
    
      public parseHtmlToMarkdown(html: string): string {
        if (!html) {
          return '';
        }
        html = this.setBreaksToHtml(html);
    
        let markdown = html;
        let snipped = document.createElement('div');
        snipped.innerHTML = markdown;
        let links = snipped.getElementsByTagName('a');
        let markdownLinks = [];
        for (let i = 0; i < links.length; i++) {
          if (links[i]) {
            let marked = `[${links[i].innerText}](${links[i].href})`;
            markdown = markdown.replace(links[i].outerHTML, marked);
            markdownLinks[i] = marked;
          }
        }
    
        markdown = markdown.replace(/<h1>/g, '# ').replace(/<\/h1>/g, '');
        markdown = markdown.replace(/<h2>/g, '## ').replace(/<\/h1>/g, '');
        markdown = markdown.replace(/<h3>/g, '### ').replace(/<\/h1>/g, '');
        markdown = markdown.replace(/<h4>/g, '#### ').replace(/<\/h1>/g, '');
        markdown = this.parseAll(markdown, 'strong', '**');
        markdown = this.parseAll(markdown, 'b', '**');
        markdown = this.parseAll(markdown, 'em', '__');
        markdown = this.parseAll(markdown, 'i', '__');
        markdown = this.parseAll(markdown, 's', '~~');
        markdown = markdown.replace(/<p><br><\/p>/g, '\n');
        markdown = markdown.replace(/<br>/g, '\n');
        markdown = markdown.replace(/<p>/g, '').replace(/<\/p>/g, '  \n');
        markdown = markdown.replace(/<div>/g, '').replace(/<\/div>/g, '  \n');
        markdown = markdown
          .replace(/<blockquote>/g, '> ')
          .replace(/<\/blockquote>/g, '');
    
        markdown = this.parseList(markdown, 'ol', '1.');
        markdown = this.parseList(markdown, 'ul', '-');
    
        return markdown;
      }
    
      public parseMarkdownToHtml(markdown: string): string {
        markdown = this.setItalicSymbols(markdown);
        return Markdown.parse(markdown);
      }
    
      private setItalicSymbols(markdown: string): string {
        let regex = /\__(.*?)\__/g;
        let match;
        do {
          if (match) {
            markdown = markdown.replace(match[0], '<i>' + match[1] + '</i>');
          }
          match = regex.exec(markdown);
        } while (match);
        return markdown;
      }
    
      private parseAll(html: string, htmlTag: string, markdownEquivalent: string) 
      {
        const regEx = new RegExp(`<\/?${htmlTag}>`, 'g');
        return html.replace(regEx, markdownEquivalent);
      }
    
      private parseList(
        html: string,
        listType: 'ol' | 'ul',
        identifier: string
      ): string {
        let parsedHtml = html;
    
        const getNextListRegEx = new RegExp(`<${listType}>.+?<\/${listType}>`);
    
        while (parsedHtml.match(getNextListRegEx) !== null) {
          const matchedList = parsedHtml.match(getNextListRegEx);
    
          const elements = this.htmlToElements(matchedList);
          const listItems = [];
    
          elements[0].childNodes.forEach((listItem) => {
            let parsedListItem = `${identifier} ${listItem.textContent}`;
    
            // @ts-ignore
            const className = listItem.className;
            if (className) {
              const splittedClassName = className.split('-');
              const numberOfLevel = parseInt(
                splittedClassName[splittedClassName.length - 1] || 0
              );
    
              for (let i = 0; i < numberOfLevel; i++) {
                parsedListItem = `   ${parsedListItem}`;
              }
            }
    
            listItems.push(parsedListItem);
          });
    
          parsedHtml = parsedHtml.replace(
            getNextListRegEx,
            listItems.join('\n') + '\n\n'
          );
        }
    
        return parsedHtml;
      }
    
      private htmlToElements(html) {
        var template = document.createElement('template');
        template.innerHTML = html;
        return template.content.childNodes;
      }
    
      private setBreaksToHtml(html: string): string {
        return html.replace(/<p>/g, '<br> ').replace(/<\/p>/g, '');
      }
    }
    

    Only library you need to install is marked, find a version that matches your version of Angular.

    There are two extra functions setItalicSymbols() and setBreaksToHtml() I have created, because in my case AngularEditor didn't pass to new line seeing <p></p> tags to \n, so before parsing to markdown I have called setBreaksToHtml(). Marked doesn't parse text between two underscores (example'__text__') to text betweenor` tags, so I'm calling setItalicSymbols() before parsing with marked.parse() function.

    Hope you will find it useful.