Search code examples
javascriptnode.jsmarkdownabstract-syntax-tree

Extract first p and h1 tag content from markdown


Suppose I have the following markdown content

# What is super?
Super is a keyword used to pass props to the upper classes.

Let's begin
...
..
etc..

For any blog, the title and description are really important.

For title, I would like to get the first # h1 tag and for description, I would like to get the first paragraph p tag.

I can not use browser API because the code is running on NodeJs.

In the case of the above content, the title and description should be

  • title - What is super?
  • description - Super is a keyword used to pass props to the upper classes.

How can I get the title and description?


Solution

  • Answer

    const regex = {
      title: /^#\s+.+/,
      heading: /^#+\s+.+/,
      custom: /\$\$\s*\w+/,
      ol: /\d+\.\s+.*/,
      ul: /\*\s+.*/,
      task: /\*\s+\[.]\s+.*/,
      blockQuote: /\>.*/,
      table: /\|.*/,
      image: /\!\[.+\]\(.+\).*/,
      url: /\[.+\]\(.+\).*/,
      codeBlock: /\`{3}\w+.*/,
    };
    
    const isTitle = (str) => regex.title.test(str);
    const isHeading = (str) => regex.heading.test(str);
    const isCustom = (str) => regex.custom.test(str);
    const isOl = (str) => regex.ol.test(str);
    const isUl = (str) => regex.ul.test(str);
    const isTask = (str) => regex.task.test(str);
    const isBlockQuote = (str) => regex.blockQuote.test(str);
    const isImage = (str) => regex.image.test(str);
    const isUrl = (str) => regex.url.test(str);
    const isCodeBlock = (str) => regex.codeBlock.test(str);
    
    export function getMdTitle(md) {
      if (!md) return "";
      let tokens = md.split("\n");
      for (let i = 0; i < tokens.length; i++) {
        if (isTitle(tokens[i])) return tokens[i];
      }
      return "";
    }
    
    export function getMdDescription(md) {
      if (!md) return "";
      let tokens = md.split("\n");
      for (let i = 0; i < tokens.length; i++) {
        if (
          isHeading(tokens[i]) ||
          isCustom(tokens[i]) ||
          isOl(tokens[i]) ||
          isUl(tokens[i]) ||
          isTask(tokens[i]) ||
          isBlockQuote(tokens[i]) ||
          isImage(tokens[i]) ||
          isUrl(tokens[i]) ||
          isCodeBlock(tokens[i])
        )
          continue;
    
        return tokens[i];
      }
      return "";
    }