Search code examples
javascriptregextext-parsing

Javascript Markup Language Parser


I'm trying to build a custom markup language parser using Javascript.

For example:-

  • **bold** ==> bold

  • __italics__ ==> italics

To display the parsed text, I'm trying to replace the special characters with the html tags, and pass them to label's innerHTML.

  • **bold** ==> <b>bold</b>

My first approach was to just use the replace function, but that was not ideal, as it can only replace the starting tag, and not the ending tag.

Then I found a temporary solution with JS regex,

const bold = /\*\*([A-z0-9]+)\*\*/gi
const italics = /\_\_([A-z0-9]+)\_\_/gi

const updateTextMessage = () => {
    let text = $('#textParser').val()
    text = text.replace(bold, '<b>$1</b>')
    text = text.replace(italics, '<i>$1</i>')
    $('#parsedText').html(text)
}
body{
display: flex;
flex-direction: column;
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<span id="parsedText">Parsed text will appear here</span>
 <textarea id="textParser" oninput="updateTextMessage()" type="text" placeholder="Type Here"></textarea>

But the problem with this code is when I give a sentence in between, it doesn't work;

**This sentence should be bold** ==> This sentence should be bold

This doesn't work.

I know it is the regex that's not allowing this to happen, and I also know a work around regex i.e. /\*\*(.*)\*\*/gim

but this also converts unwanted situations as well.

For example, I don't want these to be valid syntax.

** hello**

** hello, this a sentence**

(The difference is the the separation between the special character and the text. It's kind of similar to how the WhatsApp markup text works.)

How do I solve this issue with regex? I'm also interested in other methods as well.


Solution

  • You were pretty close! You just need to put in a few changes so that you guarantee that the first letter near the double * or _ characters is not whitespace.

    const bold = /\*{2}([A-Z0-9][A-Z0-9\s]+[A-Z0-9])\*{2}/gi This breaks down into:

    1. Check for two *
    2. Check that the first character is alphanumeric.
    3. Check that there are a series of alphanumeric or space characters.
    4. Check that this is followed by an alphanumeric.
    5. Finally, check that this ends with two *.

    const bold = /\*{2}([A-Za-z0-9][A-Za-z0-9\s]+[A-Za-z0-9])\*{2}/gi
    const italics = /_{2}([A-Za-z0-9][A-Za-z0-9\s]+[A-Za-z0-9])_{2}/gi
    
    const updateTextMessage = () => {
      let text = $('#textParser').val()
      text = text.replace(bold, '<b>$1</b>')
      text = text.replace(italics, '<i>$1</i>')
      $('#parsedText').html(text)
    };
    body {
      display: flex;
      flex-direction: column;
    }
    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
    <span id="parsedText">Parsed text will appear here</span>
    <textarea id="textParser" oninput="updateTextMessage()" type="text" placeholder="Type Here"></textarea>