Search code examples
javascriptregexregex-group

Grabbing key/value pairs of a Object-like string (similar to JSON) using regex


With a string like {float: 'null', another: 'foo'}, I'd like to grab each set of key/values pairs so that the groups would output float null, and another and foo. My current regex is /\{(?<set>(?<key>\w*)\s*\:\s*(?<value>.*)?\s?)*\}/g It grabs the key correctly, but anything past from the comma on receives it as the value. I'm using named groups mainly just for clarity. Can't figure out how to extract each key/value pair especially when there are multiple. Thanks for any help

Currently am trying /\{(?<set>(?<key>\w*)\s*\:\s*(?<value>.*)?\s?)*\}/g but the output is:

the group 'set': float: 'null', another: 'foo' (correct)

the group 'key': float (correct)

the group 'value': 'null', another: 'foo' (incorrect, I want just null)

Would like it to capture all key/value pairs if possible


Edit for more clarity:

My specific use case is for parsing Markdown and plugging it into custom components in Svelte, where I want to control the ability to gather props from the markdown syntax on an image. From what I've gathered online about putting attributes on an image, it should look something like:

![Alt Text]https://<fullurl>.jpg "This is hover text"){prop1: 'foo', prop2: 'bar', float: true}

Reason for regex is parsing the markdown string. It's not JSON, and I dont really gain anything by following JSON semantics ("'s on the key)


Solution

  • Have a go with this long JavaScript regex:

    /(?<key>\w*)\s*:\s*(?<value>(?<quote>["'])(?:\\.|.)*?\k<quote>|(?<number>[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)|(?<constant>true|false|null))/g
    

    In action (view in full page, if not it's not all visible):

    const regexKeyValue = /(?<key>\w*)\s*:\s*(?<value>(?<quote>["'])(?:\\.|.)*?\k<quote>|(?<number>[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)|(?<constant>true|false|null))/g;
    
    document.getElementById('search').addEventListener('click', function () {
      const input = document.getElementById('input').value;
    
      let match,
          i = 1,
          output = [];
    
      while ((match = regexKeyValue.exec(input)) !== null) {
        console.log(`Match n°${i} : ` + match[0]);
        console.log('match.groups =', match.groups);
    
        // If the value is starting with quotes, then unquoted it and
        // also replace all the escape sequences (ex: "\\n" should become "\n").
        let value = match.groups.value;
        // If it's double quotes, let's use JSON.parse() as it will handle everything.
        if (value.match(/^"/)) {
          value = JSON.parse(value);
        }
        // If it's simple quotes, we can't use JSON.parse() so we have to convert
        // it to a double-quoted string before.
        else if (value.match(/^'/)) {
          value = value
            // 1) Remove the simple quotes around.
            .replace(/^'|'$/g, '')
            // 2) Replace all \' by '.
            // We have to search for all backslashes to handle also an escaped backslash.
            .replace(/\\(.)/g, function (fullMatch, afterBackslash) {
              if (afterBackslash === "'") {
                return "'";
              } else {
                return fullMatch;
              }
            })
            // 3) Escape all double quotes (" becomes \").
            .replace(/"/g, '\\"');
          // 4) Now use JSON.parse();
          value = JSON.parse(`"${value}"`);
        }
        
        // If it's a number or a constant, then convert the string to this real JS value.
        if (typeof match.groups.number !== 'undefined' ||
            typeof match.groups.constant !== 'undefined') {
          value = JSON.parse(match.groups.value);
        }
    
        console.log('value =', value);
        
        output.push(
          `Match n°${i++} :\n` +
          `  Key   : ${match.groups.key}\n` +
          `  Value : ${value}\n`
        );
      }
    
      document.getElementById('output').innerText = output.join("\n");
      document.getElementById('label').classList.remove('hidden');
    });
    textarea {
      box-sizing: border-box;
      width: 100%;
    }
    
    pre {
      overflow-y: scroll;
    }
    
    .hidden {
      display: none;
    }
    <textarea id="input" rows="10">{
      float: 'null',
      another: "foo",
      age: 45,
      type: '"simple" \' quote',
      comment: "Hello,\nA backslash \\, a tab \t and a \"dummy\" word.\nOk?",
      important: true,
      weight: 69.7,
      negative: -2.5
    }</textarea>
    
    <button id="search">Search for key-value pairs</button>
    
    <p id="label" class="hidden">Matches:</p>
    <pre><code id="output"></code></pre>

    The same regular expression, with comments, with the x flag that PCRE offers:

    /
    (?<key>\w*)        # The key.
    \s*:\s*            # : with optional spaces around.
    (?<value>          # The value.
      # A string value, single or double-quoted:
      (?<quote>["'])   # Capture the double or single quote.
        (?:\\.|.)*?    # Backslash followed by anything or any char, ungreedy.
      \k<quote>        # The double or single quote captured before.
    |
      # Int and float numbers:
      (?<number>[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?)
    |
      # true, false and null (or other constants):
      (?<constant>true | false | null)
    )
    /gx
    

    Or better, on regex101, you'll have the colours and the explanation on the right column: https://regex101.com/r/bBPvUd/2