Search code examples
javascriptregextypescript

How to match {0} but allowing proper escape?


I want to create a simple text templating that allow defining placeholders using {0}, similar to what .Net does using string.format method.

Basically I want this:

      format("{0}", 42), // output `42`
      format("{0} {1}", 42, "bar"), // output `42 bar`
      format("{1} {1}", 42, "bar"), // output `bar bar` ({0} ignored)
      format("{{0", 42), // output `{0` (`{{` is an escaped `{`)
      format("{{{0}", 42), // output `{42` : an escaped brace and the formatted value
      format("Mix {{0}} and {0}", 42), // outputs `Mix {0} and 42`
      format("Invalid closing brace }"), // should fail, since the closing brace does close an opening one
      format("Invalid placeholder {z}"), // should fail, not an integer
      format("{0}", "With { in value"), // output `With { in value`, inner { should be broke the format

I'm trying to play with regex and backtracking to deal with the escaped braces.

 function format(template: string, ...values: unknown[]): string {
      const regex = /(?!({{)+){(\d+)}(?<!(}}))/gm;
      return template.replace(regex, ([, index]) => {
        const valueIndex = parseInt(index, 10);
        if (valueIndex >= values.length) throw new Error("Not enough arguments")
        return String(values[valueIndex]);

      });
    }

    console.log([
      format("{0}", 42), // output `42`
      format("{0} {1}", 42, "bar"), // output `42 bar`
      format("{1} {1}", 42, "bar"), // output `bar bar` ({0} ignored)
      format("{{0", 42), // output `{0` (`{{` is an escaped `{`)
      format("{{{0}", 42), // output `{42` : an escaped brace and the formatted value
      format("Mix {{0}} and {0}", 42), // outputs `Mix {0} and 42`
      format("Invalid closing brace }"), // should fail, since the closing brace does not close an opening one
      format("Invalid placeholder {z}"), // should fail, not an integer
      format("{0}", "With { in value"), // output `With { in value`, inner { should be broke the format
    ]);

    try {
      format("{0} {1}", 42); // throw error because not enough argument are passed
    } catch (e) {
      console.log(e.message);
    }

However, I'm struggling to properly replaced the escaped braces by a single brace

How to fix it ?


Solution

  • I would suggest replacing the double braces in the same replace call. For catching the "syntax" errors, I would suggest that when only a single brace is found as left-over (after potentially replacing pairs of them), that couldn't combine with a place holder, to throw it as a syntax error:

    function format(template, ...values) {
      const regex = /{(\d+)}|([{}])(\2?)/g;
      return template.replace(regex, (_, index, brace, escaped) => {
        if (escaped) return brace;
        if (brace) throw new Error("Unescaped literal brace");
        if (+index >= values.length) throw new Error("Not enough arguments");
        return values[+index];
      });
    }
    
    
    const tests = [
      ["{0}", 42], // output `42`
      ["{0} {1}", 42, "bar"], // output `42 bar`
      ["{1} {1}", 42, "bar"], // output `bar bar` ({0} ignored)
      ["{{0", 42], // output `{0` (`{{` is an escaped `{`)
      ["{{{0}", 42], // output `{42` : an escaped brace and the formatted value
      ["Mix {{0}} and {0}", 42], // outputs `Mix {0} and 42`
      ["Invalid closing brace }"], // should fail, since the closing brace does not close an opening one
      ["Invalid placeholder {z}"], // should fail, not an integer
      ["{0}", "With { in value"], // output `With { in value`, inner { should be broke the format
      ["{0} {1}", 42], // throw error because not enough argument are passed
    ];
    
    for (const [test, ...args] of tests) {
      try {
        console.log(format(test, ...args));
      } catch (e) {
        console.log("Error:", e.message);
      }
    }

    There is no need for look-around assertions as there is no risk that in backtracking these paired braces get chunked up in the wrong way.

    The \2 backreference serves to indicate that a double of the brace was matched, indicating an escaped brace (either { or }). The ? is greedy, but if there is no such duplicate brace, we have captured a single brace that violates the syntax.