javascriptregex

Select code blocks but ignore all curly braces inside these blocks


I'm trying to post-process auto-generated TypeScript code. The generated files contain interfaces, and the properties of the interfaces have doc comments. Some of these doc comments include RexEx patterns with curled braces.

I need a RegEx pattern that selects the individual interfaces, but not the blank lines or maybe comments in between them. What I'm struggling with is the curly braces inside the comments because they make it very difficult to find a pattern that matches the whole interface body from its signature until its closing brace.

The files I try to post-process look like this:

export interface SomeInterface {
  /**
   * Some comment on property1
   */
  property1: string;
  /**
   * Some comment on property2, including RegEx pattern with curly braces such as [a-z1-9]{2}
   */
  property2: string;
  /**
   * Some comment on property3
   */
  property3: string;

  // some more properties and doc comments, some of which have curly braces inside too
}

// This comment has to be excluded

export interface AnotherInterface {
  // internally very similar to 'SomeInterface' above
}

What I tried so far is

/export interface .*\{([^}])+\}/ g

and

/export interface .*\{([^])+\}/ g

Both don't work. The first one only selects the substring from the start of the signature of the first interface until the first closing curly brace in the doc comments. The second one selects all interface bodies at once (i.e., from the signature of the first interface until the closing curly brace of the last interface and everything in between) which is not what I want.

Any help and suggestions are highly appreciated.


Solution

  • A regex matches text and has no notion of programming language structures.

    Using a pattern to do so is best effort only and can have numerous edge cases.

    If the structure of the data is always the same and matching optional leading spaces:

    ^[^\S\n]*export interface .*\{[^]*?\n[^\S\n]*}
    

    The pattern matches:

    • ^ Start of string
    • [^\S\n]* Match optional whitespace chars without newlines
    • export interface .*
    • \{ Match {
    • [^]*? Optionally repeat matching any char including newlines (Javascript notation), as few as possible
    • \n Match a newline
    • [^\S\n]* Match optional whitespace chars without newlines
    • } Match literally

    Regex demo

    If there are no leading/trailing spaces:

    ^export interface .*\{[^]*?\n}