Search code examples

Get list of struct names that are outside of 'package' and 'endpackage' optional strings

I am trying to get struct names which are outside package and endpackage optional strings. If there are no package and endpackage strings, then the script should return all the struct names.

This is my script:

import re

a = """
package new;

typedef struct packed
    logic a;
    logic b;
} abc_y;

typedef struct packed
    logic a;
    logic b;
} abc_t;


typedef struct packed
    logic a;
    logic b;
} abc_x;


print(re.findall(r'(?!package)*.*?typedef\s+struct\s+packed\s*{.*?}\s*(\w+);.*?(?!endpackage)*', a, re.MULTILINE|re.DOTALL))

This is the output:

['abc_y', 'abc_t', 'abc_x']

Expected output:


I am missing something in the regex, but can't figure out what. Can someone please help me fixing this? Thanks in advance.


  • Use


    See regex proof.


      \b                       the boundary between a word char (\w) and
                               something that is not a word char
      package                  'package'
      .*?                      any character except \n (0 or more times
                               (matching the least amount possible))
      \b                       the boundary between a word char (\w) and
                               something that is not a word char
      endpackage               'endpackage'
      \b                       the boundary between a word char (\w) and
                               something that is not a word char
     |                        OR
      typedef                  'typedef'
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
      struct                   'struct'
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
      packed                   'packed'
      \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                               more times (matching the most amount
      {                        '{'
      [^{}]*                   any character except: '{', '}' (0 or more
                               times (matching the most amount possible))
      }                        '}'
      \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                               more times (matching the most amount
      (                        group and capture to \1:
        \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                                 more times (matching the most amount
      )                        end of \1
      ;                        ';'

    Python code:

    print(list(filter(None,re.findall(r'\bpackage.*?\bendpackage\b|typedef\s+struct\s+packed\s*{[^{}]*}\s*(\w+);', a, re.DOTALL))))

    Results: ['abc_x']