I am trying to get struct names which are outside package
and endpackage
optional strings.
If there are no package
and endpackage
strings, then the script should return all the struct names.
This is my script:
import re
a = """
package new;
typedef struct packed
{
logic a;
logic b;
} abc_y;
typedef struct packed
{
logic a;
logic b;
} abc_t;
endpackage
typedef struct packed
{
logic a;
logic b;
} abc_x;
"""
print(re.findall(r'(?!package)*.*?typedef\s+struct\s+packed\s*{.*?}\s*(\w+);.*?(?!endpackage)*', a, re.MULTILINE|re.DOTALL))
This is the output:
['abc_y', 'abc_t', 'abc_x']
Expected output:
['abc_x']
I am missing something in the regex, but can't figure out what. Can someone please help me fixing this? Thanks in advance.
Use
\bpackage.*?\bendpackage\b|typedef\s+struct\s+packed\s*{[^{}]*}\s*(\w+);
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
package 'package'
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
endpackage 'endpackage'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
typedef 'typedef'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
struct 'struct'
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
packed 'packed'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
{ '{'
--------------------------------------------------------------------------------
[^{}]* any character except: '{', '}' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
} '}'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
; ';'
print(list(filter(None,re.findall(r'\bpackage.*?\bendpackage\b|typedef\s+struct\s+packed\s*{[^{}]*}\s*(\w+);', a, re.DOTALL))))
Results: ['abc_x']