Search code examples
regexflex-lexeryacclex

How to match encrypted text between two strings, any idea what's wrong in my Regex?


I'm trying to match a verilog text that begin with

`pragma protect begin_protected

end ends with

`pragma protect end_protected

using this Regular expression

`[pP][rR][aA][gG][mM][aA]([\t ]+)[pP][rR][oO][tT][eE][cC][tT]([\t ]+)[bB][eE][gG][iI][nN]_[pP][rR][oO][tT][eE][cC][tT][eE][dD]([\t ]*)(.*)`[pP][rR][aA][gG][mM][aA]([\t ]+)[pP][rR][oO][tT][eE][cC][tT]([\t ]+)[eE][nN][dD]_[pP][rR][oO][tT][eE][cC][tT][eE][dD]([\t ]*)

the actual text between the beginning and the end may contain encrypted text or other

`pragma protect 

for example

`pragma protect begin_protected
`pragma protect key_method = "rsa"
`pragma protect data_block
k6tBDqTQakg3qSojs4OdAY/r7tL9Wk8+Lk4xS2WtANXqdpfHexLIrZni6F2envAE
v9eGY1ay/TPE7dDRrRdDZil14xYec+5kwCIVbhdp27A5RrHQEq6NAppieMJBc0wG
GqzAPujU338BKb0H7BuWz5r6ZmRTYnDhch/aqFldGfNi3rCwSQPsrniPi2s8QCkz
5SdoKv7QTIQh0VH4ic37jd+4XnADws0Z+5FM3SnQ8wkD7x8X8Y5Owq9wXG82xngI
SthAiqHFEP2RFSM8iVuX7cIPGxNLy8Dz9IEFno1TOqdStm4YPQDHTHfUL1IIfHbu
t6SQDCRiXGudU9g8e9GlTLhHVdkU+D5D5tWKli+1b8lqzVUUWfuqeT+VAyd+nwBd
5dkCjixV30IZ9/xRMDGrmVXYK85chb2X7OYlHMZdX/alWPuCfEHbLE7vvAijoYYg
gQSTPZaMqAyME3TmDJaquo0hbLVFD/OhKnoD3vFxN1K1L7UOnOQH4PeDdjSYAech
Bpu3uwdlqs3/Smad8zwAt8+e4Ws6gN80q2E/pekbx0MCz7HddPCrb59q9Co4uD2Q
JHjKuYePcscc1Hz15HpH4dLozz2t6AeV5ZdgnXJhtEGiBVESpBySE89jgFYnD70K
Oybe6YUbRltG0qYn2WE+aohbmb0oNjBXVlx3ESwdYd7nD1Bt2+0OQPZWwzWi1kbU
RRQopy6x5abqr8EnBgO4sh0iMRnmZs7/vYFq0GEoDIRNgtlfULiq8mhVwdmRmtob
9FpHBDYkA8Mjs/O9e5CXU9eqgiW8ogCL/JUYJlVAnFkjv26CmgCRyzEoNasyPY3z
G0mKQjZ4ACXr7DGi3dAeg/QrYxUk9VxgPQxlK1KOy5UHwZ71UPli/5xhQzz+uh0I
IQtUUXDOQKFAUhovlRxgeqh89BhQ4R0DxEW5rp2eec9Pvxb1kFA3YQ3sS65DYjNz
ybPNLg4FkJ2ET1Q1ArbNBDnbBWexgTuHEEs/GB9XHDBskvz9iFOFE7j8AI3l/nSr
I4/c24GboZi1EYKN4CJFQoCo7daSJ4lv/QPBG6vtSzizx5mL3Eq6C+LBa08DvlsC
a900HyJ6IUAzWX59VzsTfi3BEqiDOIvUx2Hm0g94ghfqxbLZA3zro/gHC0uwgfwH
YocG13UKR/m5iPny98aFlJST9TmMojh3QKLg4VZEy6Btu4dEyMOQOEwUEvG1oz3R
lFHl3pSHs6oLqnH5DAjg+SzLO2n4VmcCWG8M0a+a4GqyLEZA5KGf+ubcKEU4n9ur
/15l4XeSsBFMRwUmvx1jgZyRIh6P+2qAYtFfkxqQN7oCQN4VNxr2wVIGTt86cjmz
Caj19qu1P4M9ljhnlJCLsg==
`pragma protect end_protected

Solution

  • The . in lex and flex matches any character except new lines. So since your text can contain new lines, you'd need to use (.|\n)* instead of .*.

    However, this is not your only problem - at least not if your input may contain more than one protected block. As-is your rule will match anything up to the last pragma protect end_protected line because lex and flex always match as much input as possible for a given rule. So if you have two protected blocks, they will be seen as one block. Since neither lex nor flex have a concept of non-greediness, you can't fix this by modifying the regex itself.

    Instead, you can use start conditions. With this approach you'd have one regex matching the beginning of a protected block, which would change the start condition to PROTECTED and then a rule for for the end of the block, which restores the default condition.