I have the following two use cases of parsing an expression inside a SQL query:
[some_date:week]
should result in week
. week
here is one of a few keywords that are known in advance.[some_expression|condition]
should result in 1
or any other unrelated character, doesn't matter here.I first came up with a following regex:
r"\[(.*)\:(sec|min|hour|day|date|week|month|year|aggregate)\]"
However, it is not working properly when there's multiple occurrences of such an expression.
I then only found a way to limit the length of expressions inside brackets using {,30}
so for the second use case it becomes:
r"\[.{,30}\|.{,30}\]"
I don't feel that this is an optimal way to do it and looking forward to your suggestions!
Edit: here's some examples of how it works now and how it should work.
txt = """select [first:week] as time,
[next:month] as date
from table"""
exp = r"\[(.*)\:(sec|min|hour|day|date|week|month|year|aggregate)\]"
print(re.sub(exp, r"\1", txt, flags = re.S|re.I))
>> select first:week] as time,
[next as date
from table
Should be:
select first as time,
next as date
from table
You may use
r"\[([^][:]*):(sec|min|hour|day|date|week|month|year|aggregate)]"
See the regex demo.
The main point is to use [^][:]*
- any zero or more characters other than ]
, [
and :
- instead of .*
.
Note that .*?
won't work here either: if there is a non-matching substring between two matching ones, the first match will be longer than expected.