Search code examples
sqlregexsplit

"The big SQL RegEx": How do I RegEx split an easy SQL query (SELECT ... INTO ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY)?


Background

I am working on an overview over all tables and queries in all Access databases in a given folder, and my aim is to split the standardized MS Access SQL code behind the queries into the standard SQL blocks with RegEx. Mind that these queries are all of the same setting, no CTE:s, easy SELECT * FROM..., subjoins in the FROM block can happen but that is still just in the full FROM-block in the MS Access output, therefore we talk about easily understood SQL code that is not scattered but split into linebreaks by standard SQL keywords. Thus, I try to build a Regex that splits the SQL by the main SQL keywords and their blocks for the standard MS Access SQL format. Spoiler: I later found out how to do this with scattered code that does not have a standardized linebreak format, but that was not the main aim of the question.

See the code to fetch all of the objects from MS an Access database at How do I get the sql code for a query in MS Access if the "SQL" attribute of the query object is saved as a String of just 255 characters? - Super User.

This is a follow-up question on A working RegEx match that begins at the first of two OR-words takes the unwanted last OR-word instead if I place more RegEx before it [duplicate]. Thanks go to the helpful remarks of this user and this user under that question.

RegEx so far

I got:

(?>=SELECT\s+|,\s*)(.+)\s+AS\s+Error.?(?<=WHERE|HAVING)(.)

Or without any lookarounds:

(?:=SELECT\s+|,\s*)(.+?)\s+AS\s+Error.?((?:WHERE|HAVING).)

as the needed RegEx for:

SELECT a, bc d AS Error FROM y WHERE 1=1 HAVING 1=1

to be split into two matching groups:

  1. bc d as the code that builds the column called "Error". This is dummy code, do not care about the SQL.

  2. Anything after one of the two words "WHERE" or "HAVING".

enter image description here

See Regex101.

MS Access VBA: "Run-time error '5017': Application-defined or object-defined error" when parsing lookarounds in RegEx

Having tested everything with the default language and flavour of www.regex101.com, which is PCRE2 (PHP >=7.3) at the time of writing, it does not work in MS Access VBA, which throws for any pattern with a lookaround:

enter image description here

Microsoft Visual Basic - Run-time error '5017': Application-defined or object-defined error.

With Lookarounds:

regexPattern = "(?>=SELECT\s+|,\s*)(.+)\s+AS\s+Error.*?(?<=WHERE|HAVING)(.*)" '-> error 5017 pop up
regexPattern = "(?<=WHERE|HAVING)(.*)" '-> error 5017 pop up
regexPattern = "(?>=SELECT\s+|,\s*)(.+)\s+AS\s+Error.*" '-> error 5017 pop up

Without lookarounds, it seems to work in MS Access:

regexPattern = "(?:=SELECT\s+|,\s*)(.+?)\s+AS\s+Error.*?((?:WHERE|HAVING).*)" ' -> no error 5017, but no matches either
regexPattern = "((?:WHERE|HAVING).*)" '-> working
regexPattern = "(?:=SELECT\s+|,\s*)(.+?)\s+AS\s+Error" '-> working

It seems as if MS Access VBA cannot deal with lookarounds. There might be something like a double early/late RegEx match to chew on which is hard to deal with if lookarounds (lookaheads and lookbehinds) are not allowed in MS Access VBA (WRONG, see the end of the answer with lookahead RegEx that works in MS Access VBA).

+++ MIND: THIS WAS A WRONG GUESS. WITH THE WORKING LOOKAHEAD-REGEX FOR MS ACCESS IN THE ANSWER BELOW, I KNOW NOW THAT LOOKAHEADS CAN BE READ BY MS ACCESS VBA +++ (I do not know why this did not work but the other does)

"The big RegEx for SQL" is needed

I need a Regex that finds the two matches as shown above and stops before the order by, and works in MS Access VBA:

What is the RegEx:

  • to get the two matching groups of above:
    • Error column definition (a column that is always like 'my_error_text' AS Error, and without the AS Error tail,
    • WHERE/HAVING clause, beginning with the earliest of the two, (this is not must, you can also output all of the keyword sets with each in one group, only in my project the aim was to get those conditions that filter the rows for errors)
  • that allows some or all of the keyword sets not to be there at all if the SQL code does not have the needed keywords,
  • and that stops before ORDER BY? (this is not a must)

And the question is not only for MS Access VBA. I search for a RegEx code that can split any SQL code by its SQL keywords, both in MS Access and in any other SQL style. Thus, the list above is only the core aim, but "the full SQL RegEx" would be better.

Thus, in short, the question comes down to an all-in-one SQL RegEx: How do I split a SQL query with RegEx?

  • Mind that I need at least one RegEx that works without lookarounds.
  • I dropped the tags vba and ms-access since the question is helpful for any RegEx task aiming at splitting SQL code or picking from it.

Scope of the question: trivial SQL code (SELECT ... INTO ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY)

As to remarks saying that you should not try any RegEx with SQL at all since it is prone to errors of all sorts coming from dialects, styles and non-trivial code and since it does not perform well: The "big" in "The big RegEx for SQL" is just a RegEx for a trivial SQL code like SELECT ... INTO ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY. Any non-trivial SQL code is likely not a good playground for a full RegEx that checks everything. Then, you need to split up the RegEx tasks instead. And perhaps non-trivial SQL is not even a playground for any RegEx at all since you can never rely on it if your code may have some SQL outliers.
The RegEx is for less than 1000 tables and queries that are spread over the MS Access databases in always the same style. For such a small sample, performance does not play a role. But if you think of running this on a big scale, for example for hundred-thousands of SQL queries, do not try it with RegEx.

Warning: Splitting SQL with RegEx is a rabbit hole, do not think that this is a good way to go

I had this question already many years ago but never came to set it up: there is no RegEx on the net to read out a trivial SQL SELECT query (up to now). Yet, this RegEx would not have been asked for in 2024, but decades ago, if there was no caveat to this.

Even if the answer shows that you can do everything with one RegEx, see the second last screenshot for the run in MS Access, which takes about 15 s for 1000 objects, it will often be easier if you try other ways. For example, if you work in MS Access, try the built-in functions to split text and/or write your own function in VBA, which gives you the chance to loop, double-loop, and split easily, with a better understanding of what is going on and to debug.

On the other hand, since 15 seconds for 1000 objects is fast enough, and since it works for trivial SQL code, and since I tell you here clearly that you should rather avoid RegEx for SQL parsing, I do not see why this question should be closed. The fear is that some might think from the question that RegEx is a good way to go. Yet, just to fear that is not fair since I could work with RegEx, this work is proof enough to say that in some smaller datasets with just trivial SQL, RegEx can indeed help.

In short:

Do not think from reading this question that splitting SQL with RegEx is a good way to go. RegEx-parsing SQL is prone to errors and does not perform well, you are cleaner and faster with SQL or other programming langues if you:

  • split the SQL code into smaller blocks in helper columns,
  • and then, if you still have something to split left, pick what you need from there with the help of SQL, VBA, any language, or a much smaller RegEx.

SQL Code for the Regex101 and MS Access VBA checks

MS Access test SQL (main keyword sets begin and end in one-liners)

SELECT t1.dummy1, t1.dummy2 AS Error, t1.dummy3 INTO testResult
FROM (test1 t1 INNER JOIN test2 t2 ON t1.dummy4 = t2.dummy4) INNER JOIN test3 t3 ON t2.dummy9 = t3.dummy9 
WHERE t2.dummy6 = 'abc' AND t3.dummy10 <> 'def' AND t1.dummy11 = 'ghi' AND t2.dummy12 IS NULL
GROUP BY t1.dummy1
HAVING 1=1
ORDER BY 1;

SQL with many line breaks for non-MS Access style SQL tests

SELECT
    t1.dummy1,
    t1.dummy2 AS Error, 
    t1.dummy3
INTO
    testResult
FROM
    (test1 t1
    INNER JOIN test2 t2 ON t1.dummy4 = t2.dummy4)
    INNER JOIN test3 t3 ON t2.dummy9 = t3.dummy9
WHERE
    t2.dummy6 = 'abc'
    AND t3.dummy10 <> 'def'
    AND t1.dummy11 = 'ghi'
    AND t2.dummy12 IS NULL
GROUP BY t1.dummy1
HAVING 1=1
ORDER BY 1;

Solution

  • RegEx without lookarounds to pick a column definition and a WHERE/HAVING block from a SQL query

    The easiest RegEx for the MS Access SQL default style cannot deal with missing keywords (not working)

    Two tricks:

    • One trick is to make each of the found groups lazy so that the first of the findings is the one that is taken.
    • The other trick is to take care of each default line that MS Access SQL is split into by default. By this, I can find the whole tail of the SQL keyword just with .*, and there is no need to check any random line breaks since MS Access will always stick to its default line breaks.

    Here is the RegEx that embeds these two tricks (but mind, it will fail as soon as one keyword is missing):

    THIS IS A WRONG REGEX (works only if all keywords are hit):

    SELECT (.*) (?:INTO (.*)).*\sFROM (.*)(?:\sWHERE (.*)).*(?:\sHAVING (.*)).*(?:\sORDER BY (.*)).*
    

    By this, you can put many commands in a row, always knowing when to stop, so that you will not need any lookaheads. You can always make something lazy by making it a non-capturing group and writing .* (or what you want it to eat) behind it. I learnt this from the helpful remarks below A working RegEx match that begins at the first of two OR-words takes the unwanted last OR-word instead if I place more RegEx before it [duplicate].

    https://regex101.com/r/3bItXm/1

    enter image description here

    Thus, if I drop the aim of putting together WHERE and HAVING in one group, I get them both much easier like this, and then, I can also split the ORDER BY from the SQL. If I still want some of them to be put together in the output, I can do this afterwards by concatenating the two group outputs in the VBA code.

    This is the RegEx that also reads out the "Error" column:

    THIS IS A WRONG REGEX (works only if all keywords are hit):

    SELECT .*(?:, (.*) AS Error).* (?:INTO (.*)).*\sFROM (.*)(?:\sWHERE (.*)).*(?:\sHAVING (.*)).*(?:\sORDER BY (.*)).*
    

    https://regex101.com/r/b7YaKq/1

    enter image description here

    Yet, as soon as one of the keywords is not found, the whole RegEx fails. The lazy non-capturing groups do not seem to work as they should.

    THIS IS A WRONG REGEX (works only if all keywords are hit):

    https://regex101.com/r/Jf3eth/1

    enter image description here

    If just one keyword is not hit, it fails:

    https://regex101.com/r/dEFHKV/1

    enter image description here

    I tried to make all of the groups optional with ?, but nothing worked: you could do it only with the very last ORDER BY but that is not enough.

    RegEx that gets "Error" column and "WHERE till the end up to ORDER BY" (working)

    This is the fastest RegEx you can get for the task at hand. It needs only 400 steps, but it does not drop ORDER BY at the end so that the last WHERE/HAVING block is spread over all lines till ORDER BY.

    ++ Answer for the column definition and the WHERE/HAVING... block (with ORDER BY at the end) in a SQL code in MS Access SQL style in 400 steps (without lookarounds) ++

    ^\s*(?:(?:=SELECT\s+|.*,\s*)([^,]+?)\s+AS\s+Error[\s\S]*?)?(?:(?:WHERE|HAVING)\s*([\s\S]*)?)$
    

    https://regex101.com/r/mLQRVu/1

    enter image description here

    Mind that just putting another non-capturing group (?:ORDER.*) at the end does not get rid of the rather unwanted ORDER BY (see the aims of the question) since that will clearly not work if ORDER BY is missing:

    THIS IS A WRONG REGEX (works only if ORDER BY is hit):

    (?:(?:=SELECT\s+|,\s*)([^,]+?)\s+AS\s+Error[\s\S]*?)?(?:((?:WHERE|HAVING)[\s\S]*)?)(?:ORDER.*)
    

    enter image description here

    Without the ORDER BY keyword, nothing is found since it is not optional:

    enter image description here

    And mind that putting a ? at the end in (?:ORDER.*)? does not work either since the [\s\S]* in the block before will then already eat it the ORDER BY as it will eat till it reaches the end:

    THIS REGEX WITH (?:ORDER.*)? IS UNNEEDED (always gets eaten before it is reached):

    (?:(?:=SELECT\s+|,\s*)([^,]+?)\s+AS\s+Error[\s\S]*?)?(?:((?:WHERE|HAVING)[\s\S]*)?)(?:ORDER.*)?
    

    enter image description here

    RegEx to get "Error" column, INTO, FROM, and as one block: "WHERE till the end up to ORDER BY" (working)

    I ended up with this Regex so that I got the definition of that chosen column, the "into", the "from" and the "where/having/order by". It is not easy to build this code since lookarounds are not allowed (WRONG GUESS HERE, see the working lookahead-RegEx at the end). It takes 700 steps.

    ++ Answer for the column definition, INTO, FROM, and the WHERE/HAVING... block (with ORDER BY at the end) in a SQL code in MS Access SQL style in 700 steps (without lookarounds) ++

    ^(?:(?:\s*?SELECT\s+|,\s*)[\s\S]*?\s?([^,]+?)\s+AS\s+Error[\s\S]*?)?[\s\S]*?(?:(?:\s+INTO\s+([\S]+)))?(?:(?:\s+FROM\s+([\S+].*)))?(?:\s+((?:WHERE|HAVING)[\s\S]*))?$
    

    enter image description here

    I did not know how to avoid the ORDER BY, though, but I can live with the outcome. Also, hitting the full "from" set is luck, it depends on having everything in one line - but that seems to be the default in Access. .* stays inside the line, only [\s\S]* (or shorter: [^*]*) will eat linebreaks. For the WHERE/HAVING, I cannot get rid of the ORDER BY since that would work only if I checked WHERE and HAVING both on their own, not as a set. Now I cannot just take .* since that would drop the HAVING:

    enter image description here

    And the queries often do not have one or more of the WHERE, HAVING or ORDER BY so that I cannot just search for these linebreaks step by step.

    In short, ORDER BY must be taken as well with [^*]*, else you lose HAVING if there is a WHERE and you fetch the code with .*.

    Wrapping up

    • In MS Access, lookarounds are not allowed (WRONG, see the working lookahead-RegEx at the end). I changed any needed lookaround like ?>=... to a RegEx without lookarounds, that is, a non-capturing group followed by .* to make it lazy with (?:...).*.
    • I ended up with .* which stays only in the line, but that will work only if you have no line breaks inside a set and a line break after it, for example if FROM ... is not cut into line breaks, and the good news is that MS Access puts line breaks only between the main commands so that for example a long FROM ... ON ... = ... set will not be cut in more than one line by MS Access.

    RegEx without and with lookarounds for a full SQL query

    RegEx for the MS Access SQL style (= default line breaks) without lookarounds

    column definition, INTO, FROM, WHERE, HAVING, ORDER BY

    This RegEx takes 459 steps.

    ++ Answer for the column definition, INTO, FROM, WHERE, HAVING, ORDER BY in a SQL code in MS Access SQL style in 459 steps (without lookarounds) ++

    ^(?:(?:\s*?SELECT\s+|,\s*)[\s\S]*?\s?([^,]+?)\s+AS\s+Error[\s\S]*?)?[\s\S]*?(?:(?:\s+INTO\s+([\S]+)))?(?:(?:\s+FROM\s+(.*)))?(?:(?:\s+WHERE\s+(.*)))?(?:(?:\s+GROUP\s+BY\s+(.*)))?(?:(?:\s+HAVING\s+(.*)))?(?:(?:\s+ORDER BY\s+(.*)))?$
    

    https://regex101.com/r/YPtvxA/1

    enter image description here

    And it also works if keywords are missing:

    enter image description here

    SELECT (+ INTO), FROM, WHERE, HAVING, GROUP BY, ORDER BY

    Or if you need the whole SELECT instead of fishing just one column definition, but then you cannot fish the INTO from the SELECT block since the INTO is in the same line:

    ++ Answer for SELECT (+ INTO), FROM, WHERE, HAVING, GROUP BY, ORDER BY in a SQL code in MS Access SQL style in 100 steps (without lookarounds) ++

    ^(?:(?:\s*SELECT\s+(.*)))?[\s\S]*?(?:(?:\s+FROM\s+(.*)))?(?:(?:\s+WHERE\s+(.*)))?(?:(?:\s+GROUP\s+BY\s+(.*)))?(?:(?:\s+HAVING\s+(.*)))?(?:(?:\s+ORDER BY\s+(.*)))?$
    

    https://regex101.com/r/t2K2tH/1

    enter image description here

    Only to make it readable (no true RegEx):

    ^
    (?:(?:\s*SELECT\s+(.*)))?[\s\S]*?
    (?:(?:\s+FROM\s+(.*)))?
    (?:(?:\s+WHERE\s+(.*)))?
    (?:(?:\s+GROUP\s+BY\s+(.*)))?
    (?:(?:\s+HAVING\s+(.*)))?
    (?:(?:\s+ORDER BY\s+(.*)))?
    $
    

    It works if keywords are missing. The main trick for that and the very few steps is the double question marks in ?[\s\S]*? after the SELECT block - do not ask me why...:

    enter image description here

    It does not work with the random SQL style (random line breaks), but that was not the aim in this chapter:

    enter image description here

    SELECT, INTO, FROM, WHERE, HAVING, GROUP BY, ORDER BY

    Mind that catching the INTO as well takes a lot more steps. It cannot just be done like this:

    THIS IS A REGEX WITH UNNEEDED CODE (it does not catch INTO but still writes it):

    ^(?:(?:\s*SELECT\s+(.*)))?[\s\S]*?(?:(?:\s+INTO\s+([\S]+)))?(?:(?:\s+FROM\s+(.*)))?(?:(?:\s+WHERE\s+(.*)))?(?:(?:\s+GROUP\s+BY\s+(.*)))?(?:(?:\s+HAVING\s+(.*)))?(?:(?:\s+ORDER BY\s+(.*)))?$
    

    Thus, instead of ?, it needs .* as lazy bindings, and I also need placeholders for the gap that comes up if one of the keywords in a line is not known or found and must be jumped over, see for example [\s\S]*\s+ORDER where [\s\S]* could also be written in short as [^*]*. This leads to 45.000 steps to catch the INTO:

    ++ Answer for a full SQL code (SELECT, INTO, FROM, WHERE, HAVING, GROUP BY, ORDER BY) in MS Access SQL style in 47.000 steps (without lookarounds) ++

    ^(?:(?:\s*SELECT\s+(.*))).*(?:(?:[\s\S]*\s+INTO\s+([\S]+))).*(?:(?:[\s\S]*\s+FROM\s+(.*))).*(?:(?:[\s\S]*\s+WHERE\s+(.*)))?.*(?:(?:[\s\S]*\s+GROUP\s+BY\s+(.*)))?.*(?:(?:[\s\S]*\s+HAVING\s+(.*)))?.*(?:(?:[\s\S]*\s+ORDER\s+BY\s+(.*)))?.*$
    

    https://regex101.com/r/Gpj0Wa/1

    enter image description here

    Only to make it readable (no true RegEx):

    ^
    (?:(?:\s*SELECT\s+(.*))).*
    (?:(?:[\s\S]*\s+INTO\s+([\S]+))).*
    (?:(?:[\s\S]*\s+FROM\s+(.*))).*
    (?:(?:[\s\S]*\s+WHERE\s+(.*)))?.*
    (?:(?:[\s\S]*\s+GROUP\s+BY\s+(.*)))?.*
    (?:(?:[\s\S]*\s+HAVING\s+(.*)))?.*
    (?:(?:[\s\S]*\s+ORDER\s+BY\s+(.*)))?.*
    $
    

    It works with missing keywords:

    enter image description here

    And it does not work with a random SQL style, but this was not the aim of this chapter:

    enter image description here

    Picking the "Error" column definition on its own

    You can now fish the definition of the column "Error" as a Regex on its own.

    I think you cannot put this Regex check for an "Error" column together with the Regex check for the full "SELECT" block since the former Regex is embedded in the latter Regex: the SELECT ... matching group in "the big SQL RegEx" above already eats the characters that are needed here. But it is no big coding to check a SQL code a second time, thus, if you need to cut out the "Error" column (or any other named column that you search for), you can do it in a step on its own.

    Getting the column definition of the "Error" column takes 168 steps:

    ^(?:(?:\s*SELECT\s+|,\s*)[\s\S]*?\s?([^,]+?)\s+AS\s+Error[\s\S]*?)?[\s\S]*$
    

    enter image description here

    Or shorter, in 161 steps, if you can stop the matching after the first hit, which should be the best way to go:

    https://regex101.com/r/09CVwP/1

    (?:SELECT\s+|,\s*)[\s\S]*?\s?([^,]+?)\s+AS\s+Error.*
    

    enter image description here

    ++ Answer for the "Error" column definition of a SQL code in MS Access or random SQL style in 172 steps (without lookarounds) ++

    (If you are fine with stopping the Matching at the first "Error" column, this is the recommended Regex.)

    This is the same Regex as above, it works also in a random style SQL, meaning:

    "abc" AS Error
    

    ..not in one line like:

    "abc" 
    AS 
    Error
    

    Thus, the same Regex works:

    (?:SELECT\s+|,\s*)[\s\S]*?\s?([^,]+?)\s+AS\s+Error.*
    

    https://regex101.com/r/jeACCV/1

    enter image description here

    ++ Answer for the "Error" column definition of a SQL code in MS Access or random SQL style in 179 steps (without lookarounds, matching the whole code) ++

    (This is not needed if you can stop the matching after the "Error" column.)

    The following Regex will match with the whole code in any random style SQL with line breaks everywhere between the column definition and the column name, taking 179 steps:

    https://regex101.com/r/fhe98K/1

    ^(?:(?:\s*SELECT\s+|,\s*)[\s\S]*?\s?([^,]+?)\s+AS\s+Error[\s\S]*?)?[\s\S]*$
    

    enter image description here

    Changing the working RegEx with lazy bindings to a RegEx with lookaheads

    Links, to begin with:

    The Regex with lookaheads (?>=, same as ?=) instead of lazy bindings ((?:...).* or (?:...)[^*]*) is much longer, but easier to understand: you just check for the next keyword coming until you reach the end of the SQL code. If you find another keyword, you stop, and the lookahead does not eat the found characters but leaves them for the next group as the beginning keyword.

    They are not only easier to grasp for the human being, but also for the machine: The lookaheads need much less steps than the lazy bindings.

    Random SQL style (line breaks and spaces wherever you want)

    This RegEx needs 2900 steps:

    ++ Answer for the full SQL code in random SQL style in 2900 steps (with lookarounds) ++

    ^(?=\s*SELECT\s+([\s\S]*?)(?=\s+(?:INTO|FROM|WHERE|GROUP\s+BY|HAVING|ORDER\s+BY)\s))(?=(?:[\s\S]*?\s+INTO\s+([\S]+)\s*)?)(?=(?:[\s\S]*?\s+FROM\s+([\s\S]*?)(?=\s+(?:WHERE|GROUP\s+BY|HAVING|ORDER\s+BY)\s)))(?=(?:[\s\S]*?\s+WHERE\s+([\s\S]*?)(?=\s+(?:GROUP\s+BY|HAVING|ORDER\s+BY)\s)))(?=(?:[\s\S]*?\s+GROUP\s+BY\s+([\s\S]*?)(?=\s+(?:HAVING|ORDER\s+BY)\s))?)(?=(?:[\s\S]*?\s+HAVING\s+([\s\S]*?)(?=\s+ORDER\s+BY\s))?)(?=(?:[\s\S]*?\s+ORDER\s+BY\s+([\s\S]*?)(?=\s*$))).*$
    

    https://regex101.com/r/LCI3OB/2

    enter image description here

    Here it is in readable keyword blocks. The one-liner for INTO is not a mistake, it needs to eat only the next table name.

    Only to make it readable (no true RegEx):

    ^
    (?=\s*SELECT\s+([\s\S]*?)
    (?=\s+(?:INTO|FROM|WHERE|GROUP\s+BY|HAVING|ORDER\s+BY)\s))
    
    (?=(?:[\s\S]*?\s+INTO\s+([\S]+)\s*)?)
    
    (?=(?:[\s\S]*?\s+FROM\s+([\s\S]*?)
    (?=\s+(?:WHERE|GROUP\s+BY|HAVING|ORDER\s+BY)\s)))
    
    (?=(?:[\s\S]*?\s+WHERE\s+([\s\S]*?)
    (?=\s+(?:GROUP\s+BY|HAVING|ORDER\s+BY)\s)))
    
    (?=(?:[\s\S]*?\s+GROUP\s+BY\s+([\s\S]*?)
    (?=\s+(?:HAVING|ORDER\s+BY)\s))?)
    
    (?=(?:[\s\S]*?\s+HAVING\s+([\s\S]*?)
    (?=\s+ORDER\s+BY\s))?)
    
    (?=(?:[\s\S]*?\s+ORDER\s+BY\s+([\s\S]*?)
    (?=\s*$))).*
    $
    

    MS Access VBA test for the random SQL style fails with a new error

    This should not work since it has lookarounds in it, see the question (MIND: I WAS WRONG IN GUESSING THAT ACCESS CANNOT DEAL WITH LOOKAROUNDS). Strangely, this RegEx throws another error than the error in the question, which was thrown when parsing the SQL with regex.Execute(obj_SQL):

    Microsoft Visual Basic - Run-time error '5017': Application-defined or object-defined error.

    Now, the error pops up one line after regex.Execute(obj_SQL):

    enter image description here

    Run/time error '5': Invalid procedure call or argument

    Though the RegEx works on the Regex101 website, it cannot find any group at all in MS Access, the error comes up whenever you call matches(0).SubMatches(0) or any other of the group numbers, ending at matches(0).SubMatches(6).

    That means that we cannot run this RegEx in MS Access and strangely, there the error has changed.

    MS Access VBA SQL style (= main keyword sets begin and end in one-liners)

    Mind that this is just for the SQL that MS Access puts out or anything else that sticks to such a SQL default.

    First try in MS Access VBA fails

    This lookahead RegEx has more .* in it which cannot pass a line break. The following RegEx needs 3000 steps:

    THIS IS WRONG REGEX:

    ^(?=\s*SELECT\s+(.*?)(?=\s+INTO|\s+FROM|$))(?=(?:.*?\s+INTO\s+([\S]+)\s+)?)(?=(?:.*?\s+FROM\s+(.*?)(?=\s+WHERE|\s+GROUP\s+BY|\s+HAVING|\s+ORDER\s+BY|$)))(?=(?:[\s\S]*?\s+WHERE\s+(.*?)(?=\s+GROUP\s+BY|\s+HAVING|\s+ORDER\s+BY|$))?)(?=(?:[\s\S]*?\s+GROUP\s+BY\s+(.*?)(?=\s+HAVING|\s+ORDER\s+BY|$))?)(?=(?:[\s\S]*?\s+HAVING\s+(.*?)(?=\s+ORDER\s+BY|$))?)(?=(?:[\s\S]*?\s+ORDER\s+BY\s+(.*?)(?=\s*$))?).*$
    

    https://regex101.com/r/hMBtdd/2

    enter image description here

    Yet, this is a wrong RegEx. It fills the table only in the columns "Error", "Select", "Into". The rest stays empty. From the output for the working three "Error", "Select", "Into" we see that MS Access VBA can deal with lookarounds, which is good news.

    Checking the RegEx flavor

    It is then perhaps just quite unclear which flavor it needs. I checked all of them, and only the languages Golang and Rust do not work, and they do not work at all, nothing is found:

    enter image description here

    The rest of the flavors work. This does not seem to tell why MS Access might not understand the RegEx.

    Working lookahead-RegEx in MS Access

    It turned out that the match ("Match 1", but there is just one match anyway) stopped before the FROM. Without a match in the background, you cannot have a matching group in the foreground, even if Regex101 finds the group. Making "Match 1" as long as the code again made it work, and it runs in only 1700 steps:

    ++ Answer for the full SQL code in MS Access SQL style in 1700 steps (with lookarounds) ++

    ^(?:SELECT\s+([\s\S]+?)(?=\s+INTO\s+|\s+FROM|$))(?:.*?\s+INTO\s+([\S]+)\s*)?(?:\s+FROM\s+)([\s\S]+?)(?=\s+WHERE\s+|\s+GROUP\sBY\s+|\s+HAVING\s+|\s+ORDER\sBY|$)(?:.*?\s+WHERE\s+([\s\S]+?)(?=\s+GROUP\sBY\s+|\s+HAVING\s+|\s+ORDER\sBY|$))?(?:.*?\s+GROUP\sBY\s+([\s\S]+?)(?=\s+HAVING\s+|\s+ORDER\sBY|$))?(?:.*?\s+HAVING\s+([\s\S]+?)(?=\s+ORDER\sBY\s+|$))?(?:.*?\s+ORDER\sBY\s+([\s\S]+?)(?=\s*$))?$
    

    https://regex101.com/r/Pn04Gp/1

    enter image description here

    Here you see the same code with a line break for each keyword block. Mind that the INTO cannot find more than one target table and therefore does not need the lookaheads to check where the next keyword is coming, it just eats the one next table name.

    Only to make it readable (no true RegEx):

    ^
    (?:SELECT\s+([\s\S]+?)
    (?=\s+INTO\s+|\s+FROM|$))
    
    (?:.*?\s+INTO\s+([\S]+)\s*)?
    
    (?:\s+FROM\s+)([\s\S]+?)
    (?=\s+WHERE\s+|\s+GROUP\sBY\s+|\s+HAVING\s+|\s+ORDER\sBY|$)
    
    (?:.*?\s+WHERE\s+([\s\S]+?)
    (?=\s+GROUP\sBY\s+|\s+HAVING\s+|\s+ORDER\sBY|$))?
    
    (?:.*?\s+GROUP\sBY\s+([\s\S]+?)
    (?=\s+HAVING\s+|\s+ORDER\sBY|$))?
    
    (?:.*?\s+HAVING\s+([\s\S]+?)
    (?=\s+ORDER\sBY\s+|$))?
    
    (?:.*?\s+ORDER\sBY\s+([\s\S]+?)
    (?=\s*$))?
    $
    

    Thus, this is the proof that lookarounds do work also in MS Access VBA, and that the question was wrong in guessing that lookaheads cannot be read.

    MS Access output:

    enter image description here

    Mind that the same RegEx will not work for other SQL output styles:

    enter image description here

    For that to work, there is already the other working RegEx further above. You cannot run that in MS Access VBA, but it works.