Tokenizing a string with a regular expression

Suppose I have a string like this: abc def ghi jkl (I put a space at the end for the sake of simplicity but it doesn't really matter for me) and I want to capture its "chunks" as follows:

abc

def

ghi

jkl

if and only if there are 1-4 "chunks" in the string. I have already tried the following regex:

^([^ ]+ ){1,4}$

at Regex101.com but it only captures the last occurrence. A warning about it is issued:

A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data

How to correct the regular expression to achieve my goal?

Solution

Since you have no access to the code, the only solution you might use is a regex based on the \G operator that will only allow consecutive matches and a lookahead anchored at the start that will require 1 to 4 non-whitespace chunks in the string.

(?:^(?=\s*\S+(?:\s+\S+){0,3}\s*$)|\G(?!^))\s*\K\S+

See the regex demo

Details:

(?:^(?=\s*\S+(?:\s+\S+){0,3}\s*$)|\G(?!^)) - a custom boundary that checks if:
- ^(?=\s*\S+(?:\s+\S+){0,3}\s*$) - the string start position (^) that is followed with 1 to 4 non-whitespace chunks, separated with 1+ whitespaces, and trailing/leading whitespaces are allowed, too
- | - or
- \G(?!^) - the current position at the end of the previous successful match (\G also matches the start of a string, thus we have to use the negative lookahead to exclude that matching position, since there is a separate check performed)
\s* - zero or more whitespaces
\K - a match reset operator discarding all the text matched so far
\S+ - 1 or more characters other than whitespace