The key words are "*OR" or "*AND".
Suppose I have the string below:
This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.
I want the following
group1 "This is a t3xt with special characters like !#."
group2 "*AND"
group3 "and this is another text with special characters"
group4 "*AND"
group5 "this repeats"
group6 "*OR"
group7 "do not repeat"
group8 "*OR"
group9 "have more strings"
group10 "*AND"
group11 "finish with this string."
I have tried like this:
(.+?)(\*AND\*OR)
but it only gets the first string then I need to keep repeating the code to collect the others, but the problem is that there are strings that have only one *AND, or only one *OR or dozens of it, that is pretty random. And the regex below also does not work:
((.+?)(\*AND\*OR))+
For example:
This is a t3xt with special characters like !#. *AND and this is another text with special characters
PHP has a preg_split
function for this sort of thing. preg_split
allows you to split a string by a delimiter you can define as a regex pattern. In addition, it has an argument that allows you to include the matched delimiter in the matched/split results.
So, instead of writing a regex to match the full text, the regex is for the delimiter itself.
Example:
$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
$string = preg_split('~(\*(?:AND|OR))~',$string,0,PREG_SPLIT_DELIM_CAPTURE);
print_r($string);
Output:
Array
(
[0] => This is a t3xt with special characters like !#.
[1] => *AND
[2] => and this is another text with special characters
[3] => *AND
[4] => this repeats
[5] => *OR
[6] => do not repeat
[7] => *OR
[8] => have more strings
[9] => *AND
[10] => finish with this string.
)
But if you really want to stick with using preg_match
, you will instead need to use preg_match_all
, which is similar to preg_match
(what you tagged in your question), except that it does global/repeated matches.
Example:
$string = "This is a t3xt with special characters like !#. *AND and this is another text with special characters *AND this repeats *OR do not repeat *OR have more strings *AND finish with this string.";
preg_match_all('~(?:(?:(?!\*(?:AND|OR)).)+)|(?:\*(?:AND|OR))~',$string,$matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => This is a t3xt with special characters like !#.
[1] => *AND
[2] => and this is another text with special characters
[3] => *AND
[4] => this repeats
[5] => *OR
[6] => do not repeat
[7] => *OR
[8] => have more strings
[9] => *AND
[10] => finish with this string.
)
)
First, note that unlike preg_split
, preg_match_all
(and preg_match
) return a multi-dim array, not a single-dim. Secondly, technically, the pattern I used could be simplified a bit, but it would come at a cost of having to reference multiple arrays in the multi-dim array returned (one array for the matched text, and another array for the matched delimiters), that you would then have to loop through and alternate reference; IOW there would be additional cleanup to get a final single array with both match sets, as above.
I only show this method because you technically asked for it in your question, but I recommend using preg_split
, as it takes away a lot of this overhead, and why it was created in the first place (to better solve scenarios like this).