php regex expression preg-match cpu-word

regex get everything except last word and until find some text

I have data in this format:

  1  DOPPEL TYP I MEERBLICK           HALBPENSION
 FRÜHBUCHER 20%
 INKL. REISELEITUNG UND TRANSFER AB/BIS
 FLUGHAFEN
 KEIN INFO-TREFFEN IM HOTEL! REISELEITUNG  IST TELEFONISCH ZU ERREICHEN UND AUF  ANFRAGE
 F367655  HERR WILKAT, CHRISTINE                           O 05.01.15
 F367655  HERR LEBEDIES, HANS-JOACHIM                      O 05.01.15

And I want to capture: 1 and DOPPEL TYP I MEERBLICK and all text between "FFRUHBUCHER.." and "ANFRAGE" ( So F367655 is the end delimiter) as different matches. However I have this regex that captures: 1 and HALBPENSION:

$re = "/\\s(\\d{1})(\\w+\\W{1,2})*/"; 
$str = " 1  DOPPEL TYP I MEERBLICK           HALBPENSION\n FRÜHBUCHER 20%\n INKL. REISELEITUNG UND TRANSFER AB/BIS\n FLUGHAFEN\n KEIN INFO-TREFFEN IM HOTEL! REISELEITUNG  IST TELEFONISCH ZU ERREICHEN UND AUF  ANFRAGE\n F367655  HERR WILKAT, CHRISTINE                           O 05.01.15\n F367655  HERR LEBEDIES, HANS-JOACHIM                      O 05.01.15"; 

preg_match_all($re, $str, $matches);

I am testing here: Regex101

So instead of capturing the last word ("HALPENSION") I want to capture everything except last word. And also whats after HALBPENSION(maybe other word) and before something like F367655 ("FRÜHBUCHER 20% INKL. REISELEITUNG UND TRANSFER AB/BIS FLUGHAFEN KEIN INFO-TREFFEN IM HOTEL! REISELEITUNG IST TELEFONISCH ZU ERREICHEN UND AUF ANFRAGE").

I have tried several solutions but I am not getting it to work.

Thank you in advance, for your help!

Solution

You can capture the first and second values with a preg_match using the following pattern:

 '~^\s*(\d+)\s*(.*\S) .*\R((?s:.*?))\R\h*F\d{6}~um'

See the regex demo

Details:

^ - start of string
\s* - 0+ leading whitespaces
(\d+) - Group 1 capturing 1+ digits
\s* - 0+ whitespaces
(.*\S) - Group 2 capturing 0+ any chars but a newline as many as possible up to the last non-whitespace (including) and
\h - 1 horizontal whitespace (not inside Group 2)
.* - the rest of the line
\R - a line break
((?s:.*?)) - Group 3 capturing 0+ any characters as few as possible up to the first
\R\h*F\d{6} - linebreak, 0+ horizontal whitespaces, F and 6 digits.

See PHP demo:

$str = " 1  DOPPEL TYP I MEERBLICK           HALBPENSION\n FRÜHBUCHER 20%\n INKL. REISELEITUNG UND TRANSFER AB/BIS\n FLUGHAFEN\n KEIN INFO-TREFFEN IM HOTEL! REISELEITUNG  IST TELEFONISCH ZU ERREICHEN UND AUF  ANFRAGE\n F367655  HERR WILKAT, CHRISTINE                           O 05.01.15\n F367655  HERR LEBEDIES, HANS-JOACHIM                      O 05.01.15"; 
preg_match('~^\s*(\d+)\s*(.*\S) .*\R((?s:.*?))\R\h*F\d{6}~um', $str, $m);
array_shift($m);
print_r($m);