Search code examples
phpregexpreg-replaceregex-lookaroundslookbehind

Use String for Pattern but Exclude it from Being Removed


i'm pretty new on regex, i have learned something by the way, but is still pour knowledge!

so i want ask you for clarification on how it work!

assuming i have the following strings, as you can see they can be formatted little different way one from another but they are very similar!

DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE

now i want replace everything between the first A-Z block and the colon so for example i would keep

DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART

so on my very noobs knowledge i have worked out this shitty regex! :-(

preg_replace( '/^[A-Z](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );

but why i'm sure this regex will not work!? :-)

Pls help me!

PS: the title of question is pretty explaned, i want also know how for example use a well know string block for match another...

preg_replace( '/^[DTSTART](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );

..without delete DTSTART

Thanks for the time!

Regards Luca Filosofi


Solution

  • You could use a relatively simple regex like the following.

    $subject = 'DTSTART;TZID="America/Chicago":20030819T000000
    DTEND;TZID="America/Chicago":20030819T010000
    DTSTART;TZID=US/Pacific
    DTSTART;VALUE=DATE';
    
    echo preg_replace('/^[A-Z]+\K[^:\n]*/m', '', $subject) . PHP_EOL;
    

    It looks for a series of capital letters at the start of a line, resets the match starting point (that's what \K does) to the end of those and matches anything not a colon or newline (i.e. the parts you want to remove). Those matched parts are then replaced with an empty string.

    The output from the above would be

    DTSTART:20030819T000000
    DTEND:20030819T010000
    DTSTART
    DTSTART
    

    If the lines that you are interested in will only ever start with DTSTART or DTEND then we could be more precise about what to match (e.g. ^DT(?:START|END)) but [A-Z] obviously covers both of those.