Search code examples
phpregexpreg-replacepreg-matchpcre

how to switch from preg_match to preg_replace?


see this code below:

comes from: http://www.damnsemicolon.com/php/php-parse-email-body-email-piping

//get rid of any quoted text in the email body
$body_array = explode("\n",$body);
$message = "";
foreach($body_array as $key => $value){

    //remove hotmail sig
    if($value == "_________________________________________________________________"){
        break;

    //original message quote
    } elseif(preg_match("/^-*(.*)Original Message(.*)-*/i",$value,$matches)){
        break;

    //check for date wrote string
    } elseif(preg_match("/^On(.*)wrote:(.*)/i",$value,$matches)) {
        break;

    //check for From Name email section
    } elseif(preg_match("/^On(.*)$fromName(.*)/i",$value,$matches)) {
        break;

    //check for To Name email section
    } elseif(preg_match("/^On(.*)$toName(.*)/i",$value,$matches)) {
        break;

    //check for To Email email section
    } elseif(preg_match("/^(.*)$toEmail(.*)wrote:(.*)/i",$value,$matches)) {
        break;

    //check for From Email email section
    } elseif(preg_match("/^(.*)$fromEmail(.*)wrote:(.*)/i",$value,$matches)) {
        break;

    //check for quoted ">" section
    } elseif(preg_match("/^>(.*)/i",$value,$matches)){
        break;

    //check for date wrote string with dashes
    } elseif(preg_match("/^---(.*)On(.*)wrote:(.*)/i",$value,$matches)){
        break;

    //add line to body
    } else {
        $message .= "$value\n";
    }

}

//compare before and after
echo "$body<br><br><br>$message";

$body contains the complete email body including quoted area if this is a reply, this loop removes quoted area to get new reply as $message. But as suggested there, loop is slow and better to use preg_replace instead. so how can I do?

replace patterns with what? should I remove foreach loop too? I created below without foreach loop but seems wrong? please advice.

$patterns = array(
"_________________________________________________________________",
"/^-*(.*)Original Message(.*)-*/i",
"/^On(.*)wrote:(.*)/i",
"/^On(.*)$fromName(.*)/i",
"/^On(.*)$toName(.*)/i",
"/^(.*)$toEmail(.*)wrote:(.*)/i",
"/^(.*)$fromEmail(.*)wrote:(.*)/i",
"/^>(.*)/i",
"/^---(.*)On(.*)wrote:(.*)/i");

$message = preg_replace($patterns, '', $body);

Solution

  • You already narrowed it down to a workable solution. Only a few things to fix:

    1. As @mario commented, you need to set the /m modifier for ^s to match at the beggining of each line.
    2. Your first pattern needs to be enclosed with delimiters, and anchored to ^ and to the end of line to mantain the same meaning as in the original code.
    3. Include the newline chars in order to remove the whole line.
    4. Make sure the variables $fromName, $fromEmail, etc. are set.
    5. Once you get a match, match everything from there to the end of the body with (?s:.*).

    Code:

    $patterns = array(
        "/^_{30,}$(?s:.*)/m",
        "/^.*Original Message(?s:.*)/im",
        "/^(?:---.*)?On .* wrote:(?s:.*)/im",
        "/^On .* $fromName(?s:.*)/im",
        "/^On .* $toName(?s:.*)/im",
        "/^.*$toEmail(.*)wrote:(?s:.*)/im",
        "/^.*$fromEmail.* wrote:(?s:.*)/im",
        "/^>.*/ims",
    );
    $message = preg_replace($patterns, '', $body);
    echo "$body<br><br><br>$message";
    

    Run this code here


    A word of advice:

    Take into account that it will also strip lines like:

    only thing I wrote: ...