Search code examples
phpregexcsvdelimiterpreg-replace-callback

Removing line breaks between 2 different character sequences


I'm editing a csv file which contains hidden line breaks. When I apply the following php script, the line breaks are successfully removed from the entire file.

$csvFileNew = str_replace(array("\r", "\n"), '', $csvFileOld);

But I only want to remove these line breaks from within certain sections of the file - between all occurrences of the following (enclosed in single quotes):

$start = ',""';
$end = '",';

My current script looks like this:

$csvFileOld = "pre text appears here....,\"\"key words appear here\"\",....post text appears here";
//line break always appears between the final 2 quotation marks

$csvFileNew = preg_replace_callback('`,""([^"]*)",`', function($matches)
    {
        return str_replace(array("\r", "\n"), '', $matches[1]);

    },$csvFileOld);

Unfortunately, this script doesn't remove the line break - I'm assuming the regex I've used doesn't grab enough. Can anyone suggest an elegant solution?

I know it won't be possible for answers to include a working example because of the line break, however I'm really just after a solution which grabs the correct content between the delimiters.


Solution

  • Your can use

    <?php
    
    $csvFileOld = "pre text appears here....,\"\"key\n\n words\r\n appear\r\n here\"\",....post text appears here";
    //line break always appears between the final 2 quotation marks
    
    $csvFileNew = preg_replace_callback('`,""(.*?)",`s', function($matches)
    {
        return str_replace(array("\r", "\n"), '', $matches[1]);
    },$csvFileOld);
    echo $csvFileNew;
    

    See the PHP demo.

    The ,""(.*?)", regex now matches from ,"" till the first occurrence of ", substring.

    The s flag is added to allow dots to match across lines.