Search code examples
phparrayspreg-split

preg_split with array delimiters with comma not split the array


I have an array with a list in an array and I have to split to find next a value

$artista_inserito = 'DEN HARROW';
$tutti_artisti_data_ora = [
    ['time_artisti' => '18:31:00', 'artista_artisti' => 'LUIS RODRIGUEZ & DEN HARROW', 'data_artisti' => '2020-04-09'],
    ['time_artisti' => '18:32:00', 'artista_artisti' => 'J BALVIN', 'data_artisti' => '2020-04-09'],
    ['time_artisti' => '18:33:00', 'artista_artisti' => 'THE BLACK EYED PEAS VS. J BALVIN', 'data_artisti' => '2020-04-08'],
    ['time_artisti' => '18:34:00', 'artista_artisti' => 'THE BLACK EYED PEAS FT J BALVIN', 'data_artisti' => '2020-04-09'],
    ['time_artisti' => '18:35:00', 'artista_artisti' => 'J BALVIN, DEN HARROW', 'data_artisti' => '2020-04-09'],
];
//here a list of delimiter
$databaseDelimiters = array('FEAT', 'feat', 'FT', 'ft', '+', 'AND', 'and', 'E', 'e', 'VS', 'vs', 'FEAT.', 'feat.', 'FT.', 'ft.', 'VS.', 'vs.', ',', '&', 'X', 'x', ', ', ',');

$artistDelimiters = '~ (?:' . implode('|', array_map(function ($v) {
    return preg_quote($v, '~');
}, $databaseDelimiters)) . ') ~';

$artists = array_flip(preg_split($artistDelimiters, $artista_inserito));
$result = [];
$autore_duplicato_stringa = '';
foreach ($tutti_artisti_data_ora as $row) {
    foreach (preg_split($artistDelimiters, $row['artista_artisti']) as $artist) {
// print the output with every artist
        echo $artist . '<br>';
    }
}

at now the output is $artista_artisti split by delimiters

LUIS RODRIGUEZ
DEN HARROW
J BALVIN
THE BLACK EYED PEAS
J BALVIN
THE BLACK EYED PEAS
J BALVIN
J BALVIN, DEN HARROW

what's wrong? the last row must be

J BALVIN
DEN HARROW

why the comma is not recognized? thanks


Solution

  • The surrounding whitespace near the regex ~ delimiters is interfering with the , because it expects a trailing space. You can place spaces around the delimiting terms that require them, and remove spaces from the outer regex ~.

    // Put spaces only where needed
    $databaseDelimiters = array(' FEAT ',  ' feat ', ' FT ', ' ft ', ' + ', ' AND ', ' and ', ' E ', ' e ', ' VS ', ' vs ', ' FEAT. ', ' feat. ', ' FT. ',  ' ft. ', ' VS. ', ' vs. ', ',', '&', ' X ', ' x ', ', ', ',');
    
    // Remove the outer spaces from the map function
    $artistDelimiters = '~(?:' . implode('|', array_map(function ($v) {
    //-------------------^^^
        return preg_quote($v, '~');
    }, $databaseDelimiters)) . ')~';
    //--------------------------^^^
    

    This produces output like:

    LUIS RODRIGUEZ <br> DEN HARROW<br>J BALVIN<br>THE BLACK EYED PEAS<br>J BALVIN<br>THE BLACK EYED PEAS<br>J BALVIN<br>J BALVIN<br> DEN HARROW<br>
    

    You can trim() the individual values before appending the <br> if necessary.