Search code examples
phpregexfilenamestext-parsingpreg-split

How to parse a mostly consistent filename into meaningful parts?


I have filenames like:

1234_56_78 A_FAIRLY_SHORT_TITLE_D.pdf

Luckily, the file naming is pretty consistent, but I can't absolutely guarantee that someone didn't use a space where they should have used an underscore.

With this in mind, I want to parse the string and extract the following details:

$project_no = '1234
$series_no = '56
$sheet_no = '78'
$revision = 'D'
$title = 'A Fairly Short Title' 

Presently, I use the following to grab this info:

$filename = $_FILES['file']['name'][$i];
$filename = preg_replace('/\\.[^.\\s]{3,4}$/', '', $filename);
$parts = preg_split( "(_| )", $filename );
$project_no = $parts[0];
$series_no = $parts[1];
$sheet_no = $parts[2];
$revision = end($parts);

$title is simply everything that's left over after removing $parts[0] $parts[1], $parts[2], and end($parts), but how should I express that?

I thought I might be able to use

$title = implode(' ',\array_diff_key($parts, [0,1,2,end($parts)]));

But this doesn't remove the $revision bit at the end...

$title = FLOOR AS PROPOSED D

What am I missing, and am I unnecessarily over-complicating this?


Solution

  • The array_diff_key looks at key comparison of both arrays. end() just moves the internal pointer of the array and is actually useless since the value returned from it can't be used in computing difference between 2 arrays' keys.

    Current comparison behaves as

    array_diff_key([0,1,2,3,4,5,6,7], [0,1,2,'D'])
    

    which looks key wise as:

       array_diff_key([0,1,2,3,4,5,6,7], [0,1,2,3])
    

    Hence, the end result of implode is concatenation of 4,5,6,7 keys' values.

    To make the second parameter array values as keys, you can use array_flip to make keys as values and values as keys with the below expression:

    $title = implode(' ',\array_diff_key($parts, array_flip([0,1,2,count($parts)-1])));
    

    Demo: https://3v4l.org/J6b5r