I'm looking for a better optimized way to find and group multipart archives from an array of filenames
I have as an input for example:
array(
books.part1.rar,
books.part3.rar,
00000114.rar,
svoy_20ostrov.rar,
svoy_20ostrov.rar,
koncert_20v_20dk_20mir.rar,
koncert_20v_20centralnom_20teatre_20kukol.rar,
LP_LIVE_PR_Tampa.part2.rar,
koncert_20v_20dk_20vami.rar,
koncert_20v_20dk_20kommuna_20chast1.rar,
books.part2.rar,
koncert_20v_20dk_20kommuna_20chast2.rar,
books.part4.rar,
recedivist.rar,
LP_LIVE_PR_Tampa.part1.rar
)
And I'm looking for the output
array(
array(
books.part1.rar
books.part2.rar
books.part3.rar
books.part4.rar ) ,
00000114.rar
svoy_20ostrov.rar
koncert_20v_20dk_20mir.rar
koncert_20v_20centralnom_20teatre_20kukol.rar
koncert_20v_20dk_20vami.rar
array(
koncert_20v_20dk_20kommuna_20chast1.rar
koncert_20v_20dk_20kommuna_20chast2.rar
)
recedivist.rar
array (
LP_LIVE_PR_Tampa.part1.rar
LP_LIVE_PR_Tampa.part2.rar
)
)
I'm using php as a programming language, by the way,
An idea was to match with a regular expression files like (.+).part1.rar then when found , match all the other part([0-9]+).rar (other foreach required that loops through all array) and when found unset() those entries and add them to the new constructed array
I would sort the array first and then loop through it, performing the Levenshtein() function on the next entry.
$rars = array(
books.part1.rar,
books.part3.rar,
00000114.rar,
svoy_20ostrov.rar,
svoy_20ostrov.rar,
koncert_20v_20dk_20mir.rar,
koncert_20v_20centralnom_20teatre_20kukol.rar,
LP_LIVE_PR_Tampa.part2.rar,
koncert_20v_20dk_20vami.rar,
koncert_20v_20dk_20kommuna_20chast1.rar,
books.part2.rar,
koncert_20v_20dk_20kommuna_20chast2.rar,
books.part4.rar,
recedivist.rar,
LP_LIVE_PR_Tampa.part1.rar
)
sort($rars);
$current = 0;
$rars_complete = array();
foreach($rars as $i=>$rar) {
$next = ($i + 1) < count($rars)) ? $i + 1 : false;
$rars_complete[$current][] = $rar;
if($next != false && levenshtein($rar, $rars[$next]) == 1)
continue;
else
$current++;
}
Note, this is not tested.