Search code examples
phpalgorithmarraysarchivemultipart

How to optimize an algorithm for matching multipart rar files from input in php


I'm looking for a better optimized way to find and group multipart archives from an array of filenames

I have as an input for example:

array(
books.part1.rar,
books.part3.rar,
00000114.rar,
svoy_20ostrov.rar,
svoy_20ostrov.rar,
koncert_20v_20dk_20mir.rar,
koncert_20v_20centralnom_20teatre_20kukol.rar,
LP_LIVE_PR_Tampa.part2.rar,
koncert_20v_20dk_20vami.rar,
koncert_20v_20dk_20kommuna_20chast1.rar,
books.part2.rar,
koncert_20v_20dk_20kommuna_20chast2.rar,
books.part4.rar,
recedivist.rar,
LP_LIVE_PR_Tampa.part1.rar
)

And I'm looking for the output

array(  

array(

books.part1.rar
books.part2.rar
books.part3.rar
books.part4.rar ) ,

00000114.rar
svoy_20ostrov.rar
koncert_20v_20dk_20mir.rar
koncert_20v_20centralnom_20teatre_20kukol.rar
koncert_20v_20dk_20vami.rar

array(
koncert_20v_20dk_20kommuna_20chast1.rar
koncert_20v_20dk_20kommuna_20chast2.rar
)

recedivist.rar
array (
LP_LIVE_PR_Tampa.part1.rar
LP_LIVE_PR_Tampa.part2.rar
)
)

I'm using php as a programming language, by the way,

An idea was to match with a regular expression files like (.+).part1.rar then when found , match all the other part([0-9]+).rar (other foreach required that loops through all array) and when found unset() those entries and add them to the new constructed array


Solution

  • I would sort the array first and then loop through it, performing the Levenshtein() function on the next entry.

    $rars = array(
        books.part1.rar,
        books.part3.rar,
        00000114.rar,
        svoy_20ostrov.rar,
        svoy_20ostrov.rar,
        koncert_20v_20dk_20mir.rar,
        koncert_20v_20centralnom_20teatre_20kukol.rar,
        LP_LIVE_PR_Tampa.part2.rar,
        koncert_20v_20dk_20vami.rar,
        koncert_20v_20dk_20kommuna_20chast1.rar,
        books.part2.rar,
        koncert_20v_20dk_20kommuna_20chast2.rar,
        books.part4.rar,
        recedivist.rar,
        LP_LIVE_PR_Tampa.part1.rar
    )
    
    sort($rars);
    $current = 0;
    $rars_complete = array();
    foreach($rars as $i=>$rar) {
        $next = ($i + 1) < count($rars)) ? $i + 1 : false;
        $rars_complete[$current][] = $rar;
        if($next != false && levenshtein($rar, $rars[$next]) == 1)
            continue;
        else
            $current++;
    }
    

    Note, this is not tested.