Search code examples
javascriptfile-uploadchunks

Merging file chunks in PHP


For the educational purposes, I wanted to create file chunks upload. How do you guys know when all of the chunks are uploaded?

I tried to move chunks from temp and renaming them so they are in correct order, and then with the last chunk merge them together. However the last piece sent is not the last piece received, I guess. So fopen() on chunks fails since they're not created yet, and I get final file with the size exactly the size of the last chunk.

I believe I could send chunks one by one using .onload event on xhr, that way I wouldn't have to even move them from PHP temp, but I'm wondering if there are different solutions.

Some basic code to please you:

function upload(file) {
  var BYTES_PER_CHUNK = parseInt(2097152, 10),
  size = file.size,
  NUM_CHUNKS = Math.max(Math.ceil(SIZE / BYTES_PER_CHUNK), 1),
  start = 0, end = BYTES_PER_CHUNK, num = 1;

  var chunkUpload = function(blob) {
    var fd = new FormData();
    var xhr = new XMLHttpRequest();

    fd.append('upload', blob, file.name);
    fd.append('num', num);
    fd.append('num_chunks', NUM_CHUNKS);
    xhr.open('POST', '/somedir/upload.php', true);
    xhr.send(fd);
  }

  while (start < size) {
    chunkUpload(file.slice(start, end));
    start = end;
    end = start + BYTES_PER_CHUNK;
    num++;
  }
}

And PHP:

$target_path = ROOT.'/upload/';

$tmp_name = $_FILES['upload']['tmp_name'];
$filename = $_FILES['upload']['name'];
$target_file = $target_path.$filename;
$num = $_POST['num'];
$num_chunks = $_POST['num_chunks'];

move_uploaded_file($tmp_name, $target_file.$num);

if ($num === $num_chunks) {
  for ($i = 1; $i <= $num_chunks; $i++) {

    $file = fopen($target_file.$i, 'rb');
    $buff = fread($file, 2097152);
    fclose($file);

    $final = fopen($target_file, 'ab');
    $write = fwrite($final, $buff);
    fclose($final);

    unlink($target_file.$i);
  }
}

Solution

  • Sorry for my previous comments, I misunderstood a question. This quiestion is interesting and fun to play with.

    The expression you are looking for is this:

    $target_path = ROOT.'/upload/';
    
    $tmp_name = $_FILES['upload']['tmp_name'];
    $filename = $_FILES['upload']['name'];
    $target_file = $target_path.$filename;
    $num = $_POST['num'];
    $num_chunks = $_POST['num_chunks'];
    
    move_uploaded_file($tmp_name, $target_file.$num);
    
    // count ammount of uploaded chunks
    $chunksUploaded = 0;
    for ( $i = 1, i <= $num; $i++ ) {
        if ( file_exists( $target_file.$i ) ) {
             ++$chunksUploaded;
        }
    }
    
    // and THAT's what you were asking for
    // when this triggers - that means your chunks are uploaded
    if ($chunksUploaded === $num_chunks) {
    
        /* here you can reassemble chunks together */
        for ($i = 1; $i <= $num_chunks; $i++) {
    
          $file = fopen($target_file.$i, 'rb');
          $buff = fread($file, 2097152);
          fclose($file);
    
          $final = fopen($target_file, 'ab');
          $write = fwrite($final, $buff);
          fclose($final);
    
          unlink($target_file.$i);
        }
    }
    

    And this must be mentioned:

    Point of fragility of my version - is when you expect files

    • 'tmp-1',

    • 'tmp-2',

    • 'tmp-3'

    but, let's assume that after sending 'tmp-2' we were interrupted - that tmp-2 pollutes tmp folder, and it will interfere with future uploads with the same filename - that would be a sleeping bomb.

    To counter that - you must find a way to change tmp to something more original.

    • 'tmp-ABCew-1',

    • 'tmp-ABCew-2',

    • 'tmp-ABCew-3'

    is a bit better - where 'ABCew' could be called 'chunksSessionId' - you provide it when sending your POST, you make it randomly. Still, collisions are possible - as space of random names depletes. You could add time to equation - for example - you can see that

    • 'tmp-ABCew-2016-03-17-00-11-22--1',

    • 'tmp-ABCew-2016-03-17-00-11-22--2',

    • 'tmp-ABCew-2016-03-17-00-11-22--3'

    Is much more collision-resistant but it is difficult to implement - a whole can of worms here - client date and time is controlled by client and could be spoofed - this data is unreliable.

    So making tmp-name unique is a complex task. Designing a system that makes it reliable - is an interesting problem ^ ^ You can play with that.