Search code examples
perlgoogle-cloud-storagecorruptioncpgsutil

gsutil cp: concurrent execution leads to local file corruption


I have a Perl script which calls 'gsutil cp' to copy a selected from from GCS to a local folder:

$cmd = "[bin-path]/gsutil cp -n gs://[gcs-file-path] [local-folder]";
$output = `$cmd 2>&1`;

The script is called via HTTP and hence can be initiated multiple times (e.g. by double-clicking on a link). When this happens, the local file can end up being exactly double the correct size, and hence obviously corrupt. Three things appear odd:

  1. gsutil seems not to be locking the local file while it is writing to it, allowing another thread (in this case another instance of gsutil) to write to the same file.

  2. The '-n' seems to have no effect. I would have expected it to prevent the second instance of gsutil from attempting the copy action.

  3. The MD5 signature check is failing: normally gsutil deletes the target file if there is a signature mismatch, but this is clearly not always happening.

The files in question are larger than 2MB (typically around 5MB) so there may be some interaction with the automated resume feature. The Perl script only calls gsutil if the local file does not already exist, but this doesn't catch a double-click (because of the time lag for the GCS transfer authentication).

gsutil version: 3.42 on FreeBSD 8.2

Anyone experiencing a similar problem? Anyone with any insights?

Edward Leigh


Solution

  • 1) You're right, I don't see a lock in the source.

    2) This can be caused by a race condition - Process 1 checks, sees the file is not there. Process 2 checks, sees the file is not there. Process 1 begins upload. Process 2 begins upload. The docs say this is a HEAD operation before the actual upload process -- that's not atomic with the actual upload.

    3) No input on this.

    You can fix the issue by having your script maintain an atomic lock of some sort on the file prior to initiating the transfer - i.e. your check would be something along the lines of:

    use Lock::File qw(lockfile);
    
    if (my $lock = lockfile("$localfile.lock", { blocking => 0 } )) {
         ... perform transfer ...
         undef $lock;
    }
    else {
        die "Unable to retrieve $localfile, file is locked";
    }