Search code examples
djangoamazon-s3django-staticfilesdjango-storage

django collectstatic with django-storages recopying all files


I am using django.contrib.staticfiles along with django-storages for deploying my static files to Amazon S3. The django version that I am using is 1.10.4 and django-storages version is 1.5.2.

Now when I run collectstatic, it re-copies all files from local system to S3 even when there is no change in the files locally. Looking at the collectstatic management command code I can see:

In method delete_file:

            # The full path of the target file
            if self.local:
                full_path = self.storage.path(prefixed_path)
            else:
                full_path = None
            # Skip the file if the source file is younger
            # Avoid sub-second precision (see #14665, #19540)
            if (target_last_modified.replace(microsecond=0) >= source_last_modified.replace(microsecond=0) and
                    full_path and not (self.symlink ^ os.path.islink(full_path))):
                if prefixed_path not in self.unmodified_files:
                    self.unmodified_files.append(prefixed_path)
                self.log("Skipping '%s' (not modified)" % path)
                return False

On debugging I saw that even though the target_last_modified >= source_last_modified but full_path is None which is why the check fails and it ends up deleting the file on remote. I am not sure what am I doing wrong or if I have missed some setting because of which it is re-uploading the files. Interestingly if I remove the extra check in the code above and just check like:

if (target_last_modified.replace(microsecond=0) >= source_last_modified.replace(microsecond=0)):

it works fine.

I have seen similar questions on SO, but they are mostly due to different timezones of S3 vs local system. In my case both my local timezone and S3 bucket zone are same. In any case the above hack shows that the issue is not due to timezone difference.


Solution

  • Our solution was to use Collectfast:

    https://github.com/jazzband/collectfast

    It caches and compares md5 checksums of files before uploading. We'd love to know the root cause of the issue, but this resolved the slowness.