Search code examples
linuxbashamazon-web-servicesamazon-s3nfs

Sync files from Amazon S3 to local


I have a Amazon S3 bucket with multiple directories. These directories contain all sorts of important files.

I want to backup my S3 bucket and save it to a nas-server here at my local network. I have written a bash script which runs once every day.

The scripts most important part is this:

sudo aws s3 sync s3://$s3bucket/$s3folder $localpath --size-only>> $LOG_DIR/$LOG_FILE

I am certain that all variables are set and correct. The files are copied to the right locations, but for one subfolder I get this error:

warning: Skipping file /fullPathToLocalLocation/bucket/subfolder. File/Directory is not readable.

The permissions for this folder are exactly the same for this folder as for the other subfolders.

When I execute this command:

ls -l

I get this error:

And when I compare the output of:

ls -l | wc -l

with the output of:

ls -1 | sort | uniq | wc -l

the result is different(309774 vs. 309772) The console also displays the error:

ls: reading directory .: Too many levels of symbolic links

I've also checked the output of

dmesg | tail

and it had this error:

[11823.884616] NFS: directory bucket/subfolder contains a readdir loop.Please contact your server vendor.  The file: randomfilename.pdf has duplicate cookie 124416205

I've deleted the file with the duplicate cookie from my nas and retried the sync between my s3 bucket and my local nas, this time it worked. But the second time I tried it displayed the same error as before.

Is it possible that I have 2 times the same file on s3, one time with the extension in uppercase and one time with the extension in lower case, and this causes the issue?

Every time this sync error occurs, it redownloads the whole subfolder from S3 and does not just sync it. I only noticed this after a couple of days after it already downloaded 2 Tb from Amazon by continuously overriding the files on my local nas. Because of this, I received a very high bill from Amazon.

Has anyone else experienced an issue like this or knows a way to solve this? An obvious solution might be to delete the files which are causing this issue from S3 itself, but I can't seem to be able to list more than one file for each filename using command line tools for S3 or cyberduck. Maybe they only show one file when there are multiple with the same name.

Thank you for reading this till the end.


Solution

  • You might be running in to this issue: https://bugzilla.kernel.org/show_bug.cgi?id=46671

    It's a problem with NFS, not AWS. Try running your sync script directly on the NAS to see if that resolves the problem.

    Apparently this problem has been fixed with ext4 on newer Linux kernels - you might be able to update your NAS to pick up this fix.