Search code examples
ftpwgetlftp

Speeding up lftp mirroring with many directories


I am trying to mirror a public FTP to a local directory. When I use wget -m {url} then wget quite quickly skips lots of files that have been already downloaded (and no newer version exists), when I use lftp open -u user,pass {url}; mirror then lftp sends MDTM for every file before deciding whether to download the file or not. With 2 million+ files in 50 thousand+ directories this is very slow, besides I get error messages that MDTM of directories could not be obtained.

In the manual it says that using set sync-mode off will result in sending all requests at once, so that lftp doesn't wait for each response. When I do that, I get error messages from the server saying there are too many connections from my IP address.

I tried running wget first to download only the newer files, but this does not delete the files which were removed from the FTP server, so I follow up with lftp to remove the old files, however lftp still sends MDTM on each file, which means that there is no advantage to this approach.

If I use set ftp:use-mdtm off, then it seems that lftp just downloads all files again.

Could someone suggest the correct setting for lftp with large number of directories/files (specifically, so that it skips directories which were not updated, like wget seems to do)?


Solution

  • Use set ftp:use-mdtm off and mirror --ignore-time for the first invocation to avoid re-downloading all the files.

    You can also try to upgrade lftp and/or use set ftp:use-mlsd on, in this case lftp will get precise file modification time from the MLSD command output (provided that the server supports the command).