Search code examples
ftpwgetmirroring

How to archive an entire FTP server where many of the filenames seem to include illegal characters


I am trying to use wget -m <address> to download the contents of an FTP server. A lot of the content is icelandic and so contains a bunch of weird characters that I think are causing issues as I keep seeing:

Incomplete or invalid multibyte sequence encountered

I have tried adding flags such as --restrict-file-names=nocontrol but to no avail.

I have also tried using lftp but doesn't seem to make any difference.


Solution

  • According to wget manual

    If you specify ‘nocontrol’, then the escaping of the control characters is also switched off.

    that is it as actually more permissive than default, bunch of weird characters suggest you have some issues with getting encoding right and therefore ascii seems to be best fit for your use case

    The ‘ascii’ mode is used to specify that any bytes whose values are outside the range of ASCII characters (that is, greater than 127) shall be escaped. This can be useful when saving filenames whose encoding does not match the one used locally.

    As I do not have ability to test, please try it and write about result it give.