Search code examples
ftppagingfilezillaftpsyield-return

Does FTP support paging?


During some testing, one of our teams reported timeouts attempting to access a directory via FTP. The cause was a bug in their code which had caused millions of tiny files to be created.

From my understanding the reason for the timeout is the request asks for the directory's contents to be listed, and waits for a single response with all files.

If instead the server immediately started returning results as they were found (think: yield return vs return), this would stave off the timeout. Similarly, if there were some option to return paged data, that may give us a workaround.

Since FTP is request-response, rather than request-response-response-... I'm imagining the yield return scenario is not possible; but some form of paging may be. That said, perhaps this would not give a solution since paging implies some form of sorting, which itself would incur an overhead scaling with the number of files.

NB: This is a question from curiosity; our real issue is resolved as I simply purged the directory (https://stackoverflow.com/a/6208144/361842) to resolve the issue. However, my thinking is if there was an option to drip feed results back, the number of items in the folder would cease to be a potential issue (so long as we're not sorting / filtering / etc the results before they're returned). We're using FileZilla Server, and a .Net client (System.Net.FtpWebRequest); but since this is theoretical I'm interested in generic answers more than those specific to our implementation.


Solution

  • FTP does not have any explicit paging support. The FTP protocol does not concern itself with the issue you describe. For a directory listing, a new TCP connection is opened and everything from the first byte to the last byte on that connection is assumed to be the directory listing.

    So - the server is free to stream back the directory listing however/whenever it like, and a client is free to display the data it receives however/whenever it likes. The server can send a few of the directory listings back whenever it wants, wait a bit, send a few more entries, etc, and the client is free to display the directory listing as it arrives, or display it all at once when the entire response is received.

    But note that FTP servers would normally be bound by the OS API to list files. Depending on the OS, the filesystem etc. that API call to list files in a directory could block and take a very long time for a directory with many small files, and basically return all the listings back to the FTP server in one go.