Search code examples
c#searchstreamreaderftpwebrequest

Speed up searching/reading from StreamReader


I'm trying to to download a list of file names present in an FTP server and once I retrieve all names I use a StreamReader object and try to search through all file names to check the existence of a substring contained by any file present in that ftp.

For example, if file names are like

0000730970-0633788104-20140422073022-0633788104.PDF

0000730970-0633789720-20140422101011-0633789720.PDF

0000730970-0633798535-20140425075011-0633798535.PDF

0000730970-0633798536-20140425075011-0633798536.PDF

0000730970-0633804266-20140428124147-0633804266.PDF

0000730970-0633805880-20140429065011-0633805880.PDF

I'll search "0633798535" (the second or last substring separated by dash, because that's the only info I have about those files present in that ftp, do not know the full file name). Bellow code I'm using to do this

try{
browseRequest = (FtpWebRequest)FtpWebRequest.Create(ftpAddress);

browseRequest.Credentials = new NetworkCredential(username, password);
browseRequest.UsePassive = true;
browseRequest.UseBinary = true;
browseRequest.KeepAlive = true;

browseRequest.Method = WebRequestMethods.Ftp.ListDirectory;
response = (FtpWebResponse)browseRequest.GetResponse();
responseStream = response.GetResponseStream();

if (responseStream != null)
{
    using (StreamReader reader = new StreamReader(responseStream))
    {
        while (!reader.EndOfStream && !isDownloaded)
        {
            string fileName = reader.ReadLine().ToString();
            if (fileName.Contains(subStringToBeFind)) //search for the first encounter
            {
                //download the file
                isDownloaded = true; //initially false
            }
        }
    }
}
}

Here I'm using sequential search to find the file name. But the problem is if the quantity of files is large the searching becomes slow, say for 82000 file names if I'm looking for the last file it takes like 2 minuets to search out. Because of this the application is slow. So, I need help to accelerate the searching. Is there any way to use binary search or something else to improve the searching time.


Solution

  • You can only use a binary search if you already have all the data (and if it's sorted, which it looks like it may be here). I strongly suspect that the bottleneck isn't the Contains method here - I would expect it to be the data transfer. That already looks like it's reasonably efficient, although I would make three changes:

    • Use the fact that ReadLine() returns null at the end of input rather than using EndOfStream
    • Use the fact that ReadLine() is declared to return string - you don't need to call ToString. (This won't be hurting your performance, but it's ugly.)
    • Use a using statement for the response and the response stream. You may be okay because you've got a using statement for the reader, but you should at least have one for the response itself.

    So:

    string line;
    while (!isDownloaded && (line = reader.ReadLine()) != null)
    {
        if (line.Contains(target))
        {
            isDownloaded = true;
        }
    }
    

    To validate that it really is the network that's the issue rather than the Contains call, try separating the two (just for diagnostic purposes; you don't want to do this in reality, because you want to be able to stop as soon as you've found the file):

    • Fetch all the filenames, and store them in a file (or in memory)
    • Search through the filenames

    Time both steps - I would be astonished if you didn't find that the first step took almost all the time. Searching through 82000 strings using Contains should be very, very quick.