I'm trying to to download a list of file names present in an FTP server and once I retrieve all names I use a StreamReader object and try to search through all file names to check the existence of a substring contained by any file present in that ftp.
For example, if file names are like
0000730970-0633788104-20140422073022-0633788104.PDF
0000730970-0633789720-20140422101011-0633789720.PDF
0000730970-0633798535-20140425075011-0633798535.PDF
0000730970-0633798536-20140425075011-0633798536.PDF
0000730970-0633804266-20140428124147-0633804266.PDF
0000730970-0633805880-20140429065011-0633805880.PDF
I'll search "0633798535" (the second or last substring separated by dash, because that's the only info I have about those files present in that ftp, do not know the full file name). Bellow code I'm using to do this
try{
browseRequest = (FtpWebRequest)FtpWebRequest.Create(ftpAddress);
browseRequest.Credentials = new NetworkCredential(username, password);
browseRequest.UsePassive = true;
browseRequest.UseBinary = true;
browseRequest.KeepAlive = true;
browseRequest.Method = WebRequestMethods.Ftp.ListDirectory;
response = (FtpWebResponse)browseRequest.GetResponse();
responseStream = response.GetResponseStream();
if (responseStream != null)
{
using (StreamReader reader = new StreamReader(responseStream))
{
while (!reader.EndOfStream && !isDownloaded)
{
string fileName = reader.ReadLine().ToString();
if (fileName.Contains(subStringToBeFind)) //search for the first encounter
{
//download the file
isDownloaded = true; //initially false
}
}
}
}
}
Here I'm using sequential search to find the file name. But the problem is if the quantity of files is large the searching becomes slow, say for 82000 file names if I'm looking for the last file it takes like 2 minuets to search out. Because of this the application is slow. So, I need help to accelerate the searching. Is there any way to use binary search or something else to improve the searching time.
You can only use a binary search if you already have all the data (and if it's sorted, which it looks like it may be here). I strongly suspect that the bottleneck isn't the Contains
method here - I would expect it to be the data transfer. That already looks like it's reasonably efficient, although I would make three changes:
ReadLine()
returns null
at the end of input rather than using EndOfStream
ReadLine()
is declared to return string
- you don't need to call ToString
. (This won't be hurting your performance, but it's ugly.)using
statement for the response and the response stream. You may be okay because you've got a using
statement for the reader, but you should at least have one for the response itself.So:
string line;
while (!isDownloaded && (line = reader.ReadLine()) != null)
{
if (line.Contains(target))
{
isDownloaded = true;
}
}
To validate that it really is the network that's the issue rather than the Contains
call, try separating the two (just for diagnostic purposes; you don't want to do this in reality, because you want to be able to stop as soon as you've found the file):
Time both steps - I would be astonished if you didn't find that the first step took almost all the time. Searching through 82000 strings using Contains
should be very, very quick.