We have a server that does image manipulation based on a query string and then renders the result. The result is also cached for 90 days. Because of the complexities, some manipulations can take 6-7 seconds.
A marketplace where we list some of our products has recently reduced their timeout when fetching images to a low value causing most of the items in any given feed to fail the first time due to (their error message) "Image Timeout". When we resubmit the feed there are no such problems since our image server now has the images cached.
Please do not suggest asking the marketplace to change their timeout. They are ridiculously inflexible and uncooperative. Also, please do not suggest getting a more powerful image server. It is actually a massive farm and is not in my team's control.
That leaves me with one option. I need to "prime the cache" before sending the feed to the marketplace. The problem is that a feed can contain up to 5000 items which have at least 2 images each. That means 10,000 images.
I am using a HEAD
call since we don't need the image returned to us. I have tried using WebRequest
and even Socket
in the .Net Framework, called inside an async Task
(using Task.Run()`) but the CLR will only spin up somewhere around 20 tasks at a time. Since, on average, each image takes about 4 seconds (some up to 6-7 seconds, some only 1 second), you take 10,000 / 20 = 500 * 4 seconds = 2000 seconds = 33 1/3 minutes, which is not an acceptable wait on our end before we send the feed.
Since we don't actually need the reply from our image server, I tried to use an async request to the image server and that gets through the foreach
in record time but, as I found out, using that async request I am not guaranteed that the call is even triggered by the time the code that spins up all the tasks finishes so that doesn't help.
We use AWS so I have considered using Lambdas, but that would add extra complexity and expense, but the massive parallel ability there sounds like it would do the trick.
How can I fix this?
Test Server
public class HomeController : Controller {
private Random random;
public HomeController() {
random = new Random(DateTime.UtcNow.Millisecond);
}
public ActionResult Index(string url) {
var wait = random.Next(1, 70);
Thread.Sleep(wait * 100);
return Content(wait + " : " + url);
}
}
Test Client
class Program {
static void Main(string[] args) {
var tasks = new List<Task>();
for (var i = 0; i < 200; i++) {
Console.WriteLine(i.ToString());
var task = SendRequest("http://test.local.com/Home/Index?url=" + i);
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
private static async Task SendRequest(string url) {
try {
var myWebRequest = WebRequest.Create(url);
myWebRequest.Method = "HEAD";
var foo = await myWebRequest.GetResponseAsync();
//var foo = myWebRequest.GetResponseAsync();
//var foo = myWebRequest.GetResponse();
foo.Dispose();
}
catch { }
}
}
I hate answering my own question but I want to share what I ended up doing in case anyone else runs into the same problem. Basically, I encapsulated the code that calls the image service into it's own tiny executable and then I use Process.Start()
to run the executable. I definitely expected to see an increase in performance but I was surprised by just how much of a boost I saw. The boost was approximately a factor of 20 and the CPU usage on the machine only went up to 20-40% depending on how many concurrent batches I ran and how big the batches were.
In the code below, please keep in mind that I have removed try{}...catch{}
blocks to keep the code compact.
Separate executable (name of project is ImageCachePrimer
)
class Program {
static void Main(string[] args) {
var tasks = new List<Task>(args.Length);
foreach (var url in args) {
tasks.Add(Task.Run(async () => await SendRequest(url)));
}
Task.WaitAll(tasks.ToArray());
}
private static async Task SendRequest(string url) {
var myWebRequest = WebRequest.Create(url);
myWebRequest.Method = "HEAD";
var foo = await myWebRequest.GetResponseAsync();
foo.Dispose();
}
}
Method to call the executable.
private static Process CreateProcess(IEnumerable<string> urls)
{
var args = urls.Aggregate("", (current, url) => current + url + " ");
var start = new ProcessStartInfo();
start.Arguments = args;
start.FileName = "ImageCachePrimer.exe";
start.WindowStyle = ProcessWindowStyle.Hidden;
start.CreateNoWindow = false;
start.UseShellExecute = true;
return Process.Start(start);
}
Method which calls the above method
private static void PrimeImageCache(IReadOnlyCollection<string> urls) {
var distinctUrls = urls.Distinct().ToList();
const int concurrentBatches = 20;
const int batchSize = 15;
var processes = new List<Process>(concurrentBatches);
foreach (var batch in distinctUrls.FormIntoBatches(batchSize)) {
processes.Add(CreateProcess(batch));
while (processes.Count >= concurrentBatches) {
Thread.Sleep(500);
for (var i = 0; i < processes.Count; i++) {
var process = processes[i];
if (process.HasExited) {
processes.Remove(process);
}
}
}
}
while (processes.Count > 0) {
Thread.Sleep(500);
for (var i = 0; i < processes.Count; i++) {
var process = processes[i];
if (process.HasExited) {
processes.Remove(process);
}
}
}
}
The separate executable and method that calls it are pretty straightforward. I would like to explain some nuances in the final method. First, I initially tried using foreach(var process in processes){process.WaitForExit();}
but that made it so that every process in the batch had to finish before I could launch a new one. It also caused the CPU to spike to 100% (I guess internally it does a near-empty loop to see if the process is finished). So, I "rolled my own" as seen in the first while
loop.
Second, I had to add the final while
loop to make sure that the processes that were still running after I queued up the final batch in the previous foreach()
had a chance to finish.
Hope this helps someone else.