I'm in the process of implementing different algorithms on CPUs and GPUs. What struck me as odd was that a very primitive example (sequentially - aka 1 thread - creating a histogram of an array with 100*1024*1024 elements) takes 200% - 300% longer on a server CPU (which is admittedly slightly lower clocked and one generation older) than it does on a workstation CPU. Both machines use DDR3 memory, 16GB dual channel on the workstation (FSB:DRAM 1:6) and 512GB quad channel on the server (FSB:DRAM 1:12), both running at 800Mhz DRAM clock rate.
On my workstation the histogram calculation takes <100ms (90ms on average) while on the server it takes 300ms on average, while on sporadic occurrences it takes only around 150ms.
I'm using the same build on both machines (any CPU, prefer 32bit, release build).
On another question, why is it that a pure 64bit build is slower on both machines by at least 25%?
public static void Main(string[] args) {
// the array size. say its 100 * 1024 ^ 2, aka 100 Megapixels
const int Size = 100 * 1024 * 1024;
// define a buffer to hold the random data
var buffer = new byte[Size];
// fill the buffer with random bytes
var rndXorshift = new RndXorshift();
rndXorshift.NextBytes(buffer);
// start a stopwatch to time the histogram creation
var stopWatch = new Stopwatch();
stopWatch.Start();
// declare a variable for the histogram
var histo = new uint[256];
// for every element of the array ...
for (int i = 0; i < Size; i++) {
// increment the histogram at the position
// of the current array value
histo[buffer[i]]++;
}
// get the histogram count. must be equal
// to the total elements of the array
long histoCount = 0;
for (int i = 0; i < 256; i++) {
histoCount += histo[i];
}
// stop the stopwatch
stopWatch.Stop();
var et1 = stopWatch.ElapsedMilliseconds;
// output the results
Console.WriteLine("Histogram Sum: {0}", histoCount);
Console.WriteLine("Elapsed Time1: {0}ms", et1);
Console.ReadLine();
}
Server CPU:
Workstation CPU:
Server CPU clock shows 1177 MHz, while workstation has 3691 MHz clock. That would explain the difference.
It seems your server has either a CPU that slows down if not unders stress, to conserve energy, or the multipliers in BIOS are set to very low values.