I am going to benchmark several implementations of a numerical simulation software on a high-performance computer, mainly with regard to time - but other resources like memory usage, inter-process communication etc. could be interesting as well.
As for now, I have no knowledge of general guidelines how to benchmark software (in this area). Neither do I know, how much measurement noise is reasonably to be expected, nor how much tests one usually carries out. Although these issues are system dependent, of course, I am pretty sure there exists some standards considered reasonable.
Can you provide with such (introductory) information?
If a test doesn't take much time, then I repeat it (e.g. 10,000 times) to make it take several seconds.
I then do that multiple times (e.g. 5 times) to see whether the test results are reproduceable (or whether they're highly variable).
There are limits to this approach (e.g. it's testing with a 'warm' cache), but it's better than nothing: and especially good at comparing similar code, e.g. for seeing whether or not a performance tweak to some existing code did in fact improve performance (i.e. for doing 'before' and 'after' testing).