I am trying to build an online judge for programming problems (like UVA OJ). When programs are judged, their efficiency (i.e. how fast they can process test inputs) need to be tested. However, in servers, the CPUs are usually very powerful and can run even badly coded programs really fast. Moreover, some programs may get more CPU when the traffic is low, and some programs may get less due to high traffic, which is unfair.
So I was wondering, is there a way to measure program efficiency regardless of which CPU it is running on? Maybe with some CPU cycle calculations or something like that?
Note:
So any Linux or PHP solution would be awesome.
Environment: As I understand your problem, you want to measure the performance of a program written by a test taker, and then you want to be able to compare that performance against either a reference or against other test taker programs. These programs will run on possibly different web servers. The test taker will access the testing program through a browser. The test takers will be distributed over some network (local to a lab? To a campus? To the world?). The test input is from a file. The expected runtime of the programs is <5 secs, with a median of 1 sec.
Metric: CPU time will not help you because it means different things depending upon the hardware. For example, let’s say you are comparing the performance of the same CPU bound program on a Haswell generation Intel Xeon server, vs on a first generation Pentium. The execution of the same program is equally efficient but the one running on the Pentium has a much larger CPU time due to the hardware it is running on. Even if you got down to cycles (see PAPI), you’d have the same issue. The key is that you need to compare the performance of the programs (runtime) against some standard reference.
Solution: This is a possible solution. It is theoretically possible but may not be practical given web technology and limits. You create a standard reference program (std_pgm), and then run it simultaneously with the test taker’s program (tt_pgm). The key is ‘simultaneously’. You want both tt_pgm and your std_pgm to execute in the same environment (processor, OS, load, etc). Then you can compare relative performance in different environments.
Other issues: (a) You need to insure that the programs execute simultaneously with the same background processes. They don’t necessarily have to run on the same core as long as the cores are equally loaded. (b) Try to minimize process setup time and file I/O times compared to the execution time of the programs. (c) Run the programs multiple times, ideally under the same single process. This serves two purposes: it is easier to compare tt_pgm and std_pgm, and it gives you a metric as to whether the execution environment of tt_pgm and std_pgm are the same. (If the performance of the tt_pgm compared to std_pgm varies significantly, it means something is happening in the background to one and not the other.)
I won’t guarantee this will work, but it seems reasonable to me.