profiler for c++ code, very sleepy

I'm a newbie with profiling. I'd like to optimize my code to satisfy timing constraints. I use Visual C++ 08 Express and thus had to download a profiler, for me it's Very Sleepy. I did some search but found no decent tutorial on Sleepy, and here my question: How to use it properly? I grasped the general idea of profiling, so I sorted according to %exclusive to find my bottlenecks. Firstly, on the top of this list I have ZwWaitForSingleObject, RtlEnterCriticalSection, operator new, RtlLeaveCriticalSection, printf, some iterators ... and after they take some like 60% there comes my first function, first position with Child Calls. Can someone explain me why above mentioned come out, what do they mean and how can I optimize my code if I have no access to this critical 60%? (for "source file": unknown...). Also, for my function I'd think I get time for each line, but it's not the case, e.g. arithmetics or some functions have no timing (not nested in unused "if" clauses). AND last thing: how to find out that some line can execute superfast, but is called thousands times, being the actual bottleneck?

Finally, is Sleepy good? Or some free alternative for my platform?

Help very appreciated! cheers!

- - - UPDATE - - - - -

I have found another version of profiler, called plain Sleepy. It shows how many times some snippet was called plus the number of line (I guess it points to the critical one). So in my case.. KiFastSystemCallRet takes 50%! It means that it waits for some data right? How to improve that matter, is there maybe a decent approach to trace what causes these multiple calls and eventually remove/change it?

Solution

I'd like to optimize my code to satisfy timing constraints

You're running smack into a persistent issue in this business. You want to find ways to make your code take less time, and you (and many people) assume (and have been taught) the only way to do that is by taking various sorts of measurements.

There's a minority view, and the only thing it has to recommend it is actual significant results (plus an ironclad theory behind it).

If you've got a "bottleneck" (and you do, probably several), it's taking some fraction of time, like 30%.
Just treat it as a bug to be found.

Randomly halt the program with the pause button, and look carefully to see what the program is doing and why it's doing it. Ask if it's something that could be gotten rid of. Do this 10 times. On average you will see the problem on 3 of the pauses. Any activity you see more than once, if it's not truly necessary, is a speed bug. This does not tell you precisely how much the problem costs, but it does tell you precisely what the problem is, and that it's worth fixing. You'll see things this way that no profiler can find, because profilers are only programs, and cannot be broad-minded about what constitutes an opportunity.

Some folks are risk-averse, thinking it might not give enough speedup to be worth it. Granted, there is a small chance of a low payoff, but it's like investing. The theory says on average it will be worthwhile, and there's also a small chance of a high payoff. In any case, if you're worried about the risks, a few more samples will settle your fears.

After you fix the problem, the remaining bottlenecks each take a larger percent, because they didn't get smaller but the overall program did. So they will be easier to find when you repeat the whole process.

There's lots of literature about profiling, but very little that actually says how much speedup it achieves in practice. Here's a concrete example with almost 3 orders of magnitude speedup.