Sometimes code can utilize device drivers up to the point where the system is unresponsive.
Lately I've optimized a WIN32/VC++ code which made the system almost unresponsive. The CPU usage, however, was very low. The reason was 1000's of creations and destruction of GDI objects (pens, brushes, etc.). Once I refactored the code to create all objects only once - the system became responsive again.
This leads me to the question: Is there a way to measure CPU/IO usage of device drivers (GPU/disk/etc) for a given program / function / line of code?
You can use various tools from SysInternals Utilities (now a Microsoft product, see http://technet.microsoft.com/en-us/sysinternals/bb545027) to give a basic idea before jumping in. In your case process explorer (procexp) and process monitor (procmon) performs a decent job. They can be used to get you a basic idea about what type of slowness it is before doing profiling drill down.
Then you can use xperf http://msdn.microsoft.com/en-us/performance/default to drill down. With correct setup, this tool can bring you to the exact function that causes slowness without injecting profiling code into your existing program. There's already a PDC video talking about how to use it http://www.microsoftpdc.com/2009/CL16 and I highly recommend this tool. Per my own experience, it's always better to observe using procexp/procmon first, then targeting your suspects with xperf, because xperf can generate overwhelming load of information if not filtered in a smart way.
In certain hard cases that involving locking contentions, Debugging Tools for Windows (windbg) will be very handy, and there are dedicated books talking about its usage. These books typically talk about hang detection and there are quite a few techniques here can be used to detect slowness, too. (e.g. !runaway)