I am new to Intel VTune. So, I have a general doubt.
I am trying to profile an application with VTune and would like to know the placement of VTune core.
How many cores does VTune take up while profiling an application?
Does it depend on OS?
Collecting data from hardware PMU events just requires a bit of work in interrupt handlers on the cores running the code being profiled. That's intentionally fairly light-weight, like only triggering when a counter wraps around. That's a "sample" if you're running something equivalent to
perf record instead of
perf stat: the CPU has to associate that event with an instruction address, even for events like
cycles where the CPU is busy with hundreds of instructions in-flight.
A profiler will adjust the wrapping threshold to generate events with a useful frequency (so you get some samples even for rarer events, but for common events you're not spending all the CPU time handling interrupts).
IDK if VTune does any real-time visualization of that data while a profile is being collected; if so that would happen in the VTune process itself, whatever core(s) that ends up running on, according to the OS scheduling it.