Search code examples
visual-studiocudanvidiansight

how to make a meaning of memory statistics section of Nsight profiling?


i'm using Geforce 820m with

  • GPU Clock rate: 1124 MHz (1.12 GHz)
  • memory Clock rate: 900 Mhz
  • Memory Bus Width: 64-bit
  • L2 Cache Size: 1048576 bytes i used Nsight performance analysis to profile the memory transaction of my application and get to the memory statistics, which shows something like this memory stats.

how could i know if i am achieving the max memory throughput i can get from this card or not? is there a percentage value like occupancy but for memory throughput? or how can i make use/meaning of these numbers ?


Solution

  • The peak theoretical device memory bandwidth on your GPU is given by

    900MHz * 2 (DDR) * 8bytes/transfer (64-bit width) = 14.4GB/s

    The observed (utilized) memory bandwidth in this case is given by the number on the link between "L2 Cache" and "Device Memory": 856.7MB/s (ie. less than 1GB/s)

    how could i know if i am achieving the max memory throughput i can get from this card or not?

    If you compare these two numbers, you'll get an idea. However the peak theoretical bandwidth calculation above is generally not observable under any circumstances. A better proxy for what is achievable as a maximum by "real" codes is given by the cuda bandwidthTest sample code, specifically referring to the "device-to-device" bandwidth measurement. In any event, this number should still be in the several gigabytes per second range (perhaps 10, for your device) so you still have some headroom.

    is there a percentage value like occupancy but for memory throughput?

    The profilers have metrics such as dram_utilization which may be of interest. You could also aggregate dram_read_throughput and dram_write_throughput to get a more precise number.