Search code examples
cudanvcc

Register usage count of kernel different with and without -lineinfo flag


I have a simple matrix multiplication kernel running on CUDA.

When compiling using -lineinfo command along with --ptxas-options -v the register count is displayed as 28, whereas without the -lineinfo option, the register count is 20.

Exact commands used:

nvcc -g -G --ptxas-options -v -arch=sm_86 -o mmul_dbg mmul.cu

and

nvcc -lineinfo --ptxas-options -v -arch=sm_86 -o mmul_ncu mmul.cu

I also checked with

nvcc --ptxas-options -v -arch=sm_86 -o mmul_dbg mmul.cu

and it yields 20 registers.

__global__ void matrixMul(const int *a, const int *b, int *c, int N) {
  // Compute each thread's global row and column index
  int row = blockIdx.y * blockDim.y + threadIdx.y;
  int col = blockIdx.x * blockDim.x + threadIdx.x;

  // Iterate over row, and down column
  c[row * N + col] = 0;
  for (int k = 0; k < N; k++) {
    // Accumulate results for a single element
    c[row * N + col] += a[row * N + k] * b[k * N + col];
  }
}

What could be the reason for the increased register count?

Edit: nvcc is 12.3
Edit (2): removed image and added textual output

$ nvcc --ptxas-options -v -lineinfo -o wlineinfo -arch=sm_86 m mul.cu  
ptxas info    : 0 bytes gmem 
ptxas info    : Compiling entry function '_Z9matrixMulPKiS0_Pii' for 'sm_86' 
ptxas info    : Function properties for _Z9matrixMulPKiS0_Pii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads   
ptxas info    : Used 28 registers, 380 bytes cmem[0]

$ nvcc --ptxas-options -v -g -G -o wlineinfo -arch=sm_86 mmul.cu 
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '_Z9matrixMulPKiS0_Pii' for 'sm_86'
ptxas info    : Function properties for _Z9matrixMulPKiS0_Pii
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 20 registers, 380 bytes cmem[0]

Solution

  • The reason for the difference is the use of the -G switch. This selects compilation in debug mode. Once we acknowledge these statements, then we can say that the observation has nothing to do with the use of -lineinfo.

    In debug mode, many/most optimizations are disabled. One optimization the compiler may use but is disabled is loop unrolling.

    In the non--G case, the compiler implements loop unrolling. The overall number of instructions in the kernel is substantially higher, and a possible side effect of loop unrolling for performance is increased register pressure.

    So due to loop unrolling in the non--G case, the compiler has chosen a different register footprint to carry data. The character limits in the answer prevent me from providing full output for both cases, but you can get it yourself with the cuobjdump tool. Here is a portion (first part) of the output from the unrolled/non--G case:

    # cuobjdump -sass wlineinfo
    
    Fatbin elf code:
    ================
    arch = sm_86
    code version = [1,7]
    host = linux
    compile_size = 64bit
    identifier = t128.cu
    
            code for sm_86
                    Function : _Z9matrixMulPKiS0_Pii
            .headerflags    @"EF_CUDA_TEXMODE_UNIFIED EF_CUDA_64BIT_ADDRESS EF_CUDA_SM86 EF_CUDA_VIRTUAL_SM(EF_CUDA_SM86)"
            /*0000*/                   MOV R1, c[0x0][0x28] ;                       /* 0x00000a0000017a02 */
                                                                                    /* 0x000fc40000000f00 */
            /*0010*/                   S2R R9, SR_CTAID.Y ;                         /* 0x0000000000097919 */
                                                                                    /* 0x000e220000002600 */
            /*0020*/                   MOV R7, 0x4 ;                                /* 0x0000000400077802 */
                                                                                    /* 0x000fe20000000f00 */
            /*0030*/                   ULDC.64 UR4, c[0x0][0x118] ;                 /* 0x0000460000047ab9 */
                                                                                    /* 0x000fe40000000a00 */
            /*0040*/                   S2R R0, SR_TID.Y ;                           /* 0x0000000000007919 */
                                                                                    /* 0x000e280000002200 */
            /*0050*/                   S2R R8, SR_CTAID.X ;                         /* 0x0000000000087919 */
                                                                                    /* 0x000e680000002500 */
            /*0060*/                   S2R R3, SR_TID.X ;                           /* 0x0000000000037919 */
                                                                                    /* 0x000e620000002100 */
            /*0070*/                   IMAD R9, R9, c[0x0][0x4], R0 ;               /* 0x0000010009097a24 */
                                                                                    /* 0x001fe200078e0200 */
            /*0080*/                   MOV R0, c[0x0][0x178] ;                      /* 0x00005e0000007a02 */
                                                                                    /* 0x000fc60000000f00 */
            /*0090*/                   IMAD R9, R9, c[0x0][0x178], RZ ;             /* 0x00005e0009097a24 */
                                                                                    /* 0x000fe200078e02ff */
            /*00a0*/                   ISETP.GE.AND P0, PT, R0, 0x1, PT ;           /* 0x000000010000780c */
                                                                                    /* 0x000fe20003f06270 */
            /*00b0*/                   IMAD R8, R8, c[0x0][0x0], R3 ;               /* 0x0000000008087a24 */
                                                                                    /* 0x002fca00078e0203 */
            /*00c0*/                   IADD3 R2, R8, R9, RZ ;                       /* 0x0000000908027210 */
                                                                                    /* 0x000fca0007ffe0ff */
            /*00d0*/                   IMAD.WIDE R2, R2, R7, c[0x0][0x170] ;        /* 0x00005c0002027625 */
                                                                                    /* 0x000fca00078e0207 */
            /*00e0*/                   STG.E [R2.64], RZ ;                          /* 0x000000ff02007986 */
                                                                                    /* 0x0001e2000c101904 */
            /*00f0*/              @!P0 EXIT ;                                       /* 0x000000000000894d */
                                                                                    /* 0x000fea0003800000 */
            /*0100*/                   IADD3 R4, R0, -0x1, RZ ;                     /* 0xffffffff00047810 */
                                                                                    /* 0x000fe40007ffe0ff */
            /*0110*/                   MOV R15, RZ ;                                /* 0x000000ff000f7202 */
                                                                                    /* 0x000fe40000000f00 */
            /*0120*/                   ISETP.GE.U32.AND P0, PT, R4, 0x3, PT ;       /* 0x000000030400780c */
                                                                                    /* 0x000fe40003f06070 */
            /*0130*/                   LOP3.LUT R6, R0, 0x3, RZ, 0xc0, !PT ;        /* 0x0000000300067812 */
                                                                                    /* 0x000fe400078ec0ff */
            /*0140*/                   MOV R11, RZ ;                                /* 0x000000ff000b7202 */
                                                                                    /* 0x000fd20000000f00 */
            /*0150*/              @!P0 BRA 0xc80 ;                                  /* 0x00000b2000008947 */
                                                                                    /* 0x000fea0003800000 */
            /*0160*/                   IADD3 R10, -R6, c[0x0][0x178], RZ ;          /* 0x00005e00060a7a10 */
                                                                                    /* 0x000fe20007ffe1ff */
            /*0170*/                   IMAD.WIDE R4, R9, R7.reuse, c[0x0][0x160] ;  /* 0x0000580009047625 */
                                                                                    /* 0x080fe200078e0207 */
            /*0180*/                   MOV R15, RZ ;                                /* 0x000000ff000f7202 */
                                                                                    /* 0x000fe40000000f00 */
            /*0190*/                   ISETP.GT.AND P0, PT, R10, RZ, PT ;           /* 0x000000ff0a00720c */
                                                                                    /* 0x000fe20003f04270 */
            /*01a0*/                   IMAD.WIDE R12, R8, R7, c[0x0][0x168] ;       /* 0x00005a00080c7625 */
                                                                                    /* 0x000fe200078e0207 */
            /*01b0*/                   IADD3 R4, P1, R4, 0x8, RZ ;                  /* 0x0000000804047810 */
                                                                                    /* 0x000fe40007f3e0ff */
            /*01c0*/                   MOV R11, RZ ;                                /* 0x000000ff000b7202 */
                                                                                    /* 0x000fe40000000f00 */
            /*01d0*/                   IADD3.X R5, RZ, R5, RZ, P1, !PT ;            /* 0x00000005ff057210 */
                                                                                    /* 0x000fce0000ffe4ff */
            /*01e0*/              @!P0 BRA 0xad0 ;                                  /* 0x000008e000008947 */
                                                                                    /* 0x000fea0003800000 */
            /*01f0*/                   ISETP.GT.AND P1, PT, R10, 0xc, PT ;          /* 0x0000000c0a00780c */
                                                                                    /* 0x000fe40003f24270 */
            /*0200*/                   PLOP3.LUT P0, PT, PT, PT, PT, 0x80, 0x0 ;    /* 0x000000000000781c */
                                                                                    /* 0x000fd60003f0f070 */
            /*0210*/              @!P1 BRA 0x7a0 ;                                  /* 0x0000058000009947 */
                                                                                    /* 0x000fea0003800000 */
            /*0220*/                   PLOP3.LUT P0, PT, PT, PT, PT, 0x8, 0x0 ;     /* 0x000000000000781c */
                                                                                    /* 0x000fc40003f0e170 */
            /*0230*/                   LDG.E R14, [R12.64] ;                        /* 0x000000040c0e7981 */
                                                                                    /* 0x000ea8000c1e1900 */
            /*0240*/                   LDG.E R16, [R4.64+-0x8] ;                    /* 0xfffff80404107981 */
                                                                                    /* 0x000ea4000c1e1900 */
            /*0250*/                   IMAD R19, R14, R16, R15 ;                    /* 0x000000100e137224 */
                                                                                    /* 0x004fe400078e020f */
            /*0260*/                   IMAD.WIDE R14, R0, 0x4, R12 ;                /* 0x00000004000e7825 */
                                                                                    /* 0x008fc600078e020c */
            /*0270*/                   STG.E [R2.64], R19 ;                         /* 0x0000001302007986 */
                                                                                    /* 0x0003e8000c101904 */
            /*0280*/                   LDG.E R16, [R14.64] ;                        /* 0x000000040e107981 */
                                                                                    /* 0x000ea8000c1e1900 */
            /*0290*/                   LDG.E R17, [R4.64+-0x4] ;                    /* 0xfffffc0404117981 */
                                                                                    /* 0x000ea4000c1e1900 */
            /*02a0*/                   IMAD R21, R16, R17, R19 ;                    /* 0x0000001110157224 */
                                                                                    /* 0x004fc400078e0213 */
            /*02b0*/                   IMAD.WIDE R16, R0, 0x4, R14 ;                /* 0x0000000400107825 */
                                                                                    /* 0x000fc600078e020e */
            /*02c0*/                   STG.E [R2.64], R21 ;                         /* 0x0000001502007986 */
                                                                                    /* 0x0005e8000c101904 */
            /*02d0*/                   LDG.E R18, [R16.64] ;                        /* 0x0000000410127981 */
                                                                                    /* 0x000ee8000c1e1900 */
            /*02e0*/                   LDG.E R12, [R4.64] ;                         /* 0x00000004040c7981 */
                                                                                    /* 0x000ee4000c1e1900 */
            /*02f0*/                   IMAD R23, R18, R12, R21 ;                    /* 0x0000000c12177224 */
                                                                                    /* 0x008fc400078e0215 */
            /*0300*/                   IMAD.WIDE R12, R0, 0x4, R16 ;                /* 0x00000004000c7825 */
                                                                                    /* 0x000fc600078e0210 */
            /*0310*/                   STG.E [R2.64], R23 ;                         /* 0x0000001702007986 */
                                                                                    /* 0x0007e8000c101904 */
            /*0320*/                   LDG.E R18, [R12.64] ;                        /* 0x000000040c127981 */
                                                                                    /* 0x000e68000c1e1900 */
            /*0330*/                   LDG.E R14, [R4.64+0x4] ;                     /* 0x00000404040e7981 */
                                                                                    /* 0x000e64000c1e1900 */
            /*0340*/                   IMAD R19, R18, R14, R23 ;                    /* 0x0000000e12137224 */
                                                                                    /* 0x002fc400078e0217 */
            /*0350*/                   IMAD.WIDE R14, R0, 0x4, R12 ;                /* 0x00000004000e7825 */
                                                                                    /* 0x000fc600078e020c */
            /*0360*/                   STG.E [R2.64], R19 ;                         /* 0x0000001302007986 */
                                                                                    /* 0x0003e8000c101904 */
            /*0370*/                   LDG.E R18, [R14.64] ;                        /* 0x000000040e127981 */
                                                                                    /* 0x000ea8000c1e1900 */
            /*0380*/                   LDG.E R16, [R4.64+0x8] ;                     /* 0x0000080404107981 */
                                                                                    /* 0x000ea4000c1e1900 */
            /*0390*/                   IMAD R21, R18, R16, R19 ;                    /* 0x0000001012157224 */
                                                                                    /* 0x004fc400078e0213 */
            /*03a0*/                   IMAD.WIDE R16, R0, 0x4, R14 ;                /* 0x0000000400107825 */
                                                                                    /* 0x000fc600078e020e */
            /*03b0*/                   STG.E [R2.64], R21 ;                         /* 0x0000001502007986 */
                                                                                    /* 0x0005e8000c101904 */
            /*03c0*/                   LDG.E R18, [R16.64] ;                        /* 0x0000000410127981 */
                                                                                    /* 0x000ee8000c1e1900 */
            /*03d0*/                   LDG.E R12, [R4.64+0xc] ;                     /* 0x00000c04040c7981 */
                                                                                    /* 0x000ee4000c1e1900 */
            /*03e0*/                   IMAD R23, R18, R12, R21 ;                    /* 0x0000000c12177224 */
                                                                                    /* 0x008fc400078e0215 */
            /*03f0*/                   IMAD.WIDE R12, R0, 0x4, R16 ;                /* 0x00000004000c7825 */
                                                                                    /* 0x000fc600078e0210 */
            /*0400*/                   STG.E [R2.64], R23 ;                         /* 0x0000001702007986 */
                                                                                    /* 0x0007e8000c101904 */
            /*0410*/                   LDG.E R18, [R12.64] ;                        /* 0x000000040c127981 */
                                                                                    /* 0x000e68000c1e1900 */
            /*0420*/                   LDG.E R14, [R4.64+0x10] ;                    /* 0x00001004040e7981 */
                                                                                    /* 0x000e64000c1e1900 */
            /*0430*/                   IMAD R19, R18, R14, R23 ;                    /* 0x0000000e12137224 */
                                                                                    /* 0x002fc400078e0217 */
            /*0440*/                   IMAD.WIDE R14, R0, 0x4, R12 ;                /* 0x00000004000e7825 */
                                                                                    /* 0x000fc600078e020c */
            /*0450*/                   STG.E [R2.64], R19 ;                         /* 0x0000001302007986 */
                                                                                    /* 0x0003e8000c101904 */
            /*0460*/                   LDG.E R18, [R14.64] ;                        /* 0x000000040e127981 */
                                                                                    /* 0x000ea8000c1e1900 */
            /*0470*/                   LDG.E R16, [R4.64+0x14] ;                    /* 0x0000140404107981 */
                                                                                    /* 0x000ea4000c1e1900 */
            /*0480*/                   IMAD R21, R18, R16, R19 ;                    /* 0x0000001012157224 */
                                                                                    /* 0x004fc400078e0213 */
            /*0490*/                   IMAD.WIDE R16, R0, 0x4, R14 ;                /* 0x0000000400107825 */
                                                                                    /* 0x000fc600078e020e */
            /*04a0*/                   STG.E [R2.64], R21 ;                         /* 0x0000001502007986 */
                                                                                    /* 0x0005e8000c101904 */
            /*04b0*/                   LDG.E R18, [R16.64] ;                        /* 0x0000000410127981 */
                                                                                    /* 0x000ee8000c1e1900 */
            /*04c0*/                   LDG.E R12, [R4.64+0x18] ;                    /* 0x00001804040c7981 */
                                                                                    /* 0x000ee4000c1e1900 */
            /*04d0*/                   IMAD R23, R18, R12, R21 ;                    /* 0x0000000c12177224 */
                                                                                    /* 0x008fc400078e0215 */
            /*04e0*/                   IMAD.WIDE R12, R0, 0x4, R16 ;                /* 0x00000004000c7825 */
                                                                                    /* 0x000fc600078e0210 */
            /*04f0*/                   STG.E [R2.64], R23 ;                         /* 0x0000001702007986 */
                                                                                    /* 0x0007e8000c101904 */
            /*0500*/                   LDG.E R18, [R12.64] ;                        /* 0x000000040c127981 */
                                                                                    /* 0x000e68000c1e1900 */
            /*0510*/                   LDG.E R14, [R4.64+0x1c] ;                    /* 0x00001c04040e7981 */
                                                                                    /* 0x000e64000c1e1900 */
            /*0520*/                   IMAD R19, R18, R14, R23 ;                    /* 0x0000000e12137224 */
                                                                                    /* 0x002fc400078e0217 */
            /*0530*/                   IMAD.WIDE R14, R0, 0x4, R12 ;                /* 0x00000004000e7825 */
                                                                                    /* 0x000fc600078e020c */
            /*0540*/                   STG.E [R2.64], R19 ;                         /* 0x0000001302007986 */
                                                                                    /* 0x000fe8000c101904 */
            /*0550*/                   LDG.E R18, [R14.64] ;                        /* 0x000000040e127981 */
                                                                                    /* 0x000ea8000c1e1900 */
            /*0560*/                   LDG.E R16, [R4.64+0x20] ;                    /* 0x0000200404107981 */
                                                                                    /* 0x000ea4000c1e1900 */
            /*0570*/                   IMAD R21, R18, R16, R19 ;                    /* 0x0000001012157224 */
                                                                                    /* 0x004fc400078e0213 */
            /*0580*/                   IMAD.WIDE R16, R0, 0x4, R14 ;                /* 0x0000000400107825 */
                                                                                    /* 0x000fc600078e020e */
            /*0590*/                   STG.E [R2.64], R21 ;                         /* 0x0000001502007986 */
                                                                                    /* 0x0003e8000c101904 */
            /*05a0*/                   LDG.E R18, [R16.64] ;                        /* 0x0000000410127981 */
                                                                                    /* 0x000ee8000c1e1900 */
            /*05b0*/                   LDG.E R12, [R4.64+0x24] ;                    /* 0x00002404040c7981 */
                                                                                    /* 0x000ee4000c1e1900 */
            /*05c0*/                   IMAD R23, R18, R12, R21 ;                    /* 0x0000000c12177224 */
                                                                                    /* 0x008fc400078e0215 */
            /*05d0*/                   IMAD.WIDE R12, R0, 0x4, R16 ;                /* 0x00000004000c7825 */
                                                                                    /* 0x000fc600078e0210 */
            /*05e0*/                   STG.E [R2.64], R23 ;                         /* 0x0000001702007986 */
                                                                                    /* 0x0005e8000c101904 */
            /*05f0*/                   LDG.E R18, [R12.64] ;                        /* 0x000000040c127981 */
                                                                                    /* 0x000ee8000c1e1900 */
            /*0600*/                   LDG.E R14, [R4.64+0x28] ;                    /* 0x00002804040e7981 */
                                                                                    /* 0x000ee4000c1e1900 */
            /*0610*/                   IMAD R25, R18, R14, R23 ;                    /* 0x0000000e12197224 */
                                                                                    /* 0x008fc400078e0217 */
            /*0620*/                   IMAD.WIDE R14, R0, 0x4, R12 ;                /* 0x00000004000e7825 */
                                                                                    /* 0x000fc600078e020c */
            /*0630*/                   STG.E [R2.64], R25 ;                         /* 0x0000001902007986 */
                                                                                    /* 0x0007e8000c101904 */
            /*0640*/                   LDG.E R18, [R14.64] ;                        /* 0x000000040e127981 */
                                                                                    /* 0x000e68000c1e1900 */
            /*0650*/                   LDG.E R16, [R4.64+0x2c] ;                    /* 0x00002c0404107981 */
                                                                                    /* 0x000e64000c1e1900 */
            /*0660*/                   IMAD R21, R18, R16, R25 ;                    /* 0x0000001012157224 */
                                                                                    /* 0x002fc400078e0219 */
            /*0670*/                   IMAD.WIDE R16, R0, 0x4, R14 ;                /* 0x0000000400107825 */
                                                                                    /* 0x000fc600078e020e */
            /*0680*/                   STG.E [R2.64], R21 ;                         /* 0x0000001502007986 */
                                                                                    /* 0x0007e8000c101904 */
            /*0690*/                   LDG.E R18, [R16.64] ;                        /* 0x0000000410127981 */
                                                                                    /* 0x000ea8000c1e1900 */
            /*06a0*/                   LDG.E R12, [R4.64+0x30] ;                    /* 0x00003004040c7981 */
                                                                                    /* 0x000ea2000c1e1900 */
            /*06b0*/                   IADD3 R10, R10, -0x10, RZ ;                  /* 0xfffffff00a0a7810 */
                                                                                    /* 0x000fe20007ffe0ff */
            /*06c0*/                   IMAD R23, R18, R12, R21 ;                    /* 0x0000000c12177224 */
                                                                                    /* 0x004fc400078e0215 */
            /*06d0*/                   IMAD.WIDE R18, R0, 0x4, R16 ;                /* 0x0000000400127825 */
                                                                                    /* 0x000fc600078e0210 */
            /*06e0*/                   STG.E [R2.64], R23 ;                         /* 0x0000001702007986 */
                                                                                    /* 0x0007e8000c101904 */
            /*06f0*/                   LDG.E R12, [R18.64] ;                        /* 0x00000004120c7981 */
                                                                                    /* 0x000ea8000c1e1900 */
            /*0700*/                   LDG.E R15, [R4.64+0x34] ;                    /* 0x00003404040f7981 */
                                                                                    /* 0x0002a2000c1e1900 */
            /*0710*/                   ISETP.GT.AND P1, PT, R10, 0xc, PT ;          /* 0x0000000c0a00780c */
                                                                                    /* 0x000fe40003f24270 */
            /*0720*/                   IADD3 R14, P2, R4, 0x40, RZ ;                /* 0x00000040040e7810 */
                                                                                    /* 0x000fc40007f5e0ff */
            /*0730*/                   IADD3 R11, R11, 0x10, RZ ;                   /* 0x000000100b0b7810 */
                                                                                    /* 0x000fe40007ffe0ff */
            /*0740*/                   IADD3.X R5, RZ, R5, RZ, P2, !PT ;            /* 0x00000005ff057210 */
                                                                                    /* 0x002fe400017fe4ff */
            /*0750*/                   MOV R4, R14 ;                                /* 0x0000000e00047202 */
                                                                                    /* 0x000fe20000000f00 */
            /*0760*/                   IMAD R15, R12, R15, R23 ;                    /* 0x0000000f0c0f7224 */
                                                                                    /* 0x004fe400078e0217 */
            /*0770*/                   IMAD.WIDE R12, R0, 0x4, R18 ;                /* 0x00000004000c7825 */
                                                                                    /* 0x000fc600078e0212 */
            /*0780*/                   STG.E [R2.64], R15 ;                         /* 0x0000000f02007986 */
                                                                                    /* 0x0007e2000c101904 */
            /*0790*/               @P1 BRA 0x230 ;                                  /* 0xfffffa9000001947 */
                      
    

    At the tail end of the above listing, you will find a sequence of instructions that repeats, roughly like this:

    LDG  // load A element
    LDG  // load B element
    IMAD // 64-bit
    IMAD // integer multiply of A and B
    STG  // store C element
    

    That repeating sequence represents the unrolled loop body. If you use the cuobjdump tool to study the -G code, you will find: 1. a fewer number of instructions overall, 2. no repeating sequence as indicated above.

    I acknowledge this answer does not provide a detailed, precise description of the reason for the increased register use in the optimized case. That would require more careful analysis and counting, as well as probably some conjecture about compiler behavior.

    Loop unrolling by itself does not necessarily/automatically imply increased register usage, but the two are often related.