Search code examples
What does %f, %rd mean in ptx assembly...


assemblyopenclcpu-registersptx

Read More
How to compile cuda code with calling one function twice inside one method?...


cudaptx

Read More
Why is addition without overflow set CC.CF to 1?...


cudaptx

Read More
What does thread-count mean for bar.arrive PTX barrier synchronization instruction?...


cudaptx

Read More
PTX - what is a CTA?...


cudanvidiagpuptx

Read More
Why does PTX shows 32 bit load operation for a 128 bit struct assignment?...


cudagpuptx

Read More
In asm volatile inline PTX instructions, why also specify "memory" side effecs?...


cudalanguage-lawyerinline-assemblyredundancyptx

Read More
Why is this NVIDIA CUDA PTX not working as intended?...


c++cudaptx

Read More
Differences between NVCC and NVRTC on compilation to PTX...


c++cudaptxnvrtc

Read More
LLVM IR of OpenCL kernel to PTX to binary...


clangopenclllvmptx

Read More
How to pass compiler flags to nvcc from clang...


c++cudacross-compilingclang++ptx

Read More
Understanding cuobjdump output...


linuxcudagpunvccptx

Read More
What is the correct way to support `__shfl()` and `__shfl_sync()` instructions?...


cudaptxptxas

Read More
What can I use instead LOP3 instructions for working with uint64_t data types and do 3 operand logic...


cudanvidiabitwise-operatorslogical-operatorsptx

Read More
Raise x to power of y in ptx nvidia cuda (assembly)...


cudanvidiaptx

Read More
Some intrinsics named with `_sync()` appended in CUDA 9; semantics same?...


cudaptxgpu-warp

Read More
How do I check for overflow of integer arithmetic in CUDA?...


cudainteger-overflowptx

Read More
compile constant memory array to immediate value in CUDA...


cudagpuptx

Read More
Optimizing register usage in dot product...


cudaptx

Read More
Linking a kernel to a PTX function...


c++cudaptx

Read More
CUDA device properties and compute capability when compiling...


cudanvccptxcompute-capability

Read More
Funnel shift - what is it?...


cudaintrinsicsptx

Read More
Can I prefetch specific data to a specific cache level in a CUDA kernel?...


cachingcudagpgpuprefetchptx

Read More
Can my kernel code tell how much shared memory it has available?...


cudagpgpuptxgpu-shared-memory

Read More
Is inline PTX more efficient than C/C++ code?...


optimizationcudaptx

Read More
Should I look into PTX to optimize my kernel? If so, how?...


performancecudagpgpuptxloop-unrolling

Read More
How to explain inline PTX Internal Compiler Error of CUDA...


c++cudaptx

Read More
How to understand the result of SASS analysis in CUDA/GPU...


assemblycudagpuptx

Read More
Loading a PTX programatically returns error 209 when run against device with CUDA capability 5.0...


cudagpuptx

Read More
CUDA disable L1 cache only for one variable...


cachingassemblycudacpu-cacheptx

Read More
BackNext