Compute bit parity in CUDA

CUDA has popcount intrinsics for 32-bit and 64-bit types: __popc() and __popcll().

Does CUDA also have intrinsics to get the parity of 32-bit and 64-bit types? (The parity refers to whether an integer has an even or odd amount of 1-bits.)

For example, GCC has __builtin_parityl() for 64-bit integers.

And here's a C function that does the same thing:

inline uint parity64(uint64 n){
  n ^= n >> 1;
  n ^= n >> 2;
  n  = (n & 0x1111111111111111lu) * 0x1111111111111111lu;
  return (n >> 60) & 1;
}

Solution

I'm not aware of a parity intrinsic for CUDA.

However you should be able to create a fairly simple function to do it using either the __popc() (32-bit unsigned case) or __popcll() (64-bit unsigned case) intrinsics.

For example, the following function should indicate whether the number of 1 bits in a 64-bit unsigned quantity is odd (true) or even (false):

__device__ bool my_parity(unsigned long long d){
  return (__popcll(d) & 1);}

Why does CUDA.rules have two identical command lines
Why (x / y)[i] faster than x[i] / y[i]?
Template excessive recursion at instantiation cuda
Is it "worth it" to reuse events in CUDA?
Clarifying memory transactions in CUDA
Cuda C++ Accessing struct from device global memory in kernel causes illegal memory access
nvcc version remains the same after updating the CUDA Toolkit version
How does one compile a CUDA Toolkit 4.0 RC2 program under VS2010 or VS2008?
How to compile CUDA C files and Nvidia OptiX files inside the same Visual Studio project
Why is CUDA function cudaLaunchKernel passed a function pointer to host-code function?
Resetting GPU and driver after CUDA error
How is stack frame managed within a thread in Cuda?
How can I convert an integer to CUDA's __half FP16 type, in a constexpr fashion?
Dump/inspect NVIDIA GPU global memory contents corresponding to arbitrary (but not invalid) addresses
Poetry PyTorch dependency exclude cuda as I want to use the system cuda
Cuda nvJitLink error because fatbin does not contains the correct function
Vectorized Memory Stores Reduce Load Instructions
How to properly free a Cuda context?
How can I flush GPU memory using CUDA (physical reset is unavailable)
Can CUDA unified memory be written to by another CPU thread?
CUDA image upsampling with FFT method
how does cv::cuda::GpuMat turn into cv::cuda::PtrStepSz when passed to a kernel?
Cuda with multiple GPUs: host and device report different device numbers
Weird mistake while compilation
CUDA compatibility with Visual Studio 2022 version 17.10
Texture objects for doubles on a cuArray
Is it possible to overcome the maximum number of iterators in thrust::zip_iterator?
Incompatible Qt libraries and the CUDA toolkit
How to check my tensor core occupancy and utilization by Nsight Compute?
CUDA version X complains about not supporting gcc version Y - what to do?