CUDA nvcc compiler from Windows PowerShell

I'm trying to compile a simple example from GitHub/cuda_samples on Windows PowerShell¹:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Fri_Jun_14_16:44:19_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.20
Build cuda_12.6.r12.6/compiler.34431801_0

When I try to compile the example, I get the following error:

$ git clone https://github.com/NVIDIA/cuda-samples.git
$ cd .\cuda-samples\Samples\0_Introduction\simpleAssert\
$ nvcc -I..\..\..\Common\ .\simpleAssert.cu
simpleAssert.cu
nvcc error   : 'cudafe++' died with status 0xC0000005 (ACCESS_VIOLATION)

This error is reported in Nvidia developers' forum, but left unresolved there.

¹ I have an Nvidia GeForce RTX 3050 ( 4GB ) Laptop GPU

Solution

I'm adding an answer based on Stefan's for more details. I took Stefan's advice and looked at the command line created in a new CUDA VS project. Here it is in its entirety (very long !)

"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\bin\nvcc.exe" -gencode=arch=compute_52,code=\"sm_52,compute_52\" --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\bin\HostX64\x64" -x cu   -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.6\include"  -G   --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static  -g  -DWIN32 -DWIN64 -D_DEBUG -D_CONSOLE -D_MBCS -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd " -Xcompiler "/Fdx64\Debug\vc143.pdb" -o C:\Users\tuna_\source\repos\CudaRuntime2\x64\Debug\kernel.cu.obj "C:\Users\tuna_\source\repos\CudaRuntime2\kernel.cu

I played around removing flags that seemed irrelevant until I came up with the ccbin flag¹

$ nvcc -ccbin C:\'Program Files'\'Microsoft Visual Studio'\2022\Community\VC\Tools\MSVC\14.38.33130\bin\HostX64\x64 -I..\..\..\Common\ .\simpleAssert.cu
simpleAssert.cu
tmpxft_0000a6b8_00000000-10_simpleAssert.cudafe1.cpp
   Creating library a.lib and object a.exp

Compilation is fine ( a.exe generated ) and running it seems fine² too:

$ .\a.exe
simpleAssert starting...
GPU Device 0: "Ampere" with compute capability 8.6

Launch kernel to generate assertion failures

-- Begin assert output

C:\Users\tuna_\GitHub\cuda-samples\Samples\0_Introduction\simpleAssert\simpleAssert.cu:63: block: [1,0,0], thread: [28,0,0] Assertion `gtid < N` failed.
C:\Users\tuna_\GitHub\cuda-samples\Samples\0_Introduction\simpleAssert\simpleAssert.cu:63: block: [1,0,0], thread: [29,0,0] Assertion `gtid < N` failed.
C:\Users\tuna_\GitHub\cuda-samples\Samples\0_Introduction\simpleAssert\simpleAssert.cu:63: block: [1,0,0], thread: [30,0,0] Assertion `gtid < N` failed.
C:\Users\tuna_\GitHub\cuda-samples\Samples\0_Introduction\simpleAssert\simpleAssert.cu:63: block: [1,0,0], thread: [31,0,0] Assertion `gtid < N` failed.

-- End assert output

Device assert failed as expected, CUDA error message is: device-side assert triggered

simpleAssert completed, returned OK

¹ which resonates well with Stefan's answer about x64 vs. x86 stuff

² the only thing that bothers me a bit is that the power shell is stuck after each execution