NVIDIA CUDA : Install2024/02/21

	Install NVIDIA CUDA (Compute Unified Device Architecture).
[1]	Run PowerShell with Admin Privilege and work. Download and Install C++ compiler first.

Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.

PS C:\Users\Administrator> Invoke-WebRequest -Uri https://aka.ms/vs/17/release/vs_BuildTools.exe -OutFile "vs_BuildTools.exe" 

# install on silent mode
PS C:\Users\Administrator> ./vs_buildtools.exe `
--add Microsoft.Component.MSBuild `
--add Microsoft.VisualStudio.Component.CoreBuildTools `
--add Microsoft.VisualStudio.Component.VC.CoreBuildTools `
--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 `
--add Microsoft.VisualStudio.Component.VC.Redist.14.Latest `
--add Microsoft.VisualStudio.Component.VC.CoreIde `
--add Microsoft.VisualStudio.Component.Windows11SDK.22621 `
--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Core `
--add Microsoft.VisualStudio.Workload.MSBuildTools `
--add Microsoft.VisualStudio.Workload.VCTools `
--includeRecommended --quiet --wait 

# installation processes are running
PS C:\Users\Administrator> Get-Process -Name "vs_*", "setup*" 

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName
-------  ------    -----      -----     ------     --  -- -----------
    369      17     3460      15508       0.66   3132   0 vs_BuildTools
    890      64    30168      61180       8.03   4048   0 vs_setup_bootstrapper

# after finishing installation, processes above finish
PS C:\Users\Administrator> Get-Process -Name "vs_*" 


# C++ compiler is here
PS C:\Users\Administrator> Get-ChildItem "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\*\bin\Hostx64\x64\cl.exe" 

    Directory: C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         2/20/2024   5:00 PM         843248 cl.exe

# set Path to environment variables
PS C:\Users\Administrator> $currentPath = [Environment]::GetEnvironmentVariable("Path", "Machine") 
PS C:\Users\Administrator> $currentPath += ";C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" 
PS C:\Users\Administrator> [Environment]::SetEnvironmentVariable("Path", $currentPath, "Machine") 

# reload environment variables
PS C:\Users\Administrator> $env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User") 

PS C:\Users\Administrator> cl.exe 
Microsoft (R) C/C++ Optimizing Compiler Version 19.39.33519 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

[2]	Download and Install CUDA. Make sure the version of CUDA you'd like to install on the official site below. ⇒ https://developer.nvidia.com/cuda-toolkit-archive

PS C:\Users\Administrator> Invoke-WebRequest -Uri "https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_546.12_windows.exe" -OutFile "cuda_12.3.2_546.12_windows.exe" 

# install on silent mode
PS C:\Users\Administrator> ./cuda_12.3.2_546.12_windows.exe -s 

# installation processes are running
PS C:\Users\Administrator> Get-Process -Name "cuda*", "setup*" 

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName
-------  ------    -----      -----     ------     --  -- -----------
    318      20     2912      15564     149.67   3972   0 cuda_12.3.2_546.12_windows
    471      30    36488      63616     158.52   3524   0 setup

# after finishing installation, processes above finish
PS C:\Users\Administrator> Get-Process -Name "cuda*", "setup*" 


# reload environment variables
PS C:\Users\Administrator> $env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User") 

PS C:\Users\Administrator> nvcc --version 
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:30:42_Pacific_Standard_Time_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

PS C:\Users\Administrator> nvidia-smi 
Tue Feb 20 17:39:49 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 546.12                 Driver Version: 546.12       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060      WDDM  | 00000000:07:00.0 Off |                  N/A |
|  0%   40C    P8               9W / 170W |     21MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       800    C+G   C:\Windows\System32\dwm.exe               N/A      |
|    0   N/A  N/A       964    C+G   C:\Windows\System32\LogonUI.exe           N/A      |
+---------------------------------------------------------------------------------------+

[3]	Run sample program to verify installation.

PS C:\Users\Administrator> Invoke-WebRequest -Uri "https://github.com/NVIDIA/cuda-samples/archive/refs/heads/master.zip" -OutFile "master.zip" 
PS C:\Users\Administrator> Expand-Archive -Path ./master.zip 

PS C:\Users\Administrator> cd ./master/cuda-samples-master/Samples/1_Utilities/deviceQuery 
PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery> nvcc -I ../../../Common deviceQuery.cpp -o deviceQuery 
deviceQuery.cpp
   Creating library deviceQuery.lib and object deviceQuery.exp
PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery> ./deviceQuery.exe 
C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery\deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          12.3 / 12.3
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12288 MBytes (12884377600 bytes)
  (028) Multiprocessors, (128) CUDA Cores/MP:    3584 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 5 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 7 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.3, CUDA Runtime Version = 12.3, NumDevs = 1
Result = PASS


PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery> cd ~/master/cuda-samples-master/Samples/1_Utilities/bandwidthTest 
PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\bandwidthTest> nvcc -I ../../../Common bandwidthTest.cu -o bandwidthTest 
bandwidthTest.cu
tmpxft_000015a0_00000000-10_bandwidthTest.cudafe1.cpp
   Creating library bandwidthTest.lib and object bandwidthTest.exp
PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery> ./bandwidthTest.exe 
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: NVIDIA GeForce RTX 3060
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     12.0

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     12.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     324.4

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Matched Content