CentOS Stream 8
Sponsored Link

NVIDIA HPC SDK インストール2022/02/10

NVIDIA HPC SDK をインストールします。
[2] NVIDIA HPC SDK をインストールします。
[root@dlp ~]#
dnf group -y install "Development tools"
[root@dlp ~]#
wget https://developer.download.nvidia.com/hpc-sdk/22.1/nvhpc-22-1-22.1-1.x86_64.rpm https://developer.download.nvidia.com/hpc-sdk/22.1/nvhpc-2022-22.1-1.x86_64.rpm
[root@dlp ~]#
dnf -y install ./nvhpc-22-1-22.1-1.x86_64.rpm ./nvhpc-2022-22.1-1.x86_64.rpm environment-modules
[root@dlp ~]#
vi /etc/environment-modules/modulespath
# 以下のように追記

# This file defines the initial setup for the modulefiles search path
# Each line containing one or multiple paths delimited by ':' will be
# added to the MODULEPATH environment variable.

[root@dlp ~]#
source /etc/profile.d/modules.sh

[root@dlp ~]#
module avail

------------------------ /usr/share/Modules/modulefiles ------------------------
dot  module-git  module-info  modules  null  use.own

----------------------- /opt/nvidia/hpc_sdk/modulefiles ------------------------
nvhpc-byo-compiler/22.1  nvhpc-nompi/22.1  nvhpc/22.1

[root@dlp ~]#
module load nvhpc/22.1
[root@dlp ~]#
nvc --version

nvc 22.1-0 64-bit target on x86-64 Linux -tp haswell
NVIDIA Compilers and Tools
Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

[root@dlp ~]#

CUDA Driver Version:           11060
NVRM version:                  NVIDIA UNIX x86_64 Kernel Module  510.47.03  Mon Jan 24 22:58:54 UTC 2022

Device Number:                 0
Device Name:                   NVIDIA GeForce GTX 1060 6GB
Device Revision Number:        6.1
Global Memory Size:            6373376000
Number of Multiprocessors:     10
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 49152
Registers per Block:           65536
Warp Size:                     32
Maximum Threads per Block:     1024
Maximum Block Dimensions:      1024, 1024, 64
Maximum Grid Dimensions:       2147483647 x 65535 x 65535
Maximum Memory Pitch:          2147483647B
Texture Alignment:             512B
Clock Rate:                    1847 MHz
Execution Timeout:             No
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  default
Concurrent Kernels:            Yes
ECC Enabled:                   No
Memory Clock Rate:             4004 MHz
Memory Bus Width:              192 bits
L2 Cache Size:                 1572864 bytes
Max Threads Per SMP:           2048
Async Engines:                 2
Unified Addressing:            Yes
Managed Memory:                Yes
Concurrent Managed Memory:     Yes
Preemption Supported:          Yes
Cooperative Launch:            Yes
  Multi-Device:                Yes
Default Target:                cc61
[3] 任意の一般ユーザーでテストプログラムを作成して動作確認します。
[cent@dlp ~]$
module load nvhpc/22.1
### C プログラム

[cent@dlp ~]$
vi helloworld.c
# 新規作成

#include <stdio.h>
int main() {
  printf("Hello World\n");

# コンパイル

[cent@dlp ~]$
nvc -o helloworld helloworld.c
# 実行

[cent@dlp ~]$

Hello World
### C++ プログラム

[cent@dlp ~]$
vi helloworld.cpp
# 新規作成

#include <iostream>
int main() {
  std::cout << "Hello World!\n";

# コンパイル

[cent@dlp ~]$
nvc++ -o helloworld2 helloworld.cpp
# 実行

[cent@dlp ~]$

Hello World!
### Fortran プログラム

[cent@dlp ~]$
vi helloworld.f90
# 新規作成

program helloworld
  print *, 'Hello World!'
end program helloworld

# コンパイル

[cent@dlp ~]$
nvfortran -o helloworld3 helloworld.f90
# 実行

[cent@dlp ~]$

Hello World!