TensorFlow : Docker イメージ (GPU) を利用する2022/08/10

	機械学習ライブラリー, TensorFlow をインストールします。当例では、TensorFlow 公式の Docker イメージをダウンロードして、コンテナーから TensorFlow を利用します。 Docker イメージは GPU サポート有りのイメージを使用します。
[1]	こちらを参考に、NVIDIA Container Toolkit をインストールしておきます。
[2]	TensorFlow Docker (GPU) の root ユーザーでの利用例です。

# TensorFlow GPU イメージを Pull

[root@dlp ~]#

podman pull docker.io/tensorflow/tensorflow:latest-gpu

[root@dlp ~]#

podman images

REPOSITORY                       TAG         IMAGE ID      CREATED       SIZE
docker.io/tensorflow/tensorflow  latest-gpu  c8d4e2940044  36 hours ago  6 GB
docker.io/tensorflow/tensorflow  latest      976c17ec6daa  36 hours ago  1.48 GB

# [nvidia-smi] 動作確認

[root@dlp ~]#

podman run -e NVIDIA_VISIBLE_DEVICES=all --rm docker.io/tensorflow/tensorflow:latest-gpu nvidia-smi

Thu Sep  8 08:08:15 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:05:00.0 Off |                  N/A |
|  0%   53C    P5    15W / 120W |      0MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

# TensorFlow 動作確認

[root@dlp ~]#

podman run -e NVIDIA_VISIBLE_DEVICES=all --rm docker.io/tensorflow/tensorflow:latest-gpu \
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

2022-09-08 08:09:00.910339: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-08 08:09:01.070418: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-08 08:09:02.828431: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:09:02.835621: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:09:02.838364: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:09:02.841856: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-08 08:09:02.846371: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:09:02.849344: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:09:02.852309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:09:03.511643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:09:03.511945: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:09:03.512086: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:09:03.512262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5381 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:05:00.0, compute capability: 6.1
tf.Tensor(-194.77834, shape=(), dtype=float32)

[3]	SELinux を有効にしている場合は、ポリシーの変更が必要です。

[root@dlp ~]#

vi my-python.te

# 以下の内容で新規作成

module my-python 1.0;

require {
        type container_t;
        type xserver_misc_device_t;
        type device_t;
        class chr_file { getattr ioctl map open read write };
}

#============= container_t ==============
allow container_t device_t:chr_file map;
allow container_t device_t:chr_file { getattr ioctl open read write };
allow container_t xserver_misc_device_t:chr_file map;

[root@dlp ~]#

checkmodule -m -M -o my-python.mod my-python.te

[root@dlp ~]#

semodule_package --outfile my-python.pp --module my-python.mod

[root@dlp ~]#

semodule -i my-python.pp

[4]	一般ユーザーで実行したい場合は、設定変更が必要です。

[root@dlp ~]#

vi /etc/nvidia-container-runtime/config.toml

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
# コメント解除して [true] に変更
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig"
#alpha-merge-visible-devices-envvars = false

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"

# 任意の一般ユーザーでログインして動作確認

[cent@dlp ~]$

podman pull docker.io/tensorflow/tensorflow:latest-gpu

[cent@dlp ~]$

podman images

REPOSITORY                       TAG             IMAGE ID      CREATED       SIZE
docker.io/tensorflow/tensorflow  latest-gpu      c8d4e2940044  36 hours ago  6 GB
docker.io/tensorflow/tensorflow  latest-jupyter  c94342dbd1e8  36 hours ago  1.72 GB
docker.io/tensorflow/tensorflow  latest          976c17ec6daa  36 hours ago  1.48 GB

# [nvidia-smi] 動作確認

[cent@dlp ~]$

podman run --rm --security-opt=label=disable \
--hooks-dir=/usr/share/containers/oci/hooks.d/ \
docker.io/tensorflow/tensorflow:latest-gpu /usr/bin/nvidia-smi

Thu Sep  8 08:20:51 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:05:00.0 Off |                  N/A |
|  0%   53C    P5    15W / 120W |      0MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

# Hello World テストスクリプトで動作確認

[cent@dlp ~]$

podman run -e NVIDIA_VISIBLE_DEVICES=all --rm --security-opt=label=disable \
--hooks-dir=/usr/share/containers/oci/hooks.d/ \
docker.io/tensorflow/tensorflow:latest-gpu \
python -c "import tensorflow as tf; hello = tf.constant('Hello, TensorFlow World'); tf.print(hello)"

2022-09-08 08:21:43.417819: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-08 08:21:43.580492: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-09-08 08:21:45.287472: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:21:45.292619: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:21:45.293005: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:21:45.293908: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-08 08:21:45.294298: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:21:45.294581: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:21:45.294902: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:21:45.892204: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:21:45.892465: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:21:45.892657: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-09-08 08:21:45.892873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5381 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1060 6GB, pci bus id: 0000:05:00.0, compute capability: 6.1
Hello, TensorFlow World