Kubernetes : Add Worker Node (GPU)2026/05/18 |
|
Add new Worker Nodes with GPU to existing Kubernetes Cluster.
This example is based on the cluster environment like follows.
+----------------------+ +----------------------+
| [ ctrl.srv.world ] | | [ dlp.srv.world ] |
| Manager Node | | Control Plane |
+-----------+----------+ +-----------+----------+
eth0|10.0.0.25 eth0|10.0.0.30
| |
------------+--------------------------+-----------
| |
eth0|10.0.0.51 eth0|10.0.0.52
+-----------+----------+ +-----------+----------+
| [ node01.srv.world ] | | [ node02.srv.world ] |
| Worker Node#1 | | Worker Node#2 |
+----------------------+ +----------------------+
|
| [1] |
On a new node with GPU, Install NVIDIA driver, refer to here. |
| [2] |
add a new GPU Node to your Kubernetes cluster, refer to here. |
| [3] | Install the GPU Operator so that the Pod can use the GPU. |
|
ubuntu@ctrl:~$
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia "nvidia" has been added to your repositories # create a namespace for the GPU Operator ubuntu@ctrl:~$ kubectl create namespace gpu-operator namespace/gpu-operator created helm install gpu-operator -n gpu-operator nvidia/gpu-operator --set driver.enabled=false NAME: gpu-operator LAST DEPLOYED: Mon May 18 03:54:45 2026 NAMESPACE: gpu-operator STATUS: deployed REVISION: 1 TEST SUITE: Noneubuntu@ctrl:~$ kubectl get pods -n gpu-operator NAME READY STATUS RESTARTS AGE gpu-feature-discovery-vtdml 1/1 Running 0 62s gpu-operator-7bcbd487f5-qjqq4 1/1 Running 0 85s gpu-operator-node-feature-discovery-gc-847bb8f7b6-7d4k5 1/1 Running 0 85s gpu-operator-node-feature-discovery-master-d98f944cd-z678d 1/1 Running 0 85s gpu-operator-node-feature-discovery-worker-7r8h7 1/1 Running 0 85s gpu-operator-node-feature-discovery-worker-7skgj 1/1 Running 0 85s gpu-operator-node-feature-discovery-worker-8nmhq 1/1 Running 0 85s gpu-operator-node-feature-discovery-worker-92sgw 1/1 Running 0 85s nvidia-container-toolkit-daemonset-8mw6r 1/1 Running 0 63s nvidia-cuda-validator-qs76g 0/1 Completed 0 30s nvidia-dcgm-exporter-8hnfn 0/1 Running 0 62s nvidia-device-plugin-daemonset-sdnmq 1/1 Running 0 62s nvidia-operator-validator-w6fk2 1/1 Running 0 63s |
| [4] | Check if the Pod can utilize the GPU. |
apiVersion: v1
kind: Pod
metadata:
name: cuda
spec:
containers:
- name: cuda
image: nvidia/cuda:13.1.2-cudnn-runtime-ubuntu24.04
command: ["/bin/sleep"]
args: ["600"]
resources:
limits:
nvidia.com/gpu: 1
kubectl apply -f cuda.yaml pod/cuda created ubuntu@ctrl:~$ kubectl get pods NAME READY STATUS RESTARTS AGE cuda 1/1 Running 0 12subuntu@ctrl:~$ kubectl exec cuda -- /usr/bin/nvidia-smi Mon May 18 04:43:23 2026 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 595.58.03 Driver Version: 595.58.03 CUDA Version: 13.2 | +-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3060 Off | 00000000:05:00.0 Off | N/A | | 0% 47C P8 16W / 170W | 1MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ |
| Sponsored Link |
|
|