Installing Kubernetes on DGX using Kubeadm (Updated for TAO 5.0.0)

Created : 18/05/2023
Status: Draft


These are simplified instructions to install k8 and services

my installed k8 version

echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-cache policy kubectl 
sudo apt-get install -y kubelet=1.24.2-1.1 kubeadm=1.24.2-1.1  kubectl=1.24.2-1.1
sudo apt-mark hold kubelet kubeadm kubectl

install 1.23.5-00

  1. add key (this method may be depricated)
curl -s | sudo apt-key add -
  1. update souces list
cat << EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb kubernetes-xenial main
  1. update packakge list and check apt cache to make sure we have the version we are looking for
sudo apt update
apt-cache policy kubeadm

when you scroll down you’ll see

1.23.5-00 500
        500 kubernetes-xenial/main amd64 Packages

  1. install
sudo apt-get install -y kubelet=1.24.14-00 kubeadm=1.24.14-00 kubectl=1.24.14-00 --allow-downgrades --allow-change-held-packages

sudo apt-get install -y kubelet=1.23.5-00 kubeadm=1.23.5-00 kubectl=1.23.5-00 --allow-downgrades --allow-change-held-packages

If you had previous installations

official guidance can be found here here

drain all nodes 

reset kubeadm

normal setup

sudo kubeadm reset 

specifiying the CRI socket

sudo kubeadm reset --cri-socket=unix:///var/run/cri-dockerd.sock

rest changes to networking

sudo rm -rf /etc/cni/net.d
rm -rf $HOME/.kube

Clear IP tables

basic command

sudo iptables -F 
sudo iptables -t nat -F  
sudo iptables -t mangle -F 
sudo iptables -X

if it doesn’t work

Set default policies

sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT

Flush all rules

sudo iptables -t filter -F
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -t raw -F
sudo iptables -t security -F

Delete all non-default chains

sudo iptables -t filter -X
sudo iptables -t nat -X
sudo iptables -t mangle -X
sudo iptables -t raw -X
sudo iptables -t security -X

Fresh install

run the command in the master node (control plane)

without specifying the socket

sudo kubeadm init --pod-network-cidr= 

with specifying the socket

sudo kubeadm init --pod-network-cidr= --cri-socket=unix:///var/run/cri-dockerd.sock 

for a specific k8 version try:

sudo kubeadm init --pod-network-cidr= --kubernetes-version=v1.23.5

Note: cri-dockered is not compatible with version 1.23


mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

setup network stuff

install calico

install calico and the install calicoctl


my values.yaml

imagePullSecrets: {}

  enabled: true
    type: Calico

  enabled: true


resources: {}

- effect: NoExecute
  operator: Exists
- effect: NoSchedule
  operator: Exists

nodeSelector: linux

podAnnotations: {}

podLabels: {}

  image: tigera/operator
  version: v1.29.3
  tag: v3.25.1

  bgp: Enabled
  - cidr:
    encapsulation: VXLAN
    natOutgoing: Enabled
    nodeSelector: all()

confirm all pods are working

watch kubectl get pods -n calico-system

check the IP pool

kubectl calico ipam show

or if the cluster and calicoctl versions do not match

kubectl-calico ipam show --allow-version-mismatch

you will get something like

| IP Pool  | |     65536 | 7 (0%)     | 65529 (100%) |

get calicotl version pods with kubectl-calico version or calicoctl version

The aim is to make sure the cluster and the client have the same version I get something

kubectl-calico version
Client Version:    v3.25.1
Git commit:        82dadbce1
Cluster Version:   v3.25.1
Cluster Type:      typha,kdd,k8s,operator,bgp,kubeadm

in the worker node (do this before resetting the master)

if it was previously used ssh into that and run

sudo kubeadm reset

to reset the node, then clean up networking configs

sudo rm -rf /etc/cni/net.d
rm -rf ~/.kube

if that fails

  1. stop the services
    sudo systemctl stop kubelet
    sudo systemctl stop <container-runtime-service> (containerd or dockerd)

delete settings

sudo rm -rf /etc/kubernetes
sudo rm -rf /var/lib/kubelet
sudo rm -rf /var/lib/etcd
sudo rm -rf /var/lib/cni
sudo rm -rf /etc/cni/net.d
sudo rm -rf /var/run/kubernetes

clear iptables and firewall

sudo iptables -P INPUT ACCEPT
sudo iptables -P FORWARD ACCEPT
sudo iptables -P OUTPUT ACCEPT
sudo iptables -t filter -F
sudo iptables -t nat -F
sudo iptables -t mangle -F
sudo iptables -t raw -F
sudo iptables -t security -F
sudo iptables -t filter -X
sudo iptables -t nat -X
sudo iptables -t mangle -X
sudo iptables -t raw -X
sudo iptables -t security -X

list the containers (view running containers)

sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a

list only the container IDs (list running contianer IDS)

sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -aq

delete all containers (one liner, this is not tested enough)

sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -aq | xargs -r -I {} sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock rm {}

if everything above fails try

clear downloaded containers (this step is not really needed)

sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd


sudo reboot and try kubeadm reset again

Onve the node is properly resetted get a join token from the master node and apply to the user node

kubeadm token create --print-join-command

to be able to use kubectl from the worker node copy the $HOME/.kube/config to the worker (optional)

modified /etc/containerd/config.toml (to be able to use the gpu operator)

cexample can alo be found at thenvidia repo

disabled_plugins = []
imports = []
oom_score = 0
plugin_dir = ""
required_plugins = []
root = "/var/lib/containerd"
state = "/run/containerd"
temp = ""
version = 2

  path = ""

  address = ""
  format = ""
  gid = 0
  level = ""
  uid = 0

  address = "/run/containerd/containerd.sock"
  gid = 0
  max_recv_message_size = 16777216
  max_send_message_size = 16777216
  tcp_address = ""
  tcp_tls_ca = ""
  tcp_tls_cert = ""
  tcp_tls_key = ""
  uid = 0

  address = ""
  grpc_histogram = false


    deletion_threshold = 0
    mutation_threshold = 100
    pause_threshold = 0.02
    schedule_delay = "0s"
    startup_delay = "100ms"

    device_ownership_from_security_context = false
    disable_apparmor = false
    disable_cgroup = false
    disable_hugetlb_controller = true
    disable_proc_mount = false
    disable_tcp_service = true
    enable_selinux = false
    enable_tls_streaming = false
    enable_unprivileged_icmp = false
    enable_unprivileged_ports = false
    ignore_image_defined_volumes = false
    max_concurrent_downloads = 3
    max_container_log_line_size = 16384
    netns_mounts_under_state_dir = false
    restrict_oom_score_adj = false
    sandbox_image = ""
    selinux_category_range = 1024
    stats_collect_period = 10
    stream_idle_timeout = "4h0m0s"
    stream_server_address = ""
    stream_server_port = "0"
    systemd_cgroup = false
    tolerate_missing_hugetlb_controller = true
    unset_seccomp_profile = ""

      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"
      conf_template = ""
      ip_pref = ""
      max_conf_num = 1

      default_runtime_name = "nvidia"
      disable_snapshot_annotations = true
      discard_unpacked_layers = false
      ignore_rdt_not_enabled_errors = false
      no_pivot = false
      snapshotter = "overlayfs"

        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""



          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

            BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime"
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true

          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

            BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime.cdi"
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true

          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

            BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime.experimental"
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true

          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

            BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime.legacy"
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true

          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = true

        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""


      key_model = "node"

      config_path = ""





      tls_cert_file = ""
      tls_key_file = ""

    path = "/opt/containerd"

    interval = "10s"

    sampling_ratio = 1.0
    service_name = "containerd"

    content_sharing_policy = "shared"

    no_prometheus = false

    no_shim = false
    runtime = "runc"
    runtime_root = ""
    shim = "containerd-shim"
    shim_debug = false

    platforms = ["linux/amd64"]
    sched_core = false

    default = ["walking"]

    rdt_config_file = ""

    root_path = ""

    root_path = ""

    async_remove = false
    base_image_size = ""
    discard_blocks = false
    fs_options = ""
    fs_type = ""
    pool_name = ""
    root_path = ""

    root_path = ""

    root_path = ""
    upperdir_label = false

    root_path = ""

    endpoint = ""
    insecure = false
    protocol = ""



    accepts = ["application/vnd.oci.image.layer.v1.tar+encrypted"]
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
    path = "ctd-decoder"
    returns = "application/vnd.oci.image.layer.v1.tar"

    accepts = ["application/vnd.oci.image.layer.v1.tar+gzip+encrypted"]
    args = ["--decryption-keys-path", "/etc/containerd/ocicrypt/keys"]
    env = ["OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf"]
    path = "ctd-decoder"
    returns = "application/vnd.oci.image.layer.v1.tar+gzip"

  "" = "0s"
  "io.containerd.timeout.shim.cleanup" = "5s"
  "io.containerd.timeout.shim.load" = "5s"
  "io.containerd.timeout.shim.shutdown" = "3s"
  "io.containerd.timeout.task.state" = "2s"

  address = ""
  gid = 0
  uid = 0

Install the gpu operator

follow these instructions to install gpu-operator.

make sure to wait unitl the node is in ready state beforehand and wait for gpu-operator pods to install in all the nodes.

hint: if networking pods throw error e.g. “kubernetes-worker-node-is-notready-due-to-cni-plugin-not-initialized” restart the containerd or dockerd status with e.g. sudo systemctl restart containerd

helm install --wait --generate-name \
     -n gpu-operator --create-namespace \
      nvidia/gpu-operator \
      --set driver.enabled=false \
      --set toolkit.enabled=false

Installing the k8 metrics server

helm repo add metrics-server
helm upgrade --install metrics-server metrics-server/metrics-server --namespace k8-metrics --create-namespace

once installed make sure the api-service is running:

kubectl get apiservices

if it is not: edit the deployment and add the following

 - --kubelet-insecure-tls

the deployment will then look like

      - args:
        - --secure-port=10250
        - --cert-dir=/tmp
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls

installing the k8 dashboard

helm install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --namespace kubernetes-dashboard -f valus.yaml --create-namespace

this values.yaml will create a NodePort service that can be accessed via the port, token TTL is for the lifetime of the auth token. When the existing token expires you can use kubectl -n kubernetes-dashboard create token admin-user to create a new token.

  type: NodePort
  nodePort: 30001 # You can change the port number according to your needs
  tokenTTL: 28800

  enabled: false


I wanted to modify the yaml templates that take overriding values from the values.yaml so i made a backup file (e.g. ingress.yaml to ingress.yaml.backup) But I noticed that the created ingresses had the incorrect class name. then i renamed the backup to ingress-yaml.backup then it all worked. this means it only looks for the filename.yaml tag

Check the next topic

