二进制部署 k8s 集群 1.27.3 版本

通过本文的指导,读者可以了解如何通过二进制的方式部署 Kubernetes 1.27.3 版本集群。二进制部署可以加深对 Kubernetes 各组件的理解,可以灵活地将各个组件部署到不同的机器,以满足自身的要求。但是需要注意的是,二进制部署需要手动配置各个组件,需要一定的技术水平和经验。

二进制部署 k8s 集群 1.27.3 版本

1. 环境准备

虽然 kubeadm, kops, kubespray 以及 rke, kubesphere 等工具可以快速部署 k8s 集群,但是依然会有很多人热衷与使用二进制部署 k8s 集群。

二进制部署可以加深对 k8s 各组件的理解,可以灵活地将各个组件部署到不同的机器,以满足自身的要求。还可以生成一个超长时间自签证书,比如 99 年,免去忘记更新证书过期带来的生产事故。

1.1 书写约定

  • 命令行输入,均以 符号表示
  • 注释使用 #// 表示
  • 执行命令输出结果,以空行分隔
  • 如无特殊说明,命令需要在全部集群节点执行

1.2 机器规划

1.2.1 操作系统

Rocky-8.6-x86_64-minimal.iso

1.2.2 集群节点

角色 主机名 IP 组件
master master1 10.128.170.21 etcd, kube-apiserver, kube-controller-manager, kubelet, kube-proxy, kube-scheduler
worker worker1 10.128.170.131 kubelet, kube-proxy
worker worker2 10.128.170.132 kubelet, kube-proxy
worker worker3 10.128.170.133 kubelet, kube-proxy

1.2.3 测试节点(可选)

角色 主机名 IP 组件
registry registry 10.128.170.235 docker

1.3 环境配置

1.3.1 基础设置

  • 设置主机名

    1
    2
    3
    4
    5
    6
    7
    8
    # 在 master1 节点执行
    ➜ hostnamectl set-hostname master1
    # 在 worker1 节点执行
    ➜ hostnamectl set-hostname worker1
    # 在 worker2 节点执行
    ➜ hostnamectl set-hostname worker2
    # 在 worker3 节点执行
    ➜ hostnamectl set-hostname worker3
  • 主机名解析

    1
    2
    3
    4
    5
    6
    ➜ cat >> /etc/hosts << EOF
    10.128.170.21 master1 master1.local
    10.128.170.131 worker1 worker1.local
    10.128.170.132 worker2 worker2.local
    10.128.170.133 worker3 worker3.local
    EOF
  • 免密登录

    1
    2
    3
    4
    5
    # 在 master1 节点上执行(允许 master1 免密登录其他节点)
    ➜ ssh-keygen -t rsa
    ➜ ssh-copy-id worker1
    ➜ ssh-copy-id worker2
    ➜ ssh-copy-id worker3
  • 设置 yum 源

    https://developer.aliyun.com/mirror/rockylinux

    1
    2
    3
    4
    5
    6
    7
    # 执行以下命令替换默认源
    ➜ sed -e 's|^mirrorlist=|#mirrorlist=|g' \
    -e 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=https://mirrors.aliyun.com/rockylinux|g' \
    -i.bak \
    /etc/yum.repos.d/Rocky-*.repo

    ➜ dnf makecache
  • 设置 epel 源

    https://developer.aliyun.com/mirror/epel

    1
    2
    3
    ➜ yum install -y https://mirrors.aliyun.com/epel/epel-release-latest-8.noarch.rpm
    ➜ sed -i 's|^#baseurl=https://download.example/pub|baseurl=https://mirrors.aliyun.com|' /etc/yum.repos.d/epel*
    ➜ sed -i 's|^metalink|#metalink|' /etc/yum.repos.d/epel*
  • 安装必要工具

    1
    ➜ yum install -y vim wget htop
  • 创建下载文件存放目录

    1
    ➜ mkdir -p ~/Downloads

1.3.2 k8s 环境设置

  • 关闭防火墙

    1
    2
    ➜ systemctl stop firewalld
    ➜ systemctl disable firewalld
  • 关闭 selinux

    1
    2
    3
    4
    # 临时
    ➜ setenforce 0
    # 永久
    ➜ sed -i 's/enforcing/disabled/' /etc/selinux/config
  • 关闭 swap

    1
    2
    3
    4
    # 临时
    ➜ swapoff -a
    # 永久
    ➜ sed -ri 's/.*swap.*/#&/' /etc/fstab

    使用 -r 选项可以使用扩展正则表达式,这提供了一种更强大和灵活的方式来匹配文本中的模式。

    使用正则表达式 .*swap.* 匹配包含 swap 字符串的行,并在行首添加 # 符号,& 表示匹配到的整个字符串。

  • 设置文件描述符限制

    1
    2
    3
    4
    # 临时
    ➜ ulimit -SHn 65535
    # 永久
    ➜ echo "* - nofile 65535" >>/etc/security/limits.conf

    用于设置当前用户的最大文件描述符数限制。具体来说,它的作用是将当前用户的软限制和硬限制都设置为 65535。

  • 时间同步

    1
    2
    3
    4
    5
    # 设置时区
    ➜ timedatectl set-timezone Asia/Shanghai
    # 安装时间同步服务
    ➜ yum install -y chrony
    ➜ systemctl enable --now chronyd
  • 创建 kubernetes 证书存放目录

    1
    ➜ mkdir -p /etc/kubernetes/pki

1.3.3 网络环境设置

  • 转发 IPv4 并让 iptables 看到桥接流量

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    # 加载 br_netfilter 和 overlay 模块
    ➜ cat > /etc/modules-load.d/k8s.conf << EOF
    overlay
    br_netfilter
    EOF
    ➜ modprobe overlay
    ➜ modprobe br_netfilter
    # 设置所需的 sysctl 参数,参数在重新启动后保持不变
    ➜ cat > /etc/sysctl.d/k8s.conf << EOF
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1
    net.ipv4.ip_forward = 1
    EOF
    # 应用 sysctl 参数而不重新启动
    ➜ sysctl --system

    通过以下命令确认 br_netfilter 和 overlay 模块被加载:

    1
    2
    ➜ lsmod | grep overlay
    ➜ lsmod | grep br_netfilter

    通过以下命令确认 net.bridge.bridge-nf-call-iptables、net.bridge.bridge-nf-call-ip6tables 和 net.ipv4.ip_forward 系统变量在你的 sysctl 配置中被设置为 1:

    1
    ➜ sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
  • 加载 ipvs 模块

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    ➜ yum install -y ipset ipvsadm
    ➜ cat > /etc/sysconfig/modules/ipvs.modules << "EOF"
    #!/bin/bash
    modprobe -- ip_vs
    modprobe -- ip_vs_rr
    modprobe -- ip_vs_wrr
    modprobe -- ip_vs_sh
    #modprobe -- nf_conntrack_ipv4
    modprobe -- nf_conntrack
    EOF
    ➜ chmod +x /etc/sysconfig/modules/ipvs.modules
    ➜ /bin/bash /etc/sysconfig/modules/ipvs.modules
    #➜ lsmod | grep -e ip_vs -e nf_conntrack_ipv4
    ➜ lsmod | grep -e ip_vs -e nf_conntrack
    • modprobe -- ip_vs: 加载 ip_vs 内核模块,该模块提供了 Linux 内核中的 IP 负载均衡功能。
    • modprobe -- ip_vs_rr: 加载 ip_vs_rr 内核模块,该模块提供了基于轮询算法的 IP 负载均衡策略。
    • modprobe -- ip_vs_wrr: 加载 ip_vs_wrr 内核模块,该模块提供了基于加权轮询算法的 IP 负载均衡策略。
    • modprobe -- ip_vs_sh: 加载 ip_vs_sh 内核模块,该模块提供了基于哈希算法的 IP 负载均衡策略。
    • modprobe -- nf_conntrack/nf_conntrack_ipv4: 加载 nf_conntrack/nf_conntrack_ipv4 内核模块,该模块提供了 Linux 内核中的网络连接跟踪功能,用于跟踪网络连接的状态。

    这些命令通常用于配置 Linux 系统中的负载均衡和网络连接跟踪功能。在加载这些内核模块之后,就可以使用相应的工具和命令来配置和管理负载均衡和网络连接跟踪。例如,可以使用 ipvsadm 命令来配置 IP 负载均衡,使用 conntrack 命令来查看和管理网络连接跟踪表。

    如果提示如下错误:

    1
    "modprobe: FATAL: Module nf_conntrack_ipv4 not found in directory /lib/modules/4.18.0-372.9.1.el8.x86_64"

    则需要将 nf_conntrack_ipv4 修改为 nf_conntrack,然后重新执行命令,因为在高版本内核中已经把 nf_conntrack_ipv4 替换为 nf_conntrack。

    nf_conntrack_ipv4 和 nf_conntrack 都是 Linux 内核中的网络连接跟踪模块,用于跟踪网络连接的状态。它们的区别在于:

    • nf_conntrack_ipv4 模块只能跟踪 IPv4 协议的网络连接,而 nf_conntrack 模块可以跟踪 IPv4 和 IPv6 协议的网络连接。
    • nf_conntrack_ipv4 模块是 nf_conntrack 模块的一个子模块,它提供了 IPv4 协议的网络连接跟踪功能。因此,如果要使用 nf_conntrack_ipv4 模块,必须先加载 nf_conntrack 模块。

    这两个模块通常用于 Linux 系统中的网络安全和网络性能优化。它们可以被用于防火墙、负载均衡、网络流量分析等场景中,以便对网络连接进行跟踪、监控和控制。例如,可以使用 iptables 命令和 nf_conntrack 模块来实现基于连接状态的防火墙规则,或者使用 ipvsadm 命令和 nf_conntrack 模块来实现 IP 负载均衡。

1.3.4 重启系统

  • 重启

    1
    ➜ reboot

1.4 下载二进制包

https://kubernetes.io/zh-cn/releases/download/

从官方发布地址下载二进制包,下载 Server Binaries 即可,这个包含了所有所需的二进制文件。

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.27.md

解压后,复制二进制 kube-apiserver,kube-controller-manager,kubectl,kubelet,kube-proxy,kube-scheduler 到 master 节点 /usr/local/bin 目录下,复制二进制 kubelet,kube-proxy 到 worker 节点 /usr/local/bin 目录下。

在 master1 节点执行以下命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
➜ cd ~/Downloads
# 下载
➜ wget -c https://dl.k8s.io/v1.27.3/kubernetes-server-linux-amd64.tar.gz
# 解压
➜ tar -zxf kubernetes-server-linux-amd64.tar.gz
# 复制到 master1 节点 /usr/local/bin 目录
➜ cp kubernetes/server/bin/{kubeadm,kube-apiserver,kube-controller-manager,kubectl,kubelet,kube-proxy,kube-scheduler} /usr/local/bin
# 查看复制结果
➜ ls -lh /usr/local/bin/kube*

-rwxr-xr-x 1 root root 46M Jul 10 14:28 /usr/local/bin/kubeadm
-rwxr-xr-x 1 root root 112M Jul 10 14:28 /usr/local/bin/kube-apiserver
-rwxr-xr-x 1 root root 104M Jul 10 14:28 /usr/local/bin/kube-controller-manager
-rwxr-xr-x 1 root root 47M Jul 10 14:28 /usr/local/bin/kubectl
-rwxr-xr-x 1 root root 102M Jul 10 14:28 /usr/local/bin/kubelet
-rwxr-xr-x 1 root root 51M Jul 10 14:28 /usr/local/bin/kube-proxy
-rwxr-xr-x 1 root root 52M Jul 10 14:28 /usr/local/bin/kube-scheduler
# 复制到 worker1 节点 /usr/local/bin 目录
➜ scp kubernetes/server/bin/{kubelet,kube-proxy} root@worker1:/usr/local/bin
# 复制到 worker2 节点 /usr/local/bin 目录
➜ scp kubernetes/server/bin/{kubelet,kube-proxy} root@worker2:/usr/local/bin
# 复制到 worker3 节点 /usr/local/bin 目录
➜ scp kubernetes/server/bin/{kubelet,kube-proxy} root@worker3:/usr/local/bin

1.5 查看镜像版本

在 master1 节点执行以下命令:

1
2
3
4
5
6
7
8
9
10
# 查看依赖的镜像版本
➜ kubeadm config images list

registry.k8s.io/kube-apiserver:v1.27.3
registry.k8s.io/kube-controller-manager:v1.27.3
registry.k8s.io/kube-scheduler:v1.27.3
registry.k8s.io/kube-proxy:v1.27.3
registry.k8s.io/pause:3.9
registry.k8s.io/etcd:3.5.7-0
registry.k8s.io/coredns/coredns:v1.10.1

2. 容器运行时

本节概述了使用 containerd 作为 CRI 运行时的必要步骤。

https://blog.51cto.com/lajifeiwomoshu/5428345

2.1 安装 containerd

参考 Getting started with containerd 在各节点安装 containerd。

  • 设置 repository

    1
    2
    3
    4
    # 安装 yum-utils
    ➜ yum install -y yum-utils
    # 添加 repository
    ➜ yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

    或者

    1
    2
    # 添加 repository
    ➜ dnf config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
  • 查看当前镜像源中支持的 containerd 版本

    1
    ➜ yum list containerd.io --showduplicates
  • 安装特定版本的 containerd

    1
    ➜ yum install -y --setopt=obsoletes=0 containerd.io-1.6.21
  • 添加 containerd 配置

    https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd

    https://github.com/containerd/containerd/blob/main/docs/cri/config.md

    This document provides the description of the CRI plugin configuration. The CRI plugin config is part of the containerd config (default path: /etc/containerd/config.toml).

    See here for more information about containerd config.

    Note that the [plugins."io.containerd.grpc.v1.cri"] section is specific to CRI, and not recognized by other containerd clients such as ctr, nerdctl, and Docker/Moby.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    # 创建 containerd 配置文件
    ➜ cat > /etc/containerd/config.toml << EOF
    # https://github.com/containerd/containerd/blob/main/docs/cri/config.md
    disabled_plugins = []
    imports = []
    version = 2

    [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "registry.k8s.io/pause:3.9"
    # https://github.com/containerd/containerd/issues/6964
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    runtime_type = "io.containerd.runc.v2"
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true
    [plugins."io.containerd.grpc.v1.cri".registry]
    config_path = "/etc/containerd/certs.d"
    EOF
    # 创建镜像仓库配置目录
    ➜ mkdir -p /etc/containerd/certs.d
  • 启动 containerd

    1
    2
    3
    4
    ➜ systemctl daemon-reload
    ➜ systemctl enable --now containerd
    # 查看 containerd 状态
    ➜ systemctl status containerd
  • 查看当前 containerd 使用的配置

    1
    ➜ containerd config dump
  • 测试 containerd(在任意一个集群节点测试即可)

    1
    2
    3
    4
    5
    6
    # 拉取 redis 镜像
    ➜ ctr images pull docker.io/library/redis:alpine
    # 创建 redis 容器并运行
    ➜ ctr run docker.io/library/redis:alpine redis
    # 删除 redis 镜像
    ➜ ctr images delete docker.io/library/redis:alpine

2.2 安装 containerd cli 工具

There are several command line interface (CLI) projects for interacting with containerd:

Name Community API Target Web site
ctr containerd Native For debugging only (None, see ctr --help to learn the usage)
nerdctl containerd (non-core) Native General-purpose https://github.com/containerd/nerdctl
crictl Kubernetes SIG-node CRI For debugging only https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md

2.2.1 ctr

While the ctr tool is bundled together with containerd, it should be noted the ctr tool is solely made for debugging containerd. The nerdctl tool provides stable and human-friendly user experience.

2.2.2 crictl

crictl 是 CRI 兼容的容器运行时命令行接口,可以使用它来检查和调试 k8s 节点上的容器运行时和应用程序。主要是用于 kubernetes, 默认操作的命名空间是 k8s.io,而且看到的对象是 pod。

如果是 k8s 环境的话,可以参考下方链接,在每个节点部署 containerd 的时候也部署下 crictl 工具。

kubernetes-sigs/cri-tools: CLI and validation tools for Kubelet Container Runtime Interface (CRI) .

在 master1 节点执行以下命令:

1
2
3
4
5
6
7
8
9
10
11
➜ cd ~/Downloads
# 下载
➜ wget -c https://hub.gitmirror.com/https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.27.1/crictl-v1.27.1-linux-amd64.tar.gz
# 解压到 master1 节点 /usr/local/bin 目录
➜ tar -zxvf crictl-v1.27.1-linux-amd64.tar.gz -C /usr/local/bin
# 复制到 worker1 节点 /usr/local/bin 目录
➜ scp /usr/local/bin/crictl root@worker1:/usr/local/bin
# 复制到 worker2 节点 /usr/local/bin 目录
➜ scp /usr/local/bin/crictl root@worker2:/usr/local/bin
# 复制到 worker3 节点 /usr/local/bin 目录
➜ scp /usr/local/bin/crictl root@worker3:/usr/local/bin

创建 crictl 配置文件:

1
2
3
4
5
6
7
8
9
➜ cat > /etc/crictl.yaml << EOF
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 30
debug: false
EOF
➜ scp /etc/crictl.yaml root@worker1:/etc
➜ scp /etc/crictl.yaml root@worker2:/etc
➜ scp /etc/crictl.yaml root@worker3:/etc

测试 crictl 工具:

1
2
3
4
5
6
7
➜ crictl info --output go-template --template '{{.config.sandboxImage}}'

registry.k8s.io/pause:3.9

➜ crictl inspecti --output go-template --template '{{.status.pinned}}' registry.k8s.io/pause:3.9

FATA[0000] no such image "registry.k8s.io/pause:3.9" present

2.2.3 nerdctl(推荐)

https://github.com/containerd/nerdctl

对于单机节点来说,推荐使用 nerdctl,使用起来和 docker 类似,基本没学习成本,把 docker 换成 nerdctl 即可。对于普通用户来说,这个是比较友好的工具。但是对于 api 那些来说,可以选择其他。

nerdctl 有两种版本:

  • Minimal (nerdctl-1.4.0-linux-amd64.tar.gz): nerdctl only
  • Full (nerdctl-full-1.4.0-linux-amd64.tar.gz): Includes dependencies such as containerd, runc, and CNI

这里我选择的是 Minimal 的版本,直接到 https://github.com/containerd/nerdctl/releases 下载。

在 master1 节点执行以下命令:

1
2
3
4
5
6
7
8
9
10
11
➜ cd ~/Downloads
# 下载
➜ wget -c https://hub.gitmirror.com/https://github.com/containerd/nerdctl/releases/download/v1.4.0/nerdctl-1.4.0-linux-amd64.tar.gz
# 解压到 master1 节点 /usr/local/bin 目录
➜ tar Cxzvvf /usr/local/bin nerdctl-1.4.0-linux-amd64.tar.gz
# 复制到 worker1 节点 /usr/local/bin 目录
➜ scp /usr/local/bin/{containerd-rootless-setuptool.sh,containerd-rootless.sh,nerdctl} root@worker1:/usr/local/bin
# 复制到 worker2 节点 /usr/local/bin 目录
➜ scp /usr/local/bin/{containerd-rootless-setuptool.sh,containerd-rootless.sh,nerdctl} root@worker2:/usr/local/bin
# 复制到 worker3 节点 /usr/local/bin 目录
➜ scp /usr/local/bin/{containerd-rootless-setuptool.sh,containerd-rootless.sh,nerdctl} root@worker3:/usr/local/bin

测试 nerdctl(在任意一个集群节点测试即可)

1
2
3
4
5
6
7
8
# 拉取镜像
➜ nerdctl image pull redis:alpine
# 查看镜像
➜ nerdctl image ls
# 删除镜像
➜ nerdctl image rm redis:alpine
# 运行 nginx 服务(需要 CNI plugin)
#➜ nerdctl run -d --name nginx -p 80:80 nginx:alpine

2.3 设置公共仓库镜像源(可选)

https://github.com/containerd/containerd/blob/main/docs/cri/registry.md

https://github.com/containerd/containerd/blob/main/docs/hosts.md

由于某些因素,在国内拉取公共镜像仓库的速度是极慢的,为了节约拉取时间,需要为 containerd 配置镜像仓库的 mirror。

containerd 的镜像仓库 mirror 与 docker 相比有两个区别:

  • containerd 只支持通过 CRI 拉取镜像的 mirror,也就是说,只有通过 crictl,nerdctl 或者 kubernetes 调用时 mirror 才会生效,要想使用 ctr 拉取也生效的话需要指定 --hosts-dir

    可以通过 nerdctl --debug pull 来观察

  • docker 只支持为 Docker Hub 配置 mirror,而 containerd 支持为任意镜像仓库配置 mirror。

    registry.aliyuncs.com/google_containers 虽然有 k8s.gcr.io 的镜像,但不是加速站点

拉取镜像时,默认都是从 docker hub 上拉取,如果镜像名前不加 registry 地址的话默认会给你加上 docker.io/library

需要注意的是:

  • 如果 hosts.toml 文件中的 capabilities 中不加 resolve 的话,无法加速镜像
  • 配置无需重启服务,即可生效
  • 要配保底的加速站点,否则可能会导致下载失败

2.3.1 docker.io

https://yeasy.gitbook.io/docker_practice/install/mirror

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 创建 docker.io 目录
➜ mkdir -p /etc/containerd/certs.d/docker.io
# 创建 docker.io 仓库配置文件
➜ cat > /etc/containerd/certs.d/docker.io/hosts.toml << EOF
# https://blog.csdn.net/qq_44797987/article/details/112681224
server = "https://docker.io"

[host."https://dockerproxy.com"]
capabilities = ["pull", "resolve"]
[host."https://ccr.ccs.tencentyun.com"]
capabilities = ["pull", "resolve"]
[host."https://hub-mirror.c.163.com"]
capabilities = ["pull", "resolve"]
[host."https://mirror.baidubce.com"]
capabilities = ["pull", "resolve"]
[host."https://registry-1.docker.io"]
capabilities = ["pull", "resolve", "push"]
EOF

2.3.2 registry.k8s.io

1
2
3
4
5
6
7
8
9
10
# 创建 registry.k8s.io 目录
➜ mkdir -p /etc/containerd/certs.d/registry.k8s.io
# 创建 registry.k8s.io 仓库配置文件
➜ cat > /etc/containerd/certs.d/registry.k8s.io/hosts.toml << EOF
server = "https://registry.k8s.io"

[host."https://registry.aliyuncs.com/v2/google_containers"]
capabilities = ["pull", "resolve"]
override_path = true
EOF

registry.aliyuncs.com/google_containers 这个镜像仓库站点不是 registry.k8s.io 的 mirror,只是有 registry.k8s.io 的镜像,这就是为什么 registry.k8s.io 有些镜像在 registry.aliyuncs.com/google_containers 没有的原因。

通过执行以下命令并观察输出可以知道,从 "registry.aliyuncs.com/google_containers" 拉取 pause 镜像正确的请求是 "https://registry.cn-hangzhou.aliyuncs.com/v2/google_containers/pause/manifests/3.9"。

1
2
3
4
5
6
# 在 master1 节点进行测试
➜ ctr --debug images pull -k registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.9

...
DEBU[0000] do request host=registry.cn-hangzhou.aliyuncs.com request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/1.6.21 request.method=HEAD url="https://registry.cn-hangzhou.aliyuncs.com/v2/google_containers/pause/manifests/3.9"
...

使用 ctr 命令拉取 "registry.k8s.io/pause:3.9" 镜像,如果最终拼接出的请求不是 "https://registry.cn-hangzhou.aliyuncs.com/v2/google_containers/pause/manifests/3.9",则会导致拉取失败。

根据文档 https://github.com/containerd/containerd/blob/main/docs/hosts.md 可以知道:

1
pull [registry_host_name|IP address][:port][/v2][/org_path]<image_name>[:tag|@DIGEST]

拉取请求格式的 /v2 部分指的是分发 api 的版本。 如果未包含在拉取请求中,则默认情况下为符合上面链接的分发规范的所有客户端添加 /v2。这可能会导致不合规的 OCI registry 使用有问题。

以拉取 "registry.k8s.io/pause:3.9" 镜像为例,在 master1 节点上进行测试,分析几种配置情况下实际的拉取请求 URL 及拉取结果:

2.3.2.1 配置一(错误)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 设置 registry.k8s.io 配置
➜ cat > /etc/containerd/certs.d/registry.k8s.io/hosts.toml << EOF
server = "https://registry.k8s.io"

[host."https://registry.aliyuncs.com/google_containers"]
capabilities = ["pull", "resolve"]
EOF
# 测试镜像拉取
➜ ctr --debug images pull --hosts-dir "/etc/containerd/certs.d" registry.k8s.io/pause:3.9

DEBU[0000] fetching image="registry.k8s.io/pause:3.9"
DEBU[0000] loading host directory dir=/etc/containerd/certs.d/registry.k8s.io
DEBU[0000] resolving host=registry.aliyuncs.com
DEBU[0000] do request host=registry.aliyuncs.com request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/1.6.21 request.method=HEAD url="https://registry.aliyuncs.com/google_containers/v2/pause/manifests/3.9?ns=registry.k8s.io"
DEBU[0000] fetch response received host=registry.aliyuncs.com response.header.content-length=19 response.header.content-type="text/plain; charset=utf-8" response.header.date="Wed, 12 Jul 2023 14:40:38 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.x-content-type-options=nosniff response.status="404 Not Found" url="https://registry.aliyuncs.com/google_containers/v2/pause/manifests/3.9?ns=registry.k8s.io"
INFO[0000] trying next host - response was http.StatusNotFound host=registry.aliyuncs.com
...

由输出可知,因为实际请求的 URL "https://registry.aliyuncs.com/google_containers/v2/pause/manifests/3.9?ns=registry.k8s.io" 不正确,所以拉取失败。

2.3.2.2 配置二(错误)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 设置 registry.k8s.io 配置
➜ cat > /etc/containerd/certs.d/registry.k8s.io/hosts.toml << EOF
server = "https://registry.k8s.io"

[host."https://registry.aliyuncs.com/v2/google_containers"]
capabilities = ["pull", "resolve"]
EOF
# 测试镜像拉取
➜ ctr --debug images pull --hosts-dir "/etc/containerd/certs.d" registry.k8s.io/pause:3.9

DEBU[0000] fetching image="registry.k8s.io/pause:3.9"
DEBU[0000] loading host directory dir=/etc/containerd/certs.d/registry.k8s.io
DEBU[0000] resolving host=registry.aliyuncs.com
DEBU[0000] do request host=registry.aliyuncs.com request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/1.6.21 request.method=HEAD url="https://registry.aliyuncs.com/v2/google_containers/v2/pause/manifests/3.9?ns=registry.k8s.io"
DEBU[0000] fetch response received host=registry.aliyuncs.com response.header.content-length=169 response.header.content-type="application/json; charset=utf-8" response.header.date="Wed, 12 Jul 2023 15:02:21 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.www-authenticate="Bearer realm=\"https://dockerauth.cn-hangzhou.aliyuncs.com/auth\",service=\"registry.aliyuncs.com:cn-hangzhou:26842\",scope=\"repository:google_containers/v2/pause:pull\"" response.status="401 Unauthorized" url="https://registry.aliyuncs.com/v2/google_containers/v2/pause/manifests/3.9?ns=registry.k8s.io"
DEBU[0000] Unauthorized header="Bearer realm=\"https://dockerauth.cn-hangzhou.aliyuncs.com/auth\",service=\"registry.aliyuncs.com:cn-hangzhou:26842\",scope=\"repository:google_containers/v2/pause:pull\"" host=registry.aliyuncs.com
DEBU[0000] do request host=registry.aliyuncs.com request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/1.6.21 request.method=HEAD url="https://registry.aliyuncs.com/v2/google_containers/v2/pause/manifests/3.9?ns=registry.k8s.io"
DEBU[0000] fetch response received host=registry.aliyuncs.com response.header.content-length=169 response.header.content-type="application/json; charset=utf-8" response.header.date="Wed, 12 Jul 2023 15:02:21 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.www-authenticate="Bearer realm=\"https://dockerauth.cn-hangzhou.aliyuncs.com/auth\",service=\"registry.aliyuncs.com:cn-hangzhou:26842\",scope=\"repository:google_containers/v2/pause:pull\",error=\"insufficient_scope\"" response.status="401 Unauthorized" url="https://registry.aliyuncs.com/v2/google_containers/v2/pause/manifests/3.9?ns=registry.k8s.io"
DEBU[0000] Unauthorized header="Bearer realm=\"https://dockerauth.cn-hangzhou.aliyuncs.com/auth\",service=\"registry.aliyuncs.com:cn-hangzhou:26842\",scope=\"repository:google_containers/v2/pause:pull\",error=\"insufficient_scope\"" host=registry.aliyuncs.com
...

由输出可知,因为实际请求的 URL "https://registry.aliyuncs.com/v2/google_containers/v2/pause/manifests/3.9?ns=registry.k8s.io" 不正确,所以拉取失败。

2.3.2.3 配置三(正确)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 设置 registry.k8s.io 配置
➜ cat > /etc/containerd/certs.d/registry.k8s.io/hosts.toml << EOF
server = "https://registry.k8s.io"

[host."https://registry.aliyuncs.com/v2/google_containers"]
capabilities = ["pull", "resolve"]
override_path = true
EOF
# 测试镜像拉取
➜ ctr --debug images pull --hosts-dir "/etc/containerd/certs.d" registry.k8s.io/pause:3.9

DEBU[0000] fetching image="registry.k8s.io/pause:3.9"
DEBU[0000] loading host directory dir=/etc/containerd/certs.d/registry.k8s.io
DEBU[0000] resolving host=registry.aliyuncs.com
DEBU[0000] do request host=registry.aliyuncs.com request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/1.6.21 request.method=HEAD url="https://registry.aliyuncs.com/v2/google_containers/pause/manifests/3.9?ns=registry.k8s.io"
DEBU[0000] fetch response received host=registry.aliyuncs.com response.header.content-length=166 response.header.content-type="application/json; charset=utf-8" response.header.date="Wed, 12 Jul 2023 15:04:06 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.www-authenticate="Bearer realm=\"https://dockerauth.cn-hangzhou.aliyuncs.com/auth\",service=\"registry.aliyuncs.com:cn-hangzhou:26842\",scope=\"repository:google_containers/pause:pull\"" response.status="401 Unauthorized" url="https://registry.aliyuncs.com/v2/google_containers/pause/manifests/3.9?ns=registry.k8s.io"
DEBU[0000] Unauthorized header="Bearer realm=\"https://dockerauth.cn-hangzhou.aliyuncs.com/auth\",service=\"registry.aliyuncs.com:cn-hangzhou:26842\",scope=\"repository:google_containers/pause:pull\"" host=registry.aliyuncs.com
DEBU[0000] do request host=registry.aliyuncs.com request.header.accept="application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.list.v2+json, application/vnd.oci.image.manifest.v1+json, application/vnd.oci.image.index.v1+json, */*" request.header.user-agent=containerd/1.6.21 request.method=HEAD url="https://registry.aliyuncs.com/v2/google_containers/pause/manifests/3.9?ns=registry.k8s.io"
DEBU[0000] fetch response received host=registry.aliyuncs.com response.header.content-length=2405 response.header.content-type=application/vnd.docker.distribution.manifest.list.v2+json response.header.date="Wed, 12 Jul 2023 15:04:07 GMT" response.header.docker-content-digest="sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097" response.header.docker-distribution-api-version=registry/2.0 response.header.etag="\"sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097\"" response.status="200 OK" url="https://registry.aliyuncs.com/v2/google_containers/pause/manifests/3.9?ns=registry.k8s.io"
DEBU[0000] resolved desc.digest="sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097" host=registry.aliyuncs.com
...

由输出可知,因为实际请求的 URL "https://registry.aliyuncs.com/v2/google_containers/pause/manifests/3.9?ns=registry.k8s.io" 正确,所以拉取成功。

https://github.com/containerd/containerd/blob/main/docs/hosts.md#override_path-field

override_path is used to indicate the host's API root endpoint is defined in the URL path rather than by the API specification. This may be used with non-compliant OCI registries which are missing the /v2 prefix. (Defaults to false)

2.3.3 quay.io

1
2
3
4
5
6
7
8
9
# 创建 quay.io 目录
➜ mkdir -p /etc/containerd/certs.d/quay.io
# 创建 quay.io 仓库配置文件
➜ cat > /etc/containerd/certs.d/quay.io/hosts.toml << EOF
server = "https://quay.io"

[host."https://quay-mirror.qiniu.com"]
capabilities = ["pull", "resolve"]
EOF

2.4 设置私有仓库(可选)

2.4.1 部署私有仓库

在用于测试的 registry 节点部署私有仓库,以测试 containerd 使用私有仓库的场景。

2.4.1.1 安装 docker

参考文档 Install Docker Engine on CentOS,在 registry 节点安装 docker:

  • 切换镜像源

    1
    ➜ wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo

    或者

    1
    2
    # 添加 repository
    ➜ dnf config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
  • 查看当前镜像源中支持的 docker 版本

    1
    ➜ yum list docker-ce --showduplicates
  • 安装特定版本的 docker

    1
    ➜ yum install -y --setopt=obsoletes=0 docker-ce-20.10.24

    --setopt=obsoletes=0 是 yum 包管理器的一个选项,它的作用是禁用软件包依赖关系中的版本升级。

  • 添加配置文件

    docker 在默认情况下使用的 cgroup driver 为 cgroupfs,而 kubernetes 推荐使用 systemd 来替代 cgroupfs。

    1
    2
    3
    4
    5
    6
    7
    ➜ mkdir /etc/docker
    ➜ cat > /etc/docker/daemon.json << EOF
    {
    "exec-opts": ["native.cgroupdriver=systemd"],
    "registry-mirrors": ["https://mirror.baidubce.com", "https://hub-mirror.c.163.com"]
    }
    EOF
  • 启动 dokcer

    1
    2
    3
    ➜ systemctl daemon-reload
    ➜ systemctl enable --now docker
    ➜ systemctl status docker
2.4.1.2 部署 registry

https://docs.docker.com/registry/deploying/

  • 拉取 registry 镜像

    https://hub.docker.com/_/registry

    1
    ➜ docker pull registry:2.8.1
  • 创建 registry 数据目录

    1
    ➜ mkdir /data
  • 运行 registry 服务

    https://yeasy.gitbook.io/docker_practice/repository/registry

    1
    2
    3
    4
    5
    6
    ➜ docker run -d \
    --name registry.local \
    --publish 5000:5000 \
    --restart always \
    --volume /data:/var/lib/registry \
    registry:2.8.1
  • 拉取、推送测试镜像

    1
    2
    3
    4
    5
    6
    7
    8
    # 拉取镜像
    ➜ docker pull redis:alpine
    # 标记镜像
    ➜ docker tag redis:alpine registry.local:5000/redis:alpine
    # 推送镜像
    ➜ docker push registry.local:5000/redis:alpine
    # 查看镜像
    ➜ curl registry.local:5000/v2/_catalog

2.4.2 registry.local

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 创建 registry.local 目录
➜ mkdir -p /etc/containerd/certs.d/registry.local
# 创建 registry.local 仓库配置文件
➜ cat > /etc/containerd/certs.d/registry.local/hosts.toml << EOF
server = "https://registry.local"

[host."http://registry.local:5000"]
capabilities = ["pull", "resolve", "push"]
skip_verify = true
EOF
# 追加 /etc/hosts 配置
➜ cat >> /etc/hosts << EOF
10.128.170.235 registry.local
EOF

2.4.3 测试私有仓库

在任意一个集群节点测试即可:

1
2
# 拉取镜像
➜ ctr --debug images pull --hosts-dir "/etc/containerd/certs.d" registry.local/redis:alpine

需要注意的是:

  • ctr image pull 的时候需要注意镜像名称需要完整,否则无法拉取,格式如下:

    [registry_host_name|IP address][:port][/v2][/org_path]<image_name>[:tag|@DIGEST]

  • 因为 ctr 不使用 CRI,所以默认不会使用 config.toml 中 cri 的配置,如果拉取镜像时希望使用 mirror,则需要指定 --hosts-dir

3. 创建 ca 证书

证书操作如无特殊说明,只需在 master1 节点执行即可。

3.1 安装 cfssl

cfssl 是一款证书签署工具,使用 cfssl 工具可以很简化证书签署过程,方便颁发自签证书。

CloudFlare's distributes cfssl source code on github page and binaries on cfssl website.

Our documentation assumes that you will run cfssl on your local x86_64 Linux host.

https://github.com/cloudflare/cfssl/releases/tag/v1.6.4

1
2
3
4
5
# 下载并重命名
➜ curl -L -o https://download.nuaa.cf/cloudflare/cfssl/releases/download/v1.6.4/cfssl_1.6.4_linux_amd64
➜ curl -L -o /usr/local/bin/cfssljson https://download.nuaa.cf/cloudflare/cfssl/releases/download/v1.6.4/cfssljson_1.6.4_linux_amd64
# 赋予可执行权限
➜ chmod +x /usr/local/bin/{cfssl,cfssljson}

离线安装的情况,直接把两个文件下载下来重命名即可。

3.2 创建 ca 证书

创建的证书统一放到 /etc/kubernetes/ssl 目录,创建后复制到 /etc/kubernetes/pki 目录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# 创建 /etc/kubernetes/ssl 目录
➜ mkdir -p /etc/kubernetes/ssl

# 进入 /etc/kubernetes/ssl 目录
➜ cd /etc/kubernetes/ssl

# ca 证书创建申请
➜ cat > ca-csr.json << EOF
{
"CN": "kubernetes",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "GuangDong",
"L": "ShenZhen",
"O": "k8s",
"OU": "system"
}
],
"ca": {
"expiry": "87600h"
}
}
EOF

# 创建 ca 证书
➜ cfssl gencert -initca ca-csr.json | cfssljson -bare ca

# 验证结果,会生成两个证书文件
➜ ls -lh ca*pem

-rw------- 1 root root 1.7K Jul 13 12:38 ca-key.pem
-rw-r--r-- 1 root root 1.4K Jul 13 12:38 ca.pem

# 复制 ca 证书到 /etc/kubernetes/pki
➜ cp ca*pem /etc/kubernetes/pki

ca-csr.json 这个文件是 Kubernetes 集群中使用的根证书的签名请求 (CSR) 配置文件,用于定义 CA 证书的签名请求配置。

在这个配置文件中,CN 字段指定了证书的通用名称为 "kubernetes",key 字段指定了证书的密钥算法为 RSA,密钥长度为 2048 位。names 字段定义了证书的其他信息,如国家、省份、城市、组织和组织单位等。ca 字段指定了证书的过期时间为 87600 小时(即 10 年)。

这个配置文件用于创建 Kubernetes 集群中的 CA 证书,以便对集群中的其他证书进行签名和认证。

  • CN(Common Name): kube-apiserver 从证书中提取该字段作为请求的用户名 (User Name)
  • names[].O(Organization): kube-apiserver 从证书中提取该字段作为请求用户所属的组 (Group)

由于这里是 CA 证书,是签发其它证书的根证书,这个证书密钥不会分发出去作为 client 证书,所有组件使用的 client 证书都是由 CA 证书签发而来,所以 CA 证书的 CN 和 O 的名称并不重要,后续其它签发出来的证书的 CN 和 O 的名称才是有用的。

3.3 创建签发配置文件

由于各个组件都需要配置证书,并且依赖 CA 证书来签发证书,所以我们首先要生成好 CA 证书以及后续的签发配置文件。

创建用于签发其它证书的配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 进入 /etc/kubernetes/ssl 目录
➜ cd /etc/kubernetes/ssl

# 证书签发配置文件
➜ cat > ca-config.json << EOF
{
"signing": {
"default": {
"expiry": "87600h"
},
"profiles": {
"kubernetes": {
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
],
"expiry": "87600h"
}
}
}
}
EOF

ca-config.json 这个文件是签发其它证书的配置文件,用于定义签名配置和证书配置。其中,signing 字段定义了签名配置,profiles 字段定义了不同场景下的证书配置。

在这个配置文件中,default 配置指定了默认的证书过期时间为 87600 小时(即 10 年),profiles 配置定义了一个名为 "kubernetes" 的证书配置,它指定了证书的用途(签名、密钥加密、服务器认证和客户端认证)和过期时间。

这个配置文件用于创建 Kubernetes 集群中的证书和密钥,以便对集群进行安全认证和加密通信。

  • signing:定义了签名配置,包括默认的签名过期时间和各个证书配置的签名过期时间。
  • profiles:定义了不同场景下的证书配置,包括证书的用途、过期时间和其他属性。

在使用 cfssl gencert 命令生成证书时,可以使用 -config 参数指定配置文件,以便根据配置文件中的规则生成符合要求的证书。如果不指定 -config 参数,则 cfssl gencert 命令将使用默认的配置文件。

4. 部署 etcd

根据 kubeadm 获取的信息,在 Kubernetes 1.27.3 版本中 etcd 使用的版本是 3.5.7,只需在 master 节点(也即 master1 节点)部署即可

https://github.com/etcd-io/etcd/releases/tag/v3.5.7

4.1 颁发证书

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# 进入 /etc/kubernetes/ssl 目录
➜ cd /etc/kubernetes/ssl

# etcd 证书签署申请
# hosts 字段中,IP 为所有 etcd 集群节点地址,这里可以做好规划,预留几个 IP,以备后续扩容。
➜ cat > etcd-csr.json << EOF
{
"CN": "etcd",
"hosts": [
"127.0.0.1",
"10.128.170.21",
"10.128.170.22",
"10.128.170.23",
"localhost",
"master1",
"master2",
"master3",
"master1.local",
"master2.local",
"master3.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "GuangDong",
"L": "ShenZhen",
"O": "k8s",
"OU": "system"
}
]
}
EOF

# 签署 etcd 证书
➜ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes etcd-csr.json | cfssljson -bare etcd

# 验证结果,会生成两个证书文件
➜ ls -lh etcd*pem

-rw------- 1 root root 1.7K Jul 13 23:35 etcd-key.pem
-rw-r--r-- 1 root root 1.6K Jul 13 23:35 etcd.pem

# 复制 etcd 证书到 /etc/kubernetes/pki
➜ cp etcd*pem /etc/kubernetes/pki

4.2 部署 etcd

下载二进制包 https://github.com/etcd-io/etcd/releases/tag/v3.5.7 并解压,将二进制程序 etcdetcdctl 复制到 /usr/local/bin 目录下。

1
2
3
4
5
6
7
8
9
10
11
12
➜ cd ~/Downloads
# 下载
➜ wget -c https://hub.gitmirror.com/https://github.com/etcd-io/etcd/releases/download/v3.5.7/etcd-v3.5.7-linux-amd64.tar.gz
# 解压
➜ tar -zxf etcd-v3.5.7-linux-amd64.tar.gz
# 复制到 master1 节点 /usr/local/bin 目录
➜ cp etcd-v3.5.7-linux-amd64/{etcd,etcdctl} /usr/local/bin
# 查看复制结果
➜ ls -lh /usr/local/bin/etcd*

-rwxr-xr-x 1 root root 22M Jul 13 23:49 /usr/local/bin/etcd
-rwxr-xr-x 1 root root 17M Jul 13 23:49 /usr/local/bin/etcdctl

编写服务配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
➜ mkdir /etc/etcd

➜ cat > /etc/etcd/etcd.conf << EOF
ETCD_NAME="etcd1"
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
ETCD_LISTEN_PEER_URLS="https://10.128.170.21:2380"
ETCD_LISTEN_CLIENT_URLS="https://10.128.170.21:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.128.170.21:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://10.128.170.21:2379"
ETCD_INITIAL_CLUSTER="etcd1=https://10.128.170.21:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
EOF

配置文件解释:

  • ETCD_NAME:节点名称,集群中唯一
  • ETCD_DATA_DIR: 数据保存目录
  • ETCD_LISTEN_PEER_URLS:集群内部通信监听地址
  • ETCD_LISTEN_CLIENT_URLS:客户端访问监听地址
  • ETCD_INITIAL_ADVERTISE_PEER_URLS:集群通告地址
  • ETCD_ADVERTISE_CLIENT_URLS:客户端通告地址
  • ETCD_INITIAL_CLUSTER:集群节点地址列表
  • ETCD_INITIAL_CLUSTER_TOKEN:集群通信 token
  • ETCD_INITIAL_CLUSTER_STATE:加入集群的当前状态,new 是新集群,existing 表示加入已有集群

编写服务启动脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 创建数据目录
➜ mkdir -p /var/lib/etcd

# 创建系统服务
➜ cat > /lib/systemd/system/etcd.service << "EOF"
[Unit]
Description=etcd server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=/etc/etcd/etcd.conf
WorkingDirectory=/var/lib/etcd
ExecStart=/usr/local/bin/etcd \
--cert-file=/etc/kubernetes/pki/etcd.pem \
--key-file=/etc/kubernetes/pki/etcd-key.pem \
--trusted-ca-file=/etc/kubernetes/pki/ca.pem \
--peer-cert-file=/etc/kubernetes/pki/etcd.pem \
--peer-key-file=/etc/kubernetes/pki/etcd-key.pem \
--peer-trusted-ca-file=/etc/kubernetes/pki/ca.pem \
--peer-client-cert-auth \
--client-cert-auth
Restart=on-failure
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF

启动 etcd 服务:

1
2
3
4
5
6
7
8
9
➜ systemctl daemon-reload

➜ systemctl enable --now etcd

# 验证结果
➜ systemctl status etcd

# 查看日志
➜ journalctl -u etcd

5. 部署 kube-apiserver

只需在 master 节点(也即 master1 节点)部署即可。

5.1 颁发证书

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# 进入 /etc/kubernetes/ssl 目录
➜ cd /etc/kubernetes/ssl

# kube-apiserver 证书签署申请
# hosts 字段中,IP 为所有 kube-apiserver 节点地址,这里可以做好规划,预留几个 IP,以备后续扩容。
# 10.96.0.1 是 service 网段的第一个 IP
# kubernetes.default.svc.cluster.local 是 kube-apiserver 的 service 域名
➜ cat > kube-apiserver-csr.json << EOF
{
"CN": "kubernetes",
"hosts": [
"127.0.0.1",
"10.128.170.21",
"10.128.170.22",
"10.128.170.23",
"10.96.0.1",
"localhost",
"master1",
"master2",
"master3",
"master1.local",
"master2.local",
"master3.local",
"kubernetes",
"kubernetes.default",
"kubernetes.default.svc",
"kubernetes.default.svc.cluster",
"kubernetes.default.svc.cluster.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "GuangDong",
"L": "ShenZhen",
"O": "k8s",
"OU": "system"
}
]
}
EOF

# 签署 kube-apiserver 证书
➜ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-apiserver-csr.json | cfssljson -bare kube-apiserver

# 验证结果,会生成两个证书文件
➜ ls -lh kube-apiserver*pem

-rw------- 1 root root 1.7K Jul 14 00:07 kube-apiserver-key.pem
-rw-r--r-- 1 root root 1.8K Jul 14 00:07 kube-apiserver.pem

# 复制 kube-apiserver 证书到 /etc/kubernetes/pki
➜ cp kube-apiserver*pem /etc/kubernetes/pki

5.2 部署 kube-apiserver

编写服务配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# 可以使用 kubeadm 生成示例配置文件,然后进行修改
#➜ kubeadm init phase control-plane apiserver --dry-run -v 4

#...
#[control-plane] Creating static Pod manifest for "kube-apiserver"
#...
#I0717 00:13:59.260175 6660 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/tmp/kubeadm-init-dryrun365914376/kube-apiserver.yaml"
#...

➜ cat > /etc/kubernetes/kube-apiserver.conf << EOF
KUBE_APISERVER_OPTS="--enable-admission-plugins=NamespaceLifecycle,NodeRestriction,LimitRanger,ServiceAccount,DefaultStorageClass,ResourceQuota \
--anonymous-auth=false \
--bind-address=0.0.0.0 \
--secure-port=6443 \
--authorization-mode=Node,RBAC \
--runtime-config=api/all=true \
--enable-bootstrap-token-auth \
--service-cluster-ip-range=10.96.0.0/16 \
--token-auth-file=/etc/kubernetes/token.csv \
--service-node-port-range=30000-32767 \
--tls-cert-file=/etc/kubernetes/pki/kube-apiserver.pem \
--tls-private-key-file=/etc/kubernetes/pki/kube-apiserver-key.pem \
--client-ca-file=/etc/kubernetes/pki/ca.pem \
--kubelet-client-certificate=/etc/kubernetes/pki/kube-apiserver.pem \
--kubelet-client-key=/etc/kubernetes/pki/kube-apiserver-key.pem \
--service-account-key-file=/etc/kubernetes/pki/ca-key.pem \
--service-account-signing-key-file=/etc/kubernetes/pki/ca-key.pem \
--service-account-issuer=https://kubernetes.default.svc.cluster.local \
--etcd-cafile=/etc/kubernetes/pki/ca.pem \
--etcd-certfile=/etc/kubernetes/pki/etcd.pem \
--etcd-keyfile=/etc/kubernetes/pki/etcd-key.pem \
--etcd-servers=https://10.128.170.21:2379 \
--allow-privileged=true \
--apiserver-count=1 \
--audit-log-maxage=30 \
--audit-log-maxbackup=3 \
--audit-log-maxsize=100 \
--audit-log-path=/var/log/kube-apiserver-audit.log \
--event-ttl=1h \
--v=4"
EOF

如果 etcd 是一个集群,则 --etcd-servers 可以添加多个,例如:--etcd-servers=https://10.128.170.21:2379,https://10.128.170.22:2379,https://10.128.170.23:2379

生成 token 文件:

1
2
3
➜ cat > /etc/kubernetes/token.csv << EOF
$(head -c 16 /dev/urandom | od -An -t x | tr -d ' '),kubelet-bootstrap,10001,"system:node-bootstrapper"
EOF

在这个命令中,head -c 16 /dev/urandom | od -An -t x | tr -d ' ' 生成了一个 16 字节的随机字符串,并将其转换为十六进制格式。这个字符串将作为令牌的值。

  • kubelet-bootstrap 是令牌的用户名
  • 10001 是令牌的 UID,
  • system:node-bootstrapper 是令牌的组名。

这些值将用于 kubelet 节点的身份验证和授权。

编写服务启动脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
➜ cat > /usr/lib/systemd/system/kube-apiserver.service << "EOF"
[Unit]
Description=Kubernetes API Server
Documentation=https://github.com/kubernetes/kubernetes
After=network.target network-online.target
Wants=network-online.target

[Service]
Type=notify
EnvironmentFile=/etc/kubernetes/kube-apiserver.conf
ExecStart=/usr/local/bin/kube-apiserver $KUBE_APISERVER_OPTS
Restart=on-failure
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF

启动 kube-apiserver 服务:

1
2
3
4
5
6
7
8
9
➜ systemctl daemon-reload

➜ systemctl enable --now kube-apiserver

# 验证结果
➜ systemctl status kube-apiserver

# 查看日志
➜ journalctl -u kube-apiserver

6. 配置 kubectl

部署完 kube-apiserver 后,就可以配置 kubectl 了,因为 kubectl 可以验证 kube-apiserver 是否已经正常工作了。

只需在 master 节点(也即 master1 节点)配置即可。

6.1 颁发证书

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# 进入 /etc/kubernetes/ssl 目录
➜ cd /etc/kubernetes/ssl

# kubectl 证书签署申请
# O 参数的值必须为 system:masters,因为这是 kube-apiserver 一个内置好的角色,拥有集群管理的权限
➜ cat > kubectl-csr.json << EOF
{
"CN": "clusteradmin",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "GuangDong",
"L": "ShenZhen",
"O": "system:masters",
"OU": "system"
}
]
}
EOF

# 签署 kubectl 证书
➜ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kubectl-csr.json | cfssljson -bare kubectl

# 验证结果,会生成两个证书文件
➜ ls -lh kubectl*pem

-rw------- 1 root root 1.7K Jul 14 00:45 kubectl-key.pem
-rw-r--r-- 1 root root 1.4K Jul 14 00:45 kubectl.pem

6.2 生成配置文件

1
2
3
4
5
6
7
8
9
10
11
➜ kubectl config set-cluster kubernetes --certificate-authority=ca.pem --embed-certs=true --server=https://10.128.170.21:6443 --kubeconfig=kube.config

➜ kubectl config set-credentials clusteradmin --client-certificate=kubectl.pem --client-key=kubectl-key.pem --embed-certs=true --kubeconfig=kube.config

➜ kubectl config set-context kubernetes --cluster=kubernetes --user=clusteradmin --kubeconfig=kube.config

➜ kubectl config use-context kubernetes --kubeconfig=kube.config

➜ mkdir ~/.kube

➜ cp kube.config ~/.kube/config

以上命令用于在本地创建一个 Kubernetes 配置文件 kube.config,并将其复制到 ~/.kube/config 文件中,以便使用 kubectl 命令与 Kubernetes 集群进行交互。

kubectl config set-cluster 命令设置了一个名为 kubernetes 的集群,指定了以下参数:

  • --certificate-authority=ca.pem:指定 CA 证书文件的路径。
  • --embed-certs=true:将 CA 证书嵌入到配置文件中。
  • --server=https://10.128.170.21:6443:指定 API Server 的地址和端口。
  • --kubeconfig=kube.config:指定要写入的配置文件路径。

这些参数将用于创建一个名为 kubernetes 的集群配置,并将其写入到 kube.config 文件中。

kubectl config set-credentials 命令设置了一个名为 clusteradmin 的用户,指定了以下参数:

  • --client-certificate=kubectl.pem:指定客户端证书文件的路径。
  • --client-key=kubectl-key.pem:指定客户端私钥文件的路径。
  • --embed-certs=true:将客户端证书和私钥嵌入到配置文件中。
  • --kubeconfig=kube.config:指定要写入的配置文件路径。

这些参数将用于创建一个名为 clusteradmin 的用户配置,并将其写入到 kube.config 文件中。

kubectl config set-context 命令设置了一个名为 kubernetes 的上下文,指定了以下参数:

  • --cluster=kubernetes:指定要使用的集群。
  • --user=clusteradmin:指定要使用的用户。
  • --kubeconfig=kube.config:指定要写入的配置文件路径。

这些参数将用于创建一个名为 kubernetes 的上下文配置,并将其写入到 kube.config 文件中。

kubectl config use-context 命令将当前上下文设置为 kubernetes,指定了以下参数:

  • --kubeconfig=kube.config:指定要使用的配置文件路径。

这个命令将当前上下文设置为 kubernetes,以便 kubectl 命令可以使用 kube.config 文件与 Kubernetes 集群进行交互。

6.3 获取集群信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
➜ kubectl cluster-info

Kubernetes control plane is running at https://10.128.170.21:6443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

➜ kubectl get all -A

NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22m

➜ kubectl get cs

Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Unhealthy Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused
scheduler Unhealthy Get "https://127.0.0.1:10259/healthz": dial tcp 127.0.0.1:10259: connect: connection refused
etcd-0 Healthy {"health":"true","reason":""}

6.4 设置 kubectl 自动补全

查看 kubectl 命令自动补全帮助:

1
2
3
4
5
6
7
➜ kubectl completion --help
````

安装 bash-completion:

```shell
➜ yum install -y bash-completion

设置 kubectl 自动补全配置:

1
2
3
4
5
6
7
➜ echo "source <(kubectl completion bash)" >> ~/.bashrc
````

使配置生效:

```shell
➜ source ~/.bashrc

7. 部署 kube-controller-manager

只需在 master 节点(也即 master1 节点)部署即可。

7.1 颁发证书

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# 进入 /etc/kubernetes/ssl 目录
➜ cd /etc/kubernetes/ssl

# kube-controller-manager 证书签署申请
# hosts 字段中,IP 为所有节点地址,这里可以做好规划,预留几个 IP,以备后续扩容。
➜ cat > kube-controller-manager-csr.json << EOF
{
"CN": "system:kube-controller-manager",
"hosts": [
"127.0.0.1",
"10.128.170.21",
"10.128.170.22",
"10.128.170.23",
"10.128.170.131",
"10.128.170.132",
"10.128.170.133",
"10.128.170.134",
"10.128.170.135",
"localhost",
"master1",
"master2",
"master3",
"worker1",
"worker2",
"worker3",
"worker4",
"worker5",
"master1.local",
"master2.local",
"master3.local",
"worker1.local",
"worker2.local",
"worker3.local",
"worker4.local",
"worker5.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "GuangDong",
"L": "ShenZhen",
"O": "system:kube-controller-manager",
"OU": "system"
}
]
}
EOF

# 签署 kube-controller-manager 证书
➜ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager

# 验证结果,会生成两个证书文件
➜ ls -lh kube-controller-manager*pem

-rw------- 1 root root 1.7K Jul 14 00:55 kube-controller-manager-key.pem
-rw-r--r-- 1 root root 1.8K Jul 14 00:55 kube-controller-manager.pem

# 复制 kube-controler-manager 证书到 /etc/kubernetes/pki
➜ cp kube-controller-manager*pem /etc/kubernetes/pki

system:kube-controller-manager 是 Kubernetes 中的一个预定义 RBAC 角色,用于授权 kube-controller-manager 组件对 Kubernetes API 的访问。详细介绍请参考官方文档:https://kubernetes.io/zh-cn/docs/reference/access-authn-authz/rbac/#default-roles-and-role-bindings

7.2 部署 kube-controller-manager

编写服务配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 可以使用 kubeadm 生成示例配置文件,然后进行修改
#➜ kubeadm init phase control-plane controller-manager --dry-run -v 4

#...
#[control-plane] Creating static Pod manifest for "kube-controller-manager"
#...
#I0717 00:18:23.798277 6694 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/tmp/kubeadm-init-dryrun963226442/kube-controller-manager.yaml"
#...

➜ cat > /etc/kubernetes/kube-controller-manager.conf << EOF
KUBE_CONTROLLER_MANAGER_OPTS="--secure-port=10257 \
--kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \
--service-cluster-ip-range=10.96.0.0/16 \
--cluster-name=kubernetes \
--cluster-signing-cert-file=/etc/kubernetes/pki/ca.pem \
--cluster-signing-key-file=/etc/kubernetes/pki/ca-key.pem \
--cluster-signing-duration=87600h \
--tls-cert-file=/etc/kubernetes/pki/kube-controller-manager.pem \
--tls-private-key-file=/etc/kubernetes/pki/kube-controller-manager-key.pem \
--service-account-private-key-file=/etc/kubernetes/pki/ca-key.pem \
--root-ca-file=/etc/kubernetes/pki/ca.pem \
--leader-elect=true \
--controllers=*,bootstrapsigner,tokencleaner \
--use-service-account-credentials=true \
--horizontal-pod-autoscaler-sync-period=10s \
--allocate-node-cidrs=true \
--cluster-cidr=10.240.0.0/12 \
--v=4"
EOF

生成 kubeconfig:

1
2
3
4
5
6
7
8
9
➜ kubectl config set-cluster kubernetes --certificate-authority=ca.pem --embed-certs=true --server=https://10.128.170.21:6443 --kubeconfig=kube-controller-manager.kubeconfig

➜ kubectl config set-credentials kube-controller-manager --client-certificate=kube-controller-manager.pem --client-key=kube-controller-manager-key.pem --embed-certs=true --kubeconfig=kube-controller-manager.kubeconfig

➜ kubectl config set-context default --cluster=kubernetes --user=kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig

➜ kubectl config use-context default --kubeconfig=kube-controller-manager.kubeconfig

➜ cp kube-controller-manager.kubeconfig /etc/kubernetes/

编写服务启动脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
➜ cat > /usr/lib/systemd/system/kube-controller-manager.service << "EOF"
[Unit]
Description=Kubernetes controller manager
Documentation=https://github.com/kubernetes/kubernetes
After=network.target network-online.target
Wants=network-online.target

[Service]
EnvironmentFile=/etc/kubernetes/kube-controller-manager.conf
ExecStart=/usr/local/bin/kube-controller-manager $KUBE_CONTROLLER_MANAGER_OPTS
Restart=on-failure
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF

启动 kube-controller-manager 服务:

1
2
3
4
5
6
7
8
9
➜ systemctl daemon-reload

➜ systemctl enable --now kube-controller-manager

# 验证结果
➜ systemctl status kube-controller-manager

# 查看日志
➜ journalctl -u kube-controller-manager

查看组件状态:

1
2
3
4
5
6
7
➜ kubectl get cs

Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get "https://127.0.0.1:10259/healthz": dial tcp 127.0.0.1:10259: connect: connection refused
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}

8. 部署 kube-scheduler

只需在 master 节点(也即 master1 节点)部署即可。

8.1 颁发证书

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# 进入 /etc/kubernetes/ssl 目录
➜ cd /etc/kubernetes/ssl

# kube-scheduler 证书签署申请
# hosts 字段中,IP 为所有节点地址,这里可以做好规划,预留几个 IP,以备后续扩容。
➜ cat > kube-scheduler-csr.json << EOF
{
"CN": "system:kube-scheduler",
"hosts": [
"127.0.0.1",
"10.128.170.21",
"10.128.170.22",
"10.128.170.23",
"10.128.170.131",
"10.128.170.132",
"10.128.170.133",
"10.128.170.134",
"10.128.170.135",
"localhost",
"master1",
"master2",
"master3",
"worker1",
"worker2",
"worker3",
"worker4",
"worker5",
"master1.local",
"master2.local",
"master3.local",
"worker1.local",
"worker2.local",
"worker3.local",
"worker4.local",
"worker5.local"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "GuangDong",
"L": "ShenZhen",
"O": "system:kube-scheduler",
"OU": "system"
}
]
}
EOF

# 签署 kube-scheduler 证书
➜ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler

# 验证结果,会生成两个证书文件
➜ ls -lh kube-scheduler*pem

-rw------- 1 root root 1.7K Jul 14 01:06 kube-scheduler-key.pem
-rw-r--r-- 1 root root 1.8K Jul 14 01:06 kube-scheduler.pem

# 复制 kube-scheduler 证书到 /etc/kubernetes/pki
➜ cp kube-scheduler*pem /etc/kubernetes/pki

8.2 部署 kube-scheduler

编写服务配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 可以使用 kubeadm 生成示例配置文件,然后进行修改
#➜ kubeadm init phase control-plane scheduler --dry-run -v 4

#...
#[control-plane] Creating static Pod manifest for "kube-scheduler"
#...
#I0717 00:26:08.548412 6903 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/tmp/kubeadm-init-dryrun1609159078/kube-scheduler.yaml"
#...

➜ cat > /etc/kubernetes/kube-scheduler.conf << EOF
KUBE_SCHEDULER_OPTS="--bind-address=127.0.0.1 \
--kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \
--leader-elect=true \
--v=4"
EOF

生成 kubeconfig

1
2
3
4
5
6
7
8
9
➜ kubectl config set-cluster kubernetes --certificate-authority=ca.pem --embed-certs=true --server=https://10.128.170.21:6443 --kubeconfig=kube-scheduler.kubeconfig

➜ kubectl config set-credentials kube-scheduler --client-certificate=kube-scheduler.pem --client-key=kube-scheduler-key.pem --embed-certs=true --kubeconfig=kube-scheduler.kubeconfig

➜ kubectl config set-context default --cluster=kubernetes --user=kube-scheduler --kubeconfig=kube-scheduler.kubeconfig

➜ kubectl config use-context default --kubeconfig=kube-scheduler.kubeconfig

➜ cp kube-scheduler.kubeconfig /etc/kubernetes/

编写服务启动脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
➜ cat > /usr/lib/systemd/system/kube-scheduler.service << "EOF"
[Unit]
Description=Kubernetes scheduler
Documentation=https://github.com/kubernetes/kubernetes
After=network.target network-online.target
Wants=network-online.target

[Service]
EnvironmentFile=/etc/kubernetes/kube-scheduler.conf
ExecStart=/usr/local/bin/kube-scheduler $KUBE_SCHEDULER_OPTS
Restart=on-failure
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF

启动 kube-scheduler 服务:

1
2
3
4
5
6
7
8
9
➜ systemctl daemon-reload

➜ systemctl enable --now kube-scheduler

# 验证结果
➜ systemctl status kube-scheduler

# 查看日志
➜ journalctl -u kube-scheduler

查看组件状态:

1
2
3
4
5
6
7
➜ kubectl get cs

Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true","reason":""}

9. 部署 kubelet

先在 master 节点完成部署,后续添加 worker 节点时从 master 节点复制配置并调整即可。

master 节点上部署 kubelet 是可选的,一旦部署 kubelet,master 节点也可以运行 Pod,如果不希望 master 节点上运行 Pod,则可以给 master 节点打上污点。

master 节点部署 kubelet 是有好处的,一是可以通过诸如 kubectl get node 等命令查看节点信息,二是可以在上面部署监控系统,日志采集系统等。

9.1 授权 kubelet 允许请求证书

授权 kubelet-bootstrap 用户允许请求证书:

1
2
3
4
5
6
7
8
# 进入 /etc/kubernetes/ssl 目录
➜ cd /etc/kubernetes/ssl

➜ kubectl create clusterrolebinding kubelet-bootstrap \
--clusterrole=system:node-bootstrapper \
--user=kubelet-bootstrap

clusterrolebinding.rbac.authorization.k8s.io/kubelet-bootstrap created

9.2 部署 kubelet

编写服务配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# 可以使用 kubeadm 生成示例配置文件,然后进行修改
#➜ kubeadm init phase kubelet-start --dry-run

#[kubelet-start] Writing kubelet environment file with flags to file "/etc/kubernetes/tmp/kubeadm-init-dryrun3282605628/kubeadm-flags.env"
#[kubelet-start] Writing kubelet configuration to file "/etc/kubernetes/tmp/kubeadm-init-dryrun3282605628/config.yaml"

# https://github.com/kubernetes-sigs/sig-windows-tools/issues/323
# https://github.com/kubernetes/kubernetes/pull/118544
➜ cat > /etc/kubernetes/kubelet.conf << EOF
KUBELET_OPTS="--bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig \
--config=/etc/kubernetes/kubelet.yaml \
--kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
--cert-dir=/etc/kubernetes/pki \
--container-runtime-endpoint=unix:///var/run/containerd/containerd.sock \
--pod-infra-container-image=registry.k8s.io/pause:3.9 \
--v=4
EOF

➜ cat > /etc/kubernetes/kubelet.yaml << EOF
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: 0.0.0.0
port: 10250
readOnlyPort: 0
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 2m0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.pem
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 5m0s
cacheUnauthorizedTTL: 30s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
healthzBindAddress: 127.0.0.1
healthzPort: 10248
rotateCertificates: true
evictionHard:
imagefs.available: 15%
memory.available: 100Mi
nodefs.available: 10%
nodefs.inodesFree: 5%
maxOpenFiles: 1000000
maxPods: 110
EOF

生成 kubeconfig:

1
2
3
4
5
6
7
8
9
➜ kubectl config set-cluster kubernetes --certificate-authority=ca.pem --embed-certs=true --server=https://10.128.170.21:6443 --kubeconfig=kubelet-bootstrap.kubeconfig

➜ kubectl config set-credentials kubelet-bootstrap --token=$(awk -F, '{print $1}' /etc/kubernetes/token.csv) --kubeconfig=kubelet-bootstrap.kubeconfig

➜ kubectl config set-context default --cluster=kubernetes --user=kubelet-bootstrap --kubeconfig=kubelet-bootstrap.kubeconfig

➜ kubectl config use-context default --kubeconfig=kubelet-bootstrap.kubeconfig

➜ cp kubelet-bootstrap.kubeconfig /etc/kubernetes/

编写服务启动脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
➜ cat > /usr/lib/systemd/system/kubelet.service << "EOF"
[Unit]
Description=Kubernetes kubelet
After=network.target network-online.targer containerd.service
Requires=containerd.service

[Service]
EnvironmentFile=/etc/kubernetes/kubelet.conf
ExecStart=/usr/local/bin/kubelet $KUBELET_OPTS
Restart=on-failure
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF

启动 kubelet 服务:

1
2
3
4
5
6
7
8
9
➜ systemctl daemon-reload

➜ systemctl enable --now kubelet

# 验证结果
➜ systemctl status kubelet

# 查看日志
➜ journalctl -u kubelet

批准节点加入集群:

1
2
3
4
5
6
7
8
9
10
11
12
13
➜ kubectl get csr

NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-h92vn 59s kubernetes.io/kube-apiserver-client-kubelet kubelet-bootstrap <none> Pending

➜ kubectl certificate approve csr-h92vn

certificatesigningrequest.certificates.k8s.io/csr-h92vn approved

➜ kubectl get csr

NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-h92vn 2m27s kubernetes.io/kube-apiserver-client-kubelet kubelet-bootstrap <none> Approved,Issued

查看节点:

1
2
3
4
5
6
➜ kubectl get node

NAME STATUS ROLES AGE VERSION
master1 NotReady <none> 71s v1.27.3

# 此时节点状态还是 NotReady,因为还没有安装网络插件,正确安装网络插件后,状态会变为 Ready.

10. 部署 kube-proxy

先在 master 节点完成部署,后续添加 worker 节点时从 master 节点复制配置并调整即可。

10.1 颁发证书

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# 进入 /etc/kubernetes/ssl 目录
➜ cd /etc/kubernetes/ssl

# kube-proxy 证书签署申请
➜ cat > kube-proxy-csr.json << EOF
{
"CN": "system:kube-proxy",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "GuangDong",
"L": "ShenZhen",
"O": "k8s",
"OU": "system"
}
]
}
EOF

# 签署 kube-proxy 证书
➜ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy

# 验证结果,会生成两个证书文件
➜ ls -lh kube-proxy*pem

-rw------- 1 root root 1.7K Jul 15 18:36 kube-proxy-key.pem
-rw-r--r-- 1 root root 1.4K Jul 15 18:36 kube-proxy.pem

# 复制 kube-proxy 证书到 /etc/kubernetes/pki
➜ cp kube-proxy*pem /etc/kubernetes/pki

10.2 部署 kube-proxy

编写服务配置文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
➜ cat > /etc/kubernetes/kube-proxy.conf << EOF
KUBE_PROXY_OPTS="--config=/etc/kubernetes/kube-proxy.yaml \
--v=4
EOF

➜ cat > /etc/kubernetes/kube-proxy.yaml << EOF
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
clientConnection:
kubeconfig: /etc/kubernetes/kube-proxy.kubeconfig
bindAddress: 0.0.0.0
clusterCIDR: 10.240.0.0/12
healthzBindAddress: 0.0.0.0:10256
metricsBindAddress: 0.0.0.0:10249
mode: ipvs
ipvs:
scheduler: "rr"
EOF

生成 kubeconfig:

1
2
3
4
5
6
7
8
9
➜ kubectl config set-cluster kubernetes --certificate-authority=ca.pem --embed-certs=true --server=https://10.128.170.21:6443 --kubeconfig=kube-proxy.kubeconfig

➜ kubectl config set-credentials kube-proxy --client-certificate=kube-proxy.pem --client-key=kube-proxy-key.pem --embed-certs=true --kubeconfig=kube-proxy.kubeconfig

➜ kubectl config set-context default --cluster=kubernetes --user=kube-proxy --kubeconfig=kube-proxy.kubeconfig

➜ kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig

➜ cp kube-proxy.kubeconfig /etc/kubernetes/

编写服务启动脚本:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
➜ cat > /usr/lib/systemd/system/kube-proxy.service << "EOF"
[Unit]
Description=Kubernetes Proxy
Documentation=https://github.com/kubernetes/kubernetes
After=network.target network-online.target
Wants=network-online.target

[Service]
EnvironmentFile=-/etc/kubernetes/kube-proxy.conf
ExecStart=/usr/local/bin/kube-proxy $KUBE_PROXY_OPTS
Restart=on-failure
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
EOF

启动 kube-proxy 服务:

1
2
3
4
5
6
7
8
9
➜ systemctl daemon-reload

➜ systemctl enable --now kube-proxy

# 验证结果
➜ systemctl status kube-proxy

# 查看日志
➜ journalctl -u kube-proxy

11. 部署集群网络

只需在 master 节点(也即 master1 节点)部署即可。

11.1 部署 calico

参考文档 Install Calico networking and network policy for on-premises deployments

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
➜ cd ~/Downloads

➜ curl -k https://cdn.jsdelivr.net/gh/projectcalico/calico@v3.26.1/manifests/calico.yaml -O

# 找到 CALICO_IPV4POOL_CIDR 变量,取消注释并修改 Pod IP 地址段
➜ sed -i 's/# \(- name: CALICO_IPV4POOL_CIDR\)/\1/' calico.yaml
➜ sed -i 's/# value: "192.168.0.0\/16"/ value: "10.240.0.0\/12"/' calico.yaml

➜ kubectl apply -f calico.yaml

poddisruptionbudget.policy/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
serviceaccount/calico-node created
serviceaccount/calico-cni-plugin created
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgpfilters.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/caliconodestatuses.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipreservations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrole.rbac.authorization.k8s.io/calico-cni-plugin created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-cni-plugin created
daemonset.apps/calico-node created
deployment.apps/calico-kube-controllers created

# 查看网络 pod
➜ kubectl -n kube-system get pod

NAME READY STATUS RESTARTS AGE
calico-kube-controllers-85578c44bf-7v87p 1/1 Running 0 25m
calico-node-v924z 1/1 Running 0 25m

# 查看 node 状态
➜ kubectl get node

NAME STATUS ROLES AGE VERSION
master1 Ready <none> 30m v1.27.3

# 查看 ipvs 模式
➜ ipvsadm -Ln

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.96.0.1:443 rr
-> 10.128.170.21:6443 Masq 1 5 0

如果 node 状态仍然是 NotReady,基本上是镜像未拉取完成或拉取失败导致的,如果一段时间后仍拉取失败,则尝试手动拉取镜像。

11.2 授权 kube-apiserver 访问 kubelet

Using RBAC Authorization

应用场景:例如 kubectl exec/run/logs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
➜ cd ~/Downloads

➜ cat > apiserver-to-kubelet-rbac.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:kube-apiserver-to-kubelet
rules:
- apiGroups:
- ""
resources:
- nodes/proxy
- nodes/stats
- nodes/log
- nodes/spec
- nodes/metrics
- pods/log
verbs:
- "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kube-apiserver
namespace: ""
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-apiserver-to-kubelet
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: kubernetes
EOF

➜ kubectl apply -f apiserver-to-kubelet-rbac.yaml

clusterrole.rbac.authorization.k8s.io/system:kube-apiserver-to-kubelet created
clusterrolebinding.rbac.authorization.k8s.io/system:kube-apiserver created

➜ kubectl -n kube-system logs calico-kube-controllers-85578c44bf-7v87p

11.3 部署 coredns

https://github.com/coredns/deployment/blob/master/kubernetes/coredns.yaml.sed

coredns.yaml.sed 原始文件见附录章节 "16.1 coredns.yaml.sed",该 yaml 指定使用的 coredns 的版本是 1.9.4。

1
2
3
4
➜ cd ~/Downloads

# 下载 yaml 文件
➜ curl https://raw.kgithub.com/coredns/deployment/master/kubernetes/coredns.yaml.sed -o coredns.yaml

修改配置:

1
2
3
4
5
6
7
8
9
10
11
12
# "coredns/coredns:1.9.4" 替换为 "coredns/coredns:1.10.1"
➜ sed -i 's/coredns\/coredns:1.9.4/coredns\/coredns:1.10.1/g' coredns.yaml
# "CLUSTER_DOMAIN" 替换为 "cluster.local"
➜ sed -i 's/CLUSTER_DOMAIN/cluster.local/g' coredns.yaml
# "REVERSE_CIDRS" 替换为 "in-addr.arpa ip6.arpa"
➜ sed -i 's/REVERSE_CIDRS/in-addr.arpa ip6.arpa/g' coredns.yaml
# "UPSTREAMNAMESERVER" 替换为 "/etc/resolv.conf"(或当前网络所使用的 DNS 地址)
➜ sed -i 's/UPSTREAMNAMESERVER/\/etc\/resolv.conf/g' coredns.yaml
# "STUBDOMAINS" 替换为 ""
➜ sed -i 's/STUBDOMAINS//g' coredns.yaml
# "CLUSTER_DNS_IP" 替换为 "10.96.0.10"(与 kubelet.yaml 配置的 clusterDNS 一致)
➜ sed -i 's/CLUSTER_DNS_IP/10.96.0.10/g' coredns.yaml

安装:

1
2
3
4
5
6
7
8
➜ kubectl apply -f coredns.yaml

serviceaccount/coredns created
clusterrole.rbac.authorization.k8s.io/system:coredns created
clusterrolebinding.rbac.authorization.k8s.io/system:coredns created
configmap/coredns created
deployment.apps/coredns created
service/kube-dns created

验证(如果 calico 的 pod 未就绪,请检查是否是镜像拉取未完成或镜像拉取失败)

1
2
3
4
5
6
7
8
9
10
11
12
13
➜ kubectl -n kube-system get deploy,pod,svc

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/calico-kube-controllers 1/1 1 1 23h
deployment.apps/coredns 1/1 1 1 43s

NAME READY STATUS RESTARTS AGE
pod/calico-kube-controllers-85578c44bf-7v87p 1/1 Running 0 23h
pod/calico-node-v924z 1/1 Running 0 23h
pod/coredns-db5667c87-zg9s8 1/1 Running 0 41s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 43s

dig 测试:

1
2
3
4
5
6
7
➜ yum -y install bind-utils

➜ dig -t A www.baidu.com @10.96.0.10 +short

www.a.shifen.com.
14.119.104.254
14.119.104.189

pod 测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
➜ kubectl run -it --rm --image=busybox:1.28.3 -- sh

If you don't see a command prompt, try pressing enter.
/ # cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
/ # nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name: kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
/ # ping -c 4 www.baidu.com
PING www.baidu.com (14.119.104.254): 56 data bytes
64 bytes from 14.119.104.254: seq=0 ttl=127 time=14.193 ms
64 bytes from 14.119.104.254: seq=1 ttl=127 time=12.848 ms
64 bytes from 14.119.104.254: seq=2 ttl=127 time=18.553 ms
64 bytes from 14.119.104.254: seq=3 ttl=127 time=23.581 ms

--- www.baidu.com ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 12.848/17.293/23.581 ms

12. 添加 worker 节点

worker 节点需要部署两个组件 kubelet, kube-proxy

在 master 节点执行,从 master 节点上复制以下文件到 worker 节点:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
➜ scp /etc/kubernetes/pki/ca.pem \
/etc/kubernetes/pki/kube-proxy.pem \
/etc/kubernetes/pki/kube-proxy-key.pem \
root@worker1:/etc/kubernetes/pki/

➜ scp /etc/kubernetes/kubelet.conf \
/etc/kubernetes/kubelet.yaml \
/etc/kubernetes/kubelet-bootstrap.kubeconfig \
/etc/kubernetes/kube-proxy.conf \
/etc/kubernetes/kube-proxy.yaml \
/etc/kubernetes/kube-proxy.kubeconfig \
root@worker1:/etc/kubernetes/

➜ scp /usr/lib/systemd/system/kubelet.service \
/usr/lib/systemd/system/kube-proxy.service \
root@worker1:/usr/lib/systemd/system/

➜ scp /usr/local/bin/kubelet \
/usr/local/bin/kube-proxy \
root@worker1:/usr/local/bin/

在 worker 节点执行,worker 节点启动 kube-proxy 服务:

1
2
3
4
5
6
7
8
9
➜ systemctl daemon-reload

➜ systemctl enable --now kube-proxy

# 验证结果
➜ systemctl status kube-proxy

# 查看日志
➜ journalctl -u kube-proxy

在 worker 节点执行,worker 节点启动 kubelet 服务:

1
2
3
4
5
6
7
8
9
➜ systemctl daemon-reload

➜ systemctl enable --now kubelet

# 验证结果
➜ systemctl status kubelet

# 查看日志
➜ journalctl -u kubelet

(master 节点执行)批准 worker 节点加入集群

1
2
3
4
5
6
7
8
9
10
11
12
➜ kubectl get csr
NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-9twkl 85s kubernetes.io/kube-apiserver-client-kubelet kubelet-bootstrap <none> Pending

➜ kubectl certificate approve csr-9twkl

certificatesigningrequest.certificates.k8s.io/csr-9twkl approved

➜ kubectl get csr

NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
csr-9twkl 2m15s kubernetes.io/kube-apiserver-client-kubelet kubelet-bootstrap <none> Approved,Issued

在 master 节点执行,查看节点:

1
2
3
4
5
➜ kubectl get node

NAME STATUS ROLES AGE VERSION
master1 Ready <none> 47h v1.27.3
worker1 Ready <none> 26m v1.27.3

如果 worker1 的状态仍是 NotReady,请检查是否是镜像拉取未完成或镜像拉取失败。

13. 禁止 master 节点运行 pod

至此 1 master 1 worker 的 k8s 二进制集群已搭建完毕。

此外,还可以给节点打上角色标签,使得查看节点信息更加直观:

1
2
3
4
5
6
7
8
9
10
11
# 给 master 节点打上 master,etcd 角色标签
➜ kubectl label node master1 node-role.kubernetes.io/master=true node-role.kubernetes.io/etcd=true

# 给 worker 节点打上 worker 角色标签
➜ kubectl label node worker1 node-role.kubernetes.io/worker=true

# 查看标签
➜ kubectl get node --show-labels

# 删除标签
#➜ kubectl label node master1 node-role.kubernetes.io/etcd-

如果不希望 master 节点运行 Pod,则给 master 打上污点:

1
2
3
4
5
6
7
8
9
10
11
12
13
# 添加污点
➜ kubectl taint node master1 node-role.kubernetes.io/master=true:NoSchedule

# 查看污点
➜ kubectl describe node master1 | grep Taints

Taints: node-role.kubernetes.io/master=true:NoSchedule

# 查看全部节点的污点
#➜ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{.spec.taints}{"\n\n"}{end}'

# 删除污点
#➜ kubectl taint node master1 node-role.kubernetes.io/master-

后续可以新增 2 个 etcd 节点组成 etcd 集群,新增 2 个控制平面,避免单点故障。

14. 测试应用服务部署

创建 namespace:

1
2
3
4
5
6
7
8
9
10
11
12
➜ kubectl create namespace dev

namespace/dev created

➜ kubectl get namespace

NAME STATUS AGE
default Active 2d
dev Active 7m
kube-node-lease Active 2d
kube-public Active 2d
kube-system Active 2d

创建 deployment:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
➜ mkdir -p /etc/kubernetes/demo

➜ cat > /etc/kubernetes/demo/nginx-deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: dev
spec:
replicas: 1
selector:
matchLabels:
app: nginx-pod
template:
metadata:
labels:
app: nginx-pod
spec:
containers:
- name: nginx
image: nginx:latest
EOF

➜ kubectl apply -f /etc/kubernetes/demo/nginx-deployment.yaml

deployment.apps/nginx-deployment created

➜ kubectl -n dev get pod

NAME READY STATUS RESTARTS AGE
nginx-deployment-6d76bcb866-zl52j 1/1 Running 0 23s

创建 service:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
➜ cat > /etc/kubernetes/demo/nginx-service.yaml << EOF
apiVersion: v1
kind: Service
metadata:
name: nginx-service
namespace: dev
spec:
selector:
app: nginx-pod
type: NodePort
ports:
- port: 80
targetPort: 80
nodePort: 30001
EOF

➜ kubectl apply -f /etc/kubernetes/demo/nginx-service.yaml

service/nginx-service created

➜ kubectl -n dev get svc

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-service NodePort 10.96.195.221 <none> 80:30001/TCP 13s

测试服务访问:

1
2
3
4
5
6
7
8
9
10
11
12
➜ curl 10.128.170.21:30001 -I

HTTP/1.1 200 OK
Server: nginx/1.25.1
Date: Tue, 18 Jul 2023 16:56:37 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 13 Jun 2023 15:08:10 GMT
Connection: keep-alive
ETag: "6488865a-267"
Accept-Ranges: bytes

15. 部署 Dashboard

在 Kubernetes 社区中,有一个很受欢迎的 Dashboard 项目,它可以给用户提供一个可视化的 Web 界面来查看当前集群的各种信息。用户可以用 Kubernetes Dashboard 部署容器化的应用、监控应用的状态、执行故障排查任务以及管理 Kubernetes 各种资源。

官方参考文档:

使用 nodeport 方式将 dashboard 服务暴露在集群外,指定使用 30443 端口。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
➜ cd ~/Downloads

# 下载相关 yaml 文件
# https://github.com/kubernetes/dashboard/blob/v2.7.0/aio/deploy/recommended.yaml
➜ curl https://fastly.jsdelivr.net/gh/kubernetes/dashboard@v2.7.0/aio/deploy/recommended.yaml -o kubernetes-dashboard.yaml

# 修改 Service 部分
➜ vim kubernetes-dashboard.yaml
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kubernetes-dashboard
spec:
type: NodePort # 新增
ports:
- port: 443
targetPort: 8443
nodePort: 30443 # 新增
selector:
k8s-app: kubernetes-dashboard

# 部署
➜ kubectl apply -f kubernetes-dashboard.yaml

namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/dashboard-metrics-scraper created

# 查看 kubernetes-dashboard 下的资源
➜ kubectl -n kubernetes-dashboard get deploy

NAME READY UP-TO-DATE AVAILABLE AGE
dashboard-metrics-scraper 1/1 1 1 5m26s
kubernetes-dashboard 1/1 1 1 5m28s

➜ kubectl -n kubernetes-dashboard get pod

NAME READY STATUS RESTARTS AGE
dashboard-metrics-scraper-5cb4f4bb9c-8qpbc 1/1 Running 0 6m22s
kubernetes-dashboard-6967859bff-fvhkv 1/1 Running 0 6m22s

➜ kubectl get svc -n kubernetes-dashboard

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.96.6.178 <none> 8000/TCP 6m37s
kubernetes-dashboard NodePort 10.96.8.70 <none> 443:30443/TCP 6m41s

如果 kubernetes-dashboard 下的资源一直未就绪,请检查是否是正在拉取镜像或者镜像一直拉取失败。

例如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
➜ kubectl -n kubernetes-dashboard describe pod kubernetes-dashboard-546cbc58cd-hzvhr

...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m20s default-scheduler Successfully assigned kubernetes-dashboard/kubernetes-dashboard-546cbc58cd-hzvhr to worker1
Normal Pulling 6m20s kubelet Pulling image "kubernetesui/dashboard:v2.5.0"

➜ kubectl -n kubernetes-dashboard describe pod kubernetes-dashboard-546cbc58cd-hzvhr

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned kubernetes-dashboard/kubernetes-dashboard-546cbc58cd-hzvhr to worker1
Warning Failed 2m1s kubelet Failed to pull image "kubernetesui/dashboard:v2.5.0": rpc error: code = Unknown desc = dial tcp 104.18.124.25:443: i/o timeout
Warning Failed 2m1s kubelet Error: ErrImagePull
Normal SandboxChanged 2m kubelet Pod sandbox changed, it will be killed and re-created.
Normal BackOff 118s (x3 over 2m) kubelet Back-off pulling image "kubernetesui/dashboard:v2.5.0"
Warning Failed 118s (x3 over 2m) kubelet Error: ImagePullBackOff
Normal Pulling 106s (x2 over 10m) kubelet Pulling image "kubernetesui/dashboard:v2.5.0"
Normal Pulled 25s kubelet Successfully pulled image "kubernetesui/dashboard:v2.5.0" in 1m21.608630166s
Normal Created 22s kubelet Created container kubernetes-dashboard
Normal Started 21s kubelet Started container kubernetes-dashboard

创建 service account 并绑定默认 cluster-admin 管理员集群角色:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# 下面创建了一个叫 admin-user 的服务账号,放在 kubernetes-dashboard 命名空间下,并将 cluster-admin 角色绑定到 admin-user 账户,这样 admin-user 账户就有了管理员的权限。
# 默认情况下,kubeadm 创建集群时已经创建了 cluster-admin 角色,我们直接绑定即可。
➜ cat > dashboard-admin-user.yaml << EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard

---
# https://github.com/kubernetes/kubernetes/issues/110113#issuecomment-1130412032
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: admin-user
namespace: kubernetes-dashboard
annotations:
kubernetes.io/service-account.name: admin-user
EOF

# 应用资源配置清单
➜ kubectl apply -f dashboard-admin-user.yaml

serviceaccount/admin-user created
clusterrolebinding.rbac.authorization.k8s.io/admin-user created
secret/admin-user created

查看 admin-user 账户的 token:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
➜ kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}')

Name: admin-user
Namespace: kubernetes-dashboard
Labels: <none>
Annotations: kubernetes.io/service-account.name: admin-user
kubernetes.io/service-account.uid: 630430bb-4ea5-4026-81ef-9d4c39089bca

Type: kubernetes.io/service-account-token

Data
====
ca.crt: 1318 bytes
namespace: 20 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlRpS0VJZ0pkRW5va3Bsb2lKOUxVRXVtM3l6RFNtaFNzUkFFLW1zcXBHS2sifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiI2MzA0MzBiYi00ZWE1LTQwMjYtODFlZi05ZDRjMzkwODliY2EiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZXJuZXRlcy1kYXNoYm9hcmQ6YWRtaW4tdXNlciJ9.C-HqHea6OsWhQ9-yjPo0DGLhrgvtQ1cdaXOeGBOZqDKbPU4s-8VO31Ihw9Fbxo6vQnLJUyzFvRVB45eKr_95sJUht1lnD4pZOJHqnvSAa9SzkHbt4FcylHHG723wplLJc3fvnyKr1u3g74hHRUfLAE3q_VghMVwHi6hRyOalYN3KiFzQXKLVyovCxxAGwaEwJg9ftiawYMkDSzxLKkI17BBwrU_zt_xAKrLn229f9eEKsTeBMju0QMyhoWKCSVbV0chfw-sbJSUMAj7a8Ff5-uY1tru-QqUGI6RzlSKlI4E5hpsUVEFuU0HIHzrwxElTmNJnLZtcotFTLrsdHIXj2w

使用输出的 token 登录 Dashboard:

https://10.128.170.21:30443

16. 附录

16.1 coredns.yaml.sed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
apiVersion: v1
kind: ServiceAccount
metadata:
name: coredns
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
rules:
- apiGroups:
- ""
resources:
- endpoints
- services
- pods
- namespaces
verbs:
- list
- watch
- apiGroups:
- discovery.k8s.io
resources:
- endpointslices
verbs:
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
kubernetes.io/bootstrapping: rbac-defaults
name: system:coredns
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:coredns
subjects:
- kind: ServiceAccount
name: coredns
namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes CLUSTER_DOMAIN REVERSE_CIDRS {
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . UPSTREAMNAMESERVER {
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}STUBDOMAINS
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/name: "CoreDNS"
app.kubernetes.io/name: coredns
spec:
# replicas: not specified here:
# 1. Default is 1.
# 2. Will be tuned in real time if DNS horizontal auto-scaling is turned on.
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
selector:
matchLabels:
k8s-app: kube-dns
app.kubernetes.io/name: coredns
template:
metadata:
labels:
k8s-app: kube-dns
app.kubernetes.io/name: coredns
spec:
priorityClassName: system-cluster-critical
serviceAccountName: coredns
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
nodeSelector:
kubernetes.io/os: linux
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values: ["kube-dns"]
topologyKey: kubernetes.io/hostname
containers:
- name: coredns
image: coredns/coredns:1.9.4
imagePullPolicy: IfNotPresent
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
args: [ "-conf", "/etc/coredns/Corefile" ]
volumeMounts:
- name: config-volume
mountPath: /etc/coredns
readOnly: true
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- all
readOnlyRootFilesystem: true
livenessProbe:
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /ready
port: 8181
scheme: HTTP
dnsPolicy: Default
volumes:
- name: config-volume
configMap:
name: coredns
items:
- key: Corefile
path: Corefile
---
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
annotations:
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "CoreDNS"
app.kubernetes.io/name: coredns
spec:
selector:
k8s-app: kube-dns
app.kubernetes.io/name: coredns
clusterIP: CLUSTER_DNS_IP
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
- name: metrics
port: 9153
protocol: TCP

16.2 helm 安装 coredns

https://github.com/coredns/helm

安装 helm:

1
2
3
4
➜ cd ~/Downloads
➜ curl -O https://mirrors.huaweicloud.com/helm/v3.12.2/helm-v3.12.2-linux-amd64.tar.gz
➜ tar -zxvf helm-v3.12.2-linux-amd64.tar.gz
➜ cp linux-amd64/helm /usr/local/bin

添加 chart 仓库:

1
➜ helm repo add coredns https://coredns.github.io/helm

查看可安装的版本:

1
2
3
4
5
6
7
8
9
10
11
12
➜ helm search repo -l

NAME CHART VERSION APP VERSION DESCRIPTION
coredns/coredns 1.24.1 1.10.1 CoreDNS is a DNS server that chains plugins and...
coredns/coredns 1.24.0 1.10.1 CoreDNS is a DNS server that chains plugins and...
coredns/coredns 1.23.0 1.10.1 CoreDNS is a DNS server that chains plugins and...
coredns/coredns 1.22.0 1.10.1 CoreDNS is a DNS server that chains plugins and...
coredns/coredns 1.21.0 1.10.1 CoreDNS is a DNS server that chains plugins and...
coredns/coredns 1.20.2 1.9.4 CoreDNS is a DNS server that chains plugins and...
coredns/coredns 1.20.1 1.9.4 CoreDNS is a DNS server that chains plugins and...
coredns/coredns 1.20.0 1.9.4 CoreDNS is a DNS server that chains plugins and...
...

安装 coredns:

1
➜ helm --namespace=kube-system install coredns coredns/coredns

References

K8s 高可用集群架构(二进制)部署及应用

二进制部署 k8s 集群 1.23.1 版本

部署一套完整的企业级k8s集群