启动k8s的主节点(k8s实践-1)

概念

关于k8s的一些概念在官网讲的很详细,这里着重说几个。

主节点

主节点是组成集群控制平面的系统服务集合。生产环境中一般建议有3或5个主节点保证管理的高可用HA。主节点的服务有:

  • API Server:为所有组件提供通信支持
  • 集群存储:比如etcd
  • 管理器controller:实现全部的后台循环控制,完成对集群节点的监控并对事件做出响应,目的是保证集群的当前状态与期望状态相同
  • 调度器scheduler:通过监听API Server启动新的工作任务并将其分配到合适的节点中

此外,还有云controller管理器,是针对运行在云平台(如AWS、Azure)的用户集群进行管理的组件。

关于工作节点的内容在后面的文章中会跟上。

启动k8s的master节点

成功启动还是花了不少时间的,先把相关的参考列上:

https://kubernetes.io/zh/docs/setup/

https://kubernetes.io/zh/docs/tasks/tools/

https://blog.csdn.net/professorman/article/details/118150688

https://stackoverflow.com/questions/52119985/kubeadm-init-shows-kubelet-isnt-running-or-healthy

https://bbs.huaweicloud.com/forum/thread-76599-1-1.html

https://www.cnblogs.com/potato-chip/p/13973760.html

https://blog.51cto.com/zhangxueliang/2980956

前置环境

在安装k8s(泛指运行k8s所需要的环境、组件等)之前,我的环境是这样的:

  • 一台Ubuntu18.0420.04虚拟机,4G内存(最好是4G,我尝试了2G内存结果虚拟机容易卡死)

这里要特别的注意,一开始我是用的是Ubuntu18.04,但是在k8s系列的下一篇文章中,启动Service来访问集群时,出现了问题。这应该是由于Ubuntu18.04内核的一个bug,因为我搭建的环境是Kubernetes1.22.4,启动Service时需要使用iptables来配置规则,而Kubernetes使用的iptables版本要新于Ubuntu18.04的iptables,进而导致有iptables新版本的命令旧版本无法正常执行,导致无法完成Service配置。

当我将环境更换到Ubuntu20.04时,该问题得到解决,所以建议使用Ubuntu20.04来进行实验。

  • 需要配置好docker环境

关闭swap

这一点是k8s强烈建议的,我看到所有的教程几乎都说明了关闭swap,可以提高性能。

swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab

应该需要重启机器才生效。

开放端口

对于管理节点和工作节点来说,有一些端口需要开放给k8s的组件进行通信:

  • 控制平面节点
协议 方向 端口范围 作用 使用者
TCP 入站 6443 Kubernetes API 服务器 所有组件
TCP 入站 2379-2380 etcd 服务器客户端 API kube-apiserver, etcd
TCP 入站 10250 Kubelet API kubelet 自身、控制平面组件
TCP 入站 10251 kube-scheduler kube-scheduler 自身
TCP 入站 10252 kube-controller-manager kube-controller-manager 自身
  • 工作节点
协议 方向 端口范围 作用 使用者
TCP 入站 10250 Kubelet API kubelet 自身、控制平面组件
TCP 入站 30000-32767 NodePort 服务† 所有组件

以上端口我使用iptables打开。

开启ip转发

#开启ip转发
vim /etc/sysctl.conf
net.ipv4.ip_forward=1

#查看状态
sysctl -p

设置iptables转发规则

这是后来发现的坑,有的机器上iptables的Chain Forward规则是Chain FORWARD (policy DROP),这样就无法通过集群地址访问部署在别的节点上的Pod,Service也不可用。所以保险起见还是要查看下有没有打开,没有就给他ACCEPT。

iptables -P FORWARD ACCEPT

更换docker的cgroup驱动

因为k8s的cgroup驱动是systems但是docker的是systemd,所以在/etc/docker/daemon.json中添加设置:

{
    "exec-opts": ["native.cgroupdriver=systemd"]
}

然后重启docker:

 sudo systemctl daemon-reload
 sudo systemctl restart docker
 # 因为这个错误我是在安装完kubelet后才出现的 所以我也重启了kubelet
 sudo systemctl restart kubelet

安装k8s

添加证书

curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 

添加apt源

cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF  

apt-get update

安装

默认的,安装最新版本的k8s

apt-get install -y kubelet kubeadm kubectl

也可以安装想要的版本:

apt-get install -y kubelet=1.18.4-00 kubeadm=1.18.4-00 kubectl=1.18.4-00

获取构建k8s需要的镜像

首先查看需要的镜像版本:

lzl@lzl-b:~$ kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.22.4
k8s.gcr.io/kube-controller-manager:v1.22.4
k8s.gcr.io/kube-scheduler:v1.22.4
k8s.gcr.io/kube-proxy:v1.22.4
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/coredns/coredns:v1.8.4

由于直接通过pull会被墙,所以推荐通过配置了国内源的docker先pull下来,然后再打上想要的tag。整个流程的shell文件如下:

docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.22.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.22.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.22.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.22.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.0-0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4

docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.22.4 k8s.gcr.io/kube-apiserver:v1.22.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.22.4 k8s.gcr.io/kube-controller-manager:v1.22.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.22.4 k8s.gcr.io/kube-scheduler:v1.22.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.22.4 k8s.gcr.io/kube-proxy:v1.22.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5 k8s.gcr.io/pause:3.5
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.0-0 k8s.gcr.io/etcd:3.5.0-0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4 k8s.gcr.io/coredns/coredns:v1.8.4

然后检查下有没有问题:

lzl@lzl:~/Desktop$ docker image ls
REPOSITORY                                                                    TAG          IMAGE ID       CREATED         SIZE
k8s.gcr.io/kube-apiserver                                                     v1.22.4      8a5cc299272d   10 days ago     128MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver            v1.22.4      8a5cc299272d   10 days ago     128MB
k8s.gcr.io/kube-scheduler                                                     v1.22.4      721ba97f54a6   10 days ago     52.7MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler            v1.22.4      721ba97f54a6   10 days ago     52.7MB
k8s.gcr.io/kube-controller-manager                                            v1.22.4      0ce02f92d3e4   10 days ago     122MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager   v1.22.4      0ce02f92d3e4   10 days ago     122MB
k8s.gcr.io/kube-proxy                                                         v1.22.4      edeff87e4802   10 days ago     104MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy                v1.22.4      edeff87e4802   10 days ago     104MB
nginx                                                                         latest       ea335eea17ab   11 days ago     141MB
alpine                                                                        <none>       0a97eee8041e   2 weeks ago     5.61MB
counter-app-master_web-fe                                                     latest       1e3f0e452820   2 weeks ago     52.5MB
carrotliduo/web                                                               latest       8b05e3c03d63   3 weeks ago     77MB
test                                                                          latest       8b05e3c03d63   3 weeks ago     77MB
web                                                                           latest       8b05e3c03d63   3 weeks ago     77MB
python                                                                        3.6-alpine   c5aebf5e06c5   4 weeks ago     40.8MB
ubuntu                                                                        latest       ba6acccedd29   6 weeks ago     72.8MB
redis                                                                         alpine       e24d2b9deaec   7 weeks ago     32.3MB
alpine                                                                        latest       14119a10abf4   3 months ago    5.6MB
nigelpoulton/tu-demo                                                          latest       c610c6a38555   4 months ago    58.1MB
nigelpoulton/tu-demo                                                          v2           c610c6a38555   4 months ago    58.1MB
nigelpoulton/tu-demo                                                          v1           6ba12825d092   4 months ago    58.1MB
nigelpoulton/pluralsight-docker-ci                                            latest       1c201f15a046   5 months ago    79.5MB
registry.cn-hangzhou.aliyuncs.com/google_containers/etcd                      3.5.0-0      004811815584   5 months ago    295MB
k8s.gcr.io/etcd                                                               3.5.0-0      004811815584   5 months ago    295MB
k8s.gcr.io/coredns/coredns                                                    v1.8.4       8d147537fb7d   6 months ago    47.6MB
registry.cn-hangzhou.aliyuncs.com/google_containers/coredns                   v1.8.4       8d147537fb7d   6 months ago    47.6MB
k8s.gcr.io/pause                                                              3.5          ed210e3e4a5b   8 months ago    683kB
registry.cn-hangzhou.aliyuncs.com/google_containers/pause                     3.5          ed210e3e4a5b   8 months ago    683kB
nigelpoulton/tu-demo                                                          v2-old       d5e1e48cf932   20 months ago   104MB
nigelpoulton/tu-demo                                                          v1-old       6852022de69d   20 months ago   104MB
dockersamples/atseasampleshopapp_reverse_proxy                                <none>       32b8411b497a   3 years ago     18.6MB
dockersamples/visualizer

然后我们就可以初始化master节点了

这里注意要以root角色启动。

root@lzl:/home/lzl# kubeadm init --kubernetes-version=v1.22.4 --pod-network-cidr=10.0.20.0/24 --ignore-preflight-errors=Swap
[init] Using Kubernetes version: v1.22.4
[preflight] Running pre-flight checks
	[WARNING HTTPProxy]: Connection to "https://10.0.20.25" uses proxy "http://10.0.20.17:1080/". If that is not intended, adjust your proxy settings
	[WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://10.0.20.17:1080/". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
	[WARNING HTTPProxyCIDR]: connection to "10.0.20.0/24" uses proxy "http://10.0.20.17:1080/". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local lzl] and IPs [10.96.0.1 10.0.20.25]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost lzl] and IPs [10.0.20.25 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost lzl] and IPs [10.0.20.25 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 7.778428 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node lzl as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node lzl as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: y5u12k.h101qh26f94557u7
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.20.25:6443 --token y5u12k.h101qh26f94557u7 \
	--discovery-token-ca-cert-hash sha256:ef50610dda443d0dc461f3a74e8e73921c2e86dd24a2f39519b4f315a018d7f8 
root@lzl:/home/lzl# 

看到Your Kubernetes control-plane has initialized successfully!就说明第一阶段配置完成了。

继续配环境

我们先切到普通用户,然后根据刚才的提示设置以下的环境:

lzl@lzl:~$ mkdir -p $HOME/.kube
lzl@lzl:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[sudo] password for lzl: 
lzl@lzl:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

然后查看一下各组件的状态:

lzl@lzl:~$ kubectl get componentstatus
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS      MESSAGE                                                                                       ERROR
scheduler            Unhealthy   Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused   
controller-manager   Healthy     ok                                                                                            
etcd-0               Healthy     {"health":"true","reason":""}

发现调度器掉线。

出现这种情况,是/etc/kubernetes/manifests/下的kube-controller-manager.yamlkube-scheduler.yaml设置的默认端口是0导致的,解决方式是注释掉对应的port即可。

我们按照别人的教程操作:

···
spec:
  containers:
  - command:
    - kube-scheduler
    - --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
    - --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
    - --bind-address=127.0.0.1
    - --kubeconfig=/etc/kubernetes/scheduler.conf
    - --leader-elect=true
#    - --port=0
    env:
    - name: HTTP_PROXY
      value: http://10.0.20.17:1080/
    - name: FTP_PROXY
      value: http://10.0.20.17:1080/
    - name: https_proxy
      value: http://10.0.20.17:1080/
···

再次查看组件状态:

lzl@lzl:/etc/kubernetes/manifests$ kubectl get componentstatus
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE                         ERROR
scheduler            Healthy   ok                              
controller-manager   Healthy   ok                              
etcd-0               Healthy   {"health":"true","reason":""}  

这样三个组件就全部在线了。

自认为是幻象波普星的来客
Built with Hugo
主题 StackJimmy 设计