概念
关于k8s的一些概念在官网讲的很详细,这里着重说几个。
主节点
主节点是组成集群控制平面的系统服务集合。生产环境中一般建议有3或5个主节点保证管理的高可用HA。主节点的服务有:
- API Server:为所有组件提供通信支持
- 集群存储:比如etcd
- 管理器controller:实现全部的后台循环控制,完成对集群节点的监控并对事件做出响应,目的是保证集群的当前状态与期望状态相同
- 调度器scheduler:通过监听API Server启动新的工作任务并将其分配到合适的节点中
此外,还有云controller管理器,是针对运行在云平台(如AWS、Azure)的用户集群进行管理的组件。
关于工作节点的内容在后面的文章中会跟上。
启动k8s的master节点
成功启动还是花了不少时间的,先把相关的参考列上:
https://kubernetes.io/zh/docs/setup/
https://kubernetes.io/zh/docs/tasks/tools/
https://blog.csdn.net/professorman/article/details/118150688
https://stackoverflow.com/questions/52119985/kubeadm-init-shows-kubelet-isnt-running-or-healthy
https://bbs.huaweicloud.com/forum/thread-76599-1-1.html
前置环境
在安装k8s(泛指运行k8s所需要的环境、组件等)之前,我的环境是这样的:
- 一台Ubuntu
18.0420.04虚拟机,4G内存(最好是4G,我尝试了2G内存结果虚拟机容易卡死)
这里要特别的注意,一开始我是用的是Ubuntu18.04,但是在k8s系列的下一篇文章中,启动Service来访问集群时,出现了问题。这应该是由于Ubuntu18.04内核的一个bug,因为我搭建的环境是Kubernetes1.22.4,启动Service时需要使用iptables来配置规则,而Kubernetes使用的iptables版本要新于Ubuntu18.04的iptables,进而导致有iptables新版本的命令旧版本无法正常执行,导致无法完成Service配置。
当我将环境更换到Ubuntu20.04时,该问题得到解决,所以建议使用Ubuntu20.04来进行实验。
- 需要配置好docker环境
关闭swap
这一点是k8s强烈建议的,我看到所有的教程几乎都说明了关闭swap,可以提高性能。
swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
应该需要重启机器才生效。
开放端口
对于管理节点和工作节点来说,有一些端口需要开放给k8s的组件进行通信:
- 控制平面节点
协议 | 方向 | 端口范围 | 作用 | 使用者 |
---|---|---|---|---|
TCP | 入站 | 6443 | Kubernetes API 服务器 | 所有组件 |
TCP | 入站 | 2379-2380 | etcd 服务器客户端 API | kube-apiserver, etcd |
TCP | 入站 | 10250 | Kubelet API | kubelet 自身、控制平面组件 |
TCP | 入站 | 10251 | kube-scheduler | kube-scheduler 自身 |
TCP | 入站 | 10252 | kube-controller-manager | kube-controller-manager 自身 |
- 工作节点
协议 | 方向 | 端口范围 | 作用 | 使用者 |
---|---|---|---|---|
TCP | 入站 | 10250 | Kubelet API | kubelet 自身、控制平面组件 |
TCP | 入站 | 30000-32767 | NodePort 服务† | 所有组件 |
以上端口我使用iptables
打开。
开启ip转发
#开启ip转发
vim /etc/sysctl.conf
net.ipv4.ip_forward=1
#查看状态
sysctl -p
设置iptables转发规则
这是后来发现的坑,有的机器上iptables的Chain Forward规则是Chain FORWARD (policy DROP)
,这样就无法通过集群地址访问部署在别的节点上的Pod,Service也不可用。所以保险起见还是要查看下有没有打开,没有就给他ACCEPT。
iptables -P FORWARD ACCEPT
更换docker的cgroup驱动
因为k8s的cgroup驱动是systems
但是docker的是systemd
,所以在/etc/docker/daemon.json
中添加设置:
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
然后重启docker:
sudo systemctl daemon-reload
sudo systemctl restart docker
# 因为这个错误我是在安装完kubelet后才出现的 所以我也重启了kubelet
sudo systemctl restart kubelet
安装k8s
添加证书
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
添加apt源
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
apt-get update
安装
默认的,安装最新版本的k8s
apt-get install -y kubelet kubeadm kubectl
也可以安装想要的版本:
apt-get install -y kubelet=1.18.4-00 kubeadm=1.18.4-00 kubectl=1.18.4-00
获取构建k8s需要的镜像
首先查看需要的镜像版本:
lzl@lzl-b:~$ kubeadm config images list
k8s.gcr.io/kube-apiserver:v1.22.4
k8s.gcr.io/kube-controller-manager:v1.22.4
k8s.gcr.io/kube-scheduler:v1.22.4
k8s.gcr.io/kube-proxy:v1.22.4
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/coredns/coredns:v1.8.4
由于直接通过pull会被墙,所以推荐通过配置了国内源的docker先pull下来,然后再打上想要的tag。整个流程的shell文件如下:
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.22.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.22.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.22.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.22.4
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.0-0
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver:v1.22.4 k8s.gcr.io/kube-apiserver:v1.22.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager:v1.22.4 k8s.gcr.io/kube-controller-manager:v1.22.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler:v1.22.4 k8s.gcr.io/kube-scheduler:v1.22.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy:v1.22.4 k8s.gcr.io/kube-proxy:v1.22.4
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.5 k8s.gcr.io/pause:3.5
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/etcd:3.5.0-0 k8s.gcr.io/etcd:3.5.0-0
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.8.4 k8s.gcr.io/coredns/coredns:v1.8.4
然后检查下有没有问题:
lzl@lzl:~/Desktop$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
k8s.gcr.io/kube-apiserver v1.22.4 8a5cc299272d 10 days ago 128MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-apiserver v1.22.4 8a5cc299272d 10 days ago 128MB
k8s.gcr.io/kube-scheduler v1.22.4 721ba97f54a6 10 days ago 52.7MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-scheduler v1.22.4 721ba97f54a6 10 days ago 52.7MB
k8s.gcr.io/kube-controller-manager v1.22.4 0ce02f92d3e4 10 days ago 122MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-controller-manager v1.22.4 0ce02f92d3e4 10 days ago 122MB
k8s.gcr.io/kube-proxy v1.22.4 edeff87e4802 10 days ago 104MB
registry.cn-hangzhou.aliyuncs.com/google_containers/kube-proxy v1.22.4 edeff87e4802 10 days ago 104MB
nginx latest ea335eea17ab 11 days ago 141MB
alpine <none> 0a97eee8041e 2 weeks ago 5.61MB
counter-app-master_web-fe latest 1e3f0e452820 2 weeks ago 52.5MB
carrotliduo/web latest 8b05e3c03d63 3 weeks ago 77MB
test latest 8b05e3c03d63 3 weeks ago 77MB
web latest 8b05e3c03d63 3 weeks ago 77MB
python 3.6-alpine c5aebf5e06c5 4 weeks ago 40.8MB
ubuntu latest ba6acccedd29 6 weeks ago 72.8MB
redis alpine e24d2b9deaec 7 weeks ago 32.3MB
alpine latest 14119a10abf4 3 months ago 5.6MB
nigelpoulton/tu-demo latest c610c6a38555 4 months ago 58.1MB
nigelpoulton/tu-demo v2 c610c6a38555 4 months ago 58.1MB
nigelpoulton/tu-demo v1 6ba12825d092 4 months ago 58.1MB
nigelpoulton/pluralsight-docker-ci latest 1c201f15a046 5 months ago 79.5MB
registry.cn-hangzhou.aliyuncs.com/google_containers/etcd 3.5.0-0 004811815584 5 months ago 295MB
k8s.gcr.io/etcd 3.5.0-0 004811815584 5 months ago 295MB
k8s.gcr.io/coredns/coredns v1.8.4 8d147537fb7d 6 months ago 47.6MB
registry.cn-hangzhou.aliyuncs.com/google_containers/coredns v1.8.4 8d147537fb7d 6 months ago 47.6MB
k8s.gcr.io/pause 3.5 ed210e3e4a5b 8 months ago 683kB
registry.cn-hangzhou.aliyuncs.com/google_containers/pause 3.5 ed210e3e4a5b 8 months ago 683kB
nigelpoulton/tu-demo v2-old d5e1e48cf932 20 months ago 104MB
nigelpoulton/tu-demo v1-old 6852022de69d 20 months ago 104MB
dockersamples/atseasampleshopapp_reverse_proxy <none> 32b8411b497a 3 years ago 18.6MB
dockersamples/visualizer
然后我们就可以初始化master节点了
这里注意要以root
角色启动。
root@lzl:/home/lzl# kubeadm init --kubernetes-version=v1.22.4 --pod-network-cidr=10.0.20.0/24 --ignore-preflight-errors=Swap
[init] Using Kubernetes version: v1.22.4
[preflight] Running pre-flight checks
[WARNING HTTPProxy]: Connection to "https://10.0.20.25" uses proxy "http://10.0.20.17:1080/". If that is not intended, adjust your proxy settings
[WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://10.0.20.17:1080/". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
[WARNING HTTPProxyCIDR]: connection to "10.0.20.0/24" uses proxy "http://10.0.20.17:1080/". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local lzl] and IPs [10.96.0.1 10.0.20.25]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost lzl] and IPs [10.0.20.25 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost lzl] and IPs [10.0.20.25 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 7.778428 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node lzl as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node lzl as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: y5u12k.h101qh26f94557u7
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.0.20.25:6443 --token y5u12k.h101qh26f94557u7 \
--discovery-token-ca-cert-hash sha256:ef50610dda443d0dc461f3a74e8e73921c2e86dd24a2f39519b4f315a018d7f8
root@lzl:/home/lzl#
看到Your Kubernetes control-plane has initialized successfully!
就说明第一阶段配置完成了。
继续配环境
我们先切到普通用户,然后根据刚才的提示设置以下的环境:
lzl@lzl:~$ mkdir -p $HOME/.kube
lzl@lzl:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[sudo] password for lzl:
lzl@lzl:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
然后查看一下各组件的状态:
lzl@lzl:~$ kubectl get componentstatus
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
发现调度器掉线。
出现这种情况,是
/etc/kubernetes/manifests/
下的kube-controller-manager.yaml
和kube-scheduler.yaml
设置的默认端口是0导致的,解决方式是注释掉对应的port即可。
我们按照别人的教程操作:
···
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
# - --port=0
env:
- name: HTTP_PROXY
value: http://10.0.20.17:1080/
- name: FTP_PROXY
value: http://10.0.20.17:1080/
- name: https_proxy
value: http://10.0.20.17:1080/
···
再次查看组件状态:
lzl@lzl:/etc/kubernetes/manifests$ kubectl get componentstatus
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
这样三个组件就全部在线了。