Kubernetes 使用Rook-Ceph作为持久化存储PV
1 2 3 4 5 6 7
| $ lsblk -f NAME FSTYPE LABEL UUID MOUNTPOINT vda └─vda1 LVM2_member >eSO50t-GkUV-YKTH-WsGq-hNJY-eKNf-3i07IB ├─ubuntu--vg-root ext4 c2366f76-6e21-4f10-a8f3-6776212e2fe4 / └─ubuntu--vg-swap_1 swap 9492a3dc-ad75-47cd-9596-678e8cf17ff9 [SWAP] vdb
1 2 3 4 5 6 7
| fdisk /dev/vdb
>> d
>> w $ lsblk -f
Admission Controller
添加Rook Admission Controller
| kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.7.1/cert-manager.yaml
LVM package 更新
| sudo yum install -y lvm2
1 2 3 4
| $ git clone https://github.com/rook/rook.git cd rook/deploy/examples kubectl create -f crds.yaml -f common.yaml -f operator.yaml kubectl create -f cluster.yaml
因为我们使用CephFS, 所以我们需要/csi/cephfs/下的storageclass。但是在此之前,还需要/examples下的filesystem。
1 2 3 4
| kubectl create -f filesystem.yaml
kubectl create -f filesystem-ec.yaml
1 2 3 4
| cd /root/rook/deploy/examples/csi/cephfs/ kubectl create -f storageclass.yaml
kubectl create -f storageclass-ec.yaml
StorageClass 存储类检查
1 2 3 4 5 6 7 8 9 10 11
| kubectl get sc
root@iZ0xi8e6m9i2dxn2mfu8tzZ:~ NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE alibabacloud-cnfs-nas nasplugin.csi.alibabacloud.com Delete Immediate true 9d alicloud-disk-efficiency diskplugin.csi.alibabacloud.com Delete Immediate true 9d alicloud-disk-essd diskplugin.csi.alibabacloud.com Delete Immediate true 9d alicloud-disk-ssd diskplugin.csi.alibabacloud.com Delete Immediate true 9d alicloud-disk-topology-alltype diskplugin.csi.alibabacloud.com Delete WaitForFirstConsumer true 9d rook-cephfs rook-ceph.cephfs.csi.ceph.com Delete Immediate true 9d
1 2 3 4 5 6 7 8 9 10 11 12
root@k8s-manage:~ NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE alibabacloud-cnfs-nas nasplugin.csi.alibabacloud.com Delete Immediate true 9d alicloud-disk-efficiency diskplugin.csi.alibabacloud.com Delete Immediate true 9d alicloud-disk-essd diskplugin.csi.alibabacloud.com Delete Immediate true 9d alicloud-disk-ssd diskplugin.csi.alibabacloud.com Delete Immediate true 9d alicloud-disk-topology-alltype diskplugin.csi.alibabacloud.com Delete WaitForFirstConsumer true 9d rook-cephfs (default) rook-ceph.cephfs.csi.ceph.com Delete Immediate true 9d
PVC 部署样例:
| kubectl create -f kube-registry.yaml
1 2
| kubectl get pv kubectl get pvc
如果你使用阿里云,也可以通过Web UI在存储卷和存储声明找到。
安装Toolbox 方便登录进去查看ceph的状态。
| kubectl create -f deploy/examples/toolbox.yaml
1 2 3 4
| ceph status ceph osd status ceph df rados df
网页版管理 Ceph Dashboard
1 2 3 4 5 6 7 8
| root@k8s-ceph:~ NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rook-ceph-exporter ClusterIP <none> 9926/TCP 12d rook-ceph-mgr ClusterIP <none> 9283/TCP 12d rook-ceph-mgr-dashboard ClusterIP <none> 8443/TCP 8m7s rook-ceph-mon-a ClusterIP <none> 6789/TCP,3300/TCP 19h rook-ceph-mon-c ClusterIP <none> 6789/TCP,3300/TCP 12d rook-ceph-mon-d ClusterIP <none> 6789/TCP,3300/TCP 23h
1 2 3 4 5 6 7 8 9
| root@k8s-ceph:~ NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rook-ceph-exporter ClusterIP <none> 9926/TCP 12d rook-ceph-mgr ClusterIP <none> 9283/TCP 12d rook-ceph-mgr-dashboard ClusterIP <none> 8443/TCP 8m7s rook-ceph-mgr-dashboard-lb LoadBalancer 8443:31474/TCP 21m rook-ceph-mon-a ClusterIP <none> 6789/TCP,3300/TCP 19h rook-ceph-mon-c ClusterIP <none> 6789/TCP,3300/TCP 12d rook-ceph-mon-d ClusterIP <none> 6789/TCP,3300/TCP 23h
用户名为admin, 密码通过以下方式获取:
| kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
1 2 3 4 5 6 7 8 9 10
| kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
bash-5.1$ ceph osd pool ls .mgr myfs-metadata myfs-replicated
bash-5.1$ ceph osd pool set myfs-replicated size 4
测试应用1.1 nginx deployment
关于kind: PersistentVolumeClaim这一部分,你也可以分开成不同的yaml文件。
| kubectl apply -f nginx-with-pvc.yaml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
| apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cephfs-pvc-nginx namespace: default spec: accessModes: - ReadWriteMany resources: requests: storage: 2Gi storageClassName: rook-cephfs --- apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" generation: 1 labels: app: nginx name: nginx namespace: default
spec: progressDeadlineSeconds: 600 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: app: nginx strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 50% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: nginx spec: containers: - image: nginx imagePullPolicy: Always name: nginx volumeMounts: - name: nginx-html mountPath: /usr/share/nginx/html/ ports: - containerPort: 80 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumes: - name: nginx-html persistentVolumeClaim: claimName: cephfs-pvc-nginx readOnly: false dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30
测试应用1.2 nginx 使用configMap 加载nginx.conf
如果我们kubectl exec -it /bin/bash进入后,通过ls /etc/nginx/后是可以看到nginx.conf这个文件的。但是这个文件是在镜像内,那么我们怎么加载自定义修改过的nginx.conf呢?
首先,我们先建一个nginx-configMap.yaml。这个文件是基于cat /etc/nginx/nginx.conf的结果。同时我们需要将nginx.conf内的内容额外增加空格(Tab 2次),从而满足yaml的格式化要求。
起床,我们还在文件内增加This is Test,方便验证这个configMap是否加载成功。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
| apiVersion: v1 kind: ConfigMap metadata: name: nginx-config namespace: default data: nginx.conf: |+ user nginx; worker_processes auto;
error_log /var/log/nginx/error.log notice; pid /var/run/nginx.pid;
events { worker_connections 1024; }
http { include /etc/nginx/mime.types; default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
keepalive_timeout 65;
include /etc/nginx/conf.d/*.conf; }
| kubectl apply -f nginx-configMap.yaml
执行完Config Map后,我们修改deployment.yaml。需要找到spec.spec.containers下的volumeMounts和volumes增加config map的配置。结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
| apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" generation: 1 labels: app: nginx name: nginx namespace: default
spec: progressDeadlineSeconds: 600 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: app: nginx strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 50% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: nginx spec: containers: - image: nginx imagePullPolicy: Always name: nginx volumeMounts: - name: nginx-html mountPath: /usr/share/nginx/html/ - name: nginx-config-vol mountPath: /etc/nginx/nginx.conf subPath: nginx.conf ports: - containerPort: 80 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumes: - name: nginx-html persistentVolumeClaim: claimName: cephfs-pvc-nginx readOnly: false - name: nginx-config-vol configMap: name: nginx-config dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30
之后,我们就可以通过进入pod内,查看这个文件是否多了This is Test即可。
测试应用1.3 nginx configMap加载conf.d下的配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
| apiVersion: v1 kind: ConfigMap metadata: name: nginx-config-server namespace: default data: server1.conf: |+ server { listen 80; listen [::]:80; server_name localhost;
location / { root /usr/share/nginx/html; index index.html index.htm; }
error_page 500 502 503 504 /50x.html; location = /50x.html { root /usr/share/nginx/html; }
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
| apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" generation: 1 labels: app: nginx name: nginx namespace: default
spec: progressDeadlineSeconds: 600 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: app: nginx strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 50% type: RollingUpdate template: metadata: creationTimestamp: null labels: app: nginx spec: containers: - image: nginx imagePullPolicy: Always name: nginx volumeMounts: - name: nginx-html mountPath: /usr/share/nginx/html/ - name: nginx-config-vol mountPath: /etc/nginx/nginx.conf subPath: nginx.conf - name: nginx-config-server-vol mountPath: /etc/nginx/conf.d/ ports: - containerPort: 80 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumes: - name: nginx-html persistentVolumeClaim: claimName: cephfs-pvc-nginx readOnly: false - name: nginx-config-vol configMap: name: nginx-config - name: nginx-config-server-vol configMap: name: nginx-config-server dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30
测试应用1.4 nginx 服务(Services) Expose暴露端口
1 2 3 4 5 6 7 8 9 10 11
| apiVersion: v1 kind: Service metadata: name: nginx-svc spec: type: NodePort selector: app: nginx ports: - port: 80 targetPort: 80
如果是使用云产品,建议使用LoadBalancer模式,且建议直接在网页上配置。记得要有selector选择器。 (app: nginx)
测试应用2 alpine
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| apiVersion: apps/v1 kind: Deployment metadata: name: alpine-deployment spec: replicas: 2 selector: matchLabels: app: alpine strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 50% template: metadata: labels: app: alpine spec: containers: - name: alpine image: alpine:latest command: ["/bin/sh"] args: ["-c", "while true; do echo \"$(date): hello\" >> /mnt/alpine/datetime.log; sleep 15; done"] volumeMounts: - name: dataval1 mountPath: /mnt/alpine resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m" volumes: - name: dataval1 persistentVolumeClaim: claimName: cephfs-pvc-nginx
如果没有PVC,则可以新建一个yaml文件, 比如命名为pvc-nginx。这样之后增加大小就可以直接通过kubectl apply -f pvc-nginx.yaml来执行了。
1 2 3 4 5 6 7 8 9 10 11 12
| apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cephfs-pvc-nginx namespace: default spec: accessModes: - ReadWriteMany resources: requests: storage: 2Gi storageClassName: rook-cephfs
我们先kubectl get pods获得pod的名字,然后进入到sh里面。
| kubectl exec -it alpine-deployment-59b86bb64-hndhq -- sh
比如我们touch 1.txt, 或者vi 1.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| apiVersion: v1 kind: Pod metadata: name: nginx-bash-pod spec: restartPolicy: OnFailure containers:
- command: ["/bin/bash"] args: ["-c", "sleep 365d; exit 0"] image: nginx imagePullPolicy: IfNotPresent name: nginx-bash-pod volumeMounts: - name: dataval0 mountPath: /mnt/data volumes: - name: dataval0 persistentVolumeClaim: claimName: cephfs-pvc-nginx
那么我们也可以通过以下方式进入,但是这个是只能touch命令。比如touch 2.txt
| kubectl exec -it nginx-bash-pod bash
其实普通的nginx deployment 也可以通过以下命令进入命令行模式
| kubectl exec -it nginx-deployment-pod01 -- /bin/bash
因为CephFS是共享型,所以,即使我们是挂载/mnt/data 还是/mnt/alpine 实际上在CephFS这个数据池子Pool内,你会看到1.txt和2.txt。
除去通过不同应用使用vi, touch的方式读写,也可以通过ls -l, df -h的方式查看是否已经挂载。
我们还可以根据之前的测试应用nginx, 修改server1.conf的Config Map配置项,将默认的/usr/share/nginx/html修改为/usr/share/nginx/html/cfswww。
我们可以对index.html做一些改动,比如增加Test 1等h1标题。
| kubectl rollout restart deployment <deployment-name> -n <namespace>
我们发现AZ A 2节点;AZ B 1节点的情况下,只有1个B节点会导致pod卡死在ContainerCreating。
1 2 3 4 5 6 7 8
| kubectl describe pod alpine-deployment-6fc7446598-982mr
... ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11m default-scheduler Successfully assigned default/alpine-deployment-6fc7446598-982mr to us-east- Warning FailedMount 78s (x4 over 9m26s) kubelet MountVolume.SetUp failed for volume "pvc-ed5a9231-9d49-4091-9c7b-a3b3924f5f67" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
友情提示:如果你强行挂掉一个节点,之后删除这个节点。部分调度依然会尝试调度到那个被删除的节点。即使你describe pod -o yaml转成yaml格式,或者
1 2
| kubectl get rs -n <namespace> kubectl get deploy -n <namespace>
后尝试describe 并转成yaml去apply都是无效的。
我们可以通过配置拓扑分布约束(Topology Spread Constraints)来实现跨可用区高可用。当然,你也可以配置Pod反亲和(Pod Anti-Affinity)来实现。
阿里云ACK官方文档 - 节点池高可用配置之拓扑分布约束
1 2 3 4 5
| topologySpreadConstraints: - maxSkew: 1 topologyKey: "topology.kubernetes.io/zone" whenUnsatisfiable: DoNotSchedule
查看pod的node位置可以通过describe pod来查看,但是一个一个太麻烦,可以通过如下方式:
1 2 3
| kubectl get pods -o custom-columns='NAME:.metadata.name,NODE:.spec.nodeName'
kubectl get pods -n rook-ceph -o custom-columns='NAME:.metadata.name,NODE:.spec.nodeName'
针对其中已经运行的pod, 其实不需要通过kubectl edit deploy -n rook-ceph去编辑或者转成yaml去apply,直接patch即可。例如csi-cephfsplugin-provisioner默认只有两个:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| kubectl patch deployment csi-cephfsplugin-provisioner --patch '{ "spec": { "template": { "spec": { "topologySpreadConstraints": [ { "maxSkew": 1, "topologyKey": "topology.kubernetes.io/zone", "whenUnsatisfiable": "DoNotSchedule" } ] } } } }'
实践中感觉使用Pod Anti-Affinity更有效。
Pod Anti-Affinity的实现方式:
阿里云ACK官方文档 - 配置Pod反亲和
Kubernetes Multi AZ deployments using pod anti-affinity
如果你参考英文版跨可用区方式,他的failure-domain.beta.kubernetes.io/zone 已经在1.17之后被弃用。改为topology.kubernetes.io/zone
所以请参考ACK官方文档以及英文版Soft Pod Anti-Affinity合并一下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| apiVersion: apps/v1 kind: Deployment metadata: name: app-run-per-node spec: replicas: 3 selector: matchLabels: app: app-run-per-node template: metadata: labels: app: app-run-per-node spec: containers: - name: app-container image: app-image affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - app-run-per-node topologyKey: "topology.kubernetes.io/zone" weight: 100
但是MDS是每个Pod一个Deployment, 你是无法通过这个方式跨可用区的。这个时候你可以利用nodeAffinity来实现只调度到某个可用区。(ceph可以部署前把每个yaml看一遍。好像是可以高可用的)
K8S Docs - Assign Pod to Node
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| spec: affinity:
nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - us-east-1a
podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - rook-ceph-mds topologyKey: kubernetes.io/hostname