PI4 Stories¶
Raspberry Pi 4 cluster Series - Troubleshooting pod issues¶
PODs are stuck in termating status¶
If you ever come in the situation that some pods get stuck in "terminating" status [1] for one of another reason (upgrade, delete,...) and then these pods stay in the terminating status then what can you do?
gdha@n1:~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
ingress-nginx ingress-nginx-admission-patch-w5dhz 0/1 Completed 0 107d
ingress-nginx ingress-nginx-admission-create-lgbdc 0/1 Completed 0 107d
monitoring grafana-544f695579-g246k 0/1 Terminating 0 3d20h
graphite graphite-0 0/1 Terminating 1 3d20h
monitoring node-exporter-dvzxs 2/2 Running 54 (17m ago) 105d
celsius celsius-779654f68d-d56q6 1/1 Running 24 (17m ago) 92d
ntopng ntopng-7d5d578657-9wj7g 1/1 Running 7 (17m ago) 6d23h
No panic, just do the following trick - force a termination without waiting on anything:
gdha@n1:~$ kubectl delete pod graphite-0 --grace-period=0 --force --namespace graphite
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "graphite-0" force deleted
And, there you go - one pod is gone:
gdha@n1:~$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
ingress-nginx ingress-nginx-admission-patch-w5dhz 0/1 Completed 0 107d
ingress-nginx ingress-nginx-admission-create-lgbdc 0/1 Completed 0 107d
monitoring grafana-544f695579-g246k 0/1 Terminating 0 3d20h
monitoring node-exporter-dvzxs 2/2 Running 54 (33m ago) 105d
celsius celsius-779654f68d-d56q6 1/1 Running 24 (33m ago) 92d
ntopng ntopng-7d5d578657-9wj7g 1/1 Running 7 (33m ago) 6d23h
longhorn-system engine-image-ei-fc06c6fb-hml5v 1/1 Running 31 (33m ago) 107d
longhorn-system csi-resizer-847d645569-hncv2 1/1 Running 1 (33m ago) 3d14h
longhorn-system csi-snapshotter-59ff78b47b-jk6rm 1/1 Running 1 (33m ago) 3d14h
longhorn-system csi-attacher-689fdb475f-ttnzx 1/1 Running 2 (33m ago) 3d14h
longhorn-system csi-attacher-689fdb475f-rxblc 1/1 Running 2 (33m ago) 3d14h
longhorn-system csi-attacher-689fdb475f-x9zj7 1/1 Running 2 (33m ago) 3d14h
longhorn-system csi-provisioner-65dd48db7b-x8hgf 1/1 Running 2 (33m ago) 3d14h
longhorn-system csi-snapshotter-59ff78b47b-mpsk7 1/1 Running 1 (33m ago) 3d14h
celsius celsius-779654f68d-jbxg8 1/1 Running 1 (33m ago) 92d
longhorn-system csi-provisioner-65dd48db7b-r4dwc 1/1 Running 2 (33m ago) 3d14h
monitoring node-exporter-6gdxp 2/2 Running 2 (33m ago) 3d14h
celsius celsius-779654f68d-9lmq8 1/1 Running 1 (33m ago) 92d
longhorn-system csi-snapshotter-59ff78b47b-8jf5c 1/1 Running 1 (33m ago) 3d14h
monitoring node-exporter-2pt9t 2/2 Running 2 (33m ago) 105d
longhorn-system longhorn-conversion-webhook-c6f4bf8b6-jgjph 1/1 Running 1 (33m ago) 107d
longhorn-system csi-resizer-847d645569-lsk97 1/1 Running 2 (33m ago) 3d14h
celsius celsius-779654f68d-j6m6t 1/1 Running 1 (33m ago) 92d
kube-system traefik-59497f77c-gxj8z 1/1 Running 15 (33m ago) 6d23h
monitoring node-exporter-7m984 2/2 Running 2 (33m ago) 105d
celsius celsius-779654f68d-5vs8r 1/1 Running 1 (33m ago) 6d23h
cert-manager cert-manager-cainjector-5fcd49c96-ghvkt 1/1 Running 1 (33m ago) 107d
kube-system metallb-speaker-wvfzm 1/1 Running 1 (33m ago) 3d14h
longhorn-system csi-resizer-847d645569-cfsng 1/1 Running 2 (33m ago) 3d14h
kube-system local-path-provisioner-957fdf8bc-zwhws 1/1 Running 1 (33m ago) 3d16h
monitoring node-exporter-rbv7f 2/2 Running 2 (33m ago) 105d
longhorn-system engine-image-ei-fc06c6fb-k6ssm 1/1 Running 1 (33m ago) 3d14h
longhorn-system longhorn-conversion-webhook-c6f4bf8b6-q6qlc 1/1 Running 1 (33m ago) 6d23h
cert-manager cert-manager-6ffb79dfdb-gqlnm 1/1 Running 1 (33m ago) 107d
monitoring prometheus-operator-754546bd85-pmkcc 1/1 Running 1 (33m ago) 105d
kube-system metallb-speaker-6v4gs 1/1 Running 1 (33m ago) 107d
longhorn-system engine-image-ei-fc06c6fb-445sw 1/1 Running 1 (33m ago) 107d
longhorn-system engine-image-ei-fc06c6fb-8nwh8 1/1 Running 1 (33m ago) 107d
kube-system metallb-speaker-t2b5d 1/1 Running 1 (33m ago) 107d
longhorn-system longhorn-recovery-backend-786b877b6c-tfcg9 1/1 Running 1 (33m ago) 107d
kube-system coredns-77ccd57875-h9dfk 1/1 Running 1 (33m ago) 3d16h
longhorn-system longhorn-recovery-backend-786b877b6c-nr5r5 1/1 Running 28 (33m ago) 107d
logging loki-stack-promtail-qtvj5 1/1 Running 1 (33m ago) 97d
kube-system metallb-speaker-55mxm 1/1 Running 1 (33m ago) 107d
kube-system metallb-controller-5b89f7554c-8rhzv 1/1 Running 1 (33m ago) 107d
longhorn-system longhorn-ui-788dd6f547-zwjfl 1/1 Running 3 (32m ago) 107d
longhorn-system longhorn-ui-788dd6f547-fw5rq 1/1 Running 14 (32m ago) 6d23h
ingress-nginx ingress-nginx-controller-65dc77f88f-kh8sn 1/1 Running 1 (33m ago) 107d
cert-manager cert-manager-webhook-796ff7697b-ptmhq 1/1 Running 14 (32m ago) 6d23h
longhorn-system engine-image-ei-fc06c6fb-jpfwq 1/1 Running 1 (33m ago) 107d
kube-system metrics-server-7b67f64457-bk84t 1/1 Running 60 (32m ago) 107d
longhorn-system longhorn-admission-webhook-777c6cb845-fb6pl 1/1 Running 1 (33m ago) 107d
kube-system metallb-speaker-nhzj6 1/1 Running 65 (32m ago) 107d
monitoring kube-state-metrics-6f8578cffd-cmks6 1/1 Running 37 (32m ago) 105d
longhorn-system longhorn-admission-webhook-777c6cb845-snjmd 1/1 Running 1 107d
longhorn-system instance-manager-e-f7ebd11fee14cab3eb5a0c0884af9492 1/1 Running 0 32m
longhorn-system instance-manager-r-f7ebd11fee14cab3eb5a0c0884af9492 1/1 Running 0 32m
longhorn-system longhorn-manager-sp98j 1/1 Running 1 (33m ago) 107d
longhorn-system longhorn-manager-v6c78 1/1 Running 1 (33m ago) 107d
longhorn-system longhorn-manager-25889 1/1 Running 1 (33m ago) 3d14h
longhorn-system instance-manager-e-1f203b7579bacda190651c7f40a2cea4 1/1 Running 0 31m
longhorn-system longhorn-csi-plugin-28jqn 3/3 Running 5 (32m ago) 3d14h
longhorn-system instance-manager-r-11dfece26197f3792b18d4fa97734e04 1/1 Running 0 31m
longhorn-system longhorn-manager-hx647 1/1 Running 28 (33m ago) 107d
longhorn-system instance-manager-r-1f203b7579bacda190651c7f40a2cea4 1/1 Running 0 31m
longhorn-system instance-manager-e-1548f3f3f04de3f9fad47c6ce8941357 1/1 Running 0 31m
longhorn-system instance-manager-e-11dfece26197f3792b18d4fa97734e04 1/1 Running 0 31m
longhorn-system instance-manager-r-1548f3f3f04de3f9fad47c6ce8941357 1/1 Running 0 31m
longhorn-system longhorn-csi-plugin-bmzpb 3/3 Running 5 (32m ago) 3d14h
longhorn-system longhorn-driver-deployer-bf975c845-sxtsq 1/1 Running 2 (31m ago) 107d
longhorn-system longhorn-csi-plugin-fblxn 3/3 Running 5 (32m ago) 3d14h
longhorn-system longhorn-csi-plugin-dgtqs 3/3 Running 5 (32m ago) 3d14h
longhorn-system longhorn-csi-plugin-6dfv6 3/3 Running 5 (32m ago) 3d14h
longhorn-system instance-manager-r-07ae318e5cf6bed1004f251d3c792412 1/1 Running 0 31m
longhorn-system longhorn-manager-69zkg 1/1 Running 1 (33m ago) 107d
longhorn-system instance-manager-e-07ae318e5cf6bed1004f251d3c792412 1/1 Running 0 31m
longhorn-system csi-provisioner-65dd48db7b-k4lcb 1/1 Running 2 (31m ago) 3d14h
monitoring prometheus-prometheus-persistant-0 2/2 Running 0 29m
nfs-server nfs-server-f48df7cdc-2zkkq 1/1 Running 0 30m
monitoring grafana-544f695579-6jcfz 1/1 Running 0 30m
logging loki-stack-promtail-pcrjw 1/1 Running 25 (33m ago) 97d
logging loki-stack-0 1/1 Running 0 28m
logging loki-stack-promtail-9hls8 1/1 Running 1 (33m ago) 97d
logging loki-stack-promtail-zfpdt 1/1 Running 1 (33m ago) 3d14h
uptime-kuma uptime-kuma-0 1/1 Running 6 (28m ago) 3d14h
logging loki-stack-promtail-nh2m7 1/1 Running 1 (33m ago) 97d
graphite graphite-0 1/1 Running
Repeat these steps for the remaining pods which are stuck:
gdha@n1:~$ kubectl delete pod grafana-544f695579-g246k --grace-period=0 --force --namespace monitoring
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "grafana-544f695579-g246k" force deleted
PODs are stuck in ContainerCreating status¶
Sometimes we come into a situation where a pod is stuck in ContainerCreating status after restarting a cluster (e.g. after a reboot) - example:
$ kubectl get pods -n graphite graphite-0 -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
graphite-0 0/1 ContainerCreating 0 23m <none> n5 <none> <none>
The reason is not obvious as the logs show only:
$ kubectl logs -n graphite graphite-0
Error from server (BadRequest): container "graphite" in pod "graphite-0" is waiting to start: ContainerCreating
We could suspect that it could not download a fresh copy of the image, but it worked before always like a charm?
Another way to dig a bit deeper to find the real issue is to use the command:
$ kubectl describe pod -n graphite graphite-0
Name: graphite-0
Namespace: graphite
Priority: 0
Service Account: default
Node: n5/192.168.0.205
Start Time: Wed, 12 Jul 2023 08:36:05 +0200
Labels: app.kubernetes.io/instance=graphite
app.kubernetes.io/name=graphite
controller-revision-hash=graphite-645d65f6fb
statefulset.kubernetes.io/pod-name=graphite-0
workload.user.cattle.io/workloadselector=statefulSet-graphite-graphite
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/graphite
Containers:
graphite:
Container ID:
Image: ghcr.io/gdha/graphite:v1.2
Image ID:
Ports: 2003/TCP, 80/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 512Mi
Requests:
memory: 400Mi
Liveness: http-get http://:liveness-port/metrics/find%3Fquery=%2A delay=0s timeout=2s period=2s #success=1 #failure=3
Readiness: http-get http://:liveness-port/metrics/find%3Fquery=%2A delay=0s timeout=2s period=2s #success=2 #failure=3
Startup: http-get http://:liveness-port/metrics/find%3Fquery=%2A delay=0s timeout=1s period=10s #success=1 #failure=30
Environment Variables from:
graphite Secret Optional: false
Environment: <none>
Mounts:
/opt/graphite/storage from graphite-storage (rw)
/var/log from bind-mount--logs (rw)
/var/run/secrets from graphite-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kxvv5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
bind-mount--logs:
Type: HostPath (bare host directory volume)
Path: /app/util/graphite/logs
HostPathType: DirectoryOrCreate
graphite-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: graphite
Optional: false
graphite-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: graphite
ReadOnly: false
kube-api-access-kxvv5:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30m default-scheduler Successfully assigned graphite/graphite-0 to n5
Warning FailedMount 7m17s (x19 over 29m) kubelet MountVolume.MountDevice failed for volume "pvc-14e6e723-0fa2-4b8f-8619-ae02f2b74cdc" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/longhorn/pvc-14e6e723-0fa2-4b8f-8619-ae02f2b74cdc /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/8a5f0348fbb5570ffccf61701ff8b83081168509f5904af61a35649c9f2cc7d9/globalmount
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/8a5f0348fbb5570ffccf61701ff8b83081168509f5904af61a35649c9f2cc7d9/globalmount: /dev/longhorn/pvc-14e6e723-0fa2-4b8f-8619-ae02f2b74cdc already mounted or mount point busy.
Warning FailedMount 3m15s (x12 over 28m) kubelet Unable to attach or mount volumes: unmounted volumes=[graphite-storage], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
Okay, here we see clearly the error "/dev/longhorn/pvc-14e6e723-0fa2-4b8f-8619-ae02f2b74cdc already mounted or mount point busy".
So, let us try the following:
$ kubectl delete pod -n graphite graphite-0
pod "graphite-0" deleted
$ kubectl get pods -n graphite graphite-0 -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
graphite-0 0/1 ContainerCreating 0 12m <none> n5 <none> <none>
Unfortunately, that did not fix it. So, we remove the PVC and recreated it, but no luck. To see the last events use the command:
$ kubectl get events --all-namespaces --sort-by='.metadata.creationTimestamp'
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
graphite 14m Warning FailedMount pod/graphite-0 MountVolume.MountDevice failed for volume "pvc-90ed5ccf-3152-4e0b-b795-cb985a7d83be" : rpc error: code = Internal desc = format of disk "/dev/longhorn/pvc-90ed5ccf-3152-4e0b-b795-cb985a7d83be" failed: type:("ext4") target:("/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/d3ebefb7681c1a1f9472812339e5a4342cca0dee55d743334be457f6a9cf1fb3/globalmount") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.46.4 (18-Aug-2021)...
graphite 5m8s Warning FailedMount pod/graphite-0 Unable to attach or mount volumes: unmounted volumes=[graphite-storage], unattached volumes=[], failed to process volumes=[]: timed out waiting for the condition
monitoring 36m Warning Unhealthy pod/grafana-544f695579-st45c Readiness probe failed: Get "http://10.42.4.117:3000/api/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Via some research we came across page Troubleshooting: MountVolume.SetUp failed for volume
due to multipathd on the node [2] which might be the solution for our problem here as after rebooting the complete cluster the graphite PVC was mounted correctly, but then the loki-stack PVC was not able to mount with the exact same error. Therefore, we can conclude there is nothing wrong with the PVC, but it is an interaction clash with the multipath daemon.
We update our k3s ansible playbook to also update the /etc/multipath.conf
file to blacklist /dev/sd*
devices.
POD stuck in ImagePullBackOff status¶
For example when we tried to upgrade out ntopng image with helm we got an error like:
$ kubectl get pods -n ntopng
NAME READY STATUS RESTARTS AGE
ntopng-6586866d8b-w668c 0/1 ImagePullBackOff 0 69m
The first thing we think of is a mismatch of our GitHub Container Registry (ghrc) Personal Access Token (PAT). How can we verify that the PAT our kubernetes cluster knows is still the same as the one listed in our ~/.ghcr-token
file?
As we know in which namespace to look (in our case ntopng) we can start digging as follow:
$ kubectl get secret -n ntopng
NAME TYPE DATA AGE
ntopng-ghrc kubernetes.io/dockerconfigjson 1 67m
ntopng Opaque 0 67m
sh.helm.release.v1.ntopng.v1 helm.sh/release.v1 1 67m
Right, as we suspect the ghrc secret lets extract this from out cluster:
$ kubectl get secret -n ntopng ntopng-ghrc --output="jsonpath={.data.\.dockerconfigjson}" | base64 --decode
{"auths":{"ghcr.io":{"username":"gdha","password":"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","auth":"Z2RoYTpnaHBfdXlPNElwUnJ6TmpFc25JTnN3NEQ1dVZRVWdkYVpSNDR2NGM2"}}}
There is our PAT key: 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
and indeed it was an old one which we revoked a time ago.
To fix this we must update our ghcr-secret.yaml
file and especially the .dockerconfigjson data section.
How? Well, as example I take the revoked old PAT key of ghrc:
$ kubectl create secret docker-registry dockerconfigjson-github-com --docker-server=ghcr.io --docker-username=gdha --docker-password=$(echo -n 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) --dry-run=client -oyaml
apiVersion: v1
data:
.dockerconfigjson: eyJhdXRocyI6eyJnaGNyLmlvIjp7InVzZXJuYW1lIjoiZ2RoYSIsInBhc3N3b3JkIjoiZ2hwX3V5TzRJcFJyek5qRXNuSU5zdzRENXVWUVVnZGFaUjQ0djRjNiIsImF1dGgiOiJaMlJvWVRwbmFIQmZkWGxQTkVsd1VuSjZUbXBGYzI1SlRuTjNORVExZFZaUlZXZGtZVnBTTkRSMk5HTTIifX19
kind: Secret
metadata:
creationTimestamp: null
name: dockerconfigjson-github-com
type: kubernetes.io/dockerconfigjson
How to debug a POD¶
Interesting topic as we stumbled over the pi4-ntopng POD which refused to start with an error:
Starting redis-server:
*** FATAL CONFIG FILE ERROR ***
Reading the configuration file, at line 171
>>> 'logfile /var/log/redis/redis-server.log'
Can't open the log file: Permission denied
failed
29/Sep/2023 15:30:47 [Redis.cpp:98] ERROR: Connection error [Connection refused]
29/Sep/2023 15:30:48 [Redis.cpp:80] Redis has disconnected, reconnecting [remaining attempts: 14]
Of course, the POD exits immediately. So, how can we debug our error? It might just be that there is more wrong the only the permission denied error... We found a good article [3] which gives us some more insight on how to dig deeper.
We need to override the "entrypoint" as it behaves as init 1, and therefore, we cannot just pass a bash command to it. Execute the following:
$ docker run -it --entrypoint /bin/bash ghcr.io/gdha/pi4-ntopng:5.7.0
This will drop us into a shell command from where we can peek around in the container to see what might be wrong. Indeed as the error above mentioned the ownership of /var/log/redis
was wrong:
root@131f7e6411c5:/var/log# ls -l
total 692
-rw-r--r-- 1 root root 4901 Sep 29 15:13 alternatives.log
drwxr-xr-x 1 root root 4096 Sep 29 15:12 apt
-rw-r--r-- 1 root root 58584 Jan 26 2023 bootstrap.log
-rw-rw---- 1 root utmp 0 Jan 26 2023 btmp
-rw-r--r-- 1 root root 282997 Sep 29 15:14 dpkg.log
-rw-r--r-- 1 root root 32000 Sep 29 15:14 faillog
-rw-r--r-- 1 root root 484 Sep 29 15:12 fontconfig.log
drwxr-sr-x 2 root root 4096 Sep 29 15:11 journal
-rw-rw-r-- 1 root root 296000 Sep 29 15:14 lastlog
drwx------ 2 root root 4096 Sep 29 15:11 private
drwxr-s--- 1 root root 4096 Sep 29 15:12 redis
-rw-rw-r-- 1 root utmp 0 Jan 26 2023 wtmp
root@131f7e6411c5:/var/log# chown -R redis:redis /var/log/redis
However, as we are inside the container lets try to start redis:
root@131f7e6411c5:~# /etc/init.d/redis-server start
Starting redis-server: failed
Hum, not 100% okay it seems. However, as the log file of redis should be available now we can check it out:
root@131f7e6411c5:~# cat /var/log/redis/redis-server.log
19:C 02 Oct 2023 07:11:40.148 # Can't chdir to '/var/lib/redis': Permission denied
33:C 02 Oct 2023 07:13:22.884 # Can't chdir to '/var/lib/redis': Permission denied
69:C 02 Oct 2023 07:17:39.861 # Can't chdir to '/var/lib/redis': Permission denied
Indeed still another directory to be tackled:
root@131f7e6411c5:~# chown -R redis:redis /var/lib/redis
We can start redis again and see what it says:
root@131f7e6411c5:~# /usr/bin/redis-server
37:C 02 Oct 2023 07:15:29.235 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
37:C 02 Oct 2023 07:15:29.236 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=37, just started
37:C 02 Oct 2023 07:15:29.236 # Warning: no config file specified, using the default config. In order to specify a config file use /usr/bin/redis-server /path/to/redis.conf
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 5.0.7 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 37
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
37:M 02 Oct 2023 07:15:29.258 # Server initialized
37:M 02 Oct 2023 07:15:29.258 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
37:M 02 Oct 2023 07:15:29.259 * Ready to accept connections
Yep, it seems to start, but the message around transparent_hugepage is quite interesting which we can take care of in our Dockerfile.
References¶
[1] PODs are stuck in terminating status
[2] Troubleshooting: MountVolume.SetUp failed for volume
due to multipathd on the node