Logging and Troubleshooting
Course Reading
Learning objectives
- Understand Kubernetes doesn't have integrated logging yet
- Learn what external tools are used to aggregate logs
- Learn basic troubleshooting flow
- Discuss the use of side-cars for in-pod logs
Overview
Because Kubernetes depends on API calls, it is sensitive to networking issues. Standard Linux tools can be the best for debugging issues on a cluster. If the pod you are trying to troubleshoot does not have a shell, you may need to deploy a sidecar container in that pod that has a shell to debug. DNS configuration tools like dig
are a good start, but as issues become more complex you may need to install more powerful tools like tcpdump
.
Large and diverse clusters have can become difficult to track, so it is essential to be able to monitor. By monitoring we mean being able to collect the most important metrics about cluster health, like CPU, memory, and disk usage, as well as network bandwidth. Above that, we may also want to monitor the key aspects of the application deployed in the cluster.
The Metric Server can be installed on the cluster to expose a standard API which can be consumed by other agents. Once installed the endpoint can be found on the cp node at /apis/metrics/k8s.io/
.
Logging the activity across all nodes is also not something currently built into Kubernetes. For this, something like Fluentd is useful to implement a unified logging layer on the cluster. Having an aggregated logging layer can allow for better visualization of a potential issue and gives the ability to globally search logs from the cluster. This is a good place to start looking if networking issues aren't the root cause.
Another options is Prometheus, which combines logging, metrics, and alerting capabilities. Prometheus also integrates well with dashboarding tools like Grafana.
There are also a handful of very useful commands withing kubectl
to help in troubleshooting issues across the cluster.
Basic troubleshooting steps
Troubleshoot should start with the obvious. If there are errors coming from the command line, those should be addressed first. The symptoms of the issue will often determine the next steps if troubleshooting continues past the first things checked. Often it is a good idea to debug from the running container as long as it has a shell that can be accessed, typically by running
ubuntu@cp:~ $ kubectl exec -it <pod_name> -- /bin/sh
If the pod is running, it is good to check the logs with
ubuntu@cp:~ $ kubectl logs <pod_name>
This will give the standard output of the running container. If you have no logs, you may want to deploy a sidecar to generate and handle logs.
The next step would usually be networking, which would include DNS firewalls, and general connectivity. This is done with the traditional Linux tools to debug networking issues.
If networking was not the root cause, then security may be the next thing to check. RBAC (which is discussed later), SELinux, and AppArmor are all common issues, especially with network-centric applications.
Newer Kubernetes features allow for auditing on the kube-apiserver, which is useful for viewing actions after API calls have been accepted.
In summary, the general troubleshooting flow is:
- Errors in the command line
- Pod logs and status
- Network troubleshooting in the pod's container
- Checking node logs to ensure enough resources are available
- Security (RBAC, SELinux, AppArmor)
- API calls to and from the API
- Enable kube-apiserver auditing
- Networking issues between nodes (DNS and firewall)
- cp node controllers (controlling pods in a pending or errored state)
Ephemeral containers
Added in v1.16 (beta as of v1.23, and stable as of v1.25), Kubernetes allow for a container to be added to an already running pod. This allows us to deploy a fully-featured container to an existing pod so we do not need to terminate and re-create. This is helpful, especially if the issue is intermittent or difficult to reproduce.
Ephemeral container are added via the ephemeralcontainers
handler via API call, not via the podSpec. Therefore, kubectl edit
cannot be used.
Instead you could use a kubectl attach
to join an existing process in the container. This may be helpful instead of a kubectl exec
. The functionality depend on the process being attached to.
You can deploy a new ephemeral container with kubectl debug
.
Cluster start sequence
Cluster startup beings with systemd (when originally built with kubeadm
, may use another method for other initialization tools). You can use systemctl status kubelet.service
to see the current state and which configuration files are used to run the kubelet. The service uses the /etc/systemd/kubelet.service.d/10-kubeadm.conf
to start-up the kubelet.
The service status shows /var/lib/kubelet/config.yaml
as the configuration file use to set certain setting of the kubelet binary. One important setting is the staticPodPath
which specifies where the kubelet will ready every YAML file to start a pod. By default it is set to /etc/kubernetes/manifests/
. Adding a yaml file here can be useful for troubleshooting the scheduler, as any request to the scheduler should create that pod.
Four default YAMLs start the base pods necessary for the kube-apiserver, etcd, kube-controller-manager, and kube-scheduler.
Once these four are up and running, the rest of the configured objects are created.
Monitoring
Monitoring can be summarized as collecting metrics from the infrastructure and applications running on it.
Kubernetes used to use the now depreciated Heapster. Instead, Kubernetes now has an integrated metrics server, which once set up, exposes a standard API for other agent to leverage. Custom metrics can also be exposed and used. Once example might be an autoscaler using a custom metric to determine when a new node needs to be scaled.
Tools like Prometheus, OpenTelemetry, and Jaegar are often used for these use cases.
Using krew
kubectl
has the ability to be extended with plugins. Plugins cannot override existing kubectl
commands or add subcommands to them. Declaration of namespace or container must come after the plugin name, for example:
ubuntu@cp:~ $ kubectl sniff some-pod -c primary-container -n different-namespace
One of the best ways to discover and install new plugins is with krew
, which is a plugin manager for kubectl
. You can find out more about installing it on the krew
Github repo.
Once installed you could run kubectl krew help
to view the documentation on how to use krew
.
More info about extending kubectl
can be found in the official docs.
Once krew
is installed, make sure to add it to your PATH
ubuntu@cp:~ $ export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"
Subcommand | Usage |
---|---|
help | Explain usage of rest of commands |
info | Shows info about kubectl plugin specified |
install | Add new plugin to kubectl |
list | Show installed plugins |
search | Search for new plugins |
uninstall | Uninstall an existing plugin |
update | Update local plugin index |
upgrade | Upgrades local plugin index to newer version |
version | Get version info for krew |
Example: Sniffing traffic with Wireshark
Cluster network traffic is encrypted which makes debugging potential networking issues more challenging. The sniff
plugin for kubectl
allows you to view traffic from within. It requires Wireshark to view graphical info.
sniff
uses the first container found unless specified with the -c
option. Here's an example of its use:
ubuntu@cp:~ $ kubectl sniff <pod name> -c <container name>
Logging tools
Logging, like monitoring, is a vast topic with many available tools.
Typically logs will be collected and aggregated locally before being ingested into some search engine. Many software stacks exist for this purpose, with the most common being the ELK stack (Elasticsearch, Logstash, Kibana).
The kubelet writes logs to local files, which the kubectl log
command retrieves. For cluster-wide log aggregation, something like `Fluentd can be used. The official Kubernetes docs also describe how to architect a logging solution.
Additional troubleshooting resources
Lab Exercises
Lab 13.1 - Review log file locations
If using a systemd based cluster, you can view the node level kubelet logs with journalctl
. Each node will have different logs as they are specific to each node.
ubuntu@cp:~ $ journalctl -u kubelet | less
Many major processes for the cluster are containerized. An example of finding and then viewing those logs for the kube-apiserver would be:
ubuntu@cp:~ $ sudo find / -name "*apiserver*log"
ubuntu@cp:~ $ sudo less <path to log file>
You could do something simlar to find logs for things like the kube-proxy, CoreDNS, the kube-scheduler, and other controllers.
If you are not running a systemd cluster and logs aren't collected via systemd, then you can view the logs in the following locations on the cp node:
/var/log/kube-apiserver.log
for kube-apiserver logs/var/log/kube-scheduler.log
for kube-scheduler logs/var/log/kube-controller-manager.log
for kube-controller-manager logs/var/log/containers/
is the directory for any running container logs/var/log/pods/
is the directory for any running pod logs
On the worker node, you would also be able to see /var/log/kubelet.log
and /var/log/kube-proxy.log
.
13.2 - Viewing log output
To view logs from running pods you can also use kubectl
like so:
ubuntu@cp:~ $ kubectl get pods --all-namespaces
ubuntu@cp:~ $ kubectl -n <namespace of pod> logs <pod name from desired ns>
13.3 - Adding tools for monitoring and metrics
Start by cloning the metrics-server repo and viewing the README
ubuntu@cp:~ $ git clone https://github.com/kubernetes-sigs/metrics-server.git
ubuntu@cp:~ $ cd metrics-server
ubuntu@cp:metrics-server $ cat README.md
ubuntu@cp:metrics-server $ cd ..
Once you've read the README, install the required objects and verify they are there. Replace latest
in the URL with a specific version if you need an older version.
ubuntu@cp:~ $ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
ubuntu@cp:~ $ kubectl -n kube-system get pods
Now edit the deployment to allow for insecure TLS. Do this by adding - --kubelet-insecure-tls
to the end of the container args. You may also need to add - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
as well. Once edited, make sure the pod is still running and not showing any errors in the logs.
ubuntu@cp:~ $ kubectl -n kube-system edit deployment metrics-server
ubuntu@cp:~ $ kubectl -n kube-system get pods
ubuntu@cp:~ $ kubectl -n kube-system logs <metrics-server pod name>
Now check that the metrics for the nods and pods. Sometimes it may take a few minutes for the metrics to populate and not throw an error.
ubuntu@cp:~ $ sleep 120
ubuntu@cp:~ $ kubectl top pods --all-namespaces
ubuntu@cp:~ $ kubectl top nodes
Configure dashboard
Firstly, it should be noted the dashboard is not necessarily as fully featured as the CLI, as those who develop the tooling typically use the CLI instead.
Search the artifact hub for the kubernetes-dashboard
chart from helm
. Edit the values.yaml
file and change the service type to a NodePort
. Then install the chart with the name dashboard
.
ubuntu@cp:~ $ helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
ubuntu@cp:~ $ helm fetch kubernetes-dashboard/kubernetes-dashboard --untar
ubuntu@cp:~ $ cd kubernetes-dashboard
ubuntu@cp:kubernetes-dashboard $ vim values.yaml
ubuntu@cp:kubernetes-dashboard $ helm install dashboard .
ubuntu@cp:kubernetes-dashboard $ cd ..
We will still need to give the dashboard admin access to the rest of the cluster. In normal production cases, you'd probably not want to do this. First we'll find the right securityaccount
(more about these later), then create a clusterrolebinding
.
ubuntu@cp:~ $ kubectl get serviceaccount
ubuntu@cp:~ $ kubectl create clusterrolebinding dashaccess --clusterrole=cluster-admin --serviceaccount=default:dashboard-kubernetes-dashboard
Now navigate the the dashboard in your browser using the public IP of the server and the high number port (you can find this with a kubectl get svc
).
Select the token sign-in option, you can get this from the dashboard token. Once logged in, explore the dashboard's features.
ubuntu@cp:~ $ kubectl describe secret dashboard-kubernetes-dashboard-token-<unique id>
Knowledge check
- Kubernetes does not have built-in, cluster-wide logging
- If a container does not generate logs, you could use a sidecar container to generate and handle pod logging