Using Admission Controllers to Detect Container Drift at Runtime
Author: Saifuding Diliyaer (Box)
At Box, we use Kubernetes (K8s) to manage hundreds of micro-services that enable Box to stream data at a petabyte scale. When it comes to the deployment process, we run
To solve this problem, we developed our own K8s component called
Once a request is sent to K8s, it needs to be authenticated and authorized by the API server to proceed. Additionally, K8s has a separate layer of protection called admission controllers, which can intercept the request before an object is persisted in etcd. There are various predefined admission controls compiled into the API server binary (e.g. ResourceQuota to enforce hard resource usage limits per namespace). Besides, there are two dynamic admission controls named MutatingAdmissionWebhook and ValidatingAdmissionWebhook, used for mutating or validating K8s requests respectively. The latter is what we adopted to detect container drift at runtime caused by interactive kubectl commands. This whole process can be divided into three steps as explained in detail below. First of all, we needed to enable a validating webhook that sends qualified requests to kube-exec-controller. To add the new validation mechanism applying to interactive kubectl commands specifically, we configured the webhook’s rules with resources as Here is a short example of what our ValidatingWebhookConfiguration object looks like: Once a request of The admission control process cannot directly modify the targeted in its admission response. This is because the As you can see in the above event messages, the affected Pod is not evicted immediately. At times, developers might have to get into their running containers necessarily for debugging some live issues. Therefore, we define a time to live (TTL) of affected Pods based on the environment of clusters they are running. In particular, we allow a longer time in our dev clusters as it is more common to run For our production clusters, we specify a lower time limit so as to avoid the impacted Pods serving traffic abidingly. The kube-exec-controller internally sets and tracks a timer for each Pod that matches the associated TTL. Once the timer is up, the controller evicts that Pod using K8s API. The eviction (rather than deletion) is to ensure service availability, since the cluster respects any configured PodDisruptionBudget (PDB). Let's say if a user has defined x number of Pods as critical in their PDB, the eviction (as requested by kube-exec-controller) does not continue when the target workload has fewer than x Pods running. Here comes a sequence diagram of the entire workflow mentioned above: Our admission controller component works great for solving the container drift issue we had on the platform. It is also able to submit all related Events to the target Pod that has been affected. However, K8s clusters don't retain Events very long (the default retention period is one hour). We need to provide other ways for developers to get their Pod interaction activity. A kubectl plugin is a perfect choice for us to expose this information. We named our plugin When the The plugin can also be used to extend the TTL for a Pod that is marked for future eviction. This is useful in case developers need extra time to debug ongoing issues. To achieve this, a developer uses the Correspondingly, there is another webhook defined in kube-exec-controller which admits valid annotation updates. Once admitted, those updates reset the eviction timer of the target Pod as requested. An example of requesting the extension from the developer side would be: Although our admission controller service works great in handling interactive requests to a Pod, it could as well evict the Pod while the actual commands are no-op in these requests. For instance, developers sometimes run Another item worth pointing out here is the choice of using K8s labels and annotations. In our design, we decided to have all immutable metadata attached as labels for better enforcing the immutability in our admission control. Yet some of these metadata could fit better as annotations. For instance, we had a label with the key With the power of admission controllers, we are able to secure our K8s clusters by detecting potentially mutated containers at runtime, and evicting their Pods without affecting service availability. We also utilize kubectl plugins to provide flexibility of the eviction time and hence, bringing a better and more self-independent experience to service owners. We are proud to announce that we have open-sourced the whole project for the community to leverage in their own K8s clusters. Any contribution is more than welcomed and appreciated. You can find this project hosted on GitHub at
Special thanks to Ayush Sobti and Ethan Goldblum for their technical guidance on this project.Admission control for interactive kubectl commands
1. Admit interactive kubectl command requests
[pods/exec, pods/attach]
, and operations as CONNECT
. These rules tell the cluster's API server that all exec
and attach
requests should be subject to our admission control webhook. In the ValidatingAdmissionWebhook that we configured, we specified a service
reference (could also be replaced with url
that gives the location of the webhook) and caBundle
to allow validating its X.509 certificate, both under the clientConfig
stanza.apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: example-validating-webhook-config
webhooks:
- name: validate-pod-interaction.example.com
sideEffects: None
rules:
- apiGroups: ["*"]
apiVersions: ["*"]
operations: ["CONNECT"]
resources: ["pods/exec", "pods/attach"]
failurePolicy: Fail
clientConfig:
service:
# reference to kube-exec-controller service deployed inside the K8s cluster
name: example-service
namespace: kube-exec-controller
path: "/admit-pod-interaction"
caBundle: "{{VALUE}}" # PEM encoded CA bundle to validate kube-exec-controller's certificate
admissionReviewVersions: ["v1", "v1beta1"]
2. Label the target Pod with potentially mutated containers
kubectl exec
comes in, kube-exec-controller makes an internal note to label the associated Pod. The added labels mean that we can not only query all the affected Pods, but also enable the security mechanism to retrieve previously identified Pods, in case the controller service itself gets restarted.pods/exec
request is against a subresource of the Pod API, and the API kind for that subresource is PodExecOptions
. As a result, there is a separate process in kube-exec-controller that patches the labels asynchronously. The admission control always permits the exec
request, then acts as a client of the K8s API to label the target Pod and to log related events. Developers can check whether their Pods are affected or not using kubectl
or similar tools. For example:$ kubectl get pod --show-labels
NAME READY STATUS RESTARTS AGE LABELS
test-pod 1/1 Running 0 2s box.com/podInitialInteractionTimestamp=1632524400,box.com/podInteractorUsername=username-1,box.com/podTTLDuration=1h0m0s
$ kubectl describe pod test-pod
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning PodInteraction 5s admission-controller-service Pod was interacted with 'kubectl exec' command by user 'username-1' initially at time 2021-09-24 16:00:00 -0800 PST
Warning PodInteraction 5s admission-controller-service Pod will be evicted at time 2021-09-24 17:00:00 -0800 PST (in about 1h0m0s).
3. Evict the target Pod after a predefined period
kubectl exec
or other interactive commands for active development.A new kubectl plugin for better user experience
kubectl pi
(short for pod-interaction
) and provide two subcommands: get
and extend
.get
subcommand is called, the plugin checks the metadata attached by our admission controller and transfers it to human-readable information. Here is an example output from running kubectl pi get
:$ kubectl pi get test-pod
POD-NAME INTERACTOR POD-TTL EXTENSION EXTENSION-REQUESTER EVICTION-TIME
test-pod username-1 1h0m0s / / 2021-09-24 17:00:00 -0800 PST
kubectl pi extend
subcommand, where the plugin patches the relevant annotations for the given Pod. These annotations include the duration and username who made the extension request for transparency (displayed in the table returned from the kubectl pi get
command).$ kubectl pi extend test-pod --duration=30m
Successfully extended the termination time of pod/test-pod with a duration=30m
$ kubectl pi get test-pod
POD-NAME INTERACTOR POD-TTL EXTENSION EXTENSION-REQUESTER EVICTION-TIME
test-pod username-1 1h0m0s 30m username-2 2021-09-24 17:30:00 -0800 PST
Future improvement
kubectl exec
merely to check their service logs stored on hosts. Nevertheless, the target Pods would still get bounced despite the state of their containers not changing at all. One of the improvements here could be adding the ability to distinguish the commands that are passed to the interactive requests, so that no-op commands should not always force a Pod eviction. However, this becomes challenging when developers get a shell to a running container and execute commands inside the shell, since they will no longer be visible to our admission controller service.box.com/podInitialInteractionTimestamp
used to list all affected Pods in kube-exec-controller code, although its value would be unlikely to query for. As a more ideal design in the K8s world, a single label could be preferable in our case for identification with other metadata applied as annotations instead.Summary