Hardening Kubernetes

The previous section cataloged and listed the variety of security challenges facing developers and administrators deploying and maintaining Kubernetes clusters. In this section, we will hone in on the design aspects, mechanisms, and features offered by Kubernetes to address some of the challenges. You can get to a pretty good state of security by judicious use of capabilities such as service accounts, network policies, authentication, authorization, admission control, AppArmor, and secrets.

Remember that a Kubernetes cluster is one part of a bigger system that includes other software systems, people, and processes. Kubernetes can't solve all problems. You should always keep in mind general security principles, such as defense in depth, a need-to-know basis, and the principle of least privilege. In addition, log everything you think may be useful in the event of an attack and have alerts for early detection when the system deviates from its state. It may be just a bug or it may be an attack. Either way, you want to know about it and respond.

Understanding service accounts in Kubernetes

Kubernetes has regular users that are managed outside the cluster for humans connecting to the cluster (for example, via the kubectl command), and it has service accounts.

Regular user accounts are global and can access multiple namespaces in the cluster. Service accounts are constrained to one namespace. This is important. It ensures namespace isolation, because whenever the API server receives a request from a pod, its credentials will apply only to its own namespace.

Kubernetes manages service accounts on behalf of the pods. Whenever Kubernetes instantiates a pod, it assigns the pod a service account. The service account identifies all the pod processes when they interact with the API server. Each service account has a set of credentials mounted in a secret volume. Each namespace has a default service account called default. When you create a pod, it is automatically assigned the default service account unless you specify a different service account.

You can create additional service accounts. Create a file called custom-service-account.yaml with the following content:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: custom-service-account

Now type the following:

$ kubectl create -f custom-service-account.yaml
serviceaccount/custom-service-account created

Here is the service account listed alongside the default service account:

$ kubectl get serviceAccounts
NAME                                    SECRETS   AGE
custom-service-account                  1         39s
default                                 1         18d

Note that a secret was created automatically for your new service account.

To get more detail, type the following:

$ kubectl get serviceAccounts/custom-service-account -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  creationTimestamp: "2020-06-01T01:24:24Z"
  name: custom-service-account
  namespace: default
  resourceVersion: "654316"
  selfLink: /api/v1/namespaces/default/serviceaccounts/custom-service-account
  uid: 69393e47-c3b2-11e9-bb43-0242ac130002
secrets:
- name: custom-service-account-token-kdwhs

You can see the secret itself, which includes a ca.crt file and a token, by typing the following:

$ kubectl get secret custom-service-account-token-kdwhs -o yaml
How does Kubernetes manage service accounts?

The API server has a dedicated component called the service account admission controller. It is responsible for checking, at pod creation time, if the API server has a custom service account and, if it does, that the custom service account exists. If there is no service account specified, then it assigns the default service account.

It also ensures the pod has ImagePullSecrets, which are necessary when images need to be pulled from a remote image registry. If the pod spec doesn't have any secrets, it uses the service account's ImagePullSecrets.

Finally, it adds a volume with an API token for API access and a volumeSource mounted at /var/run/secrets/kubernetes.io/serviceaccount.

The API token is created and added to the secret by another component called the Token Controller whenever a service account is created. The Token Controller also monitors secrets and adds or removes tokens wherever secrets are added to or removed from a service account.

The service account controller ensures the default service account exists for every namespace.

Accessing the API server

Accessing the API server requires a chain of steps that include authentication, authorization, and admission control. At each stage, the request may be rejected. Each stage consists of multiple plugins that are chained together.

The following diagram illustrates this:

Figure 4.1: Accessing the API server

Authenticating users

When you first create the cluster, some keys and certificates are created for you to authenticate against the cluster. Kubectl uses them to authenticate itself to the API server and vice versa over TLS (an encrypted HTTPS connection). You can view your configuration using this command:

$ kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: DATA+OMITTED
    server: https://localhost:6443
  name: default
contexts:
- context:
    cluster: default
    user: default
  name: default
current-context: default
kind: Config
preferences: {}
users:
- name: default
  user:
    password: DATA+OMITTED
    username: admin

This is the configuration for a k3d cluster. It may look different for other types of clusters.

Note that if multiple users need to access the cluster, the creator should provide the necessary client certificates and keys to the other users in a secure manner.

This is just establishing basic trust with the Kubernetes API server itself. You're not authenticated yet. Various authentication modules may look at the request and check for various additional client certificates, passwords, bearer tokens, and JWT tokens (for service accounts). Most requests require an authenticated user (either a regular user or a service account), although there are some anonymous requests too. If a request fails to authenticate with all the authenticators it will be rejected with a 401 HTTP status code (unauthorized, which is a bit of a misnomer).

The cluster administrator determines what authentication strategies to use by providing various command-line arguments to the API server:

  • --client-ca-file= (for x509 client certificates specified in a file)
  • --token-auth-file= (for bearer tokens specified in a file)
  • --basic-auth-file= (for user/password pairs specified in a file)
  • --enable-bootstrap-token-auth (for bootstrap tokens used by kubeadm)

Service accounts use an automatically loaded authentication plugin. The administrator may provide two optional flags:

  • --service-account-key-file= (A PEM-encoded key for signing bearer tokens. If unspecified, the API server's TLS private key will be used.)
  • --service-account-lookup (If enabled, tokens that are deleted from the API will be revoked.)

There are several other methods, such as OpenID Connect, webhooks, Keystone (the OpenStack identity service), and an authenticating proxy. The main theme is that the authentication stage is extensible and can support any authentication mechanism.

The various authentication plugins will examine the request and, based on the provided credentials, will associate the following attributes:

  • username (a user-friendly name)
  • uid (a unique identifier and more consistent than the username)
  • groups (a set of group names the user belongs to)
  • extra fields (these map string keys to string values)

In Kubernetes 1.11, kubectl gained the ability to use credential plugins to receive an opaque token from a provider such as an organizational LDAP server. These credentials are sent by kubectl to the API server that typically uses a webhook token authenticator to authenticate the credentials and accept the request.

The authenticators have no knowledge whatsoever of what a particular user is allowed to do. They just map a set of credentials to a set of identities. The authenticators run in an unspecified order; the first authenticator to accept the passed credentials will associate an identity with the incoming request and the authentication is considered successful. If all authenticators reject the credentials then authentication failed.

Impersonation

It is possible for users to impersonate different users (with proper authorization). For example, an admin may want to troubleshoot some issue as a different user with fewer privileges. This requires passing impersonation headers to the API request. The headers are as follows:

  • Impersonate-User: The username to act as.
  • Impersonate-Group: A group name to act as. Can be provided multiple times to set multiple groups. Optional. Requires Impersonate-User.
  • Impersonate-Extra-(extra name): A dynamic header used to associate extra fields with the user. Optional. Requires Impersonate-User.

With kubectl, you pass --as and --as-group parameters.

Authorizing requests

Once a user is authenticated, authorization commences. Kubernetes has generic authorization semantics. A set of authorization modules receives the request, which includes information such as the authenticated username and the request's verb (list, get, watch, create, and so on). Unlike authentication, all authorization plugins will get a shot at any request. If a single authorization plugin rejects the request or no plugin had an opinion then it will be rejected with a 403 HTTP status code (forbidden). A request will continue only if at least one plugin accepts it and no other plugin rejected it.

The cluster administrator determines what authorization plugins to use by specifying the --authorization-mode command-line flag, which is a comma-separated list of plugin names.

The following modes are supported:

  • --authorization-mode=AlwaysDeny rejects all requests. Use if you don't need authorization.
  • --authorization-mode=AlwaysAllow allows all requests. Use if you don't need authorization. This is useful during testing.
  • --authorization-mode=ABAC allows for a simple, local file-based, user-configured authorization policy. ABAC stands for Attribute-Based Access Control.
  • --authorization-mode=RBAC is a role-based mechanism where authorization policies are stored and driven by the Kubernetes API. RBAC stands for Role-Based Access Control.
  • --authorization-mode=Node is a special mode designed to authorize API requests made by kubelets.
  • --authorization-mode=Webhook allows for authorization to be driven by a remote service using REST.

You can add your own custom authorization plugin by implementing the following straightforward Go interface:

type Authorizer interface {
  Authorize(a Attributes) (authorized bool, reason string, err error)
}

The Attributes input argument is also an interface that provides all the information you need to make an authorization decision:

type Attributes interface {
  GetUser() user.Info
  GetVerb() string
  IsReadOnly() bool
  GetNamespace() string
  GetResource() string
  GetSubresource() string
  GetName() string
  GetAPIGroup() string
  GetAPIVersion() string
  IsResourceRequest() bool
  GetPath() string
}

You can find the source code at https://github.com/kubernetes/apiserver/blob/master/pkg/authorization/authorizer/interfaces.go.

Using the kubectl can-i command, you check what actions you can perform and even impersonate other users:

$ kubectl auth can-i create deployments
Yes
$ kubectl auth can-i create deployments --as jack
no
Using admission control plugins

OK. The request was authenticated and authorized, but there is one more step before it can be executed. The request must go through a gauntlet of admission-control plugins. Similar to the authorizers, if a single admission controller rejects a request, it is denied.

Admission controllers are a neat concept. The idea is that there may be global cluster concerns that could be grounds for rejecting a request. Without admission controllers, all authorizers would have to be aware of these concerns and reject the request. But, with admission controllers, this logic can be performed once. In addition, an admission controller may modify the request. Admission controllers run in either validating mode or mutating mode. As usual, the cluster administrator decides which admission control plugins run by providing a command-line argument called admission-control. The value is a comma-separated and ordered list of plugins. Here is the list of recommended plugins for Kubernetes >= 1.9 (the order matters):

--admission-control=NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,DefaultTolerationSeconds

Let's look at some of the available plugins (more are added all the time):

  • DefaultStorageClass: Adds a default storage class to requests for the creation of a PersistentVolumeClaim that doesn't specify a storage class.
  • DefaultTolerationSeconds: Sets the default toleration of pods for taints (if not set already): notready:NoExecute and notreachable:NoExecute.
  • EventRateLimit: Limits flooding of the API server with events (new in Kubernetes 1.9).
  • ExtendedResourceToleration: Combine taints on nodes with special resources such as GPUs and Field Programmable Gate Array (FPGA) with toleration on pods that request those resources. The end result is that the node with the extra resources will be dedicated for pods with the proper toleration.
  • ImagePolicyWebhook: This complicated plugin connects to an external backend to decide whether a request should be rejected based on the image.
  • LimitPodHardAntiAffinity: Denies any pod that defines the AntiAffinity topology key other than kubernetes.io/hostname in requiredDuringSchedulingRequiredDuringExecution.
  • LimitRanger: Rejects requests that violate resource limits.
  • MutatingAdmissionWebhook: Calls registered mutating webhooks that are able to modify their target object. Note that there is no guarantee that the change will be effective due to potential changes by other mutating webhooks.
  • NamespaceAutoProvision: Creates the namespace in the request if it doesn't exist already.
  • NamespaceLifecycle: Rejects object creation requests in namespaces that are in the process of being terminated or don't exist.
  • PodSecurityPolicy: Rejects a request if the request security context doesn't conform to pod security policies.
  • ResourceQuota: Rejects requests that violate the namespace's resource quota.
  • ServiceAccount: Automation for service accounts.
  • ValidatingAdmissionWebhook: This admission controller calls any validating webhooks that match the request. Matching webhooks are called in parallel; if any of them rejects the request, the request fails.

As you can see, the admission control plugins have very perse functionality. They support namespace-wide policies and enforce validity of requests mostly from the resource management and security points of view. This frees up the authorization plugins to focus on valid operations. ImagePolicyWebHook is the gateway to validating images, which is a big challenge. MutatingAdmissionWebhook and ValidatingAdmissionWebhook are the gateways to dynamic admission control, where you can deploy your own admission controller without compiling it into Kubernetes. Dynamic admission control is suitable for tasks like semantic validation of resources (do all pods have the standard set of labels?).

The pision of responsibility for validating an incoming request through the separate stages of authentication, authorization, and admission, each with its own plugins, makes a complicated process much more manageable to understand and use.

The mutating admission controllers provide a lot of flexibility and the ability to automatically enforce certain policies without burdening the users (for example, creating a namespace automatically if it doesn't exist).

Securing pods

Pod security is a major concern, since Kubernetes schedules the pods and lets them run. There are several independent mechanisms for securing pods and containers. Together these mechanisms support defense in depth, where, even if an attacker (or a mistake) bypasses one mechanism, it will get blocked by another.

Using a private image repository

This approach gives you a lot of confidence that your cluster will only pull images that you have previously vetted, and you can manage upgrades better. You can configure your HOME/.docker/config.json on each node. But, on many cloud providers, you can't do this because nodes are provisioned automatically for you.

ImagePullSecrets

This approach is recommended for clusters on cloud providers. The idea is that the credentials for the registry will be provided by the pod, so it doesn't matter what node it is scheduled to run on. This circumvents the problem with .dockercfg at the node level.

First, you need to create a secret object for the credentials:

$ kubectl create secret the-registry-secret
  --docker-server=<docker registry server>
  --docker-username=<username>
  --docker-password=<password>
  --docker-email=<email>
secret 'docker-registry-secret' created.

You can create secrets for multiple registries (or multiple users for the same registry) if needed. The kubelet will combine all ImagePullSecrets.

But, since pods can access secrets only in their own namespace, you must create a secret on each namespace where you want the pod to run.

Once the secret is defined, you can add it to the pod spec and run some pods on your cluster. The pod will use the credentials from the secret to pull images from the target image registry:

apiVersion: v1
kind: Pod
metadata:
  name: cool-pod
  namespace: the-namespace
spec:
  containers:
    - name: cool-container
      image: cool/app:v1
  imagePullSecrets:
    - name: the-registry-secret
Specifying a security context

A security context is a set of operating-system-level security settings such as UID, gid, capabilities, and SELinux role. These settings are applied at the container level as a container security context. You can specify a pod security context that will apply to all the containers in the pod. The pod security context can also apply its security settings (in particular, fsGroup and seLinuxOptions) to volumes.

Here is a sample pod security context:

apiVersion: v1
kind: Pod
metadata:
  name: hello-world
spec:
  containers:
    ...
  securityContext:
    fsGroup: 1234
    supplementalGroups: [5678]
    seLinuxOptions:
      level: 's0:c123,c456'

The container security context is applied to each container and it overrides the pod security context. It is embedded in the containers section of the pod manifest. Container context settings can't be applied to volumes, which remain at the pod level.

Here is a sample container security context:

apiVersion: v1
kind: Pod
metadata:
  name: hello-world
spec:
  containers:
    - name: hello-world-container
      # The container definition
      # ...
      securityContext:
        privileged: true
        seLinuxOptions:
          level: 's0:c123,c456'
Protecting your cluster with AppArmor

AppArmor is a Linux kernel security module. With AppArmor, you can restrict a process running in a container to a limited set of resources such as network access, Linux capabilities, and file permissions. You configure AppArmor through profiles.

Requirements

AppArmor support was added as beta in Kubernetes 1.4. It is not available for every operating system, so you must choose a supported OS distribution in order to take advantage of it. Ubuntu and SUSE Linux support AppArmor and enable it by default. Other distributions have optional support. To check if AppArmor is enabled, type the following:

cat /sys/module/apparmor/parameters/enabled
Y

If the result is Y then it's enabled.

The profile must be loaded into the kernel. Check the following file:

/sys/kernel/security/apparmor/profiles

Also, only the Docker runtime supports AppArmor at this time.

Securing a pod with AppArmor

Since AppArmor is still in beta, you specify the metadata as annotations and not as bonafide fields. When it gets out of beta, this will change.

To apply a profile to a container, add the following annotation:

container.apparmor.security.beta.kubernetes.io/: 

The profile reference can be either the default profile, runtime/default, or a profile file on the host/localhost.

Here is a sample profile that prevents writing to files:

#include <tunables/global>
profile k8s-apparmor-example-deny-write flags=(attach\_disconnected) {
  #include <abstractions/base>
  file,
  # Deny all file writes.
  deny /\*\* w,
}

AppArmor is not a Kubernetes resource, so the format is not the YAML or JSON you're familiar with.

To verify the profile was attached correctly, check the attributes of process 1:

kubectl exec <pod-name> cat /proc/1/attr/current

Pods can be scheduled on any node in the cluster by default. This means the profile should be loaded into every node. This is a classic use case for DaemonSet.

Writing AppArmor profiles

Writing profiles for AppArmor by hand is not trivial. There are some tools that can help: aa-genprof and aa-logprof can generate a profile for you and assist in fine-tuning it by running your application with AppArmor in complain mode. The tools keep track of your application's activity and AppArmor warnings, and create a corresponding profile. This approach works, but it feels clunky.

My favorite tool is bane (https://github.com/jessfraz/bane), which generates AppArmor profiles from a simpler profile language based on the TOML syntax. Bane profiles are very readable and easy to grasp. Here is a snippet from a bane profile:

Name = 'nginx-sample'
[Filesystem]
# read only paths for the container
ReadOnlyPaths = [
  '/bin/\*\*',
  '/boot/\*\*',
  '/dev/\*\*',
]
# paths where you want to log on write
LogOnWritePaths = [
  '/\*\*'
]
# allowed capabilities
[Capabilities]
Allow = [
  'chown',
  'setuid',
]
[Network]
Raw = false
Packet = false
Protocols = [
  'tcp',
  'udp',
  'icmp'
]

The generated AppArmor profile is pretty gnarly.

Pod security policies

Pod security policies (PSPs) are available as beta since Kubernetes 1.4. It must be enabled, and you must also enable the PSP admission control to use them. A PSP is defined at the cluster level and defines the security context for pods. There are a couple of differences between using a PSP and directly specifying a security context in the pod manifest, as we did earlier:

  • Apply the same policy to multiple pods or containers
  • Let the administrator control pod creation so users don't create pods with inappropriate security contexts
  • Dynamically generate a different security context for a pod via the admission controller

PSPs really scale the concept of security contexts. Typically, you'll have a relatively small number of security policies compared to the number of pods (or rather, pod templates). This means that many pod templates and containers will have the same security policy. Without PSP, you have to manage it inpidually for each pod manifest.

Here is a sample PSP that allows everything:

kind: PodSecurityPolicy
apiVersion: extensions/v1beta1policy/v1beta1
metadata:
 name: permissive
spec:
 seLinux:
 rule: RunAsAny
 supplementalGroups:
 rule: RunAsAny
 runAsUser:
 rule: RunAsAny
 fsGroup:
 rule: RunAsAny
 volumes:
  - "\*"

As you can see it is much more human-readable than AppArmor, and is available on every OS and runtime.

Authorizing pod security policies via RBAC

This is the recommended way to enable the use of policies. Let's create a ClusterRole (Role works too) to grant access to use the target policies. It should look like the following:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: <role name>
rules:
- apiGroups: ['extensionspolicy']
  resources: ['podsecuritypolicies']
  verbs:   ['use']
  resourceNames:
  - <list of policies to authorize>

Then, we need to bind the cluster role to the authorized users:

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
 name: <binding name>
roleRef:
 kind: ClusterRole
 name: <role name>
 apiGroup: rbac.authorization.k8s.io
subjects:
 - < list of authorized service accounts >

Here is a specific service account:

- kind: ServiceAccount
  name: <authorized service account name>
  namespace: <authorized pod namespace>

You can also authorize specific users, but it's not recommended:

- kind: User
  apiGroup: rbac.authorization.k8s.io
  name: <authorized user name>

If using a role binding instead of cluster role binding, then it will apply only to pods in the same namespace as the binding. This can be paired with system groups to grant access to all pods run in the namespace:

- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:serviceaccounts

Or equivalently, granting access to all authenticated users in a namespace is done as follows:

- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:authenticated

Managing network policies

Node, pod, and container security is imperative, but it's not enough. Network segmentation is critical to design secure Kubernetes clusters that allow multi-tenancy, as well as to minimize the impact of security breaches. Defense in depth mandates that you compartmentalize parts of the system that don't need to talk to each other, while also carefully managing the direction, protocols, and ports of traffic.

Network policies allow the fine-grained control and proper network segmentation of your cluster. At the core, a network policy is a set of firewall rules applied to a set of namespaces and pods selected by labels. This is very flexible because labels can define virtual network segments and be managed as a Kubernetes resource.

This is a huge improvement over trying to segment your network using traditional approaches like IP address ranges and subnet masks, where you often run out of IP addresses or allocate too many just in case.

Choosing a supported networking solution

Some networking backends (network plugins) don't support network policies. For example, the popular Flannel can't be used to apply policies. This is critical. You will be able to define network policies even if your network plugin doesn't support them. Your policies will simply have no effect, giving you a false sense of security.

Here is a list of network plugins that support network policies (both ingress and egress):

  • Calico
  • WeaveNet
  • Canal
  • Cillium
  • Kube-Router
  • Romana
  • Contiv

If you run your cluster on a managed Kubernetes service then the choice has already been made for you.

We will explore the ins and outs of network plugins in Chapter 10, Exploring Advanced Networking. Here we focus on network policies.

Defining a network policy

You define a network policy using a standard YAML manifest.

Here is a sample policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
 name: the-network-policy
 namespace: default
spec:
  podSelector:
    matchLabels:
      role: db
  ingress:
   - from:
     - namespaceSelector:
         matchLabels:
           project: cool-project
     - podSelector:
          matchLabels:
            role: frontend
    ports:
     - protocol: tcp
       port: 6379

The spec part has two important parts, the podSelector and the ingress. The podSelector governs which pods this network policy applies to. The ingress governs which namespaces and pods can access these pods and which protocols and ports they can use.

In the preceding sample network policy, the pod selector specified the target for the network policy to be all the pods that are labeled role: db. The ingress section has a from sub-section with a namespace selector and a pod selector. All the namespaces in the cluster that are labeled project: cool-project, and within these namespaces, all the pods that are labeled role: frontend can access the target pods labeled role: db. The ports section defines a list of pairs (protocol and port) that further restrict what protocols and ports are allowed. In this case, the protocol is tcp and the port is 6379 (the standard Redis port).

Note that the network policy is cluster-wide, so pods from multiple namespaces in the cluster can access the target namespace. The current namespace is always included, so even if it doesn't have the project:cool label, pods with role:frontend can still have access.

It's important to realize that the network policy operates in a whitelist fashion. By default, all access is forbidden, and the network policy can open certain protocols and ports to certain pods that match the labels. However, the whitelist nature of the network policy applies only to pods that are selected for at least one network policy. If a pod is not selected it will allow all access. Always make sure all your pods are covered by a network policy.

Another implication of the whitelist nature is that, if multiple network policies exist, then the unified effect of all the rules applies. If one policy gives access to port 1234 and another gives access to port 5678 for the same set of pods, then a pod may be accessed through either 1234 or 5678.

To use network policies responsibly, consider starting with a deny-all network policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
 name: deny-all
spec:
 podSelector: {}
 policyTypes:
  - Ingress
  - Egress

Then, start adding network policies to allow ingress to specific pods explicitly. Note that you must apply the deny-all policy for each namespace:

$ kubectl -n <namespace> create -f deny-all-network-policy.yaml
Limiting egress to external networks

Kubernetes 1.8 added egress network policy support, so you can control outbound traffic too. Here is an example that prevents access to the external IP 1.2.3.4. The order: 999 ensures the policy is applied before other policies:

apiVersion: v1
kind: policy
metadata:
  name: default-deny-egress
spec:
  order: 999
  egress:
  - action: deny
    destination:
      net: 1.2.3.4
    source: {}
Cross-namespace policies

If you pide your cluster into multiple namespaces, it can come in handy sometimes if pods can communicate across namespaces. You can specify the ingress.namespaceSelector field in your network policy to enable access from multiple namespaces. This is useful, for example, if you have production and staging namespaces and you periodically populate your staging environments with snapshots of your production data.

Using secrets

Secrets are paramount in secure systems. They can be credentials such as usernames and passwords, access tokens, API keys, certificates, or crypto keys. Secrets are typically small. If you have large amounts of data you want to protect, you should encrypt it and keep the encryption/decryption keys as secrets.

Storing secrets in Kubernetes

Kubernetes used to store secrets in etcd as plaintext by default. This means that direct access to etcd should be limited and carefully guarded. Starting with Kubernetes 1.7, you can now encrypt your secrets at rest (when they're stored by etcd).

Secrets are managed at the namespace level. Pods can mount secrets either as files via secret volumes or as environment variables. From a security standpoint, this means that any user or service that can create a pod in a namespace can have access to any secret managed for that namespace. If you want to limit access to a secret, put it in a namespace accessible to a limited set of users or services.

When a secret is mounted into a container, it is never written to disk. It is stored in tmpfs. When the kubelet communicates with the API server, it normally uses TLS, so the secret is protected in transit.

Configuring encryption at rest

You need cto pass this argument when you start the API server:

--encryption-provider-config

Here is a sample encryption config:

apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
    - secrets
    providers:
    - identity: {}
    - aesgcm:
        keys:
        - name: key1
          secret: c2VjcmV0IGlzIHNlY3VyZQ==
        - name: key2
          secret: dGhpcyBpcyBwYXNzd29yZA==
    - aescbc:
        keys:
        - name: key1
          secret: c2VjcmV0IGlzIHNlY3VyZQ==
        - name: key2
          secret: dGhpcyBpcyBwYXNzd29yZA==
    - secretbox:
        keys:
        - name: key1
          secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY=
Creating secrets

Secrets must be created before you try to create a pod that requires them. The secret must exist; otherwise, the pod creation will fail.

You can create secrets with the following command: kubectl create secret.

Here I create a generic secret called hush-hush, which contains two keys, a username and password:

$ kubectl create secret generic hush-hush --from-literal=username=tobias --from-literal=password=cutoffs
secret/hush-hush created

The resulting secret is opaque:

$ kubectl describe secrets/hush-hush
Name:         hush-hush
Namespace:    default
Labels:       <none>
Annotations:  <none>
Type:  Opaque
Data
====
password:  7 bytes
username:  6 bytes

You can create secrets from files using --from-file instead of --from-literal, and you can also create secrets manually if you encode the secret value as base64.

Key names inside a secret must follow the rules for DNS sub-domains (without the leading dot).

Decoding secrets

To get the content of a secret you can use kubectl get secret:

$ kubectl get secrets/hush-hush -o yaml
apiVersion: v1
data:
  password: Y3V0b2Zmcw==
  username: dG9iaWFz
kind: Secret
metadata:
  creationTimestamp: "2020-06-01T06:57:07Z"
  name: hush-hush
  namespace: default
  resourceVersion: "56655"
  selfLink: /api/v1/namespaces/default/secrets/hush-hush
  uid: 8d50c767-c705-11e9-ae89-0242ac120002
type: Opaque

The values are base64-encoded. You need to decode them yourself:

$ echo 'Y3V0b2Zmcw==' | base64 --decode
cutoffs
Using secrets in a container

Containers can access secrets as files by mounting volumes from the pod. Another approach is to access the secrets as environment variables. Finally, a container (given that its service account has the permission) can access the Kubernetes API directly or use kubectl get secret.

To use a secret mounted as a volume, the pod manifest should declare the volume and it should be mounted in the container's spec:

apiVersion: v1
kind: Pod
metadata:
  name: pod-with-secret
spec:
  containers:
  - name: container-with-secret
    image: g1g1/py-kube:0.2
    command: ["/bin/bash", "-c", "while true ; do sleep 10 ; done"]
    volumeMounts:
    - name: secret-volume
      mountPath: "/mnt/hush-hush"
      readOnly: true
  volumes:
  - name: secret-volume
    secret:
      secretName: hush-hush

The volume name (secret-volume) binds the pod volume to the mount in the container. Multiple containers can mount the same volume. When this pod is running, the username and password are available as files under /etc/hush-hush:

$ kubectl create -f pod-with-secret.yaml
$ kubectl exec pod-with-secret -- cat /mnt/hush-hush/username
tobias
$ kubectl exec pod-with-secret -- cat /mnt/hush-hush/password
cutoffs