Kubernetes Cluster Hardening: API Server, RBAC, and Control Plane Security

Home >> TECHNOLOGY >> Kubernetes Cluster Hardening: API Server, RBAC, and Control Plane Security
Share

Most Teams Secure the App. They Forget the Cluster.

This is a trend that I have seen over and over again in the kinds of enterprise Kubernetes Setups I work with: develop the code’s application for weeks, you put the firewalls in place, and you scan the container images. Then they send it off to the cluster to a place where the API server is easily accessible to a lot more identities than it needs to be and RBAC roles can be as wide as a goddamned truck and their control plane components are still running with those default settings nobody touches since setup.

It’s not a specialty issue. Probably the most prevalent security vulnerability of production Kubernetes deployments currently is it.

It’s for engineers, DevOps practitioners, anyone running load on k8s who wants to get an understanding of each layer of the cluster hardening beyond the checklists: why each layer is suggested, why stuff goes wrong in reality, how to create a posture that will stand up to it.

Why the API Server Is Your Biggest Attack Surface

It is the front gate to all things: the Kubernetes API server. All kubectl commands come through here, all internal components come through here, all those service account tokens that do things come through here.

It’s this centrality that makes it hazardous if left undisciplined.

Anonymous access is still on by default in some distributions

Anonymous authentication may be on/off, depending on cluster provisioning. That means the API server may accept requests that don’t have any authorisation information attached merely for the sake of being able to get a reply. It’s easy to disable it:

--anonymous-auth=false

On kubeadm-provisioned clusters add that flag to the API server manifest (usually in /etc/kubernetes/manifests/kube-apiserver.yaml). It does happen also in “managed” clusters (cloud-provisioned).

Limit what the API server binds to

The API server by default can listen on all interfaces. It limits exposure if it is restricted to use only within a network or designated networks. This goes hand-in-hand with other Kubernetes Network Security principles – network-policies and API server bind address (aka Pod security policy) go together but don’t work in isolation.

Audit logging is non-negotiable

Incidents at the API layer without audit logs… that’s a case of building something “backwards. Use a policy file to enable the logging of an auditor, with the minimum of:

  • Metadata for all requests
  • Describe request and response bodies of sensitive resource types (secrets, configmaps, roles, rolebindings)

It’s there to give size; otherwise you’re working at random.

Admission controllers – the underused gatekeepers

Admission controllers will first be able to capture requests to the API server that are valid for authentication and authorization before being persisted to the object. There are several that are disabled and enabled by default:

  • This node restriction restricts the ability for a node to change the attributes it knows about itself and other nodes.
  • PodSecurity – applies pod security policies (podsecuritypolicy has been merged in 1.25+).
  • AlwaysPullImages – forces image pulls, avoiding caches if not properly verified with credentials.

In my experience, teams that fail to apply admission controller configuration inevitably end up applying privilege escalation patches when they could prevent those types of issues structurally.

RBAC – The Part Everyone Thinks They Understand

The documentation for RBDAC in Kubernetes is well-written. It’s also always misconfigurated. But it is not typically lack of understanding of what RBAC is, it’s simply a matter of permissions drifts over time” and a few specific anti-patterns that show up in almost every increasingly-proven cluster I’ve seen.

The wildcard problem

Roles having “*” in the verbs or resources is an RBAC version of chmod 777. They are conveniently installed in early stages of development and do not get tightened at any point. Audit your roles:

kubectl get clusterroles -o json | jq '.items[] | select(.rules[].verbs[] == "*")'

If it’s not a system role, consider it carefully here. The one that is easy to miss in a fast moving world in Kubernetes Security.

Cluster-admin bindings outside kube-system

cluster-admin has the highest privilege role in a Kubernetes cluster. It should be assigned to virtually no one other than break-glass instances. Check who has it:

kubectl get clusterrolebindings -o json | jq '.items[] | select(.roleRef.name == "cluster-admin")'

The configuration method is the same, and I have deployed it in production environments as services for my CI/CD pipelines, monitoring applications, and even workloads for the application itself with cluster-admin settings. None of them had any need of it.

Service account token automation

In pre-1.24 versions of Kubernetes, Mounting the service account token was actually done automatically on each pod. API credentials were issued to even non API calling pods. The fix:

automountServiceAccountToken: false

Make this at the service account level (or at the pod spec level for workloads not requiring API access). This is also directly relevant to Secrets Management in Kubernetes – tokens are secrets, and an unnecessarily exposed secret means that it’s a burden.

Least privilege isn’t a one-time task

RBAC drift is real. Any new permissions you add, when needed for convenience, are seldom removed. Establish a process (at least once a quarter) to audit the assignments of roles and bindings. Fortunately, there are tools available, such as rbac-tool and kubectl-who-can to make this less painful than manual review.

Control Plane Hardening – What Happens Beyond the API Server

While the server is often the focus within the API, a control plane comprised of components, each with its own attack surface.

etcd – treat it like a root password

The entire state of the cluster is stored within etcd; all your workload specs, all your configuration, all your secrets. If an attacker obtains Read access of etcd they have everything.

Here are two points of importance:

Encrypting the data at rest – etcd data is not encrypted by default. Apply encryption to sensitive resources (particularly secrets), using AES-GCM or AES-CBC with a strong key, and configure EncryptionConfiguration to set up the encryption policy. One thing I noticed in many published / “production ready” guides on clusters – it was very rare to see anything about TLS for communication between all cluster roles, but only about TLS for data communication between roles working with etcd.

Only the API server should be able to access etcd. No other component requires any direct access to etcd. Lock it down at the network level.

--trusted-ca-file
--cert-file
--key-file
--client-cert-auth=true

These flags on etcd configuration file enforce mutual TLS for all client connections.

Scheduler and controller manager exposure

The kube-scheduler and kube-controller-manager have HTTP endpoints. These are now available without logon in older versions of cluster. Verify they’re not:

--bind-address=127.0.0.1

This manner, bind address is limited to localhost for both parts and they may not be obtainable from any other node – however the API server already has proper access.

Kubelet hardening

The kubelet is installed on all nodes and it has quite the power – the ability to create, delete and modify pods on that node. Key settings:

  • To use the same principle as the API server, add the following argument:
  • –authorization-mode=Webhook – permissions are passed through to the API server for permission granting, rather than granting all permissions to authenticated users.
  • –read-only-port=0 – turns off the unauthenticated read-only port (10255)

These kubelet settings are included in what the The 4Cs of Kubernetes Security refers to when it is talking about cluster-level security, however – it’s more than just cluster API, it’s all the nodes in the fleet.

Pod Security – Where Cluster Hardening Meets Workload Reality

Cluster hardening isn’t limited to control plane components. Whether the compromised workload can break outside the boundaries is greatly affected by what pods are allowed to do.

Pod Security Standards

PodSecurityPolicy was replaced in Kubernetes 1.25+ by another concept, Pod Security Admission, which are based on three profiles:

  • A schema that is allowed to be unrestricted (only use for explicit system components that require it)
  • Baseline – blocks known privilege escalations, but is broadly compatible
  • Restricted – applications generally do not work well, and are heavily constrained

Apply the base level profile to the Namespace level, or restrict the profile. Explicitly specify label namespaces, instead of using cluster-wide defaults, so that the label defines what each namespace needs.

Dropping capabilities

Capable processes have fine granular capabilities in Linux. They’re not needed by most containerized apps. Remove all and restore only what is necessary:

securityContext:
  capabilities:
    drop:
      - ALL

When used with readOnlyRootFilesystem: true, with the container running not as the root user, this really limits the possibilities of what the malicious container can do on the underlying host.

Gets right back to the Container Security lessons learned: The overall security of the cluster is as strong as the security readers are willing to allow containers to have.

Two Things Most Hardening Guides Don’t Cover Well

Validating Admission Webhooks as a security boundary

Webhooks that hook into and potentially alter API requests are great – and frequently overlooked as a security risk. Compromised or malicious webhook can pollution/modify workload specs; add/remove environment variables; and change security context of a pod definition.

Review webhook configurations:

kubectl get validatingwebhookconfigurations
kubectl get mutatingwebhookconfigurations

In every, know what they are, who controls them, and whether they have failurePolicy Fail or Ignore. A webhook with Ignore failure policy: when the webhook is not available, the request proceeds (which is maybe a good compromise when the webhook is needed to be available, but should be documented explicitly and is a security decision).

Supply chain exposure at the cluster level

The majority of the conversations around Kubernetes security concern themselves with run-time. However, cluster hardening requires creating the possibility of considering how things can enter the cluster. A mature hardening posture has started to include image signing (Cosign/Sigstore), adopting policies that require image digests to be pinned and not tags and software bill of material (SBOM) verification.

Tags are mutable. nginx:latest today might not be nginx:latest tomorrow. Pinning to digests (nginx@sha256:...) means what you deployed is what you get — and what you can audit.

Hardening Is a Process, Not a Deployment Step

The truth is, a fully-hardened cluster will drift at launch. Teams add service accounts, operators install CRDs with wide-ranging RBAC requirements, and new admission controllers are being bypassed, as they’re blocking something important.

Initial configuration is important, but so is the continuous validation. Tools worth knowing:

  • kube-bench — performs CIS checks to cluster nodes and control plane
  • Falco — Runtime threat detection, Alerts on suspicious API / syscalls
  • Trivy — scans cluster configuration and container images, finds and reveals misconfigurations.

But with kube-bench, at least, that seems to be the kind of problem I find quick wins almost every day, and those are the kinds of things that are typically easy to overlook but can be quickly addressed given a clear warning flag.

Who Should Prioritize This (And Where to Start)

If your workloads need access to user data, financial information, or regulated spaces – this isn’t an option anymore. Even if you are running your internal tool or dev environment in a Kubernetes cluster, all the hardening practice translates.

Start with:

  • Can no longer utilize Audit Logging and Anonymous Auth on API Server.Have also lost ability to use Audit Logging and Anonymous Auth to API Server.
  • A full audit of clusterrolebindings run by cluster-admin.
  • Service account token automounting is set to false for all non-system namespaces.Service account token automounting disables service account token automounting for all non-system namespaces.

External references worth reading:

These four steps manage most of the widely-used vulnerable cluster misconfigurations.
Externally referenced reading:

  • CIS Kubernetes Benchmark is the official Kubernetes Cluster Baseline for hardening configuration .
  • NSA Kubernetes Hardening Guide – practical, in-depth, and free

Cluster hardening for Kubernetes isn’t a fancy job! It’s the discipline of frequently re-checking permissions as the cluster continues to grow, audit reads and setting up its configuration files. But it’s also the base on which security for your containers, security policies for your network, secrets management, and others depend. If the bottom layers are not in place, it doesn’t matter what the rest of the structure looks like.

What is holding them up are clusters that don’t have the most advanced security solutions. They’re the ones in which somebody made a deliberate decision on each of the components that were in the “control plane” and continued to make those decisions over time.

Leave a Reply

Your email address will not be published. Required fields are marked *