System Hardening
Required knowledge for the CKS certification.
Last reviewed: — verified against Kubernetes 1.36.
System hardening is the host-layer half of Kubernetes security. A compromised node or kernel reaches every pod scheduled on it, so the controls covered here — minimal OS images, Linux capabilities, MAC frameworks, seccomp, kernel parameters — bound the blast radius of any container compromise. This page is the head reference for Domain 3 of the CKS exam (System Hardening, 10%).
Why System Hardening Matters
Almost every container escape ends with the same outcome: code running as root on the host with full access to every other pod's filesystem and network. The defences against that outcome live below Kubernetes itself, in the host OS:
- A minimal distribution removes the package surface that a foothold could exploit
- Dropping Linux capabilities prevents the most common in-container privilege primitives
- A MAC framework (AppArmor or SELinux) constrains what a process can touch even if it gains root
- Seccomp filters the syscall surface a malicious container can reach
- A sandboxed runtime (gVisor, Kata) puts a second isolation boundary between container and host
The practices below assume PSA restricted is already in place — system hardening is the layer that catches what makes it past admission.
Hardening Domains at a Glance
| Domain | Primary Risk | Key Control | Reference |
|---|---|---|---|
| OS minimisation | Unused packages and services widen the foothold | Bottlerocket / Flatcar / Talos / minimal Ubuntu | See "Minimal Distros" below |
| Linux capabilities | Container root has more privileges than the workload needs | drop: [ALL] and add only what is required | Linux Capabilities |
| Seccomp | Unrestricted syscall access enables kernel exploits | RuntimeDefault profile or custom JSON | Seccomp in Pods |
| AppArmor / SELinux | No file-path or label-based confinement | Per-workload profile in securityContext | AppArmor Profiles |
| Kernel parameters | Permissive sysctl defaults aid attackers | Locked-down sysctl on host and pod | Sysctl Security |
| Sandboxed runtimes | Kernel surface shared between host and container | gVisor / Kata via RuntimeClass | Pod Sandboxing |
Topics Covered in This Section
Operating System Security
Use minimal, immutable OS distributions for Kubernetes nodes. Disable unnecessary services, restrict SSH, and ship OS patches as image rebuilds rather than in-place updates.
Node Hardening
Restrict access to the kubelet, harden kernel parameters, and run a sandboxed container runtime on workloads that handle untrusted code.
Key Articles
Filesystem and Data Protection
Mount filesystems read-only where possible (readOnlyRootFilesystem: true), use tmpfs or emptyDir for ephemeral state, and rely on KMS-backed encryption for any persistent volume that holds sensitive data.
Confinement Frameworks: AppArmor vs SELinux vs Seccomp
These three frameworks address different layers of the same problem — what a process can do once it is running. They are complementary, not alternatives.
| Aspect | AppArmor | SELinux | Seccomp |
|---|---|---|---|
| Scope | Per-path file access, capabilities, networking | Per-label MAC across the whole system | Syscall filtering |
| Default on | Ubuntu, Debian, SUSE | RHEL, CentOS Stream, Fedora, Bottlerocket | Any modern Linux kernel |
| Policy unit | Profile per binary | Type Enforcement labels | BPF program / JSON profile |
| Ergonomics | Path-based, easier to author | Label-based, steeper learning curve | Syscall whitelist/blacklist; tooling support is good |
| Enforcement granularity | File paths, network, capabilities | Files, processes, network, IPC | Syscalls only |
| Kubernetes integration | securityContext.appArmorProfile (GA 1.31) | securityContext.seLinuxOptions | securityContext.seccompProfile |
| Bypass without disabling | Hard | Hard | Easy if the runtime does not enforce |
Best fit:
- AppArmor for Ubuntu / Debian fleets that need per-binary path confinement.
- SELinux for RHEL / Fedora / Bottlerocket clusters where label-based MAC is already first-class.
- Seccomp on every workload, regardless of MAC choice — it is the cheapest and most portable layer.
Read more: AppArmor Profiles · Seccomp in Pods
Minimal Node Distributions Compared
The host OS is a security control. Picking a distribution designed for container hosting removes packages, services, and update workflows that would otherwise have to be hardened manually.
| Distribution | Package Manager | Update Model | Notable Properties | Best Fit |
|---|---|---|---|---|
| Bottlerocket | None (image-based) | A/B partition rollback | Read-only root, SELinux on, container-runtime only | EKS, generic Kubernetes on AWS / VMware |
| Flatcar | Image-based | A/B partition rollback | SystemD-managed, ignition for provisioning | Successor to CoreOS Container Linux |
| Talos | None (API-driven) | A/B partition rollback | No SSH; managed entirely via gRPC API | Air-gapped or hardened bare-metal clusters |
| Ubuntu Minimal | apt | In-place upgrade | AppArmor on by default, smallest Ubuntu footprint | Mixed-use clusters that need standard tooling |
| RHEL CoreOS | rpm-ostree | Atomic upgrade | SELinux enforcing, immutable / atomic | OpenShift |
For new clusters, prefer an immutable distribution (Bottlerocket, Flatcar, Talos, RHEL CoreOS). The "no general-purpose package manager" property removes a whole class of post-exploitation tooling.
Sandboxed Runtimes: gVisor vs Kata Containers
Sandboxed runtimes add a second isolation boundary between the container and the host kernel. Use them on workloads that run untrusted or multi-tenant code.
| Aspect | gVisor (runsc) | Kata Containers |
|---|---|---|
| Isolation model | User-space kernel intercepts syscalls | Lightweight VM per pod (KVM / Firecracker / Cloud Hypervisor) |
| Performance overhead | Higher per-syscall cost; lower per-pod | Higher per-pod cost (VM boot); near-native syscall cost |
| Compatibility | Runs most workloads; some seccomp/ptrace gaps | Near-native — runs anything a VM can |
| Hardware requirements | Standard Linux | Nested virt or bare-metal KVM |
| Kubernetes integration | RuntimeClass: gvisor | RuntimeClass: kata |
| Best fit | Multi-tenant SaaS, untrusted code, lower throughput | High-isolation enterprise workloads, regulated multi-tenancy |
Read more: Pod Sandboxing
Try It: Live YAML Security Analyzer
The system-hardening primitives map directly to fields in a Pod spec — securityContext.capabilities, seccompProfile, appArmorProfile, readOnlyRootFilesystem. Paste a manifest below to see whether it ships those fields with sane values.
Version-Specific Notes (Kubernetes 1.36)
The system-hardening surface has tightened in recent Kubernetes versions:
- AppArmor via
securityContext.appArmorProfile— GA in 1.31. The structured field replaces the legacycontainer.apparmor.security.beta.kubernetes.io/<container>annotation pattern. seccompProfile.type: RuntimeDefault— Cluster default since 1.27 (SeccompDefaultfeature gate). Without an explicit override, every container gets the runtime's default seccomp profile.- User namespaces (
hostUsers: false) — GA in 1.33. In-pod root maps to an unprivileged UID on the host, neutralising many root-required exploit primitives. - Sidecar containers — GA since 1.33. Replaces ad-hoc shared
process.namespacespatterns for security sidecars (e.g., on-host log shippers, secret rotators). - Pod Sandboxing via
RuntimeClass— Stable.RuntimeClassreferences an installed handler (runsc,kata) on each node; admission can require it for selected namespaces.
Always check the Kubernetes deprecation guide before upgrading; some kernel-related feature gates require matching kubelet flags.
Hardening Principles for the Host Layer
Secure by Default
Pick a distribution whose defaults already match your security posture. SELinux enforcing, AppArmor profiles loaded, seccomp RuntimeDefault, and read-only root filesystem should be on without operator intervention.
Least Privilege
Drop all Linux capabilities and re-add only what the workload requires. Run as a non-root UID, with runAsNonRoot: true and readOnlyRootFilesystem: true. Avoid hostNetwork, hostPID, hostIPC, and hostPath unless explicitly justified.
Defense in Depth
Stack syscall filtering (seccomp), MAC (AppArmor / SELinux), and runtime isolation (gVisor / Kata). A single bypass should not give an attacker the full host kernel surface.
Continuous Verification
Re-image nodes on a regular cadence rather than patching in place. Validate the host-level controls with kube-bench and watch for runtime drift via Falco or Tetragon.
Conclusion
System hardening closes the gap between admission-time policy and what a compromised pod can actually do at runtime. Combine a minimal node OS, dropped capabilities, seccomp, a MAC framework, and (for high-risk workloads) a sandboxed runtime. The articles linked above walk through each control with executable examples.