Skip to main content
6 min read·1,119 words

System Hardening

Required knowledge for the CKS certification.

Last reviewed: — verified against Kubernetes 1.36.

System hardening is the host-layer half of Kubernetes security. A compromised node or kernel reaches every pod scheduled on it, so the controls covered here — minimal OS images, Linux capabilities, MAC frameworks, seccomp, kernel parameters — bound the blast radius of any container compromise. This page is the head reference for Domain 3 of the CKS exam (System Hardening, 10%).


Why System Hardening Matters

Almost every container escape ends with the same outcome: code running as root on the host with full access to every other pod's filesystem and network. The defences against that outcome live below Kubernetes itself, in the host OS:

  • A minimal distribution removes the package surface that a foothold could exploit
  • Dropping Linux capabilities prevents the most common in-container privilege primitives
  • A MAC framework (AppArmor or SELinux) constrains what a process can touch even if it gains root
  • Seccomp filters the syscall surface a malicious container can reach
  • A sandboxed runtime (gVisor, Kata) puts a second isolation boundary between container and host

The practices below assume PSA restricted is already in place — system hardening is the layer that catches what makes it past admission.


Hardening Domains at a Glance

DomainPrimary RiskKey ControlReference
OS minimisationUnused packages and services widen the footholdBottlerocket / Flatcar / Talos / minimal UbuntuSee "Minimal Distros" below
Linux capabilitiesContainer root has more privileges than the workload needsdrop: [ALL] and add only what is requiredLinux Capabilities
SeccompUnrestricted syscall access enables kernel exploitsRuntimeDefault profile or custom JSONSeccomp in Pods
AppArmor / SELinuxNo file-path or label-based confinementPer-workload profile in securityContextAppArmor Profiles
Kernel parametersPermissive sysctl defaults aid attackersLocked-down sysctl on host and podSysctl Security
Sandboxed runtimesKernel surface shared between host and containergVisor / Kata via RuntimeClassPod Sandboxing

Topics Covered in This Section

Operating System Security

Use minimal, immutable OS distributions for Kubernetes nodes. Disable unnecessary services, restrict SSH, and ship OS patches as image rebuilds rather than in-place updates.

Node Hardening

Restrict access to the kubelet, harden kernel parameters, and run a sandboxed container runtime on workloads that handle untrusted code.

Key Articles

Filesystem and Data Protection

Mount filesystems read-only where possible (readOnlyRootFilesystem: true), use tmpfs or emptyDir for ephemeral state, and rely on KMS-backed encryption for any persistent volume that holds sensitive data.


Confinement Frameworks: AppArmor vs SELinux vs Seccomp

These three frameworks address different layers of the same problem — what a process can do once it is running. They are complementary, not alternatives.

AspectAppArmorSELinuxSeccomp
ScopePer-path file access, capabilities, networkingPer-label MAC across the whole systemSyscall filtering
Default onUbuntu, Debian, SUSERHEL, CentOS Stream, Fedora, BottlerocketAny modern Linux kernel
Policy unitProfile per binaryType Enforcement labelsBPF program / JSON profile
ErgonomicsPath-based, easier to authorLabel-based, steeper learning curveSyscall whitelist/blacklist; tooling support is good
Enforcement granularityFile paths, network, capabilitiesFiles, processes, network, IPCSyscalls only
Kubernetes integrationsecurityContext.appArmorProfile (GA 1.31)securityContext.seLinuxOptionssecurityContext.seccompProfile
Bypass without disablingHardHardEasy if the runtime does not enforce

Best fit:

  • AppArmor for Ubuntu / Debian fleets that need per-binary path confinement.
  • SELinux for RHEL / Fedora / Bottlerocket clusters where label-based MAC is already first-class.
  • Seccomp on every workload, regardless of MAC choice — it is the cheapest and most portable layer.

Read more: AppArmor Profiles · Seccomp in Pods


Minimal Node Distributions Compared

The host OS is a security control. Picking a distribution designed for container hosting removes packages, services, and update workflows that would otherwise have to be hardened manually.

DistributionPackage ManagerUpdate ModelNotable PropertiesBest Fit
BottlerocketNone (image-based)A/B partition rollbackRead-only root, SELinux on, container-runtime onlyEKS, generic Kubernetes on AWS / VMware
FlatcarImage-basedA/B partition rollbackSystemD-managed, ignition for provisioningSuccessor to CoreOS Container Linux
TalosNone (API-driven)A/B partition rollbackNo SSH; managed entirely via gRPC APIAir-gapped or hardened bare-metal clusters
Ubuntu MinimalaptIn-place upgradeAppArmor on by default, smallest Ubuntu footprintMixed-use clusters that need standard tooling
RHEL CoreOSrpm-ostreeAtomic upgradeSELinux enforcing, immutable / atomicOpenShift

For new clusters, prefer an immutable distribution (Bottlerocket, Flatcar, Talos, RHEL CoreOS). The "no general-purpose package manager" property removes a whole class of post-exploitation tooling.


Sandboxed Runtimes: gVisor vs Kata Containers

Sandboxed runtimes add a second isolation boundary between the container and the host kernel. Use them on workloads that run untrusted or multi-tenant code.

AspectgVisor (runsc)Kata Containers
Isolation modelUser-space kernel intercepts syscallsLightweight VM per pod (KVM / Firecracker / Cloud Hypervisor)
Performance overheadHigher per-syscall cost; lower per-podHigher per-pod cost (VM boot); near-native syscall cost
CompatibilityRuns most workloads; some seccomp/ptrace gapsNear-native — runs anything a VM can
Hardware requirementsStandard LinuxNested virt or bare-metal KVM
Kubernetes integrationRuntimeClass: gvisorRuntimeClass: kata
Best fitMulti-tenant SaaS, untrusted code, lower throughputHigh-isolation enterprise workloads, regulated multi-tenancy

Read more: Pod Sandboxing


Try It: Live YAML Security Analyzer

The system-hardening primitives map directly to fields in a Pod spec — securityContext.capabilities, seccompProfile, appArmorProfile, readOnlyRootFilesystem. Paste a manifest below to see whether it ships those fields with sane values.


Version-Specific Notes (Kubernetes 1.36)

The system-hardening surface has tightened in recent Kubernetes versions:

  • AppArmor via securityContext.appArmorProfile — GA in 1.31. The structured field replaces the legacy container.apparmor.security.beta.kubernetes.io/<container> annotation pattern.
  • seccompProfile.type: RuntimeDefault — Cluster default since 1.27 (SeccompDefault feature gate). Without an explicit override, every container gets the runtime's default seccomp profile.
  • User namespaces (hostUsers: false) — GA in 1.33. In-pod root maps to an unprivileged UID on the host, neutralising many root-required exploit primitives.
  • Sidecar containers — GA since 1.33. Replaces ad-hoc shared process.namespaces patterns for security sidecars (e.g., on-host log shippers, secret rotators).
  • Pod Sandboxing via RuntimeClass — Stable. RuntimeClass references an installed handler (runsc, kata) on each node; admission can require it for selected namespaces.

Always check the Kubernetes deprecation guide before upgrading; some kernel-related feature gates require matching kubelet flags.


Hardening Principles for the Host Layer

Secure by Default

Pick a distribution whose defaults already match your security posture. SELinux enforcing, AppArmor profiles loaded, seccomp RuntimeDefault, and read-only root filesystem should be on without operator intervention.

Least Privilege

Drop all Linux capabilities and re-add only what the workload requires. Run as a non-root UID, with runAsNonRoot: true and readOnlyRootFilesystem: true. Avoid hostNetwork, hostPID, hostIPC, and hostPath unless explicitly justified.

Defense in Depth

Stack syscall filtering (seccomp), MAC (AppArmor / SELinux), and runtime isolation (gVisor / Kata). A single bypass should not give an attacker the full host kernel surface.

Continuous Verification

Re-image nodes on a regular cadence rather than patching in place. Validate the host-level controls with kube-bench and watch for runtime drift via Falco or Tetragon.


Conclusion

System hardening closes the gap between admission-time policy and what a compromised pod can actually do at runtime. Combine a minimal node OS, dropped capabilities, seccomp, a MAC framework, and (for high-risk workloads) a sandboxed runtime. The articles linked above walk through each control with executable examples.