What is Kubernetes security?

Kubernetes security encompasses the practices, tools, and configurations used to protect Kubernetes clusters, workloads, and data. It includes authentication, authorization (RBAC), network policies, pod security standards, secrets management, and runtime security.

What is the CKS certification?

The Certified Kubernetes Security Specialist (CKS) is a CNCF certification that validates expertise in securing Kubernetes clusters. It covers cluster setup and hardening, system hardening, minimizing microservice vulnerabilities, and supply chain security.

What are Kubernetes network policies?

Network policies are Kubernetes resources that control pod-to-pod and pod-to-external traffic. They act as a firewall for your cluster, allowing you to specify which pods can communicate with each other based on labels, namespaces, and ports.

What are Pod Security Standards?

Pod Security Standards (PSS) are predefined security profiles that define different levels of pod security restrictions: Privileged (unrestricted), Baseline (minimally restrictive to prevent known privilege escalations), and Restricted (heavily restricted following security best practices).

What is RBAC in Kubernetes?

Role-Based Access Control (RBAC) is a Kubernetes authorization mechanism that regulates access to resources based on user roles. It uses Role, ClusterRole, RoleBinding, and ClusterRoleBinding resources to define and assign permissions to users and service accounts.

How do I secure etcd in Kubernetes?

Secure etcd by enabling encryption at rest for secrets, using TLS for client-server communication, restricting access to etcd endpoints via firewall rules, enabling authentication, and running regular backups. etcd stores all cluster data including secrets.

What is a container escape attack?

A container escape attack occurs when a process breaks out of container isolation to access the host system or other containers. Common vectors include privileged containers, hostPath mounts, and kernel exploits. Mitigate with Pod Security Standards and seccomp profiles.

What tools are used for Kubernetes security scanning?

Popular Kubernetes security tools include Trivy (vulnerability scanning), Falco (runtime security), kube-bench (CIS benchmarks), OPA Gatekeeper and Kyverno (policy enforcement), Kubescape (security posture), and Cosign (image signing).

6 min read·1,150 words

Monitoring, Logging, and Runtime Security

Required knowledge for the CKS certification.

Last reviewed: April 27, 2026 — verified against Kubernetes 1.36.

Monitoring, logging, and runtime security close the loop on every other layer in this site. By continuously collecting and analysing data from the cluster, operators can detect anomalies, unauthorized access, and active attacks before they escalate. This page is the head reference for Domain 6 of the CKS exam (Monitoring, Logging & Runtime Security, 20%) and covers the four practical pillars: audit logging, log aggregation, metrics, and runtime threat detection.

Why Runtime Security Matters

Static admission catches what is known to be bad at deploy time. Runtime security catches what was unknown then but is observable now: a process that should not exist in this container, a network connection to a country you do not operate in, a syscall pattern that matches a known exploit. The articles in this section cover the controls that produce that visibility:

API server audit logs that record every authenticated call
Centralised log aggregation across pods, nodes, and the control plane
Runtime detection via eBPF or kernel modules
Metric pipelines that turn anomalies into alerts before users notice

Pair this layer with the attack vector section to validate that the techniques you care about are actually visible in your tooling.

Runtime Domains at a Glance

Domain	Primary Risk	Key Control	Reference
Audit logging	Cluster activity not traceable to an identity	API server audit policy with structured backend	Kubernetes Audit Logging
Runtime detection	Compromised pod operates undetected	Falco, Tetragon, or Tracee at the node level	Falco · Tetragon · Tracee
Log aggregation	Logs lost or unsearchable across the fleet	Centralised, immutable log store (Loki, ELK)	See "Log Aggregation" below
Metrics and alerting	Resource exhaustion or DoS goes unnoticed	Prometheus + Alertmanager	See "Metrics" below
Vulnerability scanning	New CVEs reach deployed images	Trivy Operator for continuous scans	Trivy

Topics Covered in This Section

Audit Logging

Configure a structured audit policy on the API server, route logs to an immutable backend, and tune verbosity so the signal-to-noise ratio is acceptable. Audit logs are the only authoritative record of who did what to the cluster.

Runtime Threat Detection

Run an eBPF or kernel-module agent on every node that observes process executions, network connections, and file activity inside containers. Use it to detect post-exploitation behaviour that admission policy cannot see.

Log Aggregation

Ship pod, node, and control-plane logs to a centralised, immutable store. Apply retention and rotation policies that match your compliance posture.

Metrics and Alerting

Use Prometheus for cluster and application metrics, Grafana for visualisation, and Alertmanager to page on the security-relevant signals (failed authn, audit-log gaps, runtime alerts, certificate expiry).

Key Articles

Runtime Detection: Falco vs Tetragon vs Tracee

Three actively maintained projects cover runtime detection in Kubernetes. The head-term page compares Falco vs Tetragon at a higher level; the table below extends that with Tracee for completeness.

Aspect	Falco	Tetragon	Tracee
CNCF status	Graduated	Incubating	Sandbox
Maintainer	Falco / Sysdig	Isovalent (Cilium)	Aqua Security
Sensor	eBPF or kernel module	eBPF (in-kernel hooks)	eBPF
Primary purpose	Runtime threat alerting	Observability + in-kernel enforcement	Runtime detection + forensics
Rule format	YAML rules with expression DSL	TracingPolicy CRDs	Signatures (Rego / Go)
Enforcement	Alerts only	Can kill processes / send signals in-kernel	Alerts only
Best fit	SOC integration, mature ruleset	Cilium environments, in-kernel response	Forensic captures, signature library

Read more: Falco · Tetragon · Tracee

Log Aggregation: Loki vs Elastic vs OpenSearch

Centralised log storage is non-negotiable for incident response. The three practical choices in the Kubernetes ecosystem are Grafana Loki, the Elastic Stack, and OpenSearch.

Aspect	Grafana Loki	Elastic Stack	OpenSearch
Indexing model	Labels only; payload is grep-style	Full-text index of every field	Full-text index of every field
Storage cost	Lowest (label index + object storage)	Highest (full inverted index)	High (full inverted index)
Query language	LogQL (PromQL-style)	KQL / Elasticsearch DSL	KQL / OpenSearch DSL
Ecosystem fit	Native Grafana / Prometheus integration	Kibana, deep APM tooling	Kibana fork, AWS-native
License	AGPLv3	Elastic License v2 (source-available)	Apache 2.0
Best fit	Cost-sensitive, label-driven queries	Rich full-text search, large query workloads	AWS-managed deployments wanting Apache licensing

For audit logs specifically, prefer write-once / immutable storage on top of any of the three (e.g., S3 object lock for the underlying bucket).

Metrics and Alerting: Prometheus + Alertmanager

The default open-source stack on Kubernetes:

Prometheus scrapes cluster and application metrics at a configurable interval.
Alertmanager routes alerts to PagerDuty / Slack / email and handles deduplication and silencing.
Grafana visualises Prometheus data; pair with Loki for unified logs + metrics.

Security-relevant alerts to wire up on day one:

API server audit-log gap (no events for N seconds)
Spike in apiserver_request_total{code=~"4.."} — failed authn or authz
Falco / Tetragon WARN and above
Certificate expiry within 14 days (cluster CA, etcd, kubelet)
Pod restart loops in kube-system or other security-critical namespaces

Try It: Live YAML Security Analyzer

A workload that lacks liveness/readiness probes, resource limits, or correctly scoped ServiceAccount access shows up as runtime noise the moment it ships. Paste a manifest below to catch those issues before deploy.

Open the full YAML Analyzer →

Version-Specific Notes (Kubernetes 1.36)

The runtime and observability surface has tightened in recent Kubernetes versions:

Structured authentication and authorization configuration — GA in 1.30+. The new AuthenticationConfiguration and AuthorizationConfiguration files produce stable, easily-audited identity configuration for the audit log to reference.
KMS v2 encryption providers — GA since 1.29. Audit logs that record Secret access reflect KMS v2 events (key versioning, rotation) as first-class fields.
Validating Admission Policy — GA since 1.30. CEL policies emit consistent admission decisions to the audit log without requiring a webhook to be reachable.
Sidecar containers — GA since 1.33. Init containers with restartPolicy: Always are the supported pattern for log shippers and runtime agents that must outlive their target containers.
Pod sandboxing via RuntimeClass — Stable. A namespace can require a sandboxed runtime; runtime detection tooling should be aware of which workloads are sandboxed and which share the host kernel.

Always check the Kubernetes deprecation guide before upgrading.

Hardening Principles for Runtime Operations

Secure by Default

Turn on audit logging on the first day of a cluster. Default-deny rules for runtime detection are easier to relax later than to introduce after an incident.

Least Privilege

The runtime agent itself is a privileged workload. Scope its RBAC tightly, run it on a dedicated ServiceAccount, and forward its findings to a tenant outside the cluster it is monitoring.

Defense in Depth

Pair audit logging (cluster-layer) with runtime detection (node-layer) and metric alerting (cluster + workload). A bypass at one layer should still produce a signal in another.

Continuous Verification

Treat runtime alerts as actionable, not informational. Re-tune detection rules after every incident; treat sustained "WARN" noise as a bug to fix, not a baseline to live with.

Conclusion

Runtime security is the layer where every other control in this site is verified or invalidated. Stack audit logging, runtime detection, centralised logs, and metrics so that nothing happens in the cluster without an authoritative record. Combine the practices linked here with the attack vectors and cluster hardening sections to design end-to-end coverage.

Want to learn more?Check out our recommended books on Kubernetes security.

Why Runtime Security Matters​

Runtime Domains at a Glance​

Topics Covered in This Section​

Audit Logging​

Runtime Threat Detection​

Log Aggregation​

Metrics and Alerting​

Key Articles​

Runtime Detection: Falco vs Tetragon vs Tracee​

Log Aggregation: Loki vs Elastic vs OpenSearch​

Metrics and Alerting: Prometheus + Alertmanager​

Try It: Live YAML Security Analyzer​

Version-Specific Notes (Kubernetes 1.36)​

Hardening Principles for Runtime Operations​

Secure by Default​

Least Privilege​

Defense in Depth​

Continuous Verification​

Conclusion​