Monitor cluster-wide Kubernetes events to track resource state changes, failures, and system-level activities across all namespaces.
Events are time-limited records of state changes and notable occurrences in your cluster. They provide visibility into what's happening behind the scenes with your resources.
Event Characteristics:
Access cluster-wide events from the Kubernetes overview:
| Column | Description |
|---|---|
| Type | Normal or Warning |
| Reason | Event reason code (Scheduled, Pulling, Failed, etc.) |
| Object | Affected resource (pod, node, deployment, etc.) |
| Namespace | Kubernetes namespace |
| Message | Human-readable event description |
| Count | How many times this event occurred |
| First Seen | When event first occurred |
| Last Seen | Most recent occurrence |
| Reason | Object Type | Meaning |
|---|---|---|
| Scheduled | Pod | Pod assigned to node successfully |
| Pulling | Pod | Downloading container image |
| Pulled | Pod | Container image downloaded successfully |
| Created | Pod | Container created |
| Started | Pod | Container started successfully |
| Killing | Pod | Container being terminated gracefully |
| ScalingReplicaSet | Deployment | Replica count adjusted |
| SuccessfulCreate | ReplicaSet | New pod created successfully |
| Reason | Object Type | Meaning | Action Needed |
|---|---|---|---|
| Failed | Pod | Container exited with non-zero code | Check container logs for errors |
| BackOff | Pod | Container restarting after crash | Investigate crash cause (logs, events) |
| FailedScheduling | Pod | Cannot find node meeting requirements | Check resource requests, node capacity, taints/tolerations |
| FailedMount | Pod | Volume mount failed | Verify PVC exists and is bound |
| ImagePullBackOff | Pod | Cannot pull container image | Check image name, registry credentials |
| Unhealthy | Pod | Readiness/liveness probe failed | Check probe configuration and app health endpoint |
| Evicted | Pod | Pod removed due to resource pressure | Increase node capacity or reduce resource usage |
| FailedKillPod | Pod | Could not terminate pod | Check node status, may require manual cleanup |
Narrow event list to find specific issues:
Normal Events Only:
Filter: Type = Normal
Shows: Successful operations, routine activities
Warning Events Only:
Filter: Type = Warning
Shows: Errors, failures, issues requiring attention
Filter: Object Kind = Pod
Shows: Events related to pods only
Filter: Object Kind = Node
Shows: Node-level events (NotReady, DiskPressure, etc.)
Filter: Namespace = production
Shows: Events in production namespace only
Filter: Namespace = kube-system
Shows: System component events
Filter: Reason = FailedScheduling
Shows: All pod scheduling failures
Filter: Reason = BackOff
Shows: All container crash loops
Filter: Last 1 hour
Filter: Last 24 hours
Filter: Last 7 days
Event:
Type: Warning
Reason: FailedScheduling
Message: 0/5 nodes are available: 3 Insufficient memory,
2 node(s) had taints that pod didn't tolerate.
Investigation Steps:
Pod requesting more memory than available on any node.
Solution: Reduce resources.requests.memory or add nodes with more capacity.
Nodes have taints preventing pod scheduling.
Solution: Add tolerations to pod spec or remove taints from nodes.
Example Fix:
spec:
containers:
- name: app
resources:
requests:
memory: "512Mi" # Was 8Gi (too large)
tolerations: # Add if nodes are tainted
- key: "dedicated"
operator: "Equal"
value: "production"
effect: "NoSchedule"
Event:
Type: Warning
Reason: BackOff
Message: Back-off restarting failed container
Count: 15
Investigation Steps:
View logs to see why container is crashing.
Command: Click pod → Logs tab
Identify how container exited.
Exit Code 1: Application error Exit Code 137: OOMKilled (out of memory) Exit Code 143: SIGTERM (graceful shutdown)
Event:
Type: Warning
Reason: Failed
Message: Failed to pull image "myapp:v2.0.1":
rpc error: code = Unknown desc = Error response
from daemon: pull access denied for myapp,
repository does not exist or may require 'docker login'
Common Causes:
| Cause | Solution |
|---|---|
| Image name typo | Verify image name and tag are correct |
| Private registry | Add imagePullSecrets to pod spec |
| Image doesn't exist | Push image to registry or fix tag |
| Registry credentials expired | Update secret with new credentials |
Adding Image Pull Secret:
spec:
imagePullSecrets:
- name: registry-credentials
containers:
- name: app
image: myregistry.com/myapp:v2.0.1
Event:
Type: Warning
Reason: NodeNotReady
Object: Node/ip-10-0-1-45
Message: Node ip-10-0-1-45 status is now: NodeNotReady
Investigation:
View node detail page for conditions: DiskPressure, MemoryPressure, PIDPressure.
SSH to node (if accessible) and check kubelet logs:
journalctl -u kubelet -f
Event Sequence:
10:30:00 Normal ScalingReplicaSet Scaled up replica set to 3
10:30:05 Normal SuccessfulCreate Created pod: app-new-abc12
10:30:10 Warning FailedScheduling 0/5 nodes available
10:30:15 Warning FailedScheduling 0/5 nodes available
...
Diagnosis: New pods cannot schedule due to insufficient resources.
Fix: Increase cluster capacity or reduce resource requests.
Event Sequence:
10:25:00 Normal Scheduled Pod assigned to node-3
10:25:05 Normal Pulling Pulling image (success)
10:25:20 Normal Pulled Image pulled
10:25:25 Warning FailedMount MountVolume.SetUp failed:
PVC "data-pvc" not found
Diagnosis: PVC doesn't exist or is in different namespace.
Fix: Create PVC or fix PVC name in pod spec.
Event Sequence:
10:40:00 Normal Started Container started
10:40:30 Warning Unhealthy Liveness probe failed:
HTTP probe failed with statuscode: 500
10:40:40 Warning Unhealthy Liveness probe failed (2nd time)
10:40:50 Warning Unhealthy Liveness probe failed (3rd time)
10:40:51 Normal Killing Killing container due to liveness probe failure
10:41:00 Normal Pulled Container image pulled
10:41:05 Normal Started Container restarted
Diagnosis: Liveness probe too aggressive or app has slow startup.
Fix: Increase initialDelaySeconds or adjust probe threshold.
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 60 # Wait longer for app to start
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3 # Allow 3 failures before killing
Kubernetes Default: Events expire after 1 hour.
ops0 Retention: Events retained for 30 days for historical troubleshooting.
Benefits:
Events automatically feed into incident detection:
Kubernetes emits a warning event (e.g., CrashLoopBackOff).
ops0 detects event pattern and creates incident.
All related events attached to incident for context.
Event timeline shows sequence leading to incident.
Example Incident Timeline:
Incident #1247: CrashLoopBackOff in api-gateway
─────────────────────────────────────────────────
10:45:00 Normal Started Container started
10:45:15 Warning Unhealthy Readiness probe failed
10:45:15 Normal Killing Container exited (code 1)
10:45:30 Normal Started Container restarted (attempt 1)
10:45:45 Normal Killing Container exited (code 1)
10:46:00 Normal Started Container restarted (attempt 2)
10:46:15 Normal Killing Container exited (code 1)
10:46:30 Warning BackOff Back-off restarting failed container
10:47:00 - IncidentCreated Incident #1247 opened