Events
Monitor cluster-wide Kubernetes events to track resource state changes, failures, and system-level activities across all namespaces.
What are Kubernetes Events?
Events are time-limited records of state changes and notable occurrences in your cluster. They provide visibility into what's happening behind the scenes with your resources.
Event Characteristics:
- Automatic: Kubernetes creates events automatically for resource state changes
- Temporary: Events expire after 1 hour by default (retained longer in ops0)
- Informational: Describe what happened, when, and why
- Diagnostic: Essential for troubleshooting pod failures and scheduling issues
Event Types
Viewing Events
Access cluster-wide events from the Kubernetes overview:
Event List
| Column | Description |
|---|---|
| Type | Normal or Warning |
| Reason | Event reason code (Scheduled, Pulling, Failed, etc.) |
| Object | Affected resource (pod, node, deployment, etc.) |
| Namespace | Kubernetes namespace |
| Message | Human-readable event description |
| Count | How many times this event occurred |
| First Seen | When event first occurred |
| Last Seen | Most recent occurrence |
Common Event Reasons
Normal Events (Success)
| Reason | Object Type | Meaning |
|---|---|---|
| Scheduled | Pod | Pod assigned to node successfully |
| Pulling | Pod | Downloading container image |
| Pulled | Pod | Container image downloaded successfully |
| Created | Pod | Container created |
| Started | Pod | Container started successfully |
| Killing | Pod | Container being terminated gracefully |
| ScalingReplicaSet | Deployment | Replica count adjusted |
| SuccessfulCreate | ReplicaSet | New pod created successfully |
Warning Events (Failures)
| Reason | Object Type | Meaning | Action Needed |
|---|---|---|---|
| Failed | Pod | Container exited with non-zero code | Check container logs for errors |
| BackOff | Pod | Container restarting after crash | Investigate crash cause (logs, events) |
| FailedScheduling | Pod | Cannot find node meeting requirements | Check resource requests, node capacity, taints/tolerations |
| FailedMount | Pod | Volume mount failed | Verify PVC exists and is bound |
| ImagePullBackOff | Pod | Cannot pull container image | Check image name, registry credentials |
| Unhealthy | Pod | Readiness/liveness probe failed | Check probe configuration and app health endpoint |
| Evicted | Pod | Pod removed due to resource pressure | Increase node capacity or reduce resource usage |
| FailedKillPod | Pod | Could not terminate pod | Check node status, may require manual cleanup |
Filtering Events
Narrow event list to find specific issues:
By Type
Normal Events Only:
Filter: Type = Normal
Shows: Successful operations, routine activities
Warning Events Only:
Filter: Type = Warning
Shows: Errors, failures, issues requiring attention
By Resource Type
Filter: Object Kind = Pod
Shows: Events related to pods only
Filter: Object Kind = Node
Shows: Node-level events (NotReady, DiskPressure, etc.)
By Namespace
Filter: Namespace = production
Shows: Events in production namespace only
Filter: Namespace = kube-system
Shows: System component events
By Reason
Filter: Reason = FailedScheduling
Shows: All pod scheduling failures
Filter: Reason = BackOff
Shows: All container crash loops
By Time Range
Filter: Last 1 hour
Filter: Last 24 hours
Filter: Last 7 days
Investigating Common Issues
Pod Won't Start: FailedScheduling
Event:
Type: Warning
Reason: FailedScheduling
Message: 0/5 nodes are available: 3 Insufficient memory,
2 node(s) had taints that pod didn't tolerate.
Investigation Steps:
Check Resource Requests
Pod requesting more memory than available on any node.
Solution: Reduce resources.requests.memory or add nodes with more capacity.
Check Node Taints
Nodes have taints preventing pod scheduling.
Solution: Add tolerations to pod spec or remove taints from nodes.
Example Fix:
spec:
containers:
- name: app
resources:
requests:
memory: "512Mi" # Was 8Gi (too large)
tolerations: # Add if nodes are tainted
- key: "dedicated"
operator: "Equal"
value: "production"
effect: "NoSchedule"
Container Crash Loop: BackOff
Event:
Type: Warning
Reason: BackOff
Message: Back-off restarting failed container
Count: 15
Investigation Steps:
Check Container Logs
View logs to see why container is crashing.
Command: Click pod → Logs tab
Check Exit Code
Identify how container exited.
Exit Code 1: Application error Exit Code 137: OOMKilled (out of memory) Exit Code 143: SIGTERM (graceful shutdown)
Fix Root Cause
- Application error: Fix code bug
- OOMKilled: Increase memory limit
- Missing dependency: Fix configuration or add init container
Image Pull Failed: ImagePullBackOff
Event:
Type: Warning
Reason: Failed
Message: Failed to pull image "myapp:v2.0.1":
rpc error: code = Unknown desc = Error response
from daemon: pull access denied for myapp,
repository does not exist or may require 'docker login'
Common Causes:
| Cause | Solution |
|---|---|
| Image name typo | Verify image name and tag are correct |
| Private registry | Add imagePullSecrets to pod spec |
| Image doesn't exist | Push image to registry or fix tag |
| Registry credentials expired | Update secret with new credentials |
Adding Image Pull Secret:
spec:
imagePullSecrets:
- name: registry-credentials
containers:
- name: app
image: myregistry.com/myapp:v2.0.1
Node Issues: NodeNotReady
Event:
Type: Warning
Reason: NodeNotReady
Object: Node/ip-10-0-1-45
Message: Node ip-10-0-1-45 status is now: NodeNotReady
Investigation:
Check Node Conditions
View node detail page for conditions: DiskPressure, MemoryPressure, PIDPressure.
Check Node Logs
SSH to node (if accessible) and check kubelet logs:
journalctl -u kubelet -f
Common Fixes
- DiskPressure: Clean up unused images and logs
- MemoryPressure: Increase node size or evict pods
- Network issues: Check CNI plugin status
- Kubelet crash: Restart kubelet service
Event Patterns for Troubleshooting
Deployment Rollout Stuck
Event Sequence:
10:30:00 Normal ScalingReplicaSet Scaled up replica set to 3
10:30:05 Normal SuccessfulCreate Created pod: app-new-abc12
10:30:10 Warning FailedScheduling 0/5 nodes available
10:30:15 Warning FailedScheduling 0/5 nodes available
...
Diagnosis: New pods cannot schedule due to insufficient resources.
Fix: Increase cluster capacity or reduce resource requests.
Persistent Volume Issues
Event Sequence:
10:25:00 Normal Scheduled Pod assigned to node-3
10:25:05 Normal Pulling Pulling image (success)
10:25:20 Normal Pulled Image pulled
10:25:25 Warning FailedMount MountVolume.SetUp failed:
PVC "data-pvc" not found
Diagnosis: PVC doesn't exist or is in different namespace.
Fix: Create PVC or fix PVC name in pod spec.
Liveness Probe Killing Healthy Pods
Event Sequence:
10:40:00 Normal Started Container started
10:40:30 Warning Unhealthy Liveness probe failed:
HTTP probe failed with statuscode: 500
10:40:40 Warning Unhealthy Liveness probe failed (2nd time)
10:40:50 Warning Unhealthy Liveness probe failed (3rd time)
10:40:51 Normal Killing Killing container due to liveness probe failure
10:41:00 Normal Pulled Container image pulled
10:41:05 Normal Started Container restarted
Diagnosis: Liveness probe too aggressive or app has slow startup.
Fix: Increase initialDelaySeconds or adjust probe threshold.
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 60 # Wait longer for app to start
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3 # Allow 3 failures before killing
Event Retention
Kubernetes Default: Events expire after 1 hour.
ops0 Retention: Events retained for 30 days for historical troubleshooting.
Benefits:
- Investigate issues that occurred outside the 1-hour window
- Compare event patterns across deployments
- Track recurring issues over time
Using Events with Incidents
Events automatically feed into incident detection:
Event Occurs
Kubernetes emits a warning event (e.g., CrashLoopBackOff).
Incident Created
ops0 detects event pattern and creates incident.
Events Attached
All related events attached to incident for context.
Timeline Built
Event timeline shows sequence leading to incident.
Example Incident Timeline:
Incident #1247: CrashLoopBackOff in api-gateway
─────────────────────────────────────────────────
10:45:00 Normal Started Container started
10:45:15 Warning Unhealthy Readiness probe failed
10:45:15 Normal Killing Container exited (code 1)
10:45:30 Normal Started Container restarted (attempt 1)
10:45:45 Normal Killing Container exited (code 1)
10:46:00 Normal Started Container restarted (attempt 2)
10:46:15 Normal Killing Container exited (code 1)
10:46:30 Warning BackOff Back-off restarting failed container
10:47:00 - IncidentCreated Incident #1247 opened