Events

Monitor cluster-wide Kubernetes events to track resource state changes, failures, and system-level activities across all namespaces.

What are Kubernetes Events?

Events are time-limited records of state changes and notable occurrences in your cluster. They provide visibility into what's happening behind the scenes with your resources.

Event Characteristics:

Automatic: Kubernetes creates events automatically for resource state changes
Temporary: Events expire after 1 hour by default (retained longer in ops0)
Informational: Describe what happened, when, and why
Diagnostic: Essential for troubleshooting pod failures and scheduling issues

Event Types

Normal

Expected state transitions (pod scheduled, image pulled, container started)

Warning

Issues and errors (failed health checks, evictions, scheduling failures)

Viewing Events

Access cluster-wide events from the Kubernetes overview:

Event List

Column	Description
Type	Normal or Warning
Reason	Event reason code (Scheduled, Pulling, Failed, etc.)
Object	Affected resource (pod, node, deployment, etc.)
Namespace	Kubernetes namespace
Message	Human-readable event description
Count	How many times this event occurred
First Seen	When event first occurred
Last Seen	Most recent occurrence

Common Event Reasons

Normal Events (Success)

Reason	Object Type	Meaning
Scheduled	Pod	Pod assigned to node successfully
Pulling	Pod	Downloading container image
Pulled	Pod	Container image downloaded successfully
Created	Pod	Container created
Started	Pod	Container started successfully
Killing	Pod	Container being terminated gracefully
ScalingReplicaSet	Deployment	Replica count adjusted
SuccessfulCreate	ReplicaSet	New pod created successfully

Warning Events (Failures)

Reason	Object Type	Meaning	Action Needed
Failed	Pod	Container exited with non-zero code	Check container logs for errors
BackOff	Pod	Container restarting after crash	Investigate crash cause (logs, events)
FailedScheduling	Pod	Cannot find node meeting requirements	Check resource requests, node capacity, taints/tolerations
FailedMount	Pod	Volume mount failed	Verify PVC exists and is bound
ImagePullBackOff	Pod	Cannot pull container image	Check image name, registry credentials
Unhealthy	Pod	Readiness/liveness probe failed	Check probe configuration and app health endpoint
Evicted	Pod	Pod removed due to resource pressure	Increase node capacity or reduce resource usage
FailedKillPod	Pod	Could not terminate pod	Check node status, may require manual cleanup

Filtering Events

Narrow event list to find specific issues:

By Type

Normal Events Only:

Filter: Type = Normal
Shows: Successful operations, routine activities

Warning Events Only:

Filter: Type = Warning
Shows: Errors, failures, issues requiring attention

By Resource Type

Filter: Object Kind = Pod
Shows: Events related to pods only

Filter: Object Kind = Node
Shows: Node-level events (NotReady, DiskPressure, etc.)

By Namespace

Filter: Namespace = production
Shows: Events in production namespace only

Filter: Namespace = kube-system
Shows: System component events

By Reason

Filter: Reason = FailedScheduling
Shows: All pod scheduling failures

Filter: Reason = BackOff
Shows: All container crash loops

By Time Range

Filter: Last 1 hour
Filter: Last 24 hours
Filter: Last 7 days

Investigating Common Issues

Pod Won't Start: FailedScheduling

Event:

Type:    Warning
Reason:  FailedScheduling
Message: 0/5 nodes are available: 3 Insufficient memory,
         2 node(s) had taints that pod didn't tolerate.

Investigation Steps:

Check Resource Requests

Pod requesting more memory than available on any node.

Solution: Reduce resources.requests.memory or add nodes with more capacity.

Check Node Taints

Nodes have taints preventing pod scheduling.

Solution: Add tolerations to pod spec or remove taints from nodes.

Example Fix:

spec:
  containers:
  - name: app
    resources:
      requests:
        memory: "512Mi"  # Was 8Gi (too large)
  tolerations:  # Add if nodes are tainted
  - key: "dedicated"
    operator: "Equal"
    value: "production"
    effect: "NoSchedule"

Container Crash Loop: BackOff

Event:

Type:    Warning
Reason:  BackOff
Message: Back-off restarting failed container
Count:   15

Investigation Steps:

Check Container Logs

View logs to see why container is crashing.

Command: Click pod → Logs tab

Check Exit Code

Identify how container exited.

Exit Code 1: Application error Exit Code 137: OOMKilled (out of memory) Exit Code 143: SIGTERM (graceful shutdown)

Fix Root Cause

Application error: Fix code bug
OOMKilled: Increase memory limit
Missing dependency: Fix configuration or add init container

Image Pull Failed: ImagePullBackOff

Event:

Type:    Warning
Reason:  Failed
Message: Failed to pull image "myapp:v2.0.1":
         rpc error: code = Unknown desc = Error response
         from daemon: pull access denied for myapp,
         repository does not exist or may require 'docker login'

Common Causes:

Cause	Solution
Image name typo	Verify image name and tag are correct
Private registry	Add `imagePullSecrets` to pod spec
Image doesn't exist	Push image to registry or fix tag
Registry credentials expired	Update secret with new credentials

Adding Image Pull Secret:

spec:
  imagePullSecrets:
  - name: registry-credentials
  containers:
  - name: app
    image: myregistry.com/myapp:v2.0.1

Node Issues: NodeNotReady

Event:

Type:    Warning
Reason:  NodeNotReady
Object:  Node/ip-10-0-1-45
Message: Node ip-10-0-1-45 status is now: NodeNotReady

Investigation:

Check Node Conditions

View node detail page for conditions: DiskPressure, MemoryPressure, PIDPressure.

Check Node Logs

SSH to node (if accessible) and check kubelet logs:

journalctl -u kubelet -f

Common Fixes

DiskPressure: Clean up unused images and logs
MemoryPressure: Increase node size or evict pods
Network issues: Check CNI plugin status
Kubelet crash: Restart kubelet service

Event Patterns for Troubleshooting

Deployment Rollout Stuck

Event Sequence:

10:30:00  Normal   ScalingReplicaSet  Scaled up replica set to 3
10:30:05  Normal   SuccessfulCreate   Created pod: app-new-abc12
10:30:10  Warning  FailedScheduling   0/5 nodes available
10:30:15  Warning  FailedScheduling   0/5 nodes available
...

Diagnosis: New pods cannot schedule due to insufficient resources.

Fix: Increase cluster capacity or reduce resource requests.

Persistent Volume Issues

Event Sequence:

10:25:00  Normal   Scheduled       Pod assigned to node-3
10:25:05  Normal   Pulling         Pulling image (success)
10:25:20  Normal   Pulled          Image pulled
10:25:25  Warning  FailedMount     MountVolume.SetUp failed:
                                   PVC "data-pvc" not found

Diagnosis: PVC doesn't exist or is in different namespace.

Fix: Create PVC or fix PVC name in pod spec.

Liveness Probe Killing Healthy Pods

Event Sequence:

10:40:00  Normal   Started          Container started
10:40:30  Warning  Unhealthy        Liveness probe failed:
                                    HTTP probe failed with statuscode: 500
10:40:40  Warning  Unhealthy        Liveness probe failed (2nd time)
10:40:50  Warning  Unhealthy        Liveness probe failed (3rd time)
10:40:51  Normal   Killing          Killing container due to liveness probe failure
10:41:00  Normal   Pulled           Container image pulled
10:41:05  Normal   Started          Container restarted

Diagnosis: Liveness probe too aggressive or app has slow startup.

Fix: Increase initialDelaySeconds or adjust probe threshold.

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 60  # Wait longer for app to start
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3  # Allow 3 failures before killing

Event Retention

Kubernetes Default: Events expire after 1 hour.

ops0 Retention: Events retained for 30 days for historical troubleshooting.

Benefits:

Investigate issues that occurred outside the 1-hour window
Compare event patterns across deployments
Track recurring issues over time

Using Events with Incidents

Events automatically feed into incident detection:

Event Occurs

Kubernetes emits a warning event (e.g., CrashLoopBackOff).

Incident Created

ops0 detects event pattern and creates incident.

Events Attached

All related events attached to incident for context.

Timeline Built

Event timeline shows sequence leading to incident.

Example Incident Timeline:

Incident #1247: CrashLoopBackOff in api-gateway
─────────────────────────────────────────────────
10:45:00  Normal   Started          Container started
10:45:15  Warning  Unhealthy        Readiness probe failed
10:45:15  Normal   Killing          Container exited (code 1)
10:45:30  Normal   Started          Container restarted (attempt 1)
10:45:45  Normal   Killing          Container exited (code 1)
10:46:00  Normal   Started          Container restarted (attempt 2)
10:46:15  Normal   Killing          Container exited (code 1)
10:46:30  Warning  BackOff          Back-off restarting failed container
10:47:00  -        IncidentCreated  Incident #1247 opened

Best Practices

Event Monitoring Tips

• Check events first - Before diving into logs, check events for high-level context

• Filter by Warning - Focus on warning events to find issues quickly

• Watch for patterns - Recurring events indicate systemic issues vs one-off failures

• Combine with logs - Events show what happened, logs show why

• Set up alerts - Configure Slack/PagerDuty notifications for critical event patterns

• Review after deployments - Check for unusual events after rolling out changes