Connect a Kubernetes Cluster

Connect your first cluster, explore what's running, and use AI to investigate a real incident.

Scenario

You have a Kubernetes cluster running on AWS (EKS), GCP (GKE), Azure (AKS), or self-managed. You want to:

Connect it to ops0 for visibility
See pods, deployments, events, and resource usage
Use AI to investigate pod failures or incidents

Prerequisites

✓A running Kubernetes cluster (EKS, GKE, AKS, or kubeconfig-accessible)

✓For EKS: AWS integration connected. For GKE: GCP integration connected. For AKS: Azure integration connected.

✓For manual/self-managed: a kubeconfig with cluster-admin or equivalent read access

Step 1: Add the Cluster

1Click Kubernetes in the left sidebar

2Click Add Cluster

3Select your cluster type

Step 2: Enter Connection Details

Amazon EKS

Field	What to enter
Cluster Name	EKS cluster name (from AWS console)
Region	AWS region where the cluster runs
IAM Role ARN	Role with `eks:DescribeCluster` and `eks:ListClusters` permissions
External ID	Optional, for cross-account role assumption

Google GKE

Field	What to enter
Cluster Name	GKE cluster name
Project ID	GCP project ID
Location	Region or zone where the cluster runs
Service Account JSON	GCP service account key with Kubernetes Engine Reader role

Azure AKS

Field	What to enter
Cluster Name	AKS cluster name
Resource Group	Resource group containing the cluster
Subscription ID	Azure subscription ID
Tenant ID	Azure AD tenant ID
Client ID / Secret	Service principal credentials with AKS read access

Manual / Self-Managed

Paste your kubeconfig directly. ops0 will sanitize it to remove local credential helper references that wouldn't work server-side.

Step 3: Verify the Connection

Click Connect. ops0 will:

Create a Kubernetes client from your credentials
Fetch cluster info (API server, version)
List nodes, namespaces, and a sample of pods
Show a success confirmation with cluster summary

If the connection fails, the error message will tell you exactly what went wrong (auth failure, network unreachable, insufficient permissions).

Step 4: Explore the Cluster

After connecting, the cluster dashboard opens automatically.

What you see immediately:

Node health

All nodes with Ready status, CPU and memory usage, Kubernetes version

Workload inventory

Deployments, StatefulSets, DaemonSets, Jobs — counts, replicas, health

Pod status

All pods across namespaces with restart counts, status, and age

Recent events

Warning events that indicate scheduling failures, crashes, or resource pressure

Navigate through the sidebar to explore Pods, Nodes, Workloads, Networking, and Storage.

Step 5: View Pod Logs

1Go to Kubernetes → Pods

2Click any pod

3Click the Logs tab — logs stream in real time

4Click the Terminal tab to open a live shell in the container

Step 6: Investigate an Incident (When One Exists)

ops0 monitors your cluster every 60 seconds. When it detects a problem — a pod in CrashLoopBackOff, high restart counts, or FailedScheduling events — it automatically creates an incident.

To view incidents:

1Go to Kubernetes → Incidents

2Click an open incident

3Click Analyze with AI

What the AI analysis returns:

Section	Content
Root Cause	What ops0 believes is causing the issue based on events and logs
kubectl commands	Specific commands to investigate further
Recommendations	Kyverno policies to prevent this class of issue
Runbook	Step-by-step manual remediation instructions

Incident severities:

Severity	Trigger
P1	Pod restart count > 10, cluster-wide failures
P2	Pod restart count > 5, deployment degraded
P3	Warning events, scheduling delays

Incidents auto-resolve when the triggering condition clears for 10+ minutes.

Verification

After connecting and exploring, verify:

Cluster appears in Kubernetes → Clusters with status Connected
Pods list populates with your namespaces and pods
Events tab shows recent cluster events
Monitoring is active — the cluster shows last-checked timestamp updating

Next Steps

Enable Kyverno policies →

Block non-compliant Kubernetes resources at admission time

View Helm releases →

Inspect all Helm charts deployed to the cluster

Troubleshooting

Connection fails: "Unauthorized"

For EKS: verify the IAM role has eks:DescribeCluster permission and the role trust policy allows ops0 to assume it. For GKE: the service account needs the Kubernetes Engine Viewer IAM role.

Pods list is empty

The service account or credentials may lack permission to list pods. Ensure the credentials have get, list, and watch verbs on pods across all namespaces.

No incidents showing despite CrashLoopBackOff pods

Incident monitoring runs every 60 seconds. Wait up to 2 minutes for the first scan. If still no incident, verify the cluster has monitoring enabled in the cluster settings.