ops0ops0

Connect a Kubernetes Cluster

Connect your first cluster, explore what's running, and use AI to investigate a real incident.


Scenario

You have a Kubernetes cluster running on AWS (EKS), GCP (GKE), Azure (AKS), or self-managed. You want to:

  • Connect it to ops0 for visibility
  • See pods, deployments, events, and resource usage
  • Use AI to investigate pod failures or incidents

Prerequisites

A running Kubernetes cluster (EKS, GKE, AKS, or kubeconfig-accessible)
For EKS: AWS integration connected. For GKE: GCP integration connected. For AKS: Azure integration connected.
For manual/self-managed: a kubeconfig with cluster-admin or equivalent read access

Step 1: Add the Cluster

1Click Kubernetes in the left sidebar
2Click Add Cluster
3Select your cluster type

Step 2: Enter Connection Details

Amazon EKS

FieldWhat to enter
Cluster NameEKS cluster name (from AWS console)
RegionAWS region where the cluster runs
IAM Role ARNRole with eks:DescribeCluster and eks:ListClusters permissions
External IDOptional, for cross-account role assumption

Google GKE

FieldWhat to enter
Cluster NameGKE cluster name
Project IDGCP project ID
LocationRegion or zone where the cluster runs
Service Account JSONGCP service account key with Kubernetes Engine Reader role

Azure AKS

FieldWhat to enter
Cluster NameAKS cluster name
Resource GroupResource group containing the cluster
Subscription IDAzure subscription ID
Tenant IDAzure AD tenant ID
Client ID / SecretService principal credentials with AKS read access

Manual / Self-Managed

Paste your kubeconfig directly. ops0 will sanitize it to remove local credential helper references that wouldn't work server-side.


Step 3: Verify the Connection

Click Connect. ops0 will:

  1. Create a Kubernetes client from your credentials
  2. Fetch cluster info (API server, version)
  3. List nodes, namespaces, and a sample of pods
  4. Show a success confirmation with cluster summary

If the connection fails, the error message will tell you exactly what went wrong (auth failure, network unreachable, insufficient permissions).


Step 4: Explore the Cluster

After connecting, the cluster dashboard opens automatically.

What you see immediately:

Node health
All nodes with Ready status, CPU and memory usage, Kubernetes version
Workload inventory
Deployments, StatefulSets, DaemonSets, Jobs — counts, replicas, health
Pod status
All pods across namespaces with restart counts, status, and age
Recent events
Warning events that indicate scheduling failures, crashes, or resource pressure

Navigate through the sidebar to explore Pods, Nodes, Workloads, Networking, and Storage.


Step 5: View Pod Logs

1Go to Kubernetes → Pods
2Click any pod
3Click the Logs tab — logs stream in real time
4Click the Terminal tab to open a live shell in the container

Step 6: Investigate an Incident (When One Exists)

ops0 monitors your cluster every 60 seconds. When it detects a problem — a pod in CrashLoopBackOff, high restart counts, or FailedScheduling events — it automatically creates an incident.

To view incidents:

1Go to Kubernetes → Incidents
2Click an open incident
3Click Analyze with AI

What the AI analysis returns:

SectionContent
Root CauseWhat ops0 believes is causing the issue based on events and logs
kubectl commandsSpecific commands to investigate further
RecommendationsKyverno policies to prevent this class of issue
RunbookStep-by-step manual remediation instructions

Incident severities:

SeverityTrigger
P1Pod restart count > 10, cluster-wide failures
P2Pod restart count > 5, deployment degraded
P3Warning events, scheduling delays

Incidents auto-resolve when the triggering condition clears for 10+ minutes.


Verification

After connecting and exploring, verify:

  1. Cluster appears in Kubernetes → Clusters with status Connected
  2. Pods list populates with your namespaces and pods
  3. Events tab shows recent cluster events
  4. Monitoring is active — the cluster shows last-checked timestamp updating

Next Steps


Troubleshooting

Connection fails: "Unauthorized"
For EKS: verify the IAM role has eks:DescribeCluster permission and the role trust policy allows ops0 to assume it. For GKE: the service account needs the Kubernetes Engine Viewer IAM role.
Pods list is empty
The service account or credentials may lack permission to list pods. Ensure the credentials have get, list, and watch verbs on pods across all namespaces.
No incidents showing despite CrashLoopBackOff pods
Incident monitoring runs every 60 seconds. Wait up to 2 minutes for the first scan. If still no incident, verify the cluster has monitoring enabled in the cluster settings.