Connect a Kubernetes Cluster
Connect your first cluster, explore what's running, and use AI to investigate a real incident.
Scenario
You have a Kubernetes cluster running on AWS (EKS), GCP (GKE), Azure (AKS), or self-managed. You want to:
- Connect it to ops0 for visibility
- See pods, deployments, events, and resource usage
- Use AI to investigate pod failures or incidents
Prerequisites
Step 1: Add the Cluster
Step 2: Enter Connection Details
Amazon EKS
| Field | What to enter |
|---|---|
| Cluster Name | EKS cluster name (from AWS console) |
| Region | AWS region where the cluster runs |
| IAM Role ARN | Role with eks:DescribeCluster and eks:ListClusters permissions |
| External ID | Optional, for cross-account role assumption |
Google GKE
| Field | What to enter |
|---|---|
| Cluster Name | GKE cluster name |
| Project ID | GCP project ID |
| Location | Region or zone where the cluster runs |
| Service Account JSON | GCP service account key with Kubernetes Engine Reader role |
Azure AKS
| Field | What to enter |
|---|---|
| Cluster Name | AKS cluster name |
| Resource Group | Resource group containing the cluster |
| Subscription ID | Azure subscription ID |
| Tenant ID | Azure AD tenant ID |
| Client ID / Secret | Service principal credentials with AKS read access |
Manual / Self-Managed
Paste your kubeconfig directly. ops0 will sanitize it to remove local credential helper references that wouldn't work server-side.
Step 3: Verify the Connection
Click Connect. ops0 will:
- Create a Kubernetes client from your credentials
- Fetch cluster info (API server, version)
- List nodes, namespaces, and a sample of pods
- Show a success confirmation with cluster summary
If the connection fails, the error message will tell you exactly what went wrong (auth failure, network unreachable, insufficient permissions).
Step 4: Explore the Cluster
After connecting, the cluster dashboard opens automatically.
What you see immediately:
Navigate through the sidebar to explore Pods, Nodes, Workloads, Networking, and Storage.
Step 5: View Pod Logs
Step 6: Investigate an Incident (When One Exists)
ops0 monitors your cluster every 60 seconds. When it detects a problem — a pod in CrashLoopBackOff, high restart counts, or FailedScheduling events — it automatically creates an incident.
To view incidents:
What the AI analysis returns:
| Section | Content |
|---|---|
| Root Cause | What ops0 believes is causing the issue based on events and logs |
| kubectl commands | Specific commands to investigate further |
| Recommendations | Kyverno policies to prevent this class of issue |
| Runbook | Step-by-step manual remediation instructions |
Incident severities:
| Severity | Trigger |
|---|---|
| P1 | Pod restart count > 10, cluster-wide failures |
| P2 | Pod restart count > 5, deployment degraded |
| P3 | Warning events, scheduling delays |
Incidents auto-resolve when the triggering condition clears for 10+ minutes.
Verification
After connecting and exploring, verify:
- Cluster appears in Kubernetes → Clusters with status Connected
- Pods list populates with your namespaces and pods
- Events tab shows recent cluster events
- Monitoring is active — the cluster shows last-checked timestamp updating
Next Steps
Troubleshooting
eks:DescribeCluster permission and the role trust policy allows ops0 to assume it. For GKE: the service account needs the Kubernetes Engine Viewer IAM role.get, list, and watch verbs on pods across all namespaces.