Connect your first cluster, explore what's running, and use AI to investigate a real incident.
You have a Kubernetes cluster running on AWS (EKS), GCP (GKE), Azure (AKS), or self-managed. You want to:
| Field | What to enter |
|---|---|
| Cluster Name | EKS cluster name (from AWS console) |
| Region | AWS region where the cluster runs |
| IAM Role ARN | Role with eks:DescribeCluster and eks:ListClusters permissions |
| External ID | Optional, for cross-account role assumption |
| Field | What to enter |
|---|---|
| Cluster Name | GKE cluster name |
| Project ID | GCP project ID |
| Location | Region or zone where the cluster runs |
| Service Account JSON | GCP service account key with Kubernetes Engine Reader role |
| Field | What to enter |
|---|---|
| Cluster Name | AKS cluster name |
| Resource Group | Resource group containing the cluster |
| Subscription ID | Azure subscription ID |
| Tenant ID | Azure AD tenant ID |
| Client ID / Secret | Service principal credentials with AKS read access |
Paste your kubeconfig directly. ops0 will sanitize it to remove local credential helper references that wouldn't work server-side.
Click Connect. ops0 will:
If the connection fails, the error message will tell you exactly what went wrong (auth failure, network unreachable, insufficient permissions).
After connecting, the cluster dashboard opens automatically.
What you see immediately:
Navigate through the sidebar to explore Pods, Nodes, Workloads, Networking, and Storage.
ops0 monitors your cluster every 60 seconds. When it detects a problem — a pod in CrashLoopBackOff, high restart counts, or FailedScheduling events — it automatically creates an incident.
To view incidents:
What the AI analysis returns:
| Section | Content |
|---|---|
| Root Cause | What ops0 believes is causing the issue based on events and logs |
| kubectl commands | Specific commands to investigate further |
| Recommendations | Kyverno policies to prevent this class of issue |
| Runbook | Step-by-step manual remediation instructions |
Incident severities:
| Severity | Trigger |
|---|---|
| P1 | Pod restart count > 10, cluster-wide failures |
| P2 | Pod restart count > 5, deployment degraded |
| P3 | Warning events, scheduling delays |
Incidents auto-resolve when the triggering condition clears for 10+ minutes.
After connecting and exploring, verify:
eks:DescribeCluster permission and the role trust policy allows ops0 to assume it. For GKE: the service account needs the Kubernetes Engine Viewer IAM role.get, list, and watch verbs on pods across all namespaces.