Kubernetes
Monitor, manage, and troubleshoot your Kubernetes clusters with AI-powered insights.

What is Kubernetes in ops0?
ops0 provides a unified view of all your Kubernetes clusters with real-time monitoring, incident detection, and AI-assisted troubleshooting. Connect clusters via the Hive agent and get instant visibility into workloads, resources, and issues.
Visibility
Keep all clusters, workloads, and health signals in one operator view.
Detection
Surface incidents, rollouts, and resource pressure before they turn into outages.
Resolution
Move from symptom to root cause with logs, events, dependencies, and AI guidance.
Key Features
Cluster Dashboard
Real-time view of cluster health, node status, resource usage, and running workloads.
Incident Detection
Automatic detection of pod crashes, resource pressure, failed deployments, and misconfigurations.
Resource Graph
Visual dependency graph with incident severity overlays (P1/P2/P3).
AI Troubleshooting
Root cause analysis and remediation suggestions with workload context attached.
Deploy to Cluster
Deploy Helm charts and manifests with per-file configuration and planning.
Cost Analysis
Per-namespace cost breakdown with CPU, memory, GPU, PV, and network costs.
Resource Graph
The Kubernetes resource graph provides a visual dependency map for a cluster:
- Navigate to a cluster and click Resource Graph
- See deployments, services, pods, configmaps, secrets, and their connections
- Incidents are overlaid on the graph with severity indicators (P1 Critical, P2 High, P3 Medium)
- Click any node to view resource details and drill into logs or events
Deploy to Cluster
Deploy Helm charts or Kubernetes manifests directly from ops0:
| Feature | Description |
|---|---|
| Per-file configuration | Configure each manifest or Helm values file individually |
| Helm support | Deploy and manage Helm releases |
| kubectl apply | Apply raw manifests to the cluster |
| Deployment planning | Preview changes before applying |
| Outputs | View deployment outputs and applied resource status |
How It Works
Connect
Install Hive agent in your cluster
Monitor
View real-time cluster status and metrics
Detect
Get alerted when incidents occur
Resolve
Use AI to diagnose and fix issues
Cluster Status
Clusters show real-time health status:
| Status | Meaning |
|---|---|
| Healthy | All major cluster checks are passing and workloads are behaving normally |
| Warning | Minor issues are present, such as resource pressure or degraded rollouts |
| Critical | Immediate operator action is likely required |
| Offline | The cluster is not connected or has stopped reporting |
Incident Severities
Incidents are categorized by severity:
| Severity | Typical meaning |
|---|---|
| Critical | Service down, data-loss risk, security exposure, or widespread workload failure |
| Warning | Degraded performance, rollout risk, or resource pressure that needs review |
| Info | Changes, scaling events, or other signals that are useful context but not urgent |
Supported Resources
ops0 monitors all standard Kubernetes resources:
| Category | Resources |
|---|---|
| Workloads | Deployments, StatefulSets, DaemonSets, Jobs, CronJobs, Pods |
| Networking | Services, Ingress, NetworkPolicies, Endpoints |
| Storage | PersistentVolumes, PersistentVolumeClaims, StorageClasses |
| Config | ConfigMaps, Secrets, ServiceAccounts |
| Scaling | HorizontalPodAutoscalers, VerticalPodAutoscalers |
| Custom | CRDs and custom resources |
Quick Start
Connect a Cluster
Install Hive agent and connect your first cluster.
View Incidents
Monitor and respond to cluster issues with more context.
Explore Resource Graph
Visualize resource relationships and dependencies.
Deploy Manifests
Deploy Kubernetes configurations with ops0.
Example: Investigating a Production Incident
Here's how ops0 helps you troubleshoot a production issue:
1. Incident Detected
2. View in Resource Graph
The Resource Graph highlights the affected pod and its dependencies:
- Pod
api-gateway-7d9f8c6b4d-2xkjpshows red border - Connected Service, ConfigMap, and Secret are visible
- Related Deployment shows warning status
3. AI Analysis
Root Cause Analysis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
The pod is failing to start due to a missing environment
variable DB_PASSWORD. The Secret 'api-gateway-secrets'
exists but is missing the 'db-password' key.
Last successful deployment: 3 hours ago
Recent change: Secret 'api-gateway-secrets' was updated
45 minutes ago (removed db-password key)
Recommended Actions:
1. Add 'db-password' key to Secret 'api-gateway-secrets'
2. Or update Deployment to reference correct Secret key
4. Resolution
After fixing the Secret:
Example: Cluster Dashboard View
┌─────────────────────────────────────────────────────────┐
│ production-eks ● Healthy │
│ AWS EKS 1.28 • us-east-1 • 12 nodes │
├─────────────────────────────────────────────────────────┤
│ │
│ Nodes Pods CPU Memory │
│ ━━━━━━━━━━━ ━━━━━━━━━━━ ━━━━━━━ ━━━━━━━ │
│ 12/12 Ready 156/200 42% 61% │
│ ● ● ● ● ● ████████░░ ████░░░ ██████░ │
│ ● ● ● ● ● │
│ ● ● │
│ │
│ Recent Incidents │
│ ───────────────────────────────────────────────────── │
│ ● Warning High memory usage on node-7 15m ago │
│ ● Info HPA scaled api-gateway 3→5 1h ago │
│ ● Resolved CrashLoop fixed 2h ago │
│ │
│ Top Namespaces by Pod Count │
│ ───────────────────────────────────────────────────── │
│ api-gateway ████████████████████ 45 │
│ web-frontend ████████████ 28 │
│ backend ████████ 19 │
│ monitoring ██████ 14 │
│ │
└─────────────────────────────────────────────────────────┘