ops0ops0

Kubernetes

Monitor, manage, and troubleshoot your Kubernetes clusters with AI-powered insights.

ops0 Kubernetes Resource Graph showing cluster topology

What is Kubernetes in ops0?

ops0 provides a unified view of all your Kubernetes clusters with real-time monitoring, incident detection, and AI-assisted troubleshooting. Connect clusters via the Hive agent and get instant visibility into workloads, resources, and issues.

Visibility

Keep all clusters, workloads, and health signals in one operator view.

Detection

Surface incidents, rollouts, and resource pressure before they turn into outages.

Resolution

Move from symptom to root cause with logs, events, dependencies, and AI guidance.

Key Features

Cluster Dashboard

Real-time view of cluster health, node status, resource usage, and running workloads.

Incident Detection

Automatic detection of pod crashes, resource pressure, failed deployments, and misconfigurations.

Resource Graph

Visual dependency graph with incident severity overlays (P1/P2/P3).

AI Troubleshooting

Root cause analysis and remediation suggestions with workload context attached.

Deploy to Cluster

Deploy Helm charts and manifests with per-file configuration and planning.

Cost Analysis

Per-namespace cost breakdown with CPU, memory, GPU, PV, and network costs.


Resource Graph

The Kubernetes resource graph provides a visual dependency map for a cluster:

  • Navigate to a cluster and click Resource Graph
  • See deployments, services, pods, configmaps, secrets, and their connections
  • Incidents are overlaid on the graph with severity indicators (P1 Critical, P2 High, P3 Medium)
  • Click any node to view resource details and drill into logs or events

Deploy to Cluster

Deploy Helm charts or Kubernetes manifests directly from ops0:

FeatureDescription
Per-file configurationConfigure each manifest or Helm values file individually
Helm supportDeploy and manage Helm releases
kubectl applyApply raw manifests to the cluster
Deployment planningPreview changes before applying
OutputsView deployment outputs and applied resource status

How It Works

Connect

Install Hive agent in your cluster

Monitor

View real-time cluster status and metrics

Detect

Get alerted when incidents occur

Resolve

Use AI to diagnose and fix issues

Cluster Status

Clusters show real-time health status:

StatusMeaning
HealthyAll major cluster checks are passing and workloads are behaving normally
WarningMinor issues are present, such as resource pressure or degraded rollouts
CriticalImmediate operator action is likely required
OfflineThe cluster is not connected or has stopped reporting

Incident Severities

Incidents are categorized by severity:

SeverityTypical meaning
CriticalService down, data-loss risk, security exposure, or widespread workload failure
WarningDegraded performance, rollout risk, or resource pressure that needs review
InfoChanges, scaling events, or other signals that are useful context but not urgent

Supported Resources

ops0 monitors all standard Kubernetes resources:

CategoryResources
WorkloadsDeployments, StatefulSets, DaemonSets, Jobs, CronJobs, Pods
NetworkingServices, Ingress, NetworkPolicies, Endpoints
StoragePersistentVolumes, PersistentVolumeClaims, StorageClasses
ConfigConfigMaps, Secrets, ServiceAccounts
ScalingHorizontalPodAutoscalers, VerticalPodAutoscalers
CustomCRDs and custom resources

Quick Start


Example: Investigating a Production Incident

Here's how ops0 helps you troubleshoot a production issue:

1. Incident Detected

Critical Incident2 minutes ago
CrashLoopBackOff: api-gateway-7d9f8c6b4d-2xkjp
Cluster: production-eks • Namespace: api-gateway

2. View in Resource Graph

The Resource Graph highlights the affected pod and its dependencies:

  • Pod api-gateway-7d9f8c6b4d-2xkjp shows red border
  • Connected Service, ConfigMap, and Secret are visible
  • Related Deployment shows warning status

3. AI Analysis

Root Cause Analysis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The pod is failing to start due to a missing environment
variable DB_PASSWORD. The Secret 'api-gateway-secrets'
exists but is missing the 'db-password' key.

Last successful deployment: 3 hours ago
Recent change: Secret 'api-gateway-secrets' was updated
              45 minutes ago (removed db-password key)

Recommended Actions:
1. Add 'db-password' key to Secret 'api-gateway-secrets'
2. Or update Deployment to reference correct Secret key

4. Resolution

After fixing the Secret:

ResolvedJust now
Pod successfully started
3/3 replicas running • All health checks passing

Example: Cluster Dashboard View

┌─────────────────────────────────────────────────────────┐
│  production-eks                          ● Healthy      │
│  AWS EKS 1.28 • us-east-1 • 12 nodes                   │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Nodes          Pods           CPU        Memory        │
│  ━━━━━━━━━━━    ━━━━━━━━━━━    ━━━━━━━    ━━━━━━━       │
│  12/12 Ready    156/200        42%        61%           │
│  ● ● ● ● ●      ████████░░     ████░░░    ██████░       │
│  ● ● ● ● ●                                              │
│  ● ●                                                    │
│                                                         │
│  Recent Incidents                                       │
│  ─────────────────────────────────────────────────────  │
│  ● Warning   High memory usage on node-7    15m ago     │
│  ● Info      HPA scaled api-gateway 3→5     1h ago      │
│  ● Resolved  CrashLoop fixed                2h ago      │
│                                                         │
│  Top Namespaces by Pod Count                            │
│  ─────────────────────────────────────────────────────  │
│  api-gateway      ████████████████████  45              │
│  web-frontend     ████████████          28              │
│  backend          ████████              19              │
│  monitoring       ██████                14              │
│                                                         │
└─────────────────────────────────────────────────────────┘