ops0ops0

Alerts & Thresholds

Configure thresholds for Hive agents to get notified when a server metric crosses a warning or critical level. Alerts fire when a violation persists for a configured duration.

How Thresholds Work

Each threshold monitors a single metric on one or more agents. When the metric exceeds the configured value for the required duration, Hive creates an alert and sends notifications to your configured channels.

Metric value > threshold  AND  violation duration ≥ durationSeconds
                              → fire alert (warning or critical)

When the metric returns to normal, the alert is automatically resolved.

Threshold Fields

FieldDescription
Metric TypeWhich metric to monitor (see list below)
Warning ThresholdValue that triggers a warning alert
Critical ThresholdValue that triggers a critical alert
Duration (seconds)How long the violation must persist before alerting (avoids spikes)
Comparisongreater_than, less_than, or equals
Notify on WarningSend notification for warning-level violations
Notify on CriticalSend notification for critical-level violations
EnabledToggle threshold on/off without deleting it

Supported Metric Types

MetricDescription
cpuCPU utilization (%)
memoryMemory utilization (%)
diskDisk utilization (%)
network_inInbound network throughput (bytes/s)
network_outOutbound network throughput (bytes/s)
load_1m1-minute load average
load_5m5-minute load average
load_15m15-minute load average

Organization Defaults vs Agent-Level Thresholds

Organization defaults apply to all agents that have no agent-level override. Set them once and every new agent inherits them automatically.

Agent-level thresholds override the organization defaults for a specific agent. Use these when a server legitimately runs hotter than your baseline (e.g. a build server vs. a web server).

Setting Organization Defaults

  1. Go to Hive → Thresholds
  2. Click Edit Defaults
  3. Configure the metric, values, and duration
  4. Save — all agents without agent-level overrides will use this threshold

Setting Agent-Level Thresholds

  1. Go to Hive → Agents → click the agent
  2. Click Thresholds
  3. Click Add Threshold or edit an existing one
  4. Configure values specific to this agent

Alert Severity Levels

LevelColorMeaning
InfoBlueInformational, no action required
WarningYellowApproaching a limit, monitor closely
ErrorOrangeExceeding threshold, action recommended
CriticalRedSevere violation, immediate action required

Alert Counts

Each agent card shows alert counts by severity. The counts update in real time as alerts fire and resolve.

Notification Channels

Alerts are delivered through your configured notification channels. Go to Settings → Integrations → Alerts to connect:

  • Slack
  • PagerDuty
  • Email
  • Webhook

Thresholds respect the notifyOnWarning and notifyOnCritical toggles — you can send warnings to Slack and critical alerts to PagerDuty simultaneously.

Example: CPU Threshold

Metric:     cpu
Comparison: greater_than
Warning:    70%
Critical:   90%
Duration:   120 seconds

With this threshold:

  • CPU > 70% for 2 minutes → warning alert
  • CPU > 90% for 2 minutes → critical alert
  • CPU drops below threshold → alert auto-resolves

Troubleshooting

Alert fires too often (flapping)
Increase the Duration field. A duration of 60–300 seconds filters out brief spikes and only fires when the condition is sustained.
No notifications received
Check that a notification channel is connected in Settings → Integrations → Alerts and that notifyOnWarning / notifyOnCritical are enabled on the threshold.
Agent shows critical alert but metric looks normal
Alerts resolve when the metric drops below the threshold for the required duration. If the alert persists but the current metric looks fine, the resolution window may not have elapsed yet.