ops0ops0

Troubleshooting

Common issues and solutions for ops0 platform.


Authentication & Access

"Access Denied" when connecting cloud provider

Symptoms: Integration test fails with access denied or insufficient permissions.

Solutions:

AWS: Verify IAM Role

Verify the IAM role trust policy includes ops0's AWS account ID.

AWS: Check External ID

Check the external ID matches what ops0 shows.

GCP: Workload Identity

Ensure Workload Identity Federation is configured correctly.

Azure: Service Principal

Verify the service principal has the Contributor role.

"SSO login failed"

Symptoms: Unable to log in via SSO, redirect loop, or error page.

Solutions:

CheckSolution
SSO provider configurationVerify ACS URL and Entity ID match ops0 settings
Certificate expirationUpdate SAML certificate if expired
User provisioningEnsure user exists in both SSO provider and ops0
Browser cookiesClear cookies and try again

Terraform & IaC

"Terraform init failed"

Symptoms: Project fails to initialize, missing providers.

Solutions:

Check provider version constraints

Ensure your required_providers block is valid:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

"State lock failed"

Symptoms: Can't run plan or apply, state is locked.

Solutions:

  1. Wait for other operations to complete - Another deployment may be in progress
  2. Check for failed workflows - A crashed workflow may have left a lock
  3. Contact support - If lock is stuck, support can safely release it

"Terraform plan shows unexpected changes"

Symptoms: Plan shows changes even though you didn't modify anything.

Common causes:

CauseSolution
DriftSomeone changed resources manually in the console
Provider upgradeNew provider version changed default values
State refreshCloud API returned different values

Run Drift Detection to see what changed and decide whether to update code or revert the change.


Kubernetes

"Cluster not connecting"

Symptoms: Cluster shows disconnected, no pod data.

Solutions:

Check Hive agent is running

kubectl get pods -n ops0

Check agent logs

kubectl logs -n ops0 -l app=hive-agent

Verify outbound connectivity

Ensure connectivity from the agent to the ops0 control plane endpoint used by brew.ops0.ai over port 443.

Check network policies

Check for network policies blocking egress.

"Hive agent CrashLoopBackOff"

Symptoms: Hive agent pod keeps restarting.

Solutions:

  1. Check logs for specific error: kubectl logs -n ops0 -l app=hive-agent --previous
  2. Verify RBAC permissions are applied: kubectl auth can-i list pods --as=system:serviceaccount:ops0:hive-agent
  3. Ensure cluster token is valid (regenerate in ops0 if needed)

Workflows

"Workflow stuck in pending"

Symptoms: Workflow won't start, shows pending forever.

Solutions:

CheckSolution
Trigger conditionVerify trigger event actually occurred
Approval stepCheck if workflow is waiting for approval
Resource limitsops0 may be rate-limiting concurrent workflows

"Approval not sending notifications"

Symptoms: Approval step reached but no Slack/email notification.

Solutions:

  1. Verify Slack integration is connected in Settings > Integrations
  2. Check the notification channel exists and ops0 app is invited
  3. Confirm approvers are configured in the workflow step

Policies

"Policy always fails"

Symptoms: Policy fails on every deployment, even compliant ones.

Solutions:

Test your Rego policy

Use the policy editor's "Test" feature with sample input to debug.

Check for typos in resource types

Example: Use aws_s3_bucket, not aws_s3.

Verify input structure

Click "View Input" in a failed policy check to see the actual JSON being evaluated.


GitHub Integration

"PR comments not appearing"

Symptoms: ops0 not posting plan results to pull requests.

Solutions:

  1. Verify GitHub app is installed on the repository
  2. Check GitHub app permissions include "Write" on pull requests
  3. Confirm the project is connected to the correct repository
  4. Check webhook delivery status in GitHub repo settings

"Sync conflicts"

Symptoms: Push/pull fails with merge conflicts.

Solutions:

  1. Pull latest changes from GitHub first
  2. Resolve conflicts in the ops0 editor
  3. Push the merged result

GitLab Integration

"GitLab sync not working"

Symptoms: Files not syncing, merge request creation fails.

Solutions:

CheckSolution
Token typeUse a Personal, Group, or Project Access Token with api scope
Token expirationRegenerate if expired
Self-hosted URLVerify the GitLab instance URL is correct in integration settings
Repository accessEnsure the token has access to the target repository

Oxid / Query Console

"Oxid init failed"

Symptoms: Enabling Oxid on a project fails during initialization.

Solutions:

  1. Verify the PostgreSQL connection URL format (postgres:// or postgresql://)
  2. Ensure the database is accessible from the ops0 platform
  3. Check that the database user has create table permissions
  4. Review the streaming log output for specific error messages

"Query Console returns no results"

Symptoms: SQL queries return empty results despite having deployed resources.

Solutions:

  1. Ensure Oxid is enabled and configured on the project
  2. Run a deployment or trigger an Oxid sync to populate the state database
  3. Check the schema browser to verify tables exist
  4. Only SELECT queries are allowed; write operations are blocked

Common Error Messages

Error: Invalid credentials

Your cloud integration credentials are invalid or expired. Go to Settings → Integrations and re-authenticate.

Error: State lock held

Another deployment is in progress for this project. Wait for it to complete or cancel it if stuck.

Error: Policy violation

Your changes violate a blocking policy. Review the violation details and fix your code before deploying.

Error: Cluster unreachable

ops0 can't connect to your Kubernetes cluster. Check that the agent is running and can reach the internet.

Error: Rate limit exceeded

You've hit API rate limits with your cloud provider. Wait a few minutes and retry, or request limit increases.


Getting Help

If you can't resolve an issue:

When contacting support, include:

  • Organization ID (found in Settings)
  • Project/cluster name
  • Error messages and screenshots
  • Steps to reproduce

Next Steps