Error Correlation
Error Correlation
Error Correlation automatically groups related failures across agents and sessions, then provides AI-powered root cause analysis.
How It Works
A background cron job runs every 5 minutes to:
1. Detect recent task.failed and task.blocked events
2. Normalize error patterns (strip numbers, hex values, file paths)
3. Fingerprint each normalized pattern with a hash
4. Group events with the same fingerprint into error groups
5. Track affected agents and sessions
Error Groups
Each error group shows:
- Severity: low, medium, high, or critical (auto-classified)
- Occurrence count: How many times this error has occurred
- Affected agents: Which agents experienced the error
- First/last seen: Time range of occurrences
- Status: new, investigating, or resolved
AI Root Cause Analysis
Click Analyze Root Cause on any error group. Hivemind sends the error pattern, affected agents, and recent related events to GPT-4o-mini for analysis. The AI returns:
- A concise root cause hypothesis
- A suggested fix
Managing Errors
- Investigate: Mark an error as "investigating" to signal the team
- Resolve: Mark as resolved when fixed — if the error recurs, it re-opens automatically
- Filter: Use the tab bar to filter by status (All / New / Investigating / Resolved)
REST API
GET /v1/errors?status=new&limit=20
GET /v1/errors/:groupId
POST /v1/errors/:groupId/resolve