lindy-incident-runbook
Incident response runbook for Lindy AI integrations. Use when responding to incidents, troubleshooting outages, or creating on-call procedures. Trigger with phrases like "lindy incident", "lindy outage", "lindy on-call", "lindy runbook". allowed-tools: Read, Write, Edit, Bash(curl:*) version: 1.0.0 license: MIT author: Jeremy Longshore <jeremy@intentsolutions.io>
Allowed Tools
No tools specified
Provided by Plugin
lindy-pack
Claude Code skill pack for Lindy AI (24 skills)
Installation
This skill is included in the lindy-pack plugin:
/plugin install lindy-pack@claude-code-plugins-plus
Click to copy
Instructions
# Lindy Incident Runbook
## Overview
Incident response procedures for Lindy AI integration issues.
## Prerequisites
- Access to Lindy dashboard
- Monitoring dashboards available
- Escalation contacts known
- Admin access to production
## Incident Severity Levels
| Severity | Description | Response Time | Examples |
|----------|-------------|---------------|----------|
| SEV1 | Complete outage | 15 minutes | All agents failing |
| SEV2 | Partial outage | 30 minutes | One critical agent down |
| SEV3 | Degraded | 2 hours | High latency, some errors |
| SEV4 | Minor | 24 hours | Cosmetic issues |
## Quick Diagnostics
### Step 1: Check Lindy Status
```bash
# Check Lindy status page
curl -s https://status.lindy.ai/api/v1/status | jq '.status'
# Check API health
curl -s -o /dev/null -w "%{http_code}" \
-H "Authorization: Bearer $LINDY_API_KEY" \
https://api.lindy.ai/v1/health
```
### Step 2: Verify Authentication
```bash
# Test API key
curl -s -H "Authorization: Bearer $LINDY_API_KEY" \
https://api.lindy.ai/v1/users/me | jq '.email'
```
### Step 3: Check Rate Limits
```bash
# Check rate limit headers
curl -sI -H "Authorization: Bearer $LINDY_API_KEY" \
https://api.lindy.ai/v1/users/me | grep -i "x-ratelimit"
```
## Common Incidents
### Incident: Complete API Outage
**Symptoms:**
- All API calls failing
- 5xx errors from Lindy
**Runbook:**
```markdown
1. [ ] Check https://status.lindy.ai
2. [ ] Verify it's not a local network issue
3. [ ] Check if other services on same network work
4. [ ] Enable fallback mode if available
5. [ ] Notify stakeholders
6. [ ] Open support ticket with Lindy
7. [ ] Monitor status page for updates
```
**Fallback Code:**
```typescript
async function runWithFallback(agentId: string, input: string) {
try {
return await lindy.agents.run(agentId, { input });
} catch (error: any) {
if (error.status >= 500) {
// Enable fallback mode
return {
output: 'Service temporarily unavailable. Please try again later.',
fallback: true,
};
}
throw error;
}
}
```
### Incident: Rate Limiting
**Symptoms:**
- 429 errors
- "Rate limit exceeded" messages
**Runbook:**
```markdown
1. [ ] Check current usage in dashboard
2. [ ] Identify spike source (which agent/automation)
3. [ ] Reduce request rate or implement throttling
4. [ ] Consider upgrading plan if legitimate traffic
5. [ ] Implement request queuing
```
**Throttling Code:**
```typescript
const queue = new PQueue({ concurrency: 5, interval: 1000, intervalCap: 10 });
async function throttledRun(agentId: string, input: string) {
return queue.add(() => lindy.agents.run(agentId, { input }));
}
```
### Incident: Agent Failures
**Symptoms:**
- Specific agent not responding
- Unexpected outputs
- Timeout errors
**Runbook:**
```markdown
1. [ ] Identify affected agent(s)
2. [ ] Check agent configuration hasn't changed
3. [ ] Review recent runs for patterns
4. [ ] Test with simple input
5. [ ] Check if tools are working
6. [ ] Rollback to previous version if needed
```
**Diagnostic Script:**
```typescript
async function diagnoseAgent(agentId: string) {
const lindy = new Lindy({ apiKey: process.env.LINDY_API_KEY });
// Get agent details
const agent = await lindy.agents.get(agentId);
console.log('Agent:', agent.name, agent.status);
// Check recent runs
const runs = await lindy.runs.list({ agentId, limit: 10 });
const failures = runs.filter((r: any) => r.status === 'failed');
console.log(`Failures: ${failures.length}/${runs.length}`);
// Test run
try {
const test = await lindy.agents.run(agentId, { input: 'Hello' });
console.log('Test run: SUCCESS');
} catch (e: any) {
console.log('Test run: FAILED -', e.message);
}
return { agent, runs, failures };
}
```
### Incident: High Latency
**Symptoms:**
- Response times > 10 seconds
- Timeouts increasing
**Runbook:**
```markdown
1. [ ] Check Lindy status page for degradation
2. [ ] Review latency metrics by agent
3. [ ] Check if issue is with specific agent
4. [ ] Verify instructions aren't causing long responses
5. [ ] Consider reducing max_tokens
6. [ ] Implement streaming if not already
```
## Escalation Matrix
| Level | Contact | When |
|-------|---------|------|
| L1 | On-call engineer | Initial response |
| L2 | Engineering lead | After 30 min SEV1/2 |
| L3 | VP Engineering | After 1 hour SEV1 |
| Lindy | support@lindy.ai | External issue confirmed |
## Post-Incident
### Incident Report Template
```markdown
## Incident Report: [Title]
**Date:** YYYY-MM-DD
**Duration:** X hours Y minutes
**Severity:** SEV1/2/3/4
**Impact:** [Description of user impact]
### Timeline
- HH:MM - Incident detected
- HH:MM - On-call paged
- HH:MM - Root cause identified
- HH:MM - Resolution applied
- HH:MM - All clear
### Root Cause
[What caused the incident]
### Resolution
[What fixed it]
### Action Items
- [ ] [Preventive action 1]
- [ ] [Preventive action 2]
```
## Output
- Quick diagnostic commands
- Common incident runbooks
- Fallback code patterns
- Escalation procedures
- Post-incident template
## Resources
- [Lindy Status](https://status.lindy.ai)
- [Lindy Support](https://support.lindy.ai)
- [API Reference](https://docs.lindy.ai/api)
## Next Steps
Proceed to `lindy-data-handling` for data management.