clerk-incident-runbook

Incident response procedures for Clerk authentication issues. Use when handling auth outages, security incidents, or production authentication problems. Trigger with phrases like "clerk incident", "clerk outage", "clerk down", "auth not working", "clerk emergency". allowed-tools: Read, Write, Edit, Bash(npm:*), Bash(curl:*), Grep version: 1.0.0 license: MIT author: Jeremy Longshore <jeremy@intentsolutions.io>

Allowed Tools

No tools specified

Provided by Plugin

clerk-pack

Claude Code skill pack for Clerk authentication (24 skills)

saas packs v1.0.0
View Plugin

Installation

This skill is included in the clerk-pack plugin:

/plugin install clerk-pack@claude-code-plugins-plus

Click to copy

Instructions

# Clerk Incident Runbook ## Overview Procedures for responding to Clerk-related incidents in production. ## Prerequisites - Access to Clerk dashboard - Access to application logs - Emergency contact list - Rollback procedures documented ## Incident Categories ### Category 1: Complete Auth Outage **Symptoms:** All users unable to sign in, middleware returning errors **Immediate Actions:** ```bash # 1. Check Clerk status curl -s https://status.clerk.com/api/v1/status | jq # 2. Check your endpoint curl -I https://yourapp.com/api/health/clerk # 3. Check environment variables vercel env ls | grep CLERK ``` **Mitigation Steps:** ```typescript // Emergency bypass mode (use with caution) // middleware.ts import { clerkMiddleware } from '@clerk/nextjs/server' import { NextResponse } from 'next/server' const EMERGENCY_BYPASS = process.env.CLERK_EMERGENCY_BYPASS === 'true' export default clerkMiddleware(async (auth, request) => { if (EMERGENCY_BYPASS) { // Log for audit console.warn('[EMERGENCY] Auth bypass active', { path: request.nextUrl.pathname, timestamp: new Date().toISOString() }) return NextResponse.next() } // Normal auth flow await auth.protect() }) ``` ### Category 2: Webhook Processing Failure **Symptoms:** User data out of sync, missing user records **Diagnosis:** ```bash # Check webhook endpoint curl -X POST https://yourapp.com/api/webhooks/clerk \ -H "Content-Type: application/json" \ -d '{"type":"ping"}' \ -w "\n%{http_code}" # Check Clerk dashboard for failed webhooks # Dashboard > Webhooks > Failed Deliveries ``` **Recovery:** ```typescript // scripts/resync-users.ts import { clerkClient } from '@clerk/nextjs/server' import { db } from '../lib/db' async function resyncAllUsers() { const client = await clerkClient() let offset = 0 const limit = 100 while (true) { const { data: users, totalCount } = await client.users.getUserList({ limit, offset }) for (const user of users) { await db.user.upsert({ where: { clerkId: user.id }, update: { email: user.emailAddresses[0]?.emailAddress, firstName: user.firstName, lastName: user.lastName, updatedAt: new Date() }, create: { clerkId: user.id, email: user.emailAddresses[0]?.emailAddress, firstName: user.firstName, lastName: user.lastName } }) } console.log(`Synced ${offset + users.length} of ${totalCount} users`) offset += limit if (offset >= totalCount) break } console.log('Resync complete') } resyncAllUsers() ``` ### Category 3: Security Incident **Symptoms:** Unauthorized access detected, suspicious sessions **Immediate Actions:** ```typescript // scripts/emergency-session-revoke.ts import { clerkClient } from '@clerk/nextjs/server' async function revokeUserSessions(userId: string) { const client = await clerkClient() // Get all active sessions const sessions = await client.sessions.getSessionList({ userId, status: 'active' }) // Revoke all sessions for (const session of sessions.data) { await client.sessions.revokeSession(session.id) console.log(`Revoked session: ${session.id}`) } console.log(`Revoked ${sessions.data.length} sessions for user ${userId}`) } // Revoke all sessions for compromised user revokeUserSessions('user_xxx') ``` ```typescript // scripts/emergency-lockout.ts import { clerkClient } from '@clerk/nextjs/server' async function lockoutUser(userId: string) { const client = await clerkClient() // Ban user (prevents new sign-ins) await client.users.banUser(userId) // Revoke all sessions const sessions = await client.sessions.getSessionList({ userId, status: 'active' }) for (const session of sessions.data) { await client.sessions.revokeSession(session.id) } console.log(`User ${userId} locked out and all sessions revoked`) } ``` ### Category 4: Performance Degradation **Symptoms:** Slow sign-in, high latency, timeouts **Diagnosis:** ```typescript // scripts/diagnose-performance.ts async function diagnosePerformance() { const results = { authCheck: 0, getUserList: 0, currentUser: 0 } // Measure auth check const authStart = performance.now() await auth() results.authCheck = performance.now() - authStart // Measure API call const apiStart = performance.now() const client = await clerkClient() await client.users.getUserList({ limit: 1 }) results.getUserList = performance.now() - apiStart // Measure currentUser const userStart = performance.now() await currentUser() results.currentUser = performance.now() - userStart console.log('Performance Diagnosis:', results) // Check for issues if (results.authCheck > 100) { console.warn('Auth check slow - check middleware configuration') } if (results.getUserList > 500) { console.warn('API slow - check Clerk status or network') } return results } ``` ## Runbook Procedures ### Procedure 1: Auth Outage Response ``` 1. [ ] Confirm outage (check status.clerk.com) 2. [ ] Check application logs for errors 3. [ ] Verify environment variables 4. [ ] If Clerk outage: a. [ ] Enable emergency bypass (if safe) b. [ ] Notify users via status page c. [ ] Monitor Clerk status 5. [ ] If application issue: a. [ ] Check recent deployments b. [ ] Rollback if necessary c. [ ] Check middleware configuration 6. [ ] Document timeline and actions 7. [ ] Conduct post-mortem ``` ### Procedure 2: Security Breach Response ``` 1. [ ] Identify affected accounts 2. [ ] Revoke all sessions for affected users 3. [ ] Lock compromised accounts 4. [ ] Reset API keys if exposed 5. [ ] Enable additional verification 6. [ ] Notify affected users 7. [ ] Review access logs 8. [ ] Document and report ``` ### Procedure 3: Data Sync Recovery ``` 1. [ ] Identify sync gap (check webhook logs) 2. [ ] Pause webhook processing 3. [ ] Export current database state 4. [ ] Run resync script 5. [ ] Verify data integrity 6. [ ] Resume webhook processing 7. [ ] Monitor for new issues ``` ## Emergency Contacts ```yaml # .github/INCIDENT_CONTACTS.yml contacts: on_call: - name: On-Call Engineer phone: "+1-xxx-xxx-xxxx" slack: "@oncall" clerk_support: - url: "https://clerk.com/support" - email: "support@clerk.com" - priority: "For enterprise: contact account manager" escalation: - level: 1 contact: "On-call engineer" time: "0-15 min" - level: 2 contact: "Engineering lead" time: "15-30 min" - level: 3 contact: "CTO" time: "30+ min" ``` ## Post-Incident ### Template ```markdown # Incident Report: [Title] ## Summary - **Date:** YYYY-MM-DD - **Duration:** X hours Y minutes - **Severity:** P1/P2/P3 - **Impact:** [Number of affected users] ## Timeline - HH:MM - Incident detected - HH:MM - Initial response - HH:MM - Mitigation applied - HH:MM - Resolution confirmed ## Root Cause [Description of root cause] ## Resolution [Steps taken to resolve] ## Prevention - [ ] Action item 1 - [ ] Action item 2 ## Lessons Learned [Key takeaways] ``` ## Output - Incident response procedures - Recovery scripts - Emergency bypass capability - Post-incident templates ## Resources - [Clerk Status](https://status.clerk.com) - [Clerk Support](https://clerk.com/support) - [Clerk Discord](https://clerk.com/discord) ## Next Steps Proceed to `clerk-data-handling` for user data management.

Skill file: plugins/saas-packs/clerk-pack/skills/clerk-incident-runbook/SKILL.md