Production Engineering

Production Playbooks

Comprehensive technical guides for building production-grade Claude Code plugin systems. Each playbook provides deep implementation details, production-ready code examples, and real-world patterns learned from operating large-scale AI agent deployments.

11
Playbooks
53k
Words
5
Categories

Choose Your Focus Area

Select a category to explore production-ready guides

All Playbooks

Cost

Multi-Agent Rate Limits

Prevent API throttling in concurrent multi-agent systems. Token bucket algorithms, sliding windows, priority queues, and backpressure handling for Claude API rate limits.

~14 min read 2,800 words
Cost

Cost Caps & Budget Management

Hard budget controls for AI spending. Real-time spend tracking, automatic shutoffs, team quotas, and financial safeguards to prevent runaway costs.

~16 min read 3,200 words
Infrastructure

MCP Server Reliability

Self-healing MCP servers with circuit breakers, exponential backoff, health checks, and automatic recovery. Production-grade Model Context Protocol implementations.

~18 min read 3,500 words
Infrastructure

Ollama Migration Guide

Switch from OpenAI/Anthropic to self-hosted LLMs. Complete migration path: local setup, prompt translation, performance benchmarks, and cost analysis.

~23 min read 4,500 words
Operations

Incident Debugging Playbook

SEV-1/2/3/4 incident response protocols. Log analysis, root cause investigation (5 Whys, Fishbone), postmortem templates, and on-call procedures.

~25 min read 5,000 words
Infrastructure

Self-Hosted Stack Setup

Full infrastructure deployment with Docker/Kubernetes. Ollama, PostgreSQL, Redis, Prometheus, Grafana, Nginx - complete production stack with monitoring and backups.

~28 min read 5,500 words
Security

Compliance & Audit Guide

SOC 2, GDPR, HIPAA, PCI DSS implementation. Audit logging with immutable signatures, RBAC, data privacy (PII redaction), and regulatory compliance.

~30 min read 6,000 words
Operations

Team Presets & Workflows

Team standardization and collaboration. Plugin bundles, workflow templates, automated onboarding, and multi-layer configuration hierarchy (org/team/project/individual).

~25 min read 5,000 words
Cost

Cost Attribution System

Multi-dimensional cost tracking (team/project/user/workflow). Automatic tagging, chargeback models, budget enforcement, and usage analytics for AI operations.

~28 min read 5,500 words
Operations

Progressive Enhancement Patterns

Safe AI feature rollout strategies. Feature flags (0% → 100%), A/B testing, canary deployments, graceful degradation, and automated rollback on failures.

~28 min read 5,500 words
AI Architecture

Advanced Tool Use

Dynamic tool discovery, programmatic orchestration, and parameter guidance. Tool Search Tool (85% token reduction), Programmatic Tool Calling (37% efficiency gains), and Tool Use Examples (90% parameter accuracy). Enterprise-scale agent architecture.

~33 min read 6,500 words