Infrastructure

Self-Hosted Stack Setup

Full infrastructure deployment with Docker/Kubernetes. Ollama, PostgreSQL, Redis, Prometheus, Grafana, Nginx - complete production stack with monitoring and backups.

~28 min read 5,500 words Production-Ready

Production Playbook for Infrastructure Engineers and DevOps

Building a complete self-hosted AI infrastructure for Claude Code plugins eliminates cloud dependencies, ensures data privacy, and provides full control. This playbook provides production-ready Docker Compose configurations, Kubernetes deployments, monitoring with Prometheus/Grafana, automated backups, and disaster recovery procedures.

Architecture Overview

Self-Hosted Stack Components

graph TB
    A[Claude Code CLI] --> B[Analytics Daemon]
    A --> C[Ollama LLM Server]
    A --> D[PostgreSQL Database]
    A --> E[Redis Cache]

B --> F[Prometheus Metrics]
C --> F

F --> G[Grafana Dashboard]

D --> H[Backup Service]
E --> H

I[Nginx Reverse Proxy] --> B
I --> C

J[Let's Encrypt] --> I

Infrastructure Tiers

Component Purpose Port Storage
Ollama Local LLM inference 11434 100GB (models)
Analytics Daemon Real-time monitoring 3333, 3456 10GB (logs)
PostgreSQL Persistent data 5432 50GB (database)
Redis Caching, sessions 6379 5GB (cache)
Prometheus Metrics collection 9090 20GB (metrics)
Grafana Dashboards 3000 5GB (config)
Nginx Reverse proxy, SSL 80, 443 1GB (logs)

Total Storage: ~191GB minimum


Docker Compose Setup

Complete Stack (docker-compose.yml)

# docker-compose.yml
version: '3.8'

services:
# Ollama LLM Server
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434"
volumes:
- ollama_models:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 3

# Analytics Daemon
analytics:
build: ./packages/analytics-daemon
container_name: analytics-daemon
ports:
- "3333:3333"  # HTTP API
- "3456:3456"  # WebSocket
volumes:
- analytics_data:/data
- ${HOME}/.claude:/root/.claude:ro
environment:
- NODE_ENV=production
- PORT=3333
- WS_PORT=3456
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3333/health"]
interval: 30s
timeout: 5s
retries: 3

# PostgreSQL Database
postgres:
image: postgres:16-alpine
container_name: postgres
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
- ./backups/postgres:/backups
environment:
- POSTGRES_USER=claude
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=claude_prod
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "pg_isready -U claude"]
interval: 10s
timeout: 5s
retries: 5

# Redis Cache
redis:
image: redis:7-alpine
container_name: redis
ports:
- "6379:6379"
volumes:
- redis_data:/data
command: redis-server --appendonly yes --maxmemory 2gb --maxmemory-policy allkeys-lru
restart: unless-stopped
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3

# Prometheus Metrics
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
restart: unless-stopped

# Grafana Dashboards
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards
- ./grafana/datasources:/etc/grafana/provisioning/datasources
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
- GF_INSTALL_PLUGINS=redis-datasource
restart: unless-stopped
depends_on:
- prometheus

# Nginx Reverse Proxy
nginx:
image: nginx:alpine
container_name: nginx
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
- nginx_logs:/var/log/nginx
restart: unless-stopped
depends_on:
- ollama
- analytics
- grafana

# Automated Backups
backup:
image: alpine:latest
container_name: backup-service
volumes:
- postgres_data:/source/postgres:ro
- redis_data:/source/redis:ro
- analytics_data:/source/analytics:ro
- ./backups:/backups
command: |
sh -c '
apk add --no-cache postgresql-client redis
while true; do
DATE=$(date +%Y-%m-%d_%H-%M-%S)

# Backup PostgreSQL
PGPASSWORD=$$POSTGRES_PASSWORD pg_dump -h postgres -U claude claude_prod > /backups/postgres/backup_$$DATE.sql

# Backup Redis
redis-cli -h redis --rdb /backups/redis/dump_$$DATE.rdb

# Backup Analytics
tar -czf /backups/analytics/backup_$$DATE.tar.gz /source/analytics

# Delete old backups (keep 7 days)
find /backups -name "backup_*.sql" -mtime +7 -delete
find /backups -name "dump_*.rdb" -mtime +7 -delete
find /backups -name "backup_*.tar.gz" -mtime +7 -delete

echo "Backup completed: $$DATE"
sleep 86400  # Daily backups
done
'
environment:
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
restart: unless-stopped

volumes:
ollama_models:
analytics_data:
postgres_data:
redis_data:
prometheus_data:
grafana_data:
nginx_logs:

networks:
default:
name: claude_network

Environment Configuration (.env)

# .env
POSTGRES_PASSWORD=your-secure-password-here
GRAFANA_PASSWORD=your-grafana-password-here

Nginx Configuration

# nginx/nginx.conf
events {
    worker_connections 1024;
}

http {
upstream ollama {
server ollama:11434;
}

upstream analytics {
server analytics:3333;
}

upstream grafana {
server grafana:3000;
}

# HTTP -> HTTPS redirect
server {
listen 80;
server_name claude.example.com;
return 301 https://$server_name$request_uri;
}

# HTTPS
server {
listen 443 ssl http2;
server_name claude.example.com;

ssl_certificate /etc/nginx/ssl/fullchain.pem;
ssl_certificate_key /etc/nginx/ssl/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;

# Ollama API
location /api/ollama/ {
proxy_pass http://ollama/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}

# Analytics API
location /api/analytics/ {
proxy_pass http://analytics/api/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}

# Analytics WebSocket
location /ws/ {
proxy_pass http://analytics/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}

# Grafana
location / {
proxy_pass http://grafana/;
proxy_set_header Host $host;
}
}
}

Deployment Commands

# Setup
mkdir -p backups/{postgres,redis,analytics}
mkdir -p grafana/{dashboards,datasources}
touch .env  # Add passwords

# Start stack
docker-compose up -d

# Download Ollama models
docker exec ollama ollama pull llama3.3:70b
docker exec ollama ollama pull qwen2.5-coder:32b

# Verify health
docker-compose ps
docker-compose logs -f

# Access services
# Ollama: http://localhost:11434
# Analytics: http://localhost:3333
# Grafana: http://localhost:3000
# Prometheus: http://localhost:9090

Kubernetes Deployment

Complete Kubernetes Manifests

Namespace:

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: claude-stack

Ollama Deployment:

# ollama-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: claude-stack
spec:
  replicas: 3  # Scale with GPUs
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        resources:
          limits:
            nvidia.com/gpu: 1
          requests:
            memory: "16Gi"
            cpu: "4"
        volumeMounts:
        - name: models
          mountPath: /root/.ollama
        livenessProbe:
          httpGet:
            path: /api/tags
            port: 11434
          initialDelaySeconds: 60
          periodSeconds: 30
      volumes:
      - name: models
        persistentVolumeClaim:
          claimName: ollama-models-pvc

apiVersion: v1 kind: Service metadata: name: ollama-service namespace: claude-stack spec: selector: app: ollama ports: - protocol: TCP port: 11434 targetPort: 11434 type: ClusterIP
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: ollama-models-pvc namespace: claude-stack spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: fast-ssd

PostgreSQL StatefulSet:

# postgres-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: claude-stack
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:16-alpine
        ports:
        - containerPort: 5432
        env:
        - name: POSTGRES_USER
          value: "claude"
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        - name: POSTGRES_DB
          value: "claude_prod"
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 50Gi
      storageClassName: fast-ssd

apiVersion: v1 kind: Service metadata: name: postgres namespace: claude-stack spec: selector: app: postgres ports: - protocol: TCP port: 5432 targetPort: 5432 clusterIP: None

Monitoring Stack:

# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: claude-stack
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: config
          mountPath: /etc/prometheus
        - name: data
          mountPath: /prometheus
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.retention.time=30d'
      volumes:
      - name: config
        configMap:
          name: prometheus-config
      - name: data
        persistentVolumeClaim:
          claimName: prometheus-data-pvc

apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: claude-stack data: prometheus.yml: | global: scrape_interval: 15s scrape_configs: - job_name: 'ollama' static_configs: - targets: ['ollama-service:11434'] - job_name: 'postgres' static_configs: - targets: ['postgres:5432'] - job_name: 'analytics' static_configs: - targets: ['analytics-service:3333']

Deploy to Kubernetes

# Create namespace
kubectl apply -f namespace.yaml

# Create secrets
kubectl create secret generic postgres-secret --from-literal=password='your-secure-password' -n claude-stack

# Deploy services
kubectl apply -f ollama-deployment.yaml
kubectl apply -f postgres-statefulset.yaml
kubectl apply -f prometheus-deployment.yaml

# Verify
kubectl get pods -n claude-stack
kubectl logs -f deployment/ollama -n claude-stack

Monitoring with Prometheus & Grafana

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
# Ollama metrics
- job_name: 'ollama'
static_configs:
- targets: ['ollama:11434']
metrics_path: '/metrics'

# Analytics daemon metrics
- job_name: 'analytics'
static_configs:
- targets: ['analytics:3333']
metrics_path: '/metrics'

# PostgreSQL metrics (using postgres_exporter)
- job_name: 'postgres'
static_configs:
- targets: ['postgres-exporter:9187']

# Redis metrics (using redis_exporter)
- job_name: 'redis'
static_configs:
- targets: ['redis-exporter:9121']

# Node metrics
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']

alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']

rule_files:
- '/etc/prometheus/rules/*.yml'

Alert Rules

# prometheus/rules/alerts.yml
groups:
  - name: claude_stack_alerts
    interval: 30s
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: rate(llm_errors_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High LLM error rate"
          description: "Error rate is {{ $value | humanizePercentage }}"

# Ollama down
- alert: OllamaDown
expr: up{job="ollama"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Ollama is down"

# High memory usage
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage"
description: "Memory usage is {{ $value | humanizePercentage }}"

# Database connection issues
- alert: PostgreSQLDown
expr: up{job="postgres"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "PostgreSQL is down"

Grafana Dashboards

Claude Stack Dashboard (JSON):

{
  "dashboard": {
    "title": "Claude Code Self-Hosted Stack",
    "panels": [
      {
        "title": "LLM Requests/sec",
        "targets": [
          {
            "expr": "rate(llm_requests_total[5m])"
          }
        ],
        "type": "graph"
      },
      {
        "title": "Error Rate",
        "targets": [
          {
            "expr": "rate(llm_errors_total[5m]) / rate(llm_requests_total[5m])"
          }
        ],
        "type": "graph"
      },
      {
        "title": "Response Latency (p95)",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(llm_request_duration_seconds_bucket[5m]))"
          }
        ],
        "type": "graph"
      },
      {
        "title": "GPU Utilization",
        "targets": [
          {
            "expr": "nvidia_gpu_duty_cycle"
          }
        ],
        "type": "gauge"
      },
      {
        "title": "Database Connections",
        "targets": [
          {
            "expr": "pg_stat_database_numbackends"
          }
        ],
        "type": "graph"
      }
    ]
  }
}

Backup Strategies

Automated Daily Backups

#!/bin/bash
# backup.sh - Automated backup script

DATE=$(date +%Y-%m-%d_%H-%M-%S)
BACKUP_DIR="/backups"

# PostgreSQL backup
echo "Backing up PostgreSQL..."
PGPASSWORD=$POSTGRES_PASSWORD pg_dump -h localhost -U claude claude_prod | gzip > $BACKUP_DIR/postgres/backup_$DATE.sql.gz

# Redis backup
echo "Backing up Redis..."
redis-cli --rdb $BACKUP_DIR/redis/dump_$DATE.rdb

# Analytics data
echo "Backing up Analytics..."
tar -czf $BACKUP_DIR/analytics/backup_$DATE.tar.gz /var/lib/analytics

# Ollama models (weekly only)
if [ $(date +%u) -eq 1 ]; then
echo "Backing up Ollama models (weekly)..."
tar -czf $BACKUP_DIR/ollama/models_$DATE.tar.gz /root/.ollama
fi

# Upload to S3 (optional)
aws s3 sync $BACKUP_DIR s3://my-backups/claude-stack/

# Delete old local backups (keep 7 days)
find $BACKUP_DIR -name "backup_*.sql.gz" -mtime +7 -delete
find $BACKUP_DIR -name "dump_*.rdb" -mtime +7 -delete
find $BACKUP_DIR -name "backup_*.tar.gz" -mtime +7 -delete

echo "Backup completed: $DATE"

Restore Procedures

#!/bin/bash
# restore.sh - Restore from backup

BACKUP_FILE=$1

if [ -z "$BACKUP_FILE" ]; then
echo "Usage: ./restore.sh <backup_file>"
exit 1
fi

# Stop services
docker-compose down

# Restore PostgreSQL
if [[ $BACKUP_FILE == "postgres" ]]; then
gunzip -c $BACKUP_FILE | PGPASSWORD=$POSTGRES_PASSWORD psql -h localhost -U claude claude_prod
fi

# Restore Redis
if [[ $BACKUP_FILE == "redis" ]]; then
cp $BACKUP_FILE /var/lib/redis/dump.rdb
fi

# Restore Analytics
if [[ $BACKUP_FILE == "analytics" ]]; then
tar -xzf $BACKUP_FILE -C /
fi

# Restart services
docker-compose up -d

echo "Restore completed from: $BACKUP_FILE"

Security Hardening

Firewall Rules (UFW)

# Allow SSH
sudo ufw allow 22/tcp

# Allow HTTP/HTTPS (nginx only)
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Block direct access to services
sudo ufw deny 11434/tcp  # Ollama
sudo ufw deny 3333/tcp   # Analytics
sudo ufw deny 5432/tcp   # PostgreSQL
sudo ufw deny 6379/tcp   # Redis

# Enable firewall
sudo ufw enable

SSL/TLS Certificates (Let's Encrypt)

# Install certbot
sudo apt-get install certbot

# Generate certificate
sudo certbot certonly --standalone -d claude.example.com

# Auto-renewal cron
echo "0 0   * certbot renew --quiet" | sudo crontab -

Database Security

-- PostgreSQL hardening

-- Create read-only user
CREATE USER claude_readonly WITH PASSWORD 'readonly-password';
GRANT CONNECT ON DATABASE claude_prod TO claude_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO claude_readonly;

-- Disable remote root access
ALTER USER postgres PASSWORD 'strong-random-password';
REVOKE ALL ON DATABASE postgres FROM PUBLIC;

-- Enable SSL
ALTER SYSTEM SET ssl = on;
ALTER SYSTEM SET ssl_cert_file = '/etc/ssl/certs/server.crt';
ALTER SYSTEM SET ssl_key_file = '/etc/ssl/private/server.key';

Scaling & High Availability

Load Balancing Ollama

# ollama-ha.yaml
apiVersion: v1
kind: Service
metadata:
  name: ollama-lb
  namespace: claude-stack
spec:
  selector:
    app: ollama
  ports:
  - protocol: TCP
    port: 11434
    targetPort: 11434
  type: LoadBalancer
  sessionAffinity: ClientIP  # Sticky sessions

apiVersion: apps/v1 kind: Deployment metadata: name: ollama namespace: claude-stack spec: replicas: 5 # 5 GPU nodes strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 1 template: metadata: labels: app: ollama spec: containers: - name: ollama image: ollama/ollama:latest resources: limits: nvidia.com/gpu: 1

PostgreSQL High Availability

# postgres-ha.yaml (using Patroni)
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres-ha
  namespace: claude-stack
spec:
  serviceName: postgres-ha
  replicas: 3  # Primary + 2 replicas
  selector:
    matchLabels:
      app: postgres-ha
  template:
    metadata:
      labels:
        app: postgres-ha
    spec:
      containers:
      - name: postgres
        image: postgres:16-alpine
        env:
        - name: PATRONI_SCOPE
          value: "postgres-cluster"
        - name: PATRONI_REPLICATION_USERNAME
          value: "replicator"
        - name: PATRONI_REPLICATION_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-ha-secret
              key: replication-password

Best Practices

DO ✅

  • Use persistent volumes
volumes:
     - ollama_models:/root/.ollama  # Persistent
     - type: tmpfs                    # Temporary cache
       target: /tmp
  • Implement health checks
healthcheck:
     test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
     interval: 30s
     timeout: 10s
     retries: 3
     start_period: 60s
  • Use resource limits
resources:
     limits:
       memory: "16Gi"
       cpu: "4"
       nvidia.com/gpu: 1
     requests:
       memory: "8Gi"
       cpu: "2"
  • Automate backups
# Daily cron
   0 2   * /opt/backup.sh

DON'T ❌

  • Don't expose services publicly
# ❌ Direct internet exposure
   ports:
     - "5432:5432"  # PostgreSQL exposed!

# ✅ Use reverse proxy
# Access via Nginx only
  • Don't skip SSL/TLS
# ❌ HTTP only
   listen 80;

# ✅ HTTPS with redirect
listen 443 ssl http2;
  • Don't use default passwords
# ❌ Weak password
   POSTGRES_PASSWORD=password

# ✅ Strong random password
POSTGRES_PASSWORD=$(openssl rand -base64 32)

Tools & Resources

Infrastructure as Code

Terraform (provision cloud resources):

# main.tf
resource "aws_instance" "ollama_server" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "g4dn.xlarge"  # GPU instance

tags = {
Name = "ollama-production"
}
}

Ansible (configure servers):

# playbook.yml
  • hosts: ollama_servers
tasks: - name: Install Docker apt: name: docker.io state: present - name: Deploy stack community.docker.docker_compose: project_src: /opt/claude-stack state: present

Monitoring Tools

  • Prometheus: Metrics collection
  • Grafana: Dashboards
  • AlertManager: Alert routing
  • Loki: Log aggregation
  • Jaeger: Distributed tracing

Summary

Key Takeaways:

  • Docker Compose for dev/small deployments - Simple, fast setup
  • Kubernetes for production/scale - High availability, auto-scaling
  • Monitor everything - Prometheus + Grafana provide visibility
  • Automate backups - Daily PostgreSQL, weekly models
  • Harden security - Firewalls, SSL, strong passwords
  • Scale horizontally - Multiple Ollama instances with load balancing
  • Test disaster recovery - Practice restores monthly

Self-Hosted Stack Checklist:

  • [ ] Deploy Ollama with GPU support
  • [ ] Set up PostgreSQL with backups
  • [ ] Configure Redis for caching
  • [ ] Deploy Analytics Daemon
  • [ ] Install Prometheus + Grafana
  • [ ] Configure Nginx reverse proxy
  • [ ] Enable SSL/TLS with Let's Encrypt
  • [ ] Set up automated backups (daily)
  • [ ] Configure firewall rules
  • [ ] Test disaster recovery
  • [ ] Document runbooks

Last Updated: 2025-12-24

Author: Jeremy Longshore

Related Playbooks: Ollama Migration Guide, MCP Server Reliability