langchain-deploy-integration

Deploy LangChain integrations to production environments. Use when deploying to cloud platforms, configuring containers, or setting up production infrastructure for LangChain apps. Trigger with phrases like "deploy langchain", "langchain production deploy", "langchain cloud run", "langchain docker", "langchain kubernetes". allowed-tools: Read, Write, Edit, Bash(docker:*), Bash(gcloud:*) version: 1.0.0 license: MIT author: Jeremy Longshore <jeremy@intentsolutions.io>

v1.0.0

Jeremy Longshore

MIT

Allowed Tools

No tools specified

Provided by Plugin

langchain-pack

Claude Code skill pack for LangChain (24 skills)

saas packs v1.0.0

View Plugin

Installation

This skill is included in the langchain-pack plugin:

/plugin install langchain-pack@claude-code-plugins-plus

Click to copy

Instructions

# LangChain Deploy Integration ## Overview Deploy LangChain applications to production using containers and cloud platforms with best practices for scaling and reliability. ## Prerequisites - LangChain application ready for production - Docker installed - Cloud provider account (GCP, AWS, or Azure) - API keys stored in secrets manager ## Instructions ### Step 1: Create Dockerfile ```dockerfile # Dockerfile FROM python:3.11-slim as builder WORKDIR /app # Install build dependencies RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential \ && rm -rf /var/lib/apt/lists/* # Install Python dependencies COPY requirements.txt . RUN pip install --no-cache-dir --user -r requirements.txt # Production stage FROM python:3.11-slim WORKDIR /app # Copy installed packages from builder COPY --from=builder /root/.local /root/.local ENV PATH=/root/.local/bin:$PATH # Copy application code COPY src/ ./src/ COPY main.py . # Create non-root user RUN useradd --create-home appuser USER appuser # Health check HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ CMD python -c "import requests; requests.get('http://localhost:8080/health')" EXPOSE 8080 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"] ``` ### Step 2: Create FastAPI Application ```python # main.py from fastapi import FastAPI, HTTPException from pydantic import BaseModel from contextlib import asynccontextmanager import os from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser # Initialize LLM on startup llm = None chain = None @asynccontextmanager async def lifespan(app: FastAPI): global llm, chain # Startup llm = ChatOpenAI( model=os.environ.get("MODEL_NAME", "gpt-4o-mini"), max_retries=3 ) prompt = ChatPromptTemplate.from_template("{input}") chain = prompt | llm | StrOutputParser() yield # Shutdown pass app = FastAPI(lifespan=lifespan) class ChatRequest(BaseModel): input: str max_tokens: int = 1000 class ChatResponse(BaseModel): response: str @app.get("/health") async def health(): return {"status": "healthy", "model": os.environ.get("MODEL_NAME")} @app.post("/chat", response_model=ChatResponse) async def chat(request: ChatRequest): try: response = await chain.ainvoke({"input": request.input}) return ChatResponse(response=response) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) ``` ### Step 3: Deploy to Google Cloud Run ```bash # Build and push container gcloud builds submit --tag gcr.io/PROJECT_ID/langchain-api # Deploy to Cloud Run gcloud run deploy langchain-api \ --image gcr.io/PROJECT_ID/langchain-api \ --platform managed \ --region us-central1 \ --allow-unauthenticated \ --set-secrets=OPENAI_API_KEY=openai-api-key:latest \ --memory 1Gi \ --cpu 2 \ --min-instances 1 \ --max-instances 10 \ --concurrency 80 ``` ### Step 4: Kubernetes Deployment ```yaml # k8s/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: langchain-api spec: replicas: 3 selector: matchLabels: app: langchain-api template: metadata: labels: app: langchain-api spec: containers: - name: langchain-api image: gcr.io/PROJECT_ID/langchain-api:latest ports: - containerPort: 8080 env: - name: OPENAI_API_KEY valueFrom: secretKeyRef: name: langchain-secrets key: openai-api-key - name: MODEL_NAME value: "gpt-4o-mini" resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "1Gi" cpu: "1000m" readinessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 15 periodSeconds: 20 --- apiVersion: v1 kind: Service metadata: name: langchain-api spec: selector: app: langchain-api ports: - port: 80 targetPort: 8080 type: LoadBalancer ``` ### Step 5: Configure Autoscaling ```yaml # k8s/hpa.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: langchain-api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: langchain-api minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 ``` ## Output - Production-ready Dockerfile with multi-stage build - FastAPI application with health checks - Cloud Run deployment configuration - Kubernetes manifests with autoscaling ## Examples ### Local Testing ```bash # Build locally docker build -t langchain-api . # Run with env file docker run -p 8080:8080 --env-file .env langchain-api # Test endpoint curl -X POST http://localhost:8080/chat \ -H "Content-Type: application/json" \ -d '{"input": "Hello!"}' ``` ### AWS Deployment (ECS) ```bash # Create ECR repository aws ecr create-repository --repository-name langchain-api # Push image docker tag langchain-api:latest ACCOUNT.dkr.ecr.REGION.amazonaws.com/langchain-api:latest docker push ACCOUNT.dkr.ecr.REGION.amazonaws.com/langchain-api:latest # Deploy with Copilot copilot deploy ``` ## Error Handling | Error | Cause | Solution | |-------|-------|----------| | Container Crash | Missing env vars | Check secrets injection | | Cold Start Timeout | LLM init slow | Use min-instances > 0 | | Memory OOM | Large context | Increase memory limits | | Connection Refused | Port mismatch | Verify EXPOSE and --port match | ## Resources - [Cloud Run Documentation](https://cloud.google.com/run/docs) - [Kubernetes Best Practices](https://kubernetes.io/docs/concepts/configuration/overview/) - [Docker Multi-stage Builds](https://docs.docker.com/build/building/multi-stage/) ## Next Steps Configure `langchain-observability` for production monitoring.