mistral-cost-tuning

Optimize Mistral AI costs through model selection, token management, and usage monitoring. Use when analyzing Mistral billing, reducing API costs, or implementing usage monitoring and budget alerts. Trigger with phrases like "mistral cost", "mistral billing", "reduce mistral costs", "mistral pricing", "mistral expensive", "mistral budget". allowed-tools: Read, Grep version: 1.0.0 license: MIT author: Jeremy Longshore <jeremy@intentsolutions.io>

v1.0.0

Jeremy Longshore

MIT

Allowed Tools

No tools specified

Provided by Plugin

mistral-pack

Claude Code skill pack for Mistral AI (24 skills)

saas packs v1.0.0

View Plugin

Installation

This skill is included in the mistral-pack plugin:

/plugin install mistral-pack@claude-code-plugins-plus

Click to copy

Instructions

# Mistral AI Cost Tuning ## Overview Optimize Mistral AI costs through smart model selection, token management, and usage monitoring. ## Prerequisites - Access to Mistral AI console - Understanding of current usage patterns - Database for usage tracking (optional) - Alerting system configured (optional) ## Pricing Overview (as of 2024) | Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For | |-------|----------------------|------------------------|----------| | mistral-small-latest | $0.20 | $0.60 | Fast, simple tasks | | mistral-large-latest | $2.00 | $6.00 | Complex reasoning | | mistral-embed | $0.10 | - | Embeddings | **Note:** Prices subject to change. Check console.mistral.ai for current pricing. ## Instructions ### Step 1: Cost Estimation Calculator ```typescript interface TokenUsage { inputTokens: number; outputTokens: number; } interface CostEstimate { model: string; inputCost: number; outputCost: number; totalCost: number; currency: string; } const PRICING = { 'mistral-small-latest': { input: 0.20, output: 0.60 }, 'mistral-large-latest': { input: 2.00, output: 6.00 }, 'mistral-embed': { input: 0.10, output: 0 }, } as const; function estimateCost(model: keyof typeof PRICING, usage: TokenUsage): CostEstimate { const prices = PRICING[model]; const inputCost = (usage.inputTokens / 1_000_000) * prices.input; const outputCost = (usage.outputTokens / 1_000_000) * prices.output; return { model, inputCost, outputCost, totalCost: inputCost + outputCost, currency: 'USD', }; } // Usage const cost = estimateCost('mistral-small-latest', { inputTokens: 500_000, outputTokens: 200_000, }); console.log(`Estimated cost: $${cost.totalCost.toFixed(4)}`); // Estimated cost: $0.2200 ``` ### Step 2: Model Selection by Task ```typescript type TaskType = 'simple' | 'moderate' | 'complex' | 'embedding'; interface ModelRecommendation { model: string; reason: string; estimatedCostPer1000Requests: number; } function recommendModel( taskType: TaskType, avgInputTokens: number, avgOutputTokens: number ): ModelRecommendation { switch (taskType) { case 'simple': // Classification, extraction, simple Q&A return { model: 'mistral-small-latest', reason: 'Fast and cost-effective for simple tasks', estimatedCostPer1000Requests: (avgInputTokens * 0.20 + avgOutputTokens * 0.60) / 1000, }; case 'moderate': // Summarization, translation, basic coding return { model: 'mistral-small-latest', reason: 'Good balance of capability and cost', estimatedCostPer1000Requests: (avgInputTokens * 0.20 + avgOutputTokens * 0.60) / 1000, }; case 'complex': // Complex reasoning, code generation, analysis return { model: 'mistral-large-latest', reason: 'Required for complex tasks', estimatedCostPer1000Requests: (avgInputTokens * 2.00 + avgOutputTokens * 6.00) / 1000, }; case 'embedding': return { model: 'mistral-embed', reason: 'Specialized for embeddings', estimatedCostPer1000Requests: (avgInputTokens * 0.10) / 1000, }; } } // Usage const rec = recommendModel('simple', 500, 200); console.log(`Recommended: ${rec.model} - $${rec.estimatedCostPer1000Requests.toFixed(4)}/1000 req`); ``` ### Step 3: Token Budget Management ```typescript class TokenBudgetManager { private dailyBudget: number; private monthlyBudget: number; private dailyUsage: Map = new Map(); private monthlyUsage = 0; constructor(dailyBudget: number, monthlyBudget: number) { this.dailyBudget = dailyBudget; this.monthlyBudget = monthlyBudget; } recordUsage(model: string, tokens: number): void { const today = new Date().toISOString().split('T')[0]; const key = `${today}:${model}`; const current = this.dailyUsage.get(key) || 0; this.dailyUsage.set(key, current + tokens); this.monthlyUsage += tokens; this.checkBudgetAlerts(); } canMakeRequest(model: string, estimatedTokens: number): boolean { const today = new Date().toISOString().split('T')[0]; const key = `${today}:${model}`; const todayUsage = this.dailyUsage.get(key) || 0; return ( todayUsage + estimatedTokens <= this.dailyBudget && this.monthlyUsage + estimatedTokens <= this.monthlyBudget ); } private checkBudgetAlerts(): void { if (this.monthlyUsage > this.monthlyBudget * 0.8) { console.warn(`Budget alert: ${((this.monthlyUsage / this.monthlyBudget) * 100).toFixed(1)}% of monthly budget used`); } } getUsageReport(): { daily: Record; monthly: number } { return { daily: Object.fromEntries(this.dailyUsage), monthly: this.monthlyUsage, }; } } ``` ### Step 4: Prompt Optimization ```typescript // Optimize prompts to reduce token usage function optimizePrompt(prompt: string): string { return prompt .replace(/\s+/g, ' ') // Remove extra whitespace .replace(/\n\s*\n/g, '\n') // Remove blank lines .trim(); } // Use system prompts efficiently const EFFICIENT_SYSTEM_PROMPT = ` You are a helpful assistant. Be concise. Answer in 1-2 sentences when possible. `.trim(); // Compare token counts function countTokensEstimate(text: string): number { // Rough estimate: 1 token ≈ 4 characters return Math.ceil(text.length / 4); } // Example: Reduce prompt size const originalPrompt = ` I would like you to help me with the following task. Please provide a comprehensive and detailed explanation of how to implement a REST API in Node.js. `; const optimizedPrompt = `Explain implementing a REST API in Node.js. Be concise.`; console.log(`Original: ~${countTokensEstimate(originalPrompt)} tokens`); console.log(`Optimized: ~${countTokensEstimate(optimizedPrompt)} tokens`); // Original: ~47 tokens // Optimized: ~13 tokens (72% reduction) ``` ### Step 5: Caching for Cost Reduction ```typescript import crypto from 'crypto'; import { LRUCache } from 'lru-cache'; const responseCache = new LRUCache({ max: 10000, ttl: 24 * 60 * 60 * 1000, // 24 hours }); interface CachedResult { response: string; cached: boolean; cost: number; savedCost: number; } async function costAwareChat( client: Mistral, messages: any[], model: string ): Promise { const cacheKey = crypto .createHash('sha256') .update(JSON.stringify({ messages, model })) .digest('hex'); const cached = responseCache.get(cacheKey); if (cached) { return { response: cached.response, cached: true, cost: 0, savedCost: cached.cost, }; } const response = await client.chat.complete({ model, messages }); const content = response.choices?.[0]?.message?.content ?? ''; const cost = estimateCost(model as any, { inputTokens: response.usage?.promptTokens || 0, outputTokens: response.usage?.completionTokens || 0, }).totalCost; responseCache.set(cacheKey, { response: content, cost }); return { response: content, cached: false, cost, savedCost: 0, }; } ``` ### Step 6: Usage Dashboard Query ```sql -- Track usage in your database CREATE TABLE mistral_usage ( id SERIAL PRIMARY KEY, model VARCHAR(50), input_tokens INTEGER, output_tokens INTEGER, cost_usd DECIMAL(10, 6), user_id VARCHAR(50), created_at TIMESTAMP DEFAULT NOW() ); -- Daily cost report SELECT DATE(created_at) as date, model, SUM(input_tokens) as total_input, SUM(output_tokens) as total_output, SUM(cost_usd) as total_cost FROM mistral_usage WHERE created_at >= NOW() - INTERVAL '30 days' GROUP BY 1, 2 ORDER BY 1 DESC, 3 DESC; -- User cost breakdown SELECT user_id, SUM(cost_usd) as total_cost, COUNT(*) as request_count FROM mistral_usage WHERE created_at >= DATE_TRUNC('month', NOW()) GROUP BY 1 ORDER BY 2 DESC LIMIT 10; ``` ## Output - Optimized model selection - Token budget management - Usage monitoring implemented - Cost reduction strategies applied ## Cost Reduction Strategies | Strategy | Savings | Effort | |----------|---------|--------| | Model selection | 50-90% | Low | | Prompt optimization | 20-50% | Low | | Response caching | 30-80% | Medium | | Batch processing | 10-30% | Medium | | Max tokens limit | 10-40% | Low | ## Error Handling | Issue | Cause | Solution | |-------|-------|----------| | Unexpected costs | Untracked usage | Implement monitoring | | Budget exceeded | No alerts | Set up budget alerts | | Inefficient model | Wrong selection | Use task-based selection | | Long responses | No limit | Set maxTokens | ## Examples ### Quick Cost Check ```typescript // Estimate monthly cost const monthlyRequests = 100_000; const avgInputTokens = 500; const avgOutputTokens = 200; const smallCost = estimateCost('mistral-small-latest', { inputTokens: avgInputTokens * monthlyRequests, outputTokens: avgOutputTokens * monthlyRequests, }); const largeCost = estimateCost('mistral-large-latest', { inputTokens: avgInputTokens * monthlyRequests, outputTokens: avgOutputTokens * monthlyRequests, }); console.log(`Small model: $${smallCost.totalCost.toFixed(2)}/month`); console.log(`Large model: $${largeCost.totalCost.toFixed(2)}/month`); // Small model: $22.00/month // Large model: $220.00/month ``` ## Resources - [Mistral AI Pricing](https://mistral.ai/pricing/) - [Mistral AI Console](https://console.mistral.ai/) ## Next Steps For architecture patterns, see `mistral-reference-architecture`.