lokalise-incident-runbook
Execute Lokalise incident response procedures with triage, mitigation, and postmortem. Use when responding to Lokalise-related outages, investigating errors, or running post-incident reviews for Lokalise integration failures. Trigger with phrases like "lokalise incident", "lokalise outage", "lokalise down", "lokalise on-call", "lokalise emergency", "translations broken". allowed-tools: Read, Grep, Bash(curl:*), Bash(lokalise2:*) version: 1.0.0 license: MIT author: Jeremy Longshore <jeremy@intentsolutions.io>
Allowed Tools
No tools specified
Provided by Plugin
lokalise-pack
Claude Code skill pack for Lokalise (24 skills)
Installation
This skill is included in the lokalise-pack plugin:
/plugin install lokalise-pack@claude-code-plugins-plus
Click to copy
Instructions
# Lokalise Incident Runbook
## Overview
Rapid incident response procedures for Lokalise-related outages and issues.
## Prerequisites
- Access to Lokalise dashboard and status page
- Application logs and monitoring dashboards
- Communication channels (Slack, PagerDuty)
- Rollback procedures documented
## Severity Levels
| Level | Definition | Response Time | Examples |
|-------|------------|---------------|----------|
| P1 | Complete outage | < 15 min | Lokalise API unreachable, all translations missing |
| P2 | Degraded service | < 1 hour | High latency, partial failures, key languages affected |
| P3 | Minor impact | < 4 hours | Webhook delays, non-critical translations missing |
| P4 | No user impact | Next business day | Monitoring gaps, sync delays |
## Quick Triage Commands
```bash
# 1. Check Lokalise API status
curl -s https://api.lokalise.com/api2/system/health | jq
# 2. Check Lokalise status page
curl -s https://status.lokalise.com/api/v2/status.json | jq '.status.description'
# 3. Test API authentication
curl -s -o /dev/null -w "%{http_code}" \
-H "X-Api-Token: $LOKALISE_API_TOKEN" \
"https://api.lokalise.com/api2/projects?limit=1"
# 4. Check your app's translation health
curl -s https://your-app.com/health/lokalise | jq
# 5. Check recent error logs (last 5 min)
grep -i "lokalise" /var/log/app/*.log | tail -50
```
## Decision Tree
```
Translations missing/broken?
โโ YES: Is https://status.lokalise.com showing incident?
โ โโ YES โ Enable fallback translations. Monitor status page.
โ โโ NO โ Check our integration. Continue triage below.
โโ NO: Is sync failing?
โโ YES โ Check CI/CD pipeline, API token validity
โโ NO โ Likely localized issue. Check specific component.
API returning errors?
โโ 401 โ Token expired/invalid. Rotate token.
โโ 403 โ Permission denied. Check project access.
โโ 404 โ Project/key not found. Verify IDs.
โโ 429 โ Rate limited. Enable backoff/queuing.
โโ 5xx โ Lokalise issue. Enable fallback, monitor status.
```
## Immediate Actions by Error Type
### 401/403 - Authentication/Authorization
```bash
# Verify token is set
echo "Token set: ${LOKALISE_API_TOKEN:+YES}"
# Test token validity
curl -s -H "X-Api-Token: $LOKALISE_API_TOKEN" \
"https://api.lokalise.com/api2/projects?limit=1" | jq '.projects[0].name // .error'
# If token is invalid:
# 1. Generate new token in Lokalise Profile Settings
# 2. Update in secret manager
# 3. Restart affected services
```
### 429 - Rate Limited
```bash
# Check current rate limit status
curl -s -I -H "X-Api-Token: $LOKALISE_API_TOKEN" \
"https://api.lokalise.com/api2/projects" | grep -i "x-ratelimit"
# Immediate mitigation:
# 1. Reduce concurrent requests
# 2. Add delays between API calls
# 3. Enable request queuing
# Application config
export LOKALISE_RATE_LIMIT_MODE=queue
export LOKALISE_REQUEST_DELAY_MS=200
```
### 5xx - Lokalise Server Errors
```bash
# Check Lokalise status
curl -s https://status.lokalise.com/api/v2/status.json | jq
# Enable graceful degradation
export LOKALISE_FALLBACK_ENABLED=true
# Use bundled translations as fallback
# Your app should fall back to static locale files
# Monitor for resolution
watch -n 60 'curl -s https://status.lokalise.com/api/v2/status.json | jq ".status.description"'
```
### Missing Translations
```bash
# Check if translations exist in Lokalise
lokalise2 --token "$LOKALISE_API_TOKEN" \
--project-id "$LOKALISE_PROJECT_ID" \
key list --filter-untranslated-any 1 | head -20
# Check local translation files
ls -la src/locales/
cat src/locales/en.json | jq 'keys | length'
# If files missing, re-download
npm run i18n:pull
```
## Communication Templates
### Internal (Slack)
```
:red_circle: P1 INCIDENT: Lokalise Integration
Status: INVESTIGATING
Impact: [Translations not loading for all users]
Current action: [Checking API status and enabling fallback]
Next update: [Time + 15 min]
Incident commander: @[name]
```
### External (Status Page)
```
Translation Service Issue
We're experiencing issues with our translation service.
Some users may see English text instead of their preferred language.
We're actively investigating and will provide updates.
Last updated: [timestamp]
```
## Post-Incident
### Evidence Collection
```bash
# Generate debug bundle
bash scripts/lokalise-debug-bundle.sh
# Export relevant logs
grep -i "lokalise" /var/log/app/*.log > incident-logs.txt
# Export metrics snapshot
curl "http://prometheus:9090/api/v1/query_range?query=lokalise_errors_total&start=-2h" > metrics.json
# Screenshot dashboards
# Document timeline in incident channel
```
### Postmortem Template
```markdown
## Incident: Lokalise [Error Type]
**Date:** YYYY-MM-DD
**Duration:** X hours Y minutes
**Severity:** P[1-4]
**Incident Commander:** [Name]
### Summary
[1-2 sentence description of what happened]
### Timeline (all times UTC)
- HH:MM - [Alert triggered / Issue reported]
- HH:MM - [Investigation started]
- HH:MM - [Root cause identified]
- HH:MM - [Mitigation applied]
- HH:MM - [Service restored]
### Root Cause
[Technical explanation of what went wrong]
### Impact
- Users affected: [N]
- Duration of user impact: [X minutes]
- Translations affected: [Languages/keys]
### What Went Well
- [Positive aspects of response]
### What Could Be Improved
- [Areas for improvement]
### Action Items
- [ ] [Preventive measure] - Owner - Due date
- [ ] [Documentation update] - Owner - Due date
- [ ] [Monitoring improvement] - Owner - Due date
```
## Rollback Procedures
### Revert to Bundled Translations
```bash
# If translations are corrupted, revert to last known good
git checkout HEAD~1 -- src/locales/
# Rebuild and deploy
npm run build
npm run deploy
# Disable auto-sync temporarily
export LOKALISE_AUTO_SYNC=false
```
### Re-download from Lokalise
```bash
# Download reviewed translations only
lokalise2 --token "$LOKALISE_API_TOKEN" \
--project-id "$LOKALISE_PROJECT_ID" \
file download \
--format json \
--filter-data reviewed \
--unzip-to ./src/locales
```
## Output
- Issue identified and categorized
- Remediation applied
- Stakeholders notified
- Evidence collected for postmortem
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Can't reach status page | Network issue | Use mobile or VPN |
| Logs unavailable | Logging down | Check logging service |
| Fallback not working | Not configured | Enable static fallback |
| Token rotation fails | Permission denied | Escalate to admin |
## Examples
### One-Line Health Check
```bash
curl -sf https://your-app.com/health/lokalise | jq '.status' || echo "UNHEALTHY"
```
### Quick Fallback Enable
```typescript
// Emergency fallback - use in incident
process.env.LOKALISE_FALLBACK_ENABLED = "true";
// Your translation loader should check this
const useFallback = process.env.LOKALISE_FALLBACK_ENABLED === "true";
if (useFallback) {
return loadBundledTranslations(locale);
}
```
## Resources
- [Lokalise Status Page](https://status.lokalise.com)
- [Lokalise Support](mailto:support@lokalise.com)
- [Community Forum](https://community.lokalise.com)
## Next Steps
For data handling, see `lokalise-data-handling`.