speak-observability
Set up comprehensive observability for Speak integrations with metrics, traces, and alerts. Use when implementing monitoring for Speak operations, setting up dashboards, or configuring alerting for language learning feature health. Trigger with phrases like "speak monitoring", "speak metrics", "speak observability", "monitor speak", "speak alerts", "speak tracing". allowed-tools: Read, Write, Edit version: 1.0.0 license: MIT author: Jeremy Longshore <jeremy@intentsolutions.io>
Allowed Tools
No tools specified
Provided by Plugin
speak-pack
Claude Code skill pack for Speak AI Language Learning Platform (24 skills)
Installation
This skill is included in the speak-pack plugin:
/plugin install speak-pack@claude-code-plugins-plus
Click to copy
Instructions
# Speak Observability
## Overview
Set up comprehensive observability for Speak language learning integrations.
## Prerequisites
- Prometheus or compatible metrics backend
- OpenTelemetry SDK installed
- Grafana or similar dashboarding tool
- AlertManager configured
## Key Metrics for Language Learning
### Business Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `speak_lessons_started_total` | Counter | Lessons initiated |
| `speak_lessons_completed_total` | Counter | Lessons completed |
| `speak_lessons_abandoned_total` | Counter | Lessons abandoned mid-session |
| `speak_pronunciation_score` | Histogram | Pronunciation scores distribution |
| `speak_active_sessions` | Gauge | Currently active lesson sessions |
| `speak_daily_active_learners` | Gauge | Unique learners today |
### Technical Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `speak_api_requests_total` | Counter | Total API requests |
| `speak_api_duration_seconds` | Histogram | Request latency |
| `speak_api_errors_total` | Counter | Error count by type |
| `speak_speech_recognition_duration_seconds` | Histogram | Audio processing time |
| `speak_rate_limit_remaining` | Gauge | Rate limit headroom |
## Prometheus Metrics Implementation
```typescript
import { Registry, Counter, Histogram, Gauge } from 'prom-client';
const registry = new Registry();
// Business metrics
const lessonsStarted = new Counter({
name: 'speak_lessons_started_total',
help: 'Total lessons started',
labelNames: ['language', 'topic', 'difficulty'],
registers: [registry],
});
const lessonsCompleted = new Counter({
name: 'speak_lessons_completed_total',
help: 'Total lessons completed',
labelNames: ['language', 'topic', 'difficulty'],
registers: [registry],
});
const pronunciationScore = new Histogram({
name: 'speak_pronunciation_score',
help: 'Pronunciation scores distribution',
labelNames: ['language'],
buckets: [40, 50, 60, 70, 80, 85, 90, 95, 100],
registers: [registry],
});
const activeSessions = new Gauge({
name: 'speak_active_sessions',
help: 'Currently active lesson sessions',
labelNames: ['language'],
registers: [registry],
});
// Technical metrics
const apiRequests = new Counter({
name: 'speak_api_requests_total',
help: 'Total Speak API requests',
labelNames: ['method', 'endpoint', 'status'],
registers: [registry],
});
const apiDuration = new Histogram({
name: 'speak_api_duration_seconds',
help: 'Speak API request duration',
labelNames: ['method', 'endpoint'],
buckets: [0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
registers: [registry],
});
const speechRecognitionDuration = new Histogram({
name: 'speak_speech_recognition_duration_seconds',
help: 'Speech recognition processing time',
labelNames: ['language'],
buckets: [0.5, 1, 2, 3, 5, 10],
registers: [registry],
});
```
## Instrumented Speak Client
```typescript
async function instrumentedSpeakRequest(
method: string,
endpoint: string,
operation: () => Promise
): Promise {
const timer = apiDuration.startTimer({ method, endpoint });
try {
const result = await operation();
apiRequests.inc({ method, endpoint, status: 'success' });
return result;
} catch (error: any) {
apiRequests.inc({ method, endpoint, status: 'error' });
throw error;
} finally {
timer();
}
}
// Lesson tracking
async function trackLessonStart(session: LessonSession): Promise {
lessonsStarted.inc({
language: session.language,
topic: session.topic,
difficulty: session.difficulty,
});
activeSessions.inc({ language: session.language });
}
async function trackLessonEnd(
session: LessonSession,
summary: SessionSummary
): Promise {
const labels = {
language: session.language,
topic: session.topic,
difficulty: session.difficulty,
};
if (summary.completed) {
lessonsCompleted.inc(labels);
} else {
lessonsAbandoned.inc({
...labels,
abandonReason: summary.abandonReason || 'unknown',
});
}
activeSessions.dec({ language: session.language });
// Record pronunciation score
if (summary.averagePronunciationScore) {
pronunciationScore.observe(
{ language: session.language },
summary.averagePronunciationScore
);
}
}
```
## Distributed Tracing
### OpenTelemetry Setup
```typescript
import { trace, SpanStatusCode, context, propagation } from '@opentelemetry/api';
const tracer = trace.getTracer('speak-service');
async function tracedSpeakCall(
operationName: string,
attributes: Record,
operation: () => Promise
): Promise {
return tracer.startActiveSpan(`speak.${operationName}`, async (span) => {
span.setAttributes(attributes);
try {
const result = await operation();
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error: any) {
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}
// Trace a complete lesson session
async function tracedLessonSession(
userId: string,
config: LessonConfig
): Promise {
return tracedSpeakCall(
'lesson.session',
{
'user.id': userId,
'lesson.language': config.language,
'lesson.topic': config.topic,
},
async () => {
const session = await speakService.startSession(config);
// Create child spans for each exchange
while (!session.isComplete) {
await tracedSpeakCall(
'lesson.exchange',
{ 'session.id': session.id },
async () => {
const prompt = await session.getPrompt();
const response = await getUserResponse();
return session.submitResponse(response);
}
);
}
return session.getSummary();
}
);
}
```
## Structured Logging
```typescript
import pino from 'pino';
const logger = pino({
name: 'speak-service',
level: process.env.LOG_LEVEL || 'info',
formatters: {
level: (label) => ({ level: label }),
},
});
interface SpeakLogContext {
service: 'speak';
operation: string;
sessionId?: string;
userId?: string;
language?: string;
duration_ms?: number;
pronunciationScore?: number;
error?: any;
}
function logSpeakOperation(
level: 'info' | 'warn' | 'error',
operation: string,
context: Partial
): void {
const logContext: SpeakLogContext = {
service: 'speak',
operation,
...context,
};
logger[level](logContext, `Speak ${operation}`);
}
// Usage
logSpeakOperation('info', 'lesson.completed', {
sessionId: session.id,
userId: session.userId,
language: session.language,
duration_ms: summary.duration,
pronunciationScore: summary.averagePronunciationScore,
});
```
## Alert Configuration
### Prometheus AlertManager Rules
```yaml
# speak_alerts.yaml
groups:
- name: speak_alerts
rules:
# High error rate
- alert: SpeakHighErrorRate
expr: |
rate(speak_api_errors_total[5m]) /
rate(speak_api_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning
service: speak
annotations:
summary: "Speak API error rate > 5%"
description: "Error rate is {{ $value | humanizePercentage }}"
# Speech recognition slow
- alert: SpeakSpeechRecognitionSlow
expr: |
histogram_quantile(0.95,
rate(speak_speech_recognition_duration_seconds_bucket[5m])
) > 5
for: 5m
labels:
severity: warning
service: speak
annotations:
summary: "Speech recognition P95 latency > 5s"
# Low lesson completion rate
- alert: SpeakLowCompletionRate
expr: |
rate(speak_lessons_completed_total[1h]) /
rate(speak_lessons_started_total[1h]) < 0.5
for: 30m
labels:
severity: warning
service: speak
annotations:
summary: "Lesson completion rate < 50%"
# Speak API down
- alert: SpeakAPIDown
expr: up{job="speak"} == 0
for: 1m
labels:
severity: critical
service: speak
annotations:
summary: "Speak API is unreachable"
# Pronunciation scores dropping
- alert: SpeakPronunciationScoresDrop
expr: |
avg(speak_pronunciation_score) < 60
for: 1h
labels:
severity: info
service: speak
annotations:
summary: "Average pronunciation scores below 60%"
```
## Grafana Dashboard
```json
{
"title": "Speak Language Learning",
"panels": [
{
"title": "Active Lesson Sessions",
"type": "stat",
"targets": [{
"expr": "sum(speak_active_sessions)"
}]
},
{
"title": "Lessons Started vs Completed",
"type": "graph",
"targets": [
{ "expr": "rate(speak_lessons_started_total[5m])", "legendFormat": "Started" },
{ "expr": "rate(speak_lessons_completed_total[5m])", "legendFormat": "Completed" }
]
},
{
"title": "Pronunciation Score Distribution",
"type": "heatmap",
"targets": [{
"expr": "sum by (le) (rate(speak_pronunciation_score_bucket[5m]))"
}]
},
{
"title": "API Latency P50/P95/P99",
"type": "graph",
"targets": [
{ "expr": "histogram_quantile(0.5, rate(speak_api_duration_seconds_bucket[5m]))", "legendFormat": "P50" },
{ "expr": "histogram_quantile(0.95, rate(speak_api_duration_seconds_bucket[5m]))", "legendFormat": "P95" },
{ "expr": "histogram_quantile(0.99, rate(speak_api_duration_seconds_bucket[5m]))", "legendFormat": "P99" }
]
},
{
"title": "Speech Recognition Duration",
"type": "graph",
"targets": [{
"expr": "histogram_quantile(0.95, rate(speak_speech_recognition_duration_seconds_bucket[5m]))"
}]
},
{
"title": "Error Rate by Type",
"type": "graph",
"targets": [{
"expr": "sum by (error_type) (rate(speak_api_errors_total[5m]))"
}]
}
]
}
```
## Output
- Business and technical metrics
- Distributed tracing configured
- Structured logging implemented
- Alert rules deployed
- Grafana dashboard ready
## Error Handling
| Issue | Cause | Solution |
|-------|-------|----------|
| Missing metrics | No instrumentation | Wrap client calls |
| Trace gaps | Missing propagation | Check context headers |
| Alert storms | Wrong thresholds | Tune alert rules |
| High cardinality | Too many labels | Reduce label values |
## Examples
### Quick Metrics Endpoint
```typescript
app.get('/metrics', async (req, res) => {
res.set('Content-Type', registry.contentType);
res.send(await registry.metrics());
});
```
## Resources
- [Prometheus Best Practices](https://prometheus.io/docs/practices/naming/)
- [OpenTelemetry Documentation](https://opentelemetry.io/docs/)
- [Speak Observability Guide](https://developer.speak.com/docs/observability)
## Next Steps
For incident response, see `speak-incident-runbook`.