8.8 KiB
Monitoring Guide
🌍 Language / 语言
Learn how to monitor your AI Proxy Worker deployment, track performance, and troubleshoot issues using Cloudflare's built-in monitoring tools.
📊 Cloudflare Dashboard Monitoring
Worker Analytics
Access real-time metrics in your Cloudflare dashboard:
- Navigate to Workers & Pages
- Select your AI Proxy Worker
- View Analytics tab
Key Metrics to Monitor
Request Metrics
- Requests per second - Traffic volume
- Success rate - Percentage of successful requests
- Error rate - Failed requests requiring attention
- Response time - Average latency
Resource Usage
- CPU usage - Worker execution time
- Memory usage - Memory consumption per request
- Duration - Request processing time
Error Analysis
- 4xx errors - Client-side issues (authentication, validation)
- 5xx errors - Server-side problems (upstream API issues)
- Timeout errors - Requests exceeding time limits
🔍 Log Analysis
Accessing Logs
# View real-time logs
wrangler tail
# Filter by specific log level
wrangler tail --format json | jq 'select(.level == "error")'
# Save logs to file
wrangler tail --format json > worker-logs.json
Log Levels and Meanings
INFO Level
console.log('Request received:', {
method: request.method,
url: request.url,
timestamp: new Date().toISOString()
});
WARN Level
console.warn('Rate limit approaching:', {
clientId: 'user123',
requestCount: 95,
limit: 100
});
ERROR Level
console.error('API request failed:', {
error: error.message,
statusCode: response.status,
timestamp: new Date().toISOString()
});
📈 Performance Monitoring
Response Time Tracking
Monitor these key performance indicators:
// Custom timing logs
const start = Date.now();
const response = await fetch(upstreamAPI);
const duration = Date.now() - start;
console.log('Upstream API timing:', {
duration: duration,
endpoint: 'deepseek-api',
status: response.status
});
Recommended Response Time Targets
- Chat requests: < 2 seconds
- Streaming responses: First token < 1 second
- Health checks: < 500ms
Performance Optimization Monitoring
Track these metrics to identify optimization opportunities:
- Cold start frequency - Worker initialization time
- Memory usage patterns - Identify memory leaks
- CPU utilization - Optimize heavy computations
- Network latency - Upstream API response times
🚨 Alert Configuration
Cloudflare Alerts
Set up alerts for critical issues:
Error Rate Alert
Alert Type: Worker Error Rate
Threshold: > 5% error rate
Time Period: 5 minutes
Notification: Email, Webhook
Response Time Alert
Alert Type: Worker Response Time
Threshold: > 3 seconds average
Time Period: 5 minutes
Notification: Email, Slack
Request Volume Alert
Alert Type: Request Volume
Threshold: > 1000 requests/minute
Time Period: 1 minute
Notification: Email
Custom Alert Implementation
// In your worker code
const ALERT_THRESHOLDS = {
ERROR_RATE: 0.05, // 5%
RESPONSE_TIME: 3000, // 3 seconds
REQUEST_RATE: 1000 // 1000/minute
};
async function checkAlerts(metrics) {
if (metrics.errorRate > ALERT_THRESHOLDS.ERROR_RATE) {
await sendAlert('High error rate detected', metrics);
}
}
📋 Health Checks
Endpoint Monitoring
Create a health check endpoint:
// Add to your worker
if (url.pathname === '/health') {
const healthStatus = await checkSystemHealth();
return new Response(JSON.stringify(healthStatus), {
headers: { 'Content-Type': 'application/json' }
});
}
async function checkSystemHealth() {
return {
status: 'healthy',
timestamp: new Date().toISOString(),
version: '1.0.0',
upstreamAPIs: {
deepseek: await checkDeepSeekAPI()
}
};
}
External Monitoring Services
Integrate with external monitoring tools:
Uptime Robot
# Monitor endpoint
https://your-worker.workers.dev/health
# Check every 5 minutes
# Alert on 3 consecutive failures
Pingdom
# HTTP check configuration
URL: https://your-worker.workers.dev/health
Interval: 1 minute
Timeout: 30 seconds
🔧 Debugging Tools
Debug Mode
Enable detailed logging for troubleshooting:
const DEBUG = env.DEBUG_MODE === 'true';
if (DEBUG) {
console.log('Debug: Request details:', {
headers: Object.fromEntries(request.headers),
body: await request.clone().text(),
timestamp: new Date().toISOString()
});
}
Request Tracing
Track requests through the system:
function generateTraceId() {
return Math.random().toString(36).substring(2, 15);
}
const traceId = generateTraceId();
console.log('Request started:', { traceId, url: request.url });
// Pass traceId through all function calls
const result = await processRequest(request, { traceId });
console.log('Request completed:', { traceId, duration });
📊 Custom Metrics
Business Metrics
Track application-specific metrics:
// Track model usage
const modelUsage = {
'deepseek-chat': 0,
'deepseek-reasoner': 0
};
// Track user activity
const userMetrics = {
activeUsers: new Set(),
totalRequests: 0,
streamingRequests: 0
};
// Log metrics periodically
setInterval(() => {
console.log('Business metrics:', {
modelUsage,
userMetrics: {
activeUsers: userMetrics.activeUsers.size,
totalRequests: userMetrics.totalRequests,
streamingRequests: userMetrics.streamingRequests
}
});
}, 60000); // Every minute
Cost Tracking
Monitor usage costs:
// Track request costs
const costTracking = {
totalRequests: 0,
cpuTime: 0,
bandwidthUsed: 0
};
// Calculate estimated costs
function calculateCosts(metrics) {
const workerCost = metrics.totalRequests * 0.0000005; // $0.50 per million
const cpuCost = metrics.cpuTime * 0.000002; // $2 per million CPU seconds
return {
workerCost,
cpuCost,
totalCost: workerCost + cpuCost
};
}
🔍 Log Analysis Best Practices
Structured Logging
Use consistent log formats:
function logEvent(level, event, data) {
const logEntry = {
level,
event,
timestamp: new Date().toISOString(),
workerId: env.WORKER_ID || 'unknown',
...data
};
console[level](JSON.stringify(logEntry));
}
// Usage
logEvent('info', 'request_received', { method, url });
logEvent('error', 'api_error', { error: err.message, statusCode });
Log Retention
Understand Cloudflare's log retention:
- Real-time logs: Available during development
- Analytics data: Retained for 30 days
- Custom logging: Use external services for long-term storage
External Log Aggregation
Send logs to external services:
async function sendToLogService(logData) {
if (env.LOG_SERVICE_URL) {
await fetch(env.LOG_SERVICE_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(logData)
});
}
}
📱 Monitoring Dashboard
Creating Custom Dashboards
Use tools like Grafana or Datadog:
// Send metrics to external service
async function sendMetrics(metrics) {
if (env.METRICS_ENDPOINT) {
await fetch(env.METRICS_ENDPOINT, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${env.METRICS_API_KEY}`
},
body: JSON.stringify({
service: 'ai-proxy-worker',
timestamp: Date.now(),
metrics
})
});
}
}
Key Dashboard Widgets
- Request Volume - Line chart showing requests over time
- Error Rate - Percentage gauge with threshold alerts
- Response Time - Histogram showing latency distribution
- Model Usage - Pie chart showing model usage breakdown
- Geographic Distribution - Map showing request origins
🚨 Incident Response
Incident Detection
Automated monitoring should detect:
- High error rates (>5%)
- Slow response times (>3s average)
- Service unavailability
- Unusual traffic patterns
Response Procedures
- Immediate: Check Cloudflare status page
- Investigate: Review recent deployments and logs
- Mitigate: Roll back if necessary
- Communicate: Update status page and notify users
- Resolve: Fix root cause
- Post-mortem: Document lessons learned
Emergency Contacts
Maintain an escalation list:
- Primary: On-call engineer
- Secondary: Team lead
- Escalation: Infrastructure team
Effective monitoring ensures reliable service 📊
Regular monitoring helps you maintain high availability and quickly resolve issues.