Performance Overview
Mesrai is built for speed and scale. Our infrastructure handles thousands of PRs daily while maintaining sub-5ms response times.
Performance Metrics
Response Times
- TTFB (Time to First Byte): < 5ms (edge cached)
- PR Review Time: 5-30 seconds (based on depth)
- Webhook Processing: < 100ms
- API Response: < 50ms (p95)
Scalability
- Concurrent Reviews: 1000+ simultaneous reviews
- Daily PR Volume: 10,000+ PRs processed
- Uptime: 99.9% SLA
- Global Coverage: Edge nodes in 200+ cities
Architecture
Edge Computing
We use Cloudflare Workers for edge computing:
// Edge worker handles routing
export default {
async fetch(request: Request) {
// Route to nearest region
const region = getOptimalRegion(request)
return await handleRequest(request, region)
}
}Caching Strategy
Multi-Layer Caching
- Edge Cache: Static assets and API responses (CDN)
- Redis Cache: Review results and user data
- Database: Persistent storage with read replicas
Cache Invalidation
# Smart cache invalidation
cache:
ttl:
reviews: 3600 # 1 hour
api_responses: 300 # 5 minutes
user_data: 1800 # 30 minutes
invalidation:
- on_new_commit
- on_pr_close
- on_manual_triggerDatabase Optimization
Query Performance
- Read Replicas: 5 regional read replicas
- Connection Pooling: PgBouncer for connection management
- Indexing Strategy: Optimized indexes on hot paths
Example: Optimized Query
-- Optimized query with proper indexing
SELECT r.*, p.title, u.name
FROM reviews r
INNER JOIN pull_requests p ON r.pr_id = p.id
INNER JOIN users u ON p.author_id = u.id
WHERE r.repo_id = $1
AND r.created_at > NOW() - INTERVAL '24 hours'
ORDER BY r.created_at DESC
LIMIT 50;
-- Index: idx_reviews_repo_created (repo_id, created_at DESC)Token Optimization
Context Selection
Mesrai uses intelligent context selection to minimize token usage:
- Relevance Scoring: Only include relevant code
- Deduplication: Remove duplicate context
- Compression: Smart compression of large files
Cost Efficiency
// Token usage optimization
const context = await optimizeContext({
diff: prDiff,
maxTokens: 8000,
relevanceThreshold: 0.7,
includeTests: false
})
// Result: 30-50% token reductionWorker Architecture
Job Queue
- Bull Queue: Redis-based job queue
- Priority Levels: Critical, high, normal, low
- Retry Logic: Exponential backoff
Parallel Processing
// Process multiple files in parallel
const reviews = await Promise.all(
changedFiles.map(file =>
reviewFile(file, {
parallel: true,
timeout: 30000
})
)
)Monitoring
Real-Time Metrics
We track key performance indicators:
- Request Duration: p50, p95, p99
- Error Rates: 5xx, 4xx by endpoint
- Queue Depth: Job queue size
- Token Usage: Per review, per user
Alerting
alerts:
high_latency:
threshold: 100ms
duration: 5m
action: page_oncall
error_rate:
threshold: 1%
duration: 2m
action: slack_alertBest Practices
For Faster Reviews
- Use Shallow Reviews for quick iterations
- Enable Caching for repeated patterns
- Configure File Filters to skip irrelevant files
- Batch Similar PRs for efficiency
For Cost Optimization
- Set Token Limits per review
- Use Cheaper Models for simple checks
- Enable Smart Caching for common patterns
- Configure Review Depth appropriately
Next: Learn about scaling strategies →