Kindly API Rate Limiting Guide¶
Overview¶
The Kindly API implements intelligent rate limiting with quantum-cached quotas to ensure fair usage and system stability. Our rate limiting system uses advanced algorithms including:
- Sliding window counters with quantum superposition
- Token bucket algorithm with predictive refill
- Adaptive limits based on usage patterns
- Priority queuing for enterprise customers
Rate Limit Headers¶
Every API response includes rate limit information in the headers:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 950
X-RateLimit-Reset: 1704723600
X-RateLimit-Burst-Remaining: 9500
X-RateLimit-Window: 60
Header Definitions¶
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the current window |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
X-RateLimit-Burst-Remaining | Burst capacity remaining |
X-RateLimit-Window | Window duration in seconds |
Default Rate Limits¶
Standard Tiers¶
| Authentication Type | Requests/Minute | Requests/Hour | Burst Capacity |
|---|---|---|---|
| Unauthenticated | 100 | 3,000 | 500 |
| Free Tier | 1,000 | 30,000 | 10,000 |
| Pro Tier | 5,000 | 150,000 | 50,000 |
| Enterprise | Custom | Custom | Custom |
Endpoint-Specific Limits¶
Some endpoints have additional limits:
| Endpoint | Additional Limit | Reason |
|---|---|---|
/api/compression/compress | 100 MB/minute | Resource intensive |
/api/neural/evolve | 10/hour | Computational cost |
/api/quantum/predict | 1000/minute | Quantum resource allocation |
/api/agi/consciousness/* | 100/minute | Consciousness substrate protection |
Rate Limit Response¶
When rate limited, you'll receive a 429 status code:
{
"error": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded. Please retry after 1704723600",
"details": {
"limit": 1000,
"window": "1m",
"retry_after": 1704723600,
"upgrade_url": "https://kindly.com/pricing"
}
}
Handling Rate Limits¶
1. Exponential Backoff¶
Implement exponential backoff with jitter:
import time
import random
import requests
from typing import Optional, Dict, Any
class RateLimitHandler:
def __init__(self, max_retries: int = 5):
self.max_retries = max_retries
self.base_delay = 1 # seconds
self.max_delay = 60 # seconds
def make_request_with_retry(self,
method: str,
url: str,
**kwargs) -> requests.Response:
"""Make request with automatic retry on rate limit."""
for attempt in range(self.max_retries):
try:
response = requests.request(method, url, **kwargs)
# Check rate limit headers
self._check_rate_limit_warning(response.headers)
if response.status_code != 429:
return response
# Rate limited - calculate backoff
retry_after = self._get_retry_after(response)
if retry_after:
wait_time = retry_after
else:
wait_time = self._calculate_backoff(attempt)
print(f"Rate limited. Waiting {wait_time}s before retry...")
time.sleep(wait_time)
except requests.exceptions.RequestException as e:
if attempt == self.max_retries - 1:
raise
wait_time = self._calculate_backoff(attempt)
time.sleep(wait_time)
raise Exception(f"Max retries ({self.max_retries}) exceeded")
def _calculate_backoff(self, attempt: int) -> float:
"""Calculate exponential backoff with jitter."""
delay = min(self.base_delay * (2 ** attempt), self.max_delay)
jitter = random.uniform(0, delay * 0.1) # 10% jitter
return delay + jitter
def _get_retry_after(self, response: requests.Response) -> Optional[int]:
"""Extract retry-after from response."""
retry_after = response.headers.get('Retry-After')
if retry_after:
return int(retry_after)
# Calculate from reset time
reset_time = response.headers.get('X-RateLimit-Reset')
if reset_time:
wait_time = int(reset_time) - int(time.time())
return max(wait_time, 1)
return None
def _check_rate_limit_warning(self, headers: Dict[str, str]) -> None:
"""Warn if approaching rate limit."""
remaining = headers.get('X-RateLimit-Remaining')
limit = headers.get('X-RateLimit-Limit')
if remaining and limit:
usage_percent = (1 - int(remaining) / int(limit)) * 100
if usage_percent > 80:
print(f"Warning: {usage_percent:.1f}% of rate limit used")
# Usage
rate_limit_handler = RateLimitHandler()
response = rate_limit_handler.make_request_with_retry(
'POST',
'https://api.kindly.com/api/compression/compress',
headers={'X-API-Key': 'your_key'},
json={'data': 'base64_data', 'privacy_level': 'private'}
)
2. Request Queuing¶
Queue requests to stay within limits:
class RateLimitedQueue {
constructor(apiClient, requestsPerMinute = 1000) {
this.apiClient = apiClient;
this.requestsPerMinute = requestsPerMinute;
this.interval = 60000 / requestsPerMinute; // ms between requests
this.queue = [];
this.processing = false;
this.lastRequestTime = 0;
}
async enqueue(method, ...args) {
return new Promise((resolve, reject) => {
this.queue.push({ method, args, resolve, reject });
this.processQueue();
});
}
async processQueue() {
if (this.processing || this.queue.length === 0) return;
this.processing = true;
while (this.queue.length > 0) {
const now = Date.now();
const timeSinceLastRequest = now - this.lastRequestTime;
if (timeSinceLastRequest < this.interval) {
await this.sleep(this.interval - timeSinceLastRequest);
}
const { method, args, resolve, reject } = this.queue.shift();
try {
this.lastRequestTime = Date.now();
const result = await this.apiClient[method](...args);
// Check rate limit headers
this.adjustRate(result.headers);
resolve(result);
} catch (error) {
if (error.response?.status === 429) {
// Put back in queue and wait
this.queue.unshift({ method, args, resolve, reject });
await this.handleRateLimit(error.response);
} else {
reject(error);
}
}
}
this.processing = false;
}
adjustRate(headers) {
const remaining = parseInt(headers['x-ratelimit-remaining']);
const reset = parseInt(headers['x-ratelimit-reset']);
if (remaining < 100) {
// Slow down when approaching limit
const timeUntilReset = reset * 1000 - Date.now();
this.interval = timeUntilReset / remaining;
}
}
async handleRateLimit(response) {
const retryAfter = response.headers['retry-after'] || 60;
console.log(`Rate limited. Waiting ${retryAfter} seconds...`);
await this.sleep(retryAfter * 1000);
}
sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// Usage
const queue = new RateLimitedQueue(apiClient, 900); // 900 req/min (safety margin)
// All requests go through the queue
const result1 = await queue.enqueue('compressData', data1);
const result2 = await queue.enqueue('extractContext', text);
const result3 = await queue.enqueue('quantumPredict', state);
3. Adaptive Rate Limiting¶
Dynamically adjust request rate based on server feedback:
package ratelimit
import (
"context"
"sync"
"time"
)
type AdaptiveRateLimiter struct {
mu sync.Mutex
currentRate float64
minRate float64
maxRate float64
tokens float64
lastRefill time.Time
backoffFactor float64
increaseFactory float64
}
func NewAdaptiveRateLimiter(initialRate, minRate, maxRate float64) *AdaptiveRateLimiter {
return &AdaptiveRateLimiter{
currentRate: initialRate,
minRate: minRate,
maxRate: maxRate,
tokens: initialRate,
lastRefill: time.Now(),
backoffFactor: 0.5,
increaseFactory: 1.1,
}
}
func (r *AdaptiveRateLimiter) Wait(ctx context.Context) error {
r.mu.Lock()
defer r.mu.Unlock()
// Refill tokens
now := time.Now()
elapsed := now.Sub(r.lastRefill).Seconds()
r.tokens += elapsed * r.currentRate
if r.tokens > r.currentRate {
r.tokens = r.currentRate
}
r.lastRefill = now
// Check if we have a token
if r.tokens >= 1 {
r.tokens--
return nil
}
// Calculate wait time
waitTime := time.Duration((1 - r.tokens) / r.currentRate * float64(time.Second))
// Wait with context
timer := time.NewTimer(waitTime)
defer timer.Stop()
select {
case <-timer.C:
r.tokens = 0
return nil
case <-ctx.Done():
return ctx.Err()
}
}
func (r *AdaptiveRateLimiter) OnSuccess(headers map[string]string) {
r.mu.Lock()
defer r.mu.Unlock()
// Check remaining quota
remaining := parseFloat(headers["X-RateLimit-Remaining"])
limit := parseFloat(headers["X-RateLimit-Limit"])
if remaining/limit > 0.5 {
// Plenty of quota - increase rate
r.currentRate = min(r.currentRate*r.increaseFactory, r.maxRate)
}
}
func (r *AdaptiveRateLimiter) OnRateLimit() {
r.mu.Lock()
defer r.mu.Unlock()
// Back off
r.currentRate = max(r.currentRate*r.backoffFactor, r.minRate)
r.tokens = 0
}
// Usage
limiter := NewAdaptiveRateLimiter(16.6, 1.0, 83.3) // Start at 1000/min
for {
// Wait for permission
if err := limiter.Wait(ctx); err != nil {
return err
}
// Make request
resp, err := client.MakeRequest()
if err != nil {
if isRateLimitError(err) {
limiter.OnRateLimit()
continue
}
return err
}
// Adjust rate based on response
limiter.OnSuccess(resp.Headers)
}
4. Circuit Breaker Pattern¶
Implement circuit breaker to prevent overwhelming the API:
class CircuitBreaker
STATES = [:closed, :open, :half_open].freeze
def initialize(failure_threshold: 5, recovery_timeout: 60, half_open_requests: 3)
@failure_threshold = failure_threshold
@recovery_timeout = recovery_timeout
@half_open_requests = half_open_requests
@state = :closed
@failure_count = 0
@last_failure_time = nil
@half_open_count = 0
@mutex = Mutex.new
end
def call(&block)
@mutex.synchronize do
case @state
when :open
if Time.now - @last_failure_time > @recovery_timeout
transition_to(:half_open)
else
raise CircuitOpenError, "Circuit breaker is open"
end
when :half_open
if @half_open_count >= @half_open_requests
# Completed test requests successfully
transition_to(:closed)
end
end
end
begin
result = block.call
on_success
result
rescue => e
on_failure(e)
raise
end
end
private
def on_success
@mutex.synchronize do
case @state
when :half_open
@half_open_count += 1
if @half_open_count >= @half_open_requests
transition_to(:closed)
end
when :closed
@failure_count = 0
end
end
end
def on_failure(error)
@mutex.synchronize do
@failure_count += 1
@last_failure_time = Time.now
if rate_limit_error?(error)
# Immediate open on rate limit
transition_to(:open)
elsif @failure_count >= @failure_threshold
transition_to(:open)
end
end
end
def transition_to(new_state)
puts "Circuit breaker: #{@state} -> #{new_state}"
@state = new_state
case new_state
when :closed
@failure_count = 0
@half_open_count = 0
when :half_open
@half_open_count = 0
end
end
def rate_limit_error?(error)
error.is_a?(HTTPError) && error.response.code == 429
end
end
# Usage with API client
circuit_breaker = CircuitBreaker.new(
failure_threshold: 3,
recovery_timeout: 120
)
def make_api_call_with_circuit_breaker(client, method, *args)
circuit_breaker.call do
client.send(method, *args)
end
rescue CircuitOpenError => e
# Fallback behavior
puts "API unavailable: #{e.message}"
return cached_response_for(method, *args)
end
Burst Handling¶
The API supports burst capacity for handling traffic spikes:
class BurstCapacityManager:
def __init__(self, regular_limit=1000, burst_limit=10000):
self.regular_limit = regular_limit
self.burst_limit = burst_limit
self.burst_tokens = burst_limit
self.last_refill = time.time()
def check_burst_available(self, headers):
"""Check if burst capacity is available."""
burst_remaining = int(headers.get('X-RateLimit-Burst-Remaining', 0))
return burst_remaining > 0
def use_burst_wisely(self, requests_needed):
"""Determine if burst should be used."""
# Use burst for:
# 1. Time-sensitive operations
# 2. Batch operations that benefit from speed
# 3. Recovery from previous failures
if requests_needed > self.regular_limit:
return True # Must use burst
# Save burst for emergencies if below 20%
burst_percent = self.burst_tokens / self.burst_limit
if burst_percent < 0.2:
return False
return True
def batch_with_burst(self, items, process_func):
"""Process items in batches using burst capacity."""
results = []
errors = []
# Calculate optimal batch size
batch_size = min(100, self.burst_tokens // 10)
for i in range(0, len(items), batch_size):
batch = items[i:i + batch_size]
try:
# Process batch in parallel
batch_results = parallel_process(batch, process_func)
results.extend(batch_results)
# Update burst tokens (approximate)
self.burst_tokens -= len(batch)
except RateLimitError as e:
# Fall back to regular processing
for item in batch:
try:
result = process_func(item)
results.append(result)
time.sleep(0.06) # 1000/min = 0.06s per request
except Exception as err:
errors.append((item, err))
return results, errors
Predictive Rate Limiting¶
The API uses quantum prediction to anticipate rate limit issues:
class PredictiveRateLimiter {
constructor(apiClient) {
this.apiClient = apiClient;
this.usageHistory = [];
this.predictions = null;
}
async predictUsage() {
// Get historical usage pattern
const pattern = this.getUsagePattern();
// Use quantum prediction for future usage
const quantumState = {
vector: this.normalizePattern(pattern),
metadata: {
time_of_day: new Date().getHours(),
day_of_week: new Date().getDay()
}
};
try {
const predictions = await this.apiClient.quantumPredict(
quantumState,
60 // Predict next 60 minutes
);
this.predictions = predictions;
return this.analyizePredictions(predictions);
} catch (error) {
// Fallback to classical prediction
return this.classicalPredict(pattern);
}
}
analyizePredictions(predictions) {
const peaks = predictions.predictions.filter(p =>
p.state.usage_rate > 0.8 && p.confidence > 0.7
);
return {
willHitLimit: peaks.length > 0,
peakTime: peaks[0]?.timestamp,
recommendedRate: this.calculateSafeRate(predictions),
confidence: predictions.quantum_fidelity
};
}
async adaptiveThrottle() {
const prediction = await this.predictUsage();
if (prediction.willHitLimit) {
// Preemptively slow down
const timeUntilPeak = new Date(prediction.peakTime) - new Date();
const currentRate = this.getCurrentRate();
// Gradually reduce rate
const reductionFactor = Math.max(0.5, 1 - (prediction.confidence * 0.3));
return currentRate * reductionFactor;
}
return this.getCurrentRate();
}
}
// Usage
const predictor = new PredictiveRateLimiter(apiClient);
// Before making requests
const safeRate = await predictor.adaptiveThrottle();
rateLimiter.setRate(safeRate);
Enterprise Rate Limiting¶
Enterprise customers get enhanced rate limiting features:
1. Dedicated Rate Limit Pools¶
class EnterpriseRateLimitPool:
def __init__(self, pool_config):
self.pools = {
'critical': RateLimitPool(10000, 'requests/minute'),
'normal': RateLimitPool(50000, 'requests/minute'),
'bulk': RateLimitPool(100000, 'requests/hour'),
'quantum': RateLimitPool(5000, 'quantum_ops/minute')
}
def get_pool(self, operation_type):
"""Route requests to appropriate pool."""
if operation_type in ['health_check', 'critical_predict']:
return self.pools['critical']
elif operation_type in ['bulk_compress', 'batch_analysis']:
return self.pools['bulk']
elif operation_type.startswith('quantum_'):
return self.pools['quantum']
else:
return self.pools['normal']
2. Rate Limit Reservation¶
// Reserve rate limit capacity in advance
type RateLimitReservation struct {
ReservationID string
Capacity int
ValidUntil time.Time
PoolType string
}
func (c *EnterpriseClient) ReserveCapacity(ctx context.Context,
operations int, duration time.Duration) (*RateLimitReservation, error) {
resp, err := c.makeRequest(ctx, "POST", "/api/enterprise/reserve-capacity", map[string]interface{}{
"operations": operations,
"duration": duration.String(),
"pool_type": "normal",
})
if err != nil {
return nil, err
}
return &RateLimitReservation{
ReservationID: resp.ReservationID,
Capacity: operations,
ValidUntil: time.Now().Add(duration),
PoolType: "normal",
}, nil
}
// Use reservation
func (c *EnterpriseClient) ExecuteWithReservation(ctx context.Context,
reservation *RateLimitReservation, operations []Operation) error {
for _, op := range operations {
// Include reservation ID in headers
headers := map[string]string{
"X-RateLimit-Reservation": reservation.ReservationID,
}
if err := c.executeOperation(ctx, op, headers); err != nil {
return err
}
}
return nil
}
3. Priority Queue Access¶
# Enterprise requests get priority during high load
class PriorityQueueClient
PRIORITY_LEVELS = {
critical: 0,
high: 1,
normal: 2,
low: 3,
bulk: 4
}.freeze
def make_priority_request(method, url, priority: :normal, **options)
headers = options[:headers] || {}
headers['X-Priority'] = PRIORITY_LEVELS[priority].to_s
# Critical requests bypass queue
if priority == :critical
headers['X-Bypass-Queue'] = 'true'
end
response = http_client.request(method, url, **options.merge(headers: headers))
# Check if request was queued
if response.headers['X-Queue-Position']
puts "Request queued at position: #{response.headers['X-Queue-Position']}"
end
response
end
end
Monitoring Rate Limit Usage¶
1. Real-time Monitoring¶
import asyncio
from datetime import datetime, timedelta
class RateLimitMonitor:
def __init__(self, alert_threshold=0.8):
self.alert_threshold = alert_threshold
self.usage_data = []
self.alerts_sent = set()
async def monitor_continuously(self, check_interval=60):
"""Monitor rate limit usage continuously."""
while True:
try:
usage = await self.get_current_usage()
self.usage_data.append({
'timestamp': datetime.utcnow(),
'usage': usage
})
# Check thresholds
await self.check_alerts(usage)
# Maintain sliding window
cutoff = datetime.utcnow() - timedelta(hours=1)
self.usage_data = [d for d in self.usage_data
if d['timestamp'] > cutoff]
except Exception as e:
print(f"Monitor error: {e}")
await asyncio.sleep(check_interval)
async def check_alerts(self, usage):
"""Send alerts for high usage."""
for pool_name, pool_usage in usage.items():
if pool_usage['percentage'] > self.alert_threshold:
alert_key = f"{pool_name}_{datetime.utcnow().hour}"
if alert_key not in self.alerts_sent:
await self.send_alert(pool_name, pool_usage)
self.alerts_sent.add(alert_key)
def get_usage_trends(self):
"""Analyze usage trends."""
if len(self.usage_data) < 2:
return None
# Calculate trend
recent = self.usage_data[-10:]
older = self.usage_data[-20:-10]
recent_avg = sum(d['usage']['total']['percentage']
for d in recent) / len(recent)
older_avg = sum(d['usage']['total']['percentage']
for d in older) / len(older) if older else recent_avg
trend = 'increasing' if recent_avg > older_avg else 'decreasing'
rate = abs(recent_avg - older_avg) / older_avg if older_avg else 0
return {
'trend': trend,
'rate': rate,
'current_usage': recent_avg,
'prediction': self.predict_limit_hit()
}
2. Usage Analytics¶
class RateLimitAnalytics {
constructor() {
this.metrics = {
requestCounts: new Map(),
rateLimitHits: new Map(),
responseTime: new Map(),
burstUsage: new Map()
};
}
recordRequest(endpoint, headers, responseTime) {
const hour = new Date().getHours();
const key = `${endpoint}_${hour}`;
// Record counts
this.incrementCounter(this.metrics.requestCounts, key);
// Record rate limit status
const remaining = parseInt(headers['x-ratelimit-remaining']);
const limit = parseInt(headers['x-ratelimit-limit']);
if (remaining / limit < 0.1) {
this.incrementCounter(this.metrics.rateLimitHits, key);
}
// Record response time
this.recordAverage(this.metrics.responseTime, key, responseTime);
// Record burst usage
const burstRemaining = parseInt(headers['x-ratelimit-burst-remaining']);
if (burstRemaining !== undefined) {
this.recordValue(this.metrics.burstUsage, key, burstRemaining);
}
}
generateReport() {
const report = {
summary: this.generateSummary(),
hotspots: this.identifyHotspots(),
recommendations: this.generateRecommendations(),
visualizations: this.generateCharts()
};
return report;
}
identifyHotspots() {
const hotspots = [];
for (const [key, count] of this.metrics.requestCounts) {
const [endpoint, hour] = key.split('_');
const hitRate = (this.metrics.rateLimitHits.get(key) || 0) / count;
if (hitRate > 0.1) {
hotspots.push({
endpoint,
hour: parseInt(hour),
requests: count,
rateLimitHitRate: hitRate,
avgResponseTime: this.metrics.responseTime.get(key)
});
}
}
return hotspots.sort((a, b) => b.rateLimitHitRate - a.rateLimitHitRate);
}
generateRecommendations() {
const recommendations = [];
const hotspots = this.identifyHotspots();
if (hotspots.length > 0) {
recommendations.push({
type: 'OPTIMIZE_PEAK_HOURS',
message: `High rate limit pressure during hours: ${
hotspots.map(h => h.hour).join(', ')
}`,
action: 'Consider spreading requests or upgrading plan'
});
}
// Check burst usage
const avgBurstRemaining = this.calculateAverageBurst();
if (avgBurstRemaining < 1000) {
recommendations.push({
type: 'LOW_BURST_CAPACITY',
message: 'Burst capacity frequently depleted',
action: 'Implement request queuing or increase burst limit'
});
}
return recommendations;
}
}
Best Practices¶
1. Implement Graceful Degradation¶
class GracefulDegradation:
def __init__(self, api_client):
self.api_client = api_client
self.cache = TTLCache(maxsize=1000, ttl=300) # 5 min cache
self.degradation_level = 0
async def get_data_with_degradation(self, key, fetch_func, *args):
"""Get data with automatic degradation on rate limits."""
# Try cache first if degraded
if self.degradation_level > 0:
cached = self.cache.get(key)
if cached:
return cached
try:
# Attempt fresh fetch
result = await fetch_func(*args)
self.cache[key] = result
self.degradation_level = max(0, self.degradation_level - 1)
return result
except RateLimitError:
self.degradation_level = min(3, self.degradation_level + 1)
# Level 1: Use cache if available
if self.degradation_level == 1:
cached = self.cache.get(key)
if cached:
return cached
# Level 2: Use older cache or approximate
elif self.degradation_level == 2:
return self.get_approximate_result(key)
# Level 3: Return minimal data
else:
return self.get_minimal_result(key)
2. Optimize Request Patterns¶
// Batch operations to reduce request count
func BatchOptimizer(items []Item, batchSize int) [][]Item {
batches := make([][]Item, 0)
for i := 0; i < len(items); i += batchSize {
end := i + batchSize
if end > len(items) {
end = len(items)
}
batches = append(batches, items[i:end])
}
return batches
}
// Combine multiple operations
type CombinedRequest struct {
Compress []CompressRequest `json:"compress,omitempty"`
Context []ContextRequest `json:"context,omitempty"`
Predict []PredictRequest `json:"predict,omitempty"`
}
func (c *Client) ExecuteCombined(ctx context.Context, req CombinedRequest) (*CombinedResponse, error) {
// Single API call for multiple operations
return c.post(ctx, "/api/combined", req)
}
3. Rate Limit Observability¶
# Detailed rate limit tracking
class RateLimitObserver
include Prometheus::Client
def initialize
@registry = Prometheus::Client.registry
# Define metrics
@request_counter = Counter.new(
:api_requests_total,
docstring: 'Total API requests',
labels: [:endpoint, :status]
)
@rate_limit_gauge = Gauge.new(
:api_rate_limit_remaining,
docstring: 'Remaining rate limit',
labels: [:endpoint]
)
@rate_limit_histogram = Histogram.new(
:api_rate_limit_usage_ratio,
docstring: 'Rate limit usage ratio',
labels: [:endpoint],
buckets: [0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99]
)
[@request_counter, @rate_limit_gauge, @rate_limit_histogram].each do |metric|
@registry.register(metric)
end
end
def observe_response(endpoint, response)
# Count request
@request_counter.increment(
labels: { endpoint: endpoint, status: response.code }
)
# Track rate limits
if response.headers['x-ratelimit-remaining']
remaining = response.headers['x-ratelimit-remaining'].to_i
limit = response.headers['x-ratelimit-limit'].to_i
@rate_limit_gauge.set(remaining, labels: { endpoint: endpoint })
usage_ratio = 1.0 - (remaining.to_f / limit)
@rate_limit_histogram.observe(usage_ratio, labels: { endpoint: endpoint })
end
end
def export_metrics
@registry.metrics
end
end
Troubleshooting¶
Common Issues¶
- Sudden rate limit errors
- Check for burst usage depletion
- Verify no parallel processes are running
-
Look for retry loops causing amplification
-
Inconsistent rate limits
- Ensure consistent authentication method
- Check for multiple API keys being used
-
Verify time synchronization
-
Lower than expected limits
- Confirm account tier
- Check for endpoint-specific limits
- Verify no account restrictions
Debug Mode¶
Enable debug mode to see detailed rate limit information:
Debug response:
X-Debug-Pool: normal
X-Debug-Account-Tier: pro
X-Debug-Effective-Limit: 5000
X-Debug-Window-Start: 1704723540
X-Debug-Window-End: 1704723600
X-Debug-Request-Cost: 1
X-Debug-Burst-Available: true
Contact Support¶
For rate limit issues: - Status page: https://status.kindly.com - Support: support@kindly.com - Enterprise: enterprise@kindly.com
Enterprise customers can request: - Custom rate limits - Dedicated infrastructure - Priority queue access - Real-time limit adjustments