Skip to content

Kindly API Rate Limiting Guide

Overview

The Kindly API implements intelligent rate limiting with quantum-cached quotas to ensure fair usage and system stability. Our rate limiting system uses advanced algorithms including:

  • Sliding window counters with quantum superposition
  • Token bucket algorithm with predictive refill
  • Adaptive limits based on usage patterns
  • Priority queuing for enterprise customers

Rate Limit Headers

Every API response includes rate limit information in the headers:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 950
X-RateLimit-Reset: 1704723600
X-RateLimit-Burst-Remaining: 9500
X-RateLimit-Window: 60

Header Definitions

Header Description
X-RateLimit-Limit Maximum requests allowed in the current window
X-RateLimit-Remaining Requests remaining in the current window
X-RateLimit-Reset Unix timestamp when the window resets
X-RateLimit-Burst-Remaining Burst capacity remaining
X-RateLimit-Window Window duration in seconds

Default Rate Limits

Standard Tiers

Authentication Type Requests/Minute Requests/Hour Burst Capacity
Unauthenticated 100 3,000 500
Free Tier 1,000 30,000 10,000
Pro Tier 5,000 150,000 50,000
Enterprise Custom Custom Custom

Endpoint-Specific Limits

Some endpoints have additional limits:

Endpoint Additional Limit Reason
/api/compression/compress 100 MB/minute Resource intensive
/api/neural/evolve 10/hour Computational cost
/api/quantum/predict 1000/minute Quantum resource allocation
/api/agi/consciousness/* 100/minute Consciousness substrate protection

Rate Limit Response

When rate limited, you'll receive a 429 status code:

{
  "error": "RATE_LIMIT_EXCEEDED",
  "message": "Rate limit exceeded. Please retry after 1704723600",
  "details": {
    "limit": 1000,
    "window": "1m",
    "retry_after": 1704723600,
    "upgrade_url": "https://kindly.com/pricing"
  }
}

Handling Rate Limits

1. Exponential Backoff

Implement exponential backoff with jitter:

import time
import random
import requests
from typing import Optional, Dict, Any

class RateLimitHandler:
    def __init__(self, max_retries: int = 5):
        self.max_retries = max_retries
        self.base_delay = 1  # seconds
        self.max_delay = 60  # seconds

    def make_request_with_retry(self, 
                               method: str, 
                               url: str, 
                               **kwargs) -> requests.Response:
        """Make request with automatic retry on rate limit."""

        for attempt in range(self.max_retries):
            try:
                response = requests.request(method, url, **kwargs)

                # Check rate limit headers
                self._check_rate_limit_warning(response.headers)

                if response.status_code != 429:
                    return response

                # Rate limited - calculate backoff
                retry_after = self._get_retry_after(response)
                if retry_after:
                    wait_time = retry_after
                else:
                    wait_time = self._calculate_backoff(attempt)

                print(f"Rate limited. Waiting {wait_time}s before retry...")
                time.sleep(wait_time)

            except requests.exceptions.RequestException as e:
                if attempt == self.max_retries - 1:
                    raise
                wait_time = self._calculate_backoff(attempt)
                time.sleep(wait_time)

        raise Exception(f"Max retries ({self.max_retries}) exceeded")

    def _calculate_backoff(self, attempt: int) -> float:
        """Calculate exponential backoff with jitter."""
        delay = min(self.base_delay * (2 ** attempt), self.max_delay)
        jitter = random.uniform(0, delay * 0.1)  # 10% jitter
        return delay + jitter

    def _get_retry_after(self, response: requests.Response) -> Optional[int]:
        """Extract retry-after from response."""
        retry_after = response.headers.get('Retry-After')
        if retry_after:
            return int(retry_after)

        # Calculate from reset time
        reset_time = response.headers.get('X-RateLimit-Reset')
        if reset_time:
            wait_time = int(reset_time) - int(time.time())
            return max(wait_time, 1)

        return None

    def _check_rate_limit_warning(self, headers: Dict[str, str]) -> None:
        """Warn if approaching rate limit."""
        remaining = headers.get('X-RateLimit-Remaining')
        limit = headers.get('X-RateLimit-Limit')

        if remaining and limit:
            usage_percent = (1 - int(remaining) / int(limit)) * 100
            if usage_percent > 80:
                print(f"Warning: {usage_percent:.1f}% of rate limit used")

# Usage
rate_limit_handler = RateLimitHandler()
response = rate_limit_handler.make_request_with_retry(
    'POST',
    'https://api.kindly.com/api/compression/compress',
    headers={'X-API-Key': 'your_key'},
    json={'data': 'base64_data', 'privacy_level': 'private'}
)

2. Request Queuing

Queue requests to stay within limits:

class RateLimitedQueue {
  constructor(apiClient, requestsPerMinute = 1000) {
    this.apiClient = apiClient;
    this.requestsPerMinute = requestsPerMinute;
    this.interval = 60000 / requestsPerMinute; // ms between requests
    this.queue = [];
    this.processing = false;
    this.lastRequestTime = 0;
  }

  async enqueue(method, ...args) {
    return new Promise((resolve, reject) => {
      this.queue.push({ method, args, resolve, reject });
      this.processQueue();
    });
  }

  async processQueue() {
    if (this.processing || this.queue.length === 0) return;

    this.processing = true;

    while (this.queue.length > 0) {
      const now = Date.now();
      const timeSinceLastRequest = now - this.lastRequestTime;

      if (timeSinceLastRequest < this.interval) {
        await this.sleep(this.interval - timeSinceLastRequest);
      }

      const { method, args, resolve, reject } = this.queue.shift();

      try {
        this.lastRequestTime = Date.now();
        const result = await this.apiClient[method](...args);

        // Check rate limit headers
        this.adjustRate(result.headers);

        resolve(result);
      } catch (error) {
        if (error.response?.status === 429) {
          // Put back in queue and wait
          this.queue.unshift({ method, args, resolve, reject });
          await this.handleRateLimit(error.response);
        } else {
          reject(error);
        }
      }
    }

    this.processing = false;
  }

  adjustRate(headers) {
    const remaining = parseInt(headers['x-ratelimit-remaining']);
    const reset = parseInt(headers['x-ratelimit-reset']);

    if (remaining < 100) {
      // Slow down when approaching limit
      const timeUntilReset = reset * 1000 - Date.now();
      this.interval = timeUntilReset / remaining;
    }
  }

  async handleRateLimit(response) {
    const retryAfter = response.headers['retry-after'] || 60;
    console.log(`Rate limited. Waiting ${retryAfter} seconds...`);
    await this.sleep(retryAfter * 1000);
  }

  sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// Usage
const queue = new RateLimitedQueue(apiClient, 900); // 900 req/min (safety margin)

// All requests go through the queue
const result1 = await queue.enqueue('compressData', data1);
const result2 = await queue.enqueue('extractContext', text);
const result3 = await queue.enqueue('quantumPredict', state);

3. Adaptive Rate Limiting

Dynamically adjust request rate based on server feedback:

package ratelimit

import (
    "context"
    "sync"
    "time"
)

type AdaptiveRateLimiter struct {
    mu              sync.Mutex
    currentRate     float64
    minRate         float64
    maxRate         float64
    tokens          float64
    lastRefill      time.Time
    backoffFactor   float64
    increaseFactory float64
}

func NewAdaptiveRateLimiter(initialRate, minRate, maxRate float64) *AdaptiveRateLimiter {
    return &AdaptiveRateLimiter{
        currentRate:     initialRate,
        minRate:         minRate,
        maxRate:         maxRate,
        tokens:          initialRate,
        lastRefill:      time.Now(),
        backoffFactor:   0.5,
        increaseFactory: 1.1,
    }
}

func (r *AdaptiveRateLimiter) Wait(ctx context.Context) error {
    r.mu.Lock()
    defer r.mu.Unlock()

    // Refill tokens
    now := time.Now()
    elapsed := now.Sub(r.lastRefill).Seconds()
    r.tokens += elapsed * r.currentRate
    if r.tokens > r.currentRate {
        r.tokens = r.currentRate
    }
    r.lastRefill = now

    // Check if we have a token
    if r.tokens >= 1 {
        r.tokens--
        return nil
    }

    // Calculate wait time
    waitTime := time.Duration((1 - r.tokens) / r.currentRate * float64(time.Second))

    // Wait with context
    timer := time.NewTimer(waitTime)
    defer timer.Stop()

    select {
    case <-timer.C:
        r.tokens = 0
        return nil
    case <-ctx.Done():
        return ctx.Err()
    }
}

func (r *AdaptiveRateLimiter) OnSuccess(headers map[string]string) {
    r.mu.Lock()
    defer r.mu.Unlock()

    // Check remaining quota
    remaining := parseFloat(headers["X-RateLimit-Remaining"])
    limit := parseFloat(headers["X-RateLimit-Limit"])

    if remaining/limit > 0.5 {
        // Plenty of quota - increase rate
        r.currentRate = min(r.currentRate*r.increaseFactory, r.maxRate)
    }
}

func (r *AdaptiveRateLimiter) OnRateLimit() {
    r.mu.Lock()
    defer r.mu.Unlock()

    // Back off
    r.currentRate = max(r.currentRate*r.backoffFactor, r.minRate)
    r.tokens = 0
}

// Usage
limiter := NewAdaptiveRateLimiter(16.6, 1.0, 83.3) // Start at 1000/min

for {
    // Wait for permission
    if err := limiter.Wait(ctx); err != nil {
        return err
    }

    // Make request
    resp, err := client.MakeRequest()
    if err != nil {
        if isRateLimitError(err) {
            limiter.OnRateLimit()
            continue
        }
        return err
    }

    // Adjust rate based on response
    limiter.OnSuccess(resp.Headers)
}

4. Circuit Breaker Pattern

Implement circuit breaker to prevent overwhelming the API:

class CircuitBreaker
  STATES = [:closed, :open, :half_open].freeze

  def initialize(failure_threshold: 5, recovery_timeout: 60, half_open_requests: 3)
    @failure_threshold = failure_threshold
    @recovery_timeout = recovery_timeout
    @half_open_requests = half_open_requests

    @state = :closed
    @failure_count = 0
    @last_failure_time = nil
    @half_open_count = 0
    @mutex = Mutex.new
  end

  def call(&block)
    @mutex.synchronize do
      case @state
      when :open
        if Time.now - @last_failure_time > @recovery_timeout
          transition_to(:half_open)
        else
          raise CircuitOpenError, "Circuit breaker is open"
        end
      when :half_open
        if @half_open_count >= @half_open_requests
          # Completed test requests successfully
          transition_to(:closed)
        end
      end
    end

    begin
      result = block.call
      on_success
      result
    rescue => e
      on_failure(e)
      raise
    end
  end

  private

  def on_success
    @mutex.synchronize do
      case @state
      when :half_open
        @half_open_count += 1
        if @half_open_count >= @half_open_requests
          transition_to(:closed)
        end
      when :closed
        @failure_count = 0
      end
    end
  end

  def on_failure(error)
    @mutex.synchronize do
      @failure_count += 1
      @last_failure_time = Time.now

      if rate_limit_error?(error)
        # Immediate open on rate limit
        transition_to(:open)
      elsif @failure_count >= @failure_threshold
        transition_to(:open)
      end
    end
  end

  def transition_to(new_state)
    puts "Circuit breaker: #{@state} -> #{new_state}"
    @state = new_state

    case new_state
    when :closed
      @failure_count = 0
      @half_open_count = 0
    when :half_open
      @half_open_count = 0
    end
  end

  def rate_limit_error?(error)
    error.is_a?(HTTPError) && error.response.code == 429
  end
end

# Usage with API client
circuit_breaker = CircuitBreaker.new(
  failure_threshold: 3,
  recovery_timeout: 120
)

def make_api_call_with_circuit_breaker(client, method, *args)
  circuit_breaker.call do
    client.send(method, *args)
  end
rescue CircuitOpenError => e
  # Fallback behavior
  puts "API unavailable: #{e.message}"
  return cached_response_for(method, *args)
end

Burst Handling

The API supports burst capacity for handling traffic spikes:

class BurstCapacityManager:
    def __init__(self, regular_limit=1000, burst_limit=10000):
        self.regular_limit = regular_limit
        self.burst_limit = burst_limit
        self.burst_tokens = burst_limit
        self.last_refill = time.time()

    def check_burst_available(self, headers):
        """Check if burst capacity is available."""
        burst_remaining = int(headers.get('X-RateLimit-Burst-Remaining', 0))
        return burst_remaining > 0

    def use_burst_wisely(self, requests_needed):
        """Determine if burst should be used."""
        # Use burst for:
        # 1. Time-sensitive operations
        # 2. Batch operations that benefit from speed
        # 3. Recovery from previous failures

        if requests_needed > self.regular_limit:
            return True  # Must use burst

        # Save burst for emergencies if below 20%
        burst_percent = self.burst_tokens / self.burst_limit
        if burst_percent < 0.2:
            return False

        return True

    def batch_with_burst(self, items, process_func):
        """Process items in batches using burst capacity."""
        results = []
        errors = []

        # Calculate optimal batch size
        batch_size = min(100, self.burst_tokens // 10)

        for i in range(0, len(items), batch_size):
            batch = items[i:i + batch_size]

            try:
                # Process batch in parallel
                batch_results = parallel_process(batch, process_func)
                results.extend(batch_results)

                # Update burst tokens (approximate)
                self.burst_tokens -= len(batch)

            except RateLimitError as e:
                # Fall back to regular processing
                for item in batch:
                    try:
                        result = process_func(item)
                        results.append(result)
                        time.sleep(0.06)  # 1000/min = 0.06s per request
                    except Exception as err:
                        errors.append((item, err))

        return results, errors

Predictive Rate Limiting

The API uses quantum prediction to anticipate rate limit issues:

class PredictiveRateLimiter {
  constructor(apiClient) {
    this.apiClient = apiClient;
    this.usageHistory = [];
    this.predictions = null;
  }

  async predictUsage() {
    // Get historical usage pattern
    const pattern = this.getUsagePattern();

    // Use quantum prediction for future usage
    const quantumState = {
      vector: this.normalizePattern(pattern),
      metadata: {
        time_of_day: new Date().getHours(),
        day_of_week: new Date().getDay()
      }
    };

    try {
      const predictions = await this.apiClient.quantumPredict(
        quantumState,
        60 // Predict next 60 minutes
      );

      this.predictions = predictions;
      return this.analyizePredictions(predictions);

    } catch (error) {
      // Fallback to classical prediction
      return this.classicalPredict(pattern);
    }
  }

  analyizePredictions(predictions) {
    const peaks = predictions.predictions.filter(p => 
      p.state.usage_rate > 0.8 && p.confidence > 0.7
    );

    return {
      willHitLimit: peaks.length > 0,
      peakTime: peaks[0]?.timestamp,
      recommendedRate: this.calculateSafeRate(predictions),
      confidence: predictions.quantum_fidelity
    };
  }

  async adaptiveThrottle() {
    const prediction = await this.predictUsage();

    if (prediction.willHitLimit) {
      // Preemptively slow down
      const timeUntilPeak = new Date(prediction.peakTime) - new Date();
      const currentRate = this.getCurrentRate();

      // Gradually reduce rate
      const reductionFactor = Math.max(0.5, 1 - (prediction.confidence * 0.3));
      return currentRate * reductionFactor;
    }

    return this.getCurrentRate();
  }
}

// Usage
const predictor = new PredictiveRateLimiter(apiClient);

// Before making requests
const safeRate = await predictor.adaptiveThrottle();
rateLimiter.setRate(safeRate);

Enterprise Rate Limiting

Enterprise customers get enhanced rate limiting features:

1. Dedicated Rate Limit Pools

class EnterpriseRateLimitPool:
    def __init__(self, pool_config):
        self.pools = {
            'critical': RateLimitPool(10000, 'requests/minute'),
            'normal': RateLimitPool(50000, 'requests/minute'),
            'bulk': RateLimitPool(100000, 'requests/hour'),
            'quantum': RateLimitPool(5000, 'quantum_ops/minute')
        }

    def get_pool(self, operation_type):
        """Route requests to appropriate pool."""
        if operation_type in ['health_check', 'critical_predict']:
            return self.pools['critical']
        elif operation_type in ['bulk_compress', 'batch_analysis']:
            return self.pools['bulk']
        elif operation_type.startswith('quantum_'):
            return self.pools['quantum']
        else:
            return self.pools['normal']

2. Rate Limit Reservation

// Reserve rate limit capacity in advance
type RateLimitReservation struct {
    ReservationID string
    Capacity      int
    ValidUntil    time.Time
    PoolType      string
}

func (c *EnterpriseClient) ReserveCapacity(ctx context.Context, 
    operations int, duration time.Duration) (*RateLimitReservation, error) {

    resp, err := c.makeRequest(ctx, "POST", "/api/enterprise/reserve-capacity", map[string]interface{}{
        "operations": operations,
        "duration":   duration.String(),
        "pool_type":  "normal",
    })

    if err != nil {
        return nil, err
    }

    return &RateLimitReservation{
        ReservationID: resp.ReservationID,
        Capacity:      operations,
        ValidUntil:    time.Now().Add(duration),
        PoolType:      "normal",
    }, nil
}

// Use reservation
func (c *EnterpriseClient) ExecuteWithReservation(ctx context.Context, 
    reservation *RateLimitReservation, operations []Operation) error {

    for _, op := range operations {
        // Include reservation ID in headers
        headers := map[string]string{
            "X-RateLimit-Reservation": reservation.ReservationID,
        }

        if err := c.executeOperation(ctx, op, headers); err != nil {
            return err
        }
    }

    return nil
}

3. Priority Queue Access

# Enterprise requests get priority during high load
class PriorityQueueClient
  PRIORITY_LEVELS = {
    critical: 0,
    high: 1,
    normal: 2,
    low: 3,
    bulk: 4
  }.freeze

  def make_priority_request(method, url, priority: :normal, **options)
    headers = options[:headers] || {}
    headers['X-Priority'] = PRIORITY_LEVELS[priority].to_s

    # Critical requests bypass queue
    if priority == :critical
      headers['X-Bypass-Queue'] = 'true'
    end

    response = http_client.request(method, url, **options.merge(headers: headers))

    # Check if request was queued
    if response.headers['X-Queue-Position']
      puts "Request queued at position: #{response.headers['X-Queue-Position']}"
    end

    response
  end
end

Monitoring Rate Limit Usage

1. Real-time Monitoring

import asyncio
from datetime import datetime, timedelta

class RateLimitMonitor:
    def __init__(self, alert_threshold=0.8):
        self.alert_threshold = alert_threshold
        self.usage_data = []
        self.alerts_sent = set()

    async def monitor_continuously(self, check_interval=60):
        """Monitor rate limit usage continuously."""
        while True:
            try:
                usage = await self.get_current_usage()
                self.usage_data.append({
                    'timestamp': datetime.utcnow(),
                    'usage': usage
                })

                # Check thresholds
                await self.check_alerts(usage)

                # Maintain sliding window
                cutoff = datetime.utcnow() - timedelta(hours=1)
                self.usage_data = [d for d in self.usage_data 
                                  if d['timestamp'] > cutoff]

            except Exception as e:
                print(f"Monitor error: {e}")

            await asyncio.sleep(check_interval)

    async def check_alerts(self, usage):
        """Send alerts for high usage."""
        for pool_name, pool_usage in usage.items():
            if pool_usage['percentage'] > self.alert_threshold:
                alert_key = f"{pool_name}_{datetime.utcnow().hour}"

                if alert_key not in self.alerts_sent:
                    await self.send_alert(pool_name, pool_usage)
                    self.alerts_sent.add(alert_key)

    def get_usage_trends(self):
        """Analyze usage trends."""
        if len(self.usage_data) < 2:
            return None

        # Calculate trend
        recent = self.usage_data[-10:]
        older = self.usage_data[-20:-10]

        recent_avg = sum(d['usage']['total']['percentage'] 
                        for d in recent) / len(recent)
        older_avg = sum(d['usage']['total']['percentage'] 
                       for d in older) / len(older) if older else recent_avg

        trend = 'increasing' if recent_avg > older_avg else 'decreasing'
        rate = abs(recent_avg - older_avg) / older_avg if older_avg else 0

        return {
            'trend': trend,
            'rate': rate,
            'current_usage': recent_avg,
            'prediction': self.predict_limit_hit()
        }

2. Usage Analytics

class RateLimitAnalytics {
  constructor() {
    this.metrics = {
      requestCounts: new Map(),
      rateLimitHits: new Map(),
      responseTime: new Map(),
      burstUsage: new Map()
    };
  }

  recordRequest(endpoint, headers, responseTime) {
    const hour = new Date().getHours();
    const key = `${endpoint}_${hour}`;

    // Record counts
    this.incrementCounter(this.metrics.requestCounts, key);

    // Record rate limit status
    const remaining = parseInt(headers['x-ratelimit-remaining']);
    const limit = parseInt(headers['x-ratelimit-limit']);

    if (remaining / limit < 0.1) {
      this.incrementCounter(this.metrics.rateLimitHits, key);
    }

    // Record response time
    this.recordAverage(this.metrics.responseTime, key, responseTime);

    // Record burst usage
    const burstRemaining = parseInt(headers['x-ratelimit-burst-remaining']);
    if (burstRemaining !== undefined) {
      this.recordValue(this.metrics.burstUsage, key, burstRemaining);
    }
  }

  generateReport() {
    const report = {
      summary: this.generateSummary(),
      hotspots: this.identifyHotspots(),
      recommendations: this.generateRecommendations(),
      visualizations: this.generateCharts()
    };

    return report;
  }

  identifyHotspots() {
    const hotspots = [];

    for (const [key, count] of this.metrics.requestCounts) {
      const [endpoint, hour] = key.split('_');
      const hitRate = (this.metrics.rateLimitHits.get(key) || 0) / count;

      if (hitRate > 0.1) {
        hotspots.push({
          endpoint,
          hour: parseInt(hour),
          requests: count,
          rateLimitHitRate: hitRate,
          avgResponseTime: this.metrics.responseTime.get(key)
        });
      }
    }

    return hotspots.sort((a, b) => b.rateLimitHitRate - a.rateLimitHitRate);
  }

  generateRecommendations() {
    const recommendations = [];
    const hotspots = this.identifyHotspots();

    if (hotspots.length > 0) {
      recommendations.push({
        type: 'OPTIMIZE_PEAK_HOURS',
        message: `High rate limit pressure during hours: ${
          hotspots.map(h => h.hour).join(', ')
        }`,
        action: 'Consider spreading requests or upgrading plan'
      });
    }

    // Check burst usage
    const avgBurstRemaining = this.calculateAverageBurst();
    if (avgBurstRemaining < 1000) {
      recommendations.push({
        type: 'LOW_BURST_CAPACITY',
        message: 'Burst capacity frequently depleted',
        action: 'Implement request queuing or increase burst limit'
      });
    }

    return recommendations;
  }
}

Best Practices

1. Implement Graceful Degradation

class GracefulDegradation:
    def __init__(self, api_client):
        self.api_client = api_client
        self.cache = TTLCache(maxsize=1000, ttl=300)  # 5 min cache
        self.degradation_level = 0

    async def get_data_with_degradation(self, key, fetch_func, *args):
        """Get data with automatic degradation on rate limits."""

        # Try cache first if degraded
        if self.degradation_level > 0:
            cached = self.cache.get(key)
            if cached:
                return cached

        try:
            # Attempt fresh fetch
            result = await fetch_func(*args)
            self.cache[key] = result
            self.degradation_level = max(0, self.degradation_level - 1)
            return result

        except RateLimitError:
            self.degradation_level = min(3, self.degradation_level + 1)

            # Level 1: Use cache if available
            if self.degradation_level == 1:
                cached = self.cache.get(key)
                if cached:
                    return cached

            # Level 2: Use older cache or approximate
            elif self.degradation_level == 2:
                return self.get_approximate_result(key)

            # Level 3: Return minimal data
            else:
                return self.get_minimal_result(key)

2. Optimize Request Patterns

// Batch operations to reduce request count
func BatchOptimizer(items []Item, batchSize int) [][]Item {
    batches := make([][]Item, 0)

    for i := 0; i < len(items); i += batchSize {
        end := i + batchSize
        if end > len(items) {
            end = len(items)
        }
        batches = append(batches, items[i:end])
    }

    return batches
}

// Combine multiple operations
type CombinedRequest struct {
    Compress  []CompressRequest  `json:"compress,omitempty"`
    Context   []ContextRequest   `json:"context,omitempty"`
    Predict   []PredictRequest   `json:"predict,omitempty"`
}

func (c *Client) ExecuteCombined(ctx context.Context, req CombinedRequest) (*CombinedResponse, error) {
    // Single API call for multiple operations
    return c.post(ctx, "/api/combined", req)
}

3. Rate Limit Observability

# Detailed rate limit tracking
class RateLimitObserver
  include Prometheus::Client

  def initialize
    @registry = Prometheus::Client.registry

    # Define metrics
    @request_counter = Counter.new(
      :api_requests_total,
      docstring: 'Total API requests',
      labels: [:endpoint, :status]
    )

    @rate_limit_gauge = Gauge.new(
      :api_rate_limit_remaining,
      docstring: 'Remaining rate limit',
      labels: [:endpoint]
    )

    @rate_limit_histogram = Histogram.new(
      :api_rate_limit_usage_ratio,
      docstring: 'Rate limit usage ratio',
      labels: [:endpoint],
      buckets: [0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99]
    )

    [@request_counter, @rate_limit_gauge, @rate_limit_histogram].each do |metric|
      @registry.register(metric)
    end
  end

  def observe_response(endpoint, response)
    # Count request
    @request_counter.increment(
      labels: { endpoint: endpoint, status: response.code }
    )

    # Track rate limits
    if response.headers['x-ratelimit-remaining']
      remaining = response.headers['x-ratelimit-remaining'].to_i
      limit = response.headers['x-ratelimit-limit'].to_i

      @rate_limit_gauge.set(remaining, labels: { endpoint: endpoint })

      usage_ratio = 1.0 - (remaining.to_f / limit)
      @rate_limit_histogram.observe(usage_ratio, labels: { endpoint: endpoint })
    end
  end

  def export_metrics
    @registry.metrics
  end
end

Troubleshooting

Common Issues

  1. Sudden rate limit errors
  2. Check for burst usage depletion
  3. Verify no parallel processes are running
  4. Look for retry loops causing amplification

  5. Inconsistent rate limits

  6. Ensure consistent authentication method
  7. Check for multiple API keys being used
  8. Verify time synchronization

  9. Lower than expected limits

  10. Confirm account tier
  11. Check for endpoint-specific limits
  12. Verify no account restrictions

Debug Mode

Enable debug mode to see detailed rate limit information:

curl -H "X-API-Key: your_key" \
     -H "X-Debug-RateLimit: true" \
     https://api.kindly.com/api/health

Debug response:

X-Debug-Pool: normal
X-Debug-Account-Tier: pro
X-Debug-Effective-Limit: 5000
X-Debug-Window-Start: 1704723540
X-Debug-Window-End: 1704723600
X-Debug-Request-Cost: 1
X-Debug-Burst-Available: true

Contact Support

For rate limit issues: - Status page: https://status.kindly.com - Support: support@kindly.com - Enterprise: enterprise@kindly.com

Enterprise customers can request: - Custom rate limits - Dedicated infrastructure - Priority queue access - Real-time limit adjustments