More accurate RPM limit enforcement on keys #10037

krrishdholakia · 2025-04-16T01:12:59Z

Title

More accurate RPM limit enforcement on keys

Relevant issues

Fixes issue where instances were overwriting each others increment values on redis cache
Improves rpm checking by incrementing on check (reducing spillover from 66 -> 2 in multi-instance, high traffic setup)

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on (make test-unit)[https://docs.litellm.ai/docs/extras/contributing_code]
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature
🐛 Bug Fix

Changes

migrate parallel_request_limiter.py to inherit from base_routing_strategy.py (already solved rpm increment/syncing with redis problem)
v2 check_key_in_limits function

…llel request handler to use base routing strategy allows for better redis / internal memory cache usage

uses redis increment cache logic ensures tpm/rpm logic works well across instances

reduces spillover (from 66 -> 2 at 10k+ requests in 1min.)

vercel · 2025-04-16T01:13:04Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 29, 2025 3:26am

harvardfly · 2025-04-26T19:03:24Z

When will this issue be fixed? It is stably reproducible in version 1.65.4, and the multi-instance rate limiting is failing. @krrishdholakia

krrishdholakia · 2025-04-26T19:38:57Z

Hey @harvardfly acknowledging this. I hope to have this done over the next 2 weeks.

Need to do:

pass unit testing
run load testing to confirm this passes as expected

Follow up pr:

migrate team, user, model-level rate limiting as well

harvardfly · 2025-04-27T01:48:01Z

Got it, thank you very much @krrishdholakia . I hope you can fix it in the stable version as soon as possible, as TPM and RPM limits are crucial features.

ScGPS · 2025-04-29T10:07:24Z

@krrishdholakia,
From your PR code, there are 2 issues exists:

1. There is 500 error found. How to reproduce it? please use the same key to get service twice.

10:55:03 - LiteLLM Proxy:ERROR: proxy_server.py:3642 - litellm.proxy.proxy_server.completion(): Exception occured - unsupported operand type(s) for +: 'dict' and 'int'
Traceback (most recent call last):
  File "/mnt/d/projects/AIAPIChannel/litellm_oss/litellm/proxy/proxy_server.py", line 3524, in completion
    data = await proxy_logging_obj.pre_call_hook(  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/d/projects/AIAPIChannel/litellm_oss/litellm/proxy/utils.py", line 565, in pre_call_hook
    raise e
  File "/mnt/d/projects/AIAPIChannel/litellm_oss/litellm/proxy/utils.py", line 552, in pre_call_hook
    response = await _callback.async_pre_call_hook(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/d/projects/AIAPIChannel/litellm_oss/litellm/proxy/hooks/parallel_request_limiter.py", line 381, in async_pre_call_hook
    await self.check_key_in_limits_v2(
  File "/mnt/d/projects/AIAPIChannel/litellm_oss/litellm/proxy/hooks/parallel_request_limiter.py", line 128, in check_key_in_limits_v2
    results = await self._increment_value_list_in_current_window(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/d/projects/AIAPIChannel/litellm_oss/litellm/router_strategy/base_routing_strategy.py", line 64, in _increment_value_list_in_current_window
    result = await self._increment_value_in_current_window(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/d/projects/AIAPIChannel/litellm_oss/litellm/router_strategy/base_routing_strategy.py", line 81, in _increment_value_in_current_window
    result = await self.dual_cache.in_memory_cache.async_increment(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/d/projects/AIAPIChannel/litellm_oss/litellm/caching/in_memory_cache.py", line 184, in async_increment
    value = init_value + value
            ~~~~~~~~~~~^~~~~~~
TypeError: unsupported operand type(s) for +: 'dict' and 'int'
INFO:     127.0.0.1:36024 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

Suggest: atomic increment need be only used in redis cache, not in in-memory cache. When redis cache updated by atomic increment, need use redis cache to update in-memory cache in multi-instances case.

litellm/litellm/router_strategy/base_routing_strategy.py

Lines 80 to 89 in befb36e

    
           result = await self.dual_cache.in_memory_cache.async_increment( 
        
               key=key, 
        
               value=value, 
        
               ttl=ttl, 
        
           ) 
        
           increment_op = RedisPipelineIncrementOperation( 
        
               key=key, 
        
               increment_value=value, 
        
               ttl=ttl, 
        
           )

krrishdholakia · 2025-05-02T05:10:12Z

Closing as this is now on main -

litellm/enterprise/enterprise_hooks/parallel_request_limiter_v2.py

Line 4 in a4c96d5

    
           Designed to work on a multi-instance setup, where multiple instances are writing to redis simultaneously

krrishdholakia · 2025-05-02T05:11:30Z

@ScGPS we do sync the redis value to the in memory cache (handled by the BaseRoutingStrategy class)

We do this periodically (every 0.01s) to avoid calling redis on each request

krrishdholakia added 5 commits April 15, 2025 17:04

refactor(parallel_request_limiter.py): initial commit moving max para…

a4baa61

…llel request handler to use base routing strategy allows for better redis / internal memory cache usage

fix(base_routing_strategy.py): fix init in async environments

2e68214

fix(base_routing_strategy.py): fix linting error

5266c8a

refactor: move to new check key in limits logic

937a6e6

uses redis increment cache logic ensures tpm/rpm logic works well across instances

feat(parallel_request_limiter.py): move to doing an increment on check

897eb46

reduces spillover (from 66 -> 2 at 10k+ requests in 1min.)

krrishdholakia marked this pull request as draft April 16, 2025 01:13

fix: fix ruff errors

befb36e

vercel bot deployed to Preview April 29, 2025 03:26 View deployment

krrishdholakia closed this May 2, 2025

krrishdholakia deleted the litellm_dev_04_15_2025_p1 branch May 2, 2025 05:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

More accurate RPM limit enforcement on keys #10037

More accurate RPM limit enforcement on keys #10037

Uh oh!

krrishdholakia commented Apr 16, 2025

Uh oh!

vercel bot commented Apr 16, 2025 •

edited

Loading

Uh oh!

harvardfly commented Apr 26, 2025

Uh oh!

krrishdholakia commented Apr 26, 2025

Uh oh!

harvardfly commented Apr 27, 2025 •

edited

Loading

Uh oh!

ScGPS commented Apr 29, 2025 •

edited

Loading

Uh oh!

krrishdholakia commented May 2, 2025

Uh oh!

krrishdholakia commented May 2, 2025

Uh oh!

Uh oh!

Uh oh!

More accurate RPM limit enforcement on keys #10037

More accurate RPM limit enforcement on keys #10037

Uh oh!

Conversation

krrishdholakia commented Apr 16, 2025

Title

Relevant issues

Pre-Submission checklist

Type

Changes

Uh oh!

vercel bot commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harvardfly commented Apr 26, 2025

Uh oh!

krrishdholakia commented Apr 26, 2025

Uh oh!

harvardfly commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ScGPS commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krrishdholakia commented May 2, 2025

Uh oh!

krrishdholakia commented May 2, 2025

Uh oh!

Uh oh!

vercel bot commented Apr 16, 2025 •

edited

Loading

harvardfly commented Apr 27, 2025 •

edited

Loading

ScGPS commented Apr 29, 2025 •

edited

Loading