Skip to content

PERF: get_requirement() raise cache from 2048 to 8192 elements #12873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 1, 2025

Conversation

morotti
Copy link
Contributor

@morotti morotti commented Jul 23, 2024

Hello,

To summarize, the cache was too small :D
Should we have a NEWS item for this? It's very internal with no relevance to end users. EDIT: I added the news item since the CI was red
2% faster pip install on dry run, less in actual run.

You can debug the cache with print(f"parsing {req_string} {get_requirement.cache_info()}")

pip install --dry-run packageswith300dependencies
run with a default size
parsing python-jose[cryptography] CacheInfo(hits=51941, misses=13580, maxsize=2048, currsize=2048)

run with a larger size. 
parsing cryptography>=3.4.0; extra == "crypto" CacheInfo(hits=58758, misses=6947, maxsize=10000, currsize=6692)

before:
Screenshot 2024-07-23 114336

after:
Screenshot 2024-07-23 114343

Regards.

@notatallshaw
Copy link
Member

FWIW, this, I think, was clear when it was changed to 2048: #12663 (comment)

Originally when packaging 22.0+ was going to be vendored the plan was to drop the cache altogether. But as I showed in that PR, there are real world highly esoteric backtracking scenarios where it makes a noticable difference.

I don't have an opinion on if this is a good change or not, but I do think it would be worth running a memory profiler, like memray to see it changes the peak memory usage (though due to #12834 it will be a small fraction of the total).

@ichard26 ichard26 added the type: performance Commands take too long to run label Dec 26, 2024
@morotti
Copy link
Contributor Author

morotti commented Mar 20, 2025

ping on this. any interest in getting this merged? I'd like to get the few percent of performance :D

@notatallshaw notatallshaw added this to the 25.1 milestone Mar 20, 2025
@notatallshaw
Copy link
Member

Do you have a reproducible example that shows this performance improvement?

@morotti
Copy link
Contributor Author

morotti commented Mar 20, 2025

I'm testing on internal projects with hundreds of dependencies.
I don't have a venv I can share unfortunately.
You can try a pip install with as many dependenices as you can think of. It should hit the limit easily. I'd suggest jupyter as a starting point.

To debug the cache status, you can adjust this function contructors.py, then run pip install.

    def _parse_req_string(req_as_string: str) -> Requirement:
        try:
            r = get_requirement(req_as_string)
            print(get_requirement.cache_info())
            return r

On a common project we use internally (263 dependencies), the installation concludes with
CacheInfo(hits=9129, misses=5942, maxsize=8192, currsize=5756) with maxsize at 8192.
CacheInfo(hits=4311, misses=10760, maxsize=2048, currsize=2048) with maxsize at 2048. that is a lot of cache missed.

the profiler shows 5% taken in the get_requirement, down to 2% when raising the cache size.
image

image

Testing on a larger project I can find (435 dependencies). We hit 7500. I think I need to raise the cache a bit more to 10000 :D
CacheInfo(hits=43614, misses=7809, maxsize=8192, currsize=7532)

small performance improvements when installing hundreds of packages.
@morotti
Copy link
Contributor Author

morotti commented Mar 20, 2025

alright, run on the largest envs I could find, 697 dependencies.
CacheInfo(hits=126981, misses=9902, maxsize=99999, currsize=9539)

@morotti morotti force-pushed the perf-requirement-cache-size branch from a9be84a to 3634e87 Compare March 20, 2025 13:29
@morotti
Copy link
Contributor Author

morotti commented Mar 20, 2025

I've resent the PR. adjusted to 10000 max cache to fit large venvs. updated the news items. updated to main branch from today.

@pfmoore
Copy link
Member

pfmoore commented Mar 20, 2025

The question of how this affects memory usage has still not been addressed.

Also, how does this affect performance/memory use of small installs? For instance, does the large cache slow down installation of a single requirement (something like pip install --no-deps requests)? Smaller installs are way more common than larger ones, and I'd prefer it if we didn't penalise the common case for the sake of the uncommon one. In practice, I'd be surprised if it makes a difference, but then again, I don't see any overall runtime figures for big runs either.

I'm not against this change, and I know micro-optimisations like this can add up, but I'd caution against spending too much time on small improvements like this without having a better sense of where the real bottlenecks are.

@morotti
Copy link
Contributor Author

morotti commented Mar 20, 2025

hello @pfmoore , this has no effect on memory usage.

measuring memory usage on a few runs, the memory vary around 80-90MB. irrelevant of the cache size. the cache size makes no meaningful difference.
no difference when running a single install.
measured with print(psutil.Process().memory_info())

    # CacheInfo(hits=9129, misses=5942, maxsize=10000, currsize=5756)
    # pmem(rss=83542016, vms=112848896, shared=10485760, text=2306048, lib=0, data=86695936, dirty=0)

    # CacheInfo(hits=9129, misses=5942, maxsize=10000, currsize=5756)
    # pmem(rss=90177536, vms=113909760, shared=10485760, text=2306048, lib=0, data=87756800, dirty=0)

    # CacheInfo(hits=4311, misses=10760, maxsize=2048, currsize=2048)
    # pmem(rss=86573056, vms=107511808, shared=11534336, text=2306048, lib=0, data=81358848, dirty=0)

    # CacheInfo(hits=4311, misses=10760, maxsize=2048, currsize=2048)
    # pmem(rss=85520384, vms=107515904, shared=12582912, text=2306048, lib=0, data=81362944, dirty=0)

I'm not against this change, and I know micro-optimisations like this can add up, but I'd caution against spending too much time on small improvements like this without having a better sense of where the real bottlenecks are.

the bottleneck is on parsing the requirement strings. it's pretty clear.
I've made a dump of all the requirement strings.
there is a tremendous amount of strings with extras; nothing in particular, there are "tests" and "docs" and a hundred of software like common database and tools that would be expected in extra.

@notatallshaw
Copy link
Member

notatallshaw commented Mar 27, 2025

I agree that the memory difference is small enough that it's not really measurable within the margin of error.

Here's a snapshot on main branch for a large resolution just before it completes:

memray-0

And here's a snapshot on this branch:

memray-2

This is sample profiling so don't read too deeply into some of the differences. But in broad strokes, when connecting to a JSON Simple API the memory is dominated by raw_decode in json.decoder, and the object relevant to this PR is __init__ in pip._internal.req.req_install and there's only a small memory difference.

@ichard26
Copy link
Member

If there is no/little appreciable gain in memory usage due to this change, I'm fine with it. It seems a bit silly to need such a big cache for simply parsing requirement strings, but I'd imagine that reworking pip/packaging to avoid the parsing would be hard/impossible.

@pfmoore
Copy link
Member

pfmoore commented Apr 1, 2025

I agree with @ichard26 here - @notatallshaw you added this to the 25.1 milestone, do you plan on merging it? If not, can you remove the milestone and it can go in when it's ready - there's no real need to tie it to a particular release.

@notatallshaw notatallshaw merged commit 84a052e into pypa:main Apr 1, 2025
28 checks passed
@notatallshaw
Copy link
Member

Thanks for the ping, I had been planning to merge.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants