-
Notifications
You must be signed in to change notification settings - Fork 3.1k
PERF: get_requirement() raise cache from 2048 to 8192 elements #12873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b9eef58
to
04b2630
Compare
FWIW, this, I think, was clear when it was changed to 2048: #12663 (comment) Originally when packaging 22.0+ was going to be vendored the plan was to drop the cache altogether. But as I showed in that PR, there are real world highly esoteric backtracking scenarios where it makes a noticable difference. I don't have an opinion on if this is a good change or not, but I do think it would be worth running a memory profiler, like memray to see it changes the peak memory usage (though due to #12834 it will be a small fraction of the total). |
ping on this. any interest in getting this merged? I'd like to get the few percent of performance :D |
Do you have a reproducible example that shows this performance improvement? |
small performance improvements when installing hundreds of packages.
alright, run on the largest envs I could find, 697 dependencies. |
a9be84a
to
3634e87
Compare
I've resent the PR. adjusted to 10000 max cache to fit large venvs. updated the news items. updated to main branch from today. |
The question of how this affects memory usage has still not been addressed. Also, how does this affect performance/memory use of small installs? For instance, does the large cache slow down installation of a single requirement (something like I'm not against this change, and I know micro-optimisations like this can add up, but I'd caution against spending too much time on small improvements like this without having a better sense of where the real bottlenecks are. |
hello @pfmoore , this has no effect on memory usage. measuring memory usage on a few runs, the memory vary around 80-90MB. irrelevant of the cache size. the cache size makes no meaningful difference.
the bottleneck is on parsing the requirement strings. it's pretty clear. |
I agree that the memory difference is small enough that it's not really measurable within the margin of error. Here's a snapshot on main branch for a large resolution just before it completes: And here's a snapshot on this branch: This is sample profiling so don't read too deeply into some of the differences. But in broad strokes, when connecting to a JSON Simple API the memory is dominated by |
If there is no/little appreciable gain in memory usage due to this change, I'm fine with it. It seems a bit silly to need such a big cache for simply parsing requirement strings, but I'd imagine that reworking pip/packaging to avoid the parsing would be hard/impossible. |
I agree with @ichard26 here - @notatallshaw you added this to the 25.1 milestone, do you plan on merging it? If not, can you remove the milestone and it can go in when it's ready - there's no real need to tie it to a particular release. |
Thanks for the ping, I had been planning to merge. |
Hello,
To summarize, the cache was too small :D
Should we have a NEWS item for this? It's very internal with no relevance to end users. EDIT: I added the news item since the CI was red
2% faster pip install on dry run, less in actual run.
You can debug the cache with
print(f"parsing {req_string} {get_requirement.cache_info()}")
before:

after:

Regards.