Skip to content

Extract logic from find_best_candidate and find_all_candidate to static methods #9681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

Martinn1996
Copy link

Closes #9084

Extract the logic from package_finder.find_best_candidate and package_finder.find_all_candidate to static methods.

#9078 (comment)

Co-authored by ThijmenL98

@Martinn1996 Martinn1996 changed the title Extract logic from find_best_candidate and find_all_candidate to static methods [WIP] Extract logic from find_best_candidate and find_all_candidate to static methods Mar 3, 2021
@uranusjr
Copy link
Member

uranusjr commented Mar 4, 2021

This is a good start, but needs more work to achive the goal. My concern to find_all_candidates being cached is that LinkEvaluator is a mutable object, and changing some of its attributes will affect the behaviour of the function. This means that the cached return value may become incorrect if we don’t correctly review every LinkEvaluator changes, and even if we find an attribute change needs to invalidate the cache, functools.lru_cache() also does not provide a way to do it.

The proper way to resolve this would be to ensure find_all_candidates’s return value is purely determined purely by the inputs, and all of them must be immutable (or at least hashable). Instead of simply moving the function out of LinkEvaluator (but still pass in a mutable instance of it into the function), you must disassemble it into immutable components and pass them in separately. This would involve significant work—as you can see, find_all_candidates is composed of multiple methods (on both LinkEvaluator and CandidateEvaluator), and you must also disassemble all them.

@uranusjr uranusjr marked this pull request as draft March 4, 2021 12:32
@uranusjr
Copy link
Member

uranusjr commented Mar 4, 2021

(Converting to draft so this don’t accidentally get merged.)

@Martinn1996
Copy link
Author

Thank you for the feedback and explanation on how to tackle this issue. In the upcoming weeks, we'll try to make the pr ready.

@QuentinLee5
Copy link
Contributor

@uranusjr
I have been looking into this issue as well.

Would it be a good solution to make PackageFinder, LinkEvaluator and its dependencies hashable based on its fields?
This way, if one of the fields in these mutable classes change, it will not use the cached functions, since it will look for the same hash of the parameters which will now be different since one of the fields has changed.

@uranusjr
Copy link
Member

It should be, but I’d favour the approach less myself. The amount of work is likely similar, but hashability affects much more than just the cachability of the two functions. Conceptually, both PackageFinder and LinkEvaluator represent logic, not data, so making them hashable makes less sense to me IMO from the maintenance angle.

@QuentinLee5
Copy link
Contributor

How would your approach look like?

Because if we look at for example the find_all_candidates function, find_links_versions, page_versions and file_versions all have a dependency on LinkEvaluator. If we would disassemble find_all_candidates and pass these three components as parameters, then the only code we would cache, is sorting file_versions and the concatenation of find_links_versions, page_versions and file_versions.

@uranusjr
Copy link
Member

What I have in mind is to not pass hashable objects into those functions, but those objects shouldn’t be PackageFinder and LinkEvaluator themselves. For example, generate some namedtuples from the current state and pass them into the functions instead.

@QuentinLee5
Copy link
Contributor

@uranusjr we refactored find_all_candidates and find_best_candidate to static functions. We decided to put them as static functions of the PackageFinder class instead of pure function to make it better readable while still achieving the same goal. We also did the same for all functions called by these 2 functions.

For the static functions, instead of using the LinkEvaluator and PackageFinder classes, we created a namedTuple of their state and use this state in the static functions.

When disassembling find_all_candidates, it turned out that this does change the _logged_links field of PackageFinder. Do you have a suggestion on how we could update the _logged_links field of PackageFinder since this requires an update in a PackageFinder class

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach looks good to me! There are some coding style improvements I would like to se though (commented inline).

Regarding _logged_links, the easiest solution is probably to make get_install_candidate_static return Tuple[Optional[InstallationCandidate], Set[Link]] instead, where the second member is the newly logged links, and update it back into PackageFinder._logged_links after calling.

BTW, static methods do not need to be called by their class name, and can still be called with self. This will make the code (and the generated diff) cleaner.

Returns elements of links in order, non-egg links first, egg links
second, while eliminating duplicates
"""
return PackageFinder._sort_links_static(links)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because this function is called in evaluate_links_static.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is, can we just call _sort_links_static directly there and get rid of this function? It does not do anything from what I can tell.

"""
link_evaluator = self.make_link_evaluator(project_name)

package_finder_tuple = self.get_state_as_tuple(True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’d be much clearer to make True a keyword argument instead.

@functools.lru_cache(maxsize=None)
def find_best_candidate_static(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’d be clearer to name this a private function (adding a leading _). There are other functions that can be similarly changed to signify they are only internal and not called elsewhere (which improves maintainability).

@QuentinLee5
Copy link
Contributor

We saw that _find_all_candidates_static updates the logged_links in the PackageFinder tuple, so we let _find_all_candidates_static return a tuple with the logged_links as second argument. We overwrite self._logged_links with this, since logged_links should contain all links from self._logged_links with additional links added by _find_all_candidates_static

@Martinn1996 Martinn1996 changed the title [WIP] Extract logic from find_best_candidate and find_all_candidate to static methods Extract logic from find_best_candidate and find_all_candidate to static methods Mar 22, 2021
@Martinn1996 Martinn1996 requested a review from uranusjr March 26, 2021 15:17
@Martinn1996 Martinn1996 marked this pull request as ready for review April 2, 2021 09:14
@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Apr 15, 2021
@Martinn1996 Martinn1996 force-pushed the 9084-Refactor-PackageFinder branch from c596437 to 28b08a0 Compare April 24, 2021 14:02
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Apr 24, 2021
@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Aug 15, 2021
@pradyunsg pradyunsg added the S: awaiting response Waiting for a response/more information label Sep 18, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs rebase or merge PR has conflicts with current master S: awaiting response Waiting for a response/more information
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor PackageFinder caching mechanism
6 participants