Skip to content

Refine performance (2nd) #1492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

XenoAmess
Copy link

@XenoAmess XenoAmess commented Jun 14, 2025

refine performance for demo at apache/maven#2481

including 3 parts:

  1. fix the bug about using MMap when value be collection, but the old codes didn't copy the collection, thus may cause appending to a list value in a DoneMMap, thus makes infinity long value, thus makes the run slow in super multi module project.
  2. BfDependencyCollector can share executorService between resolvers, to reduce the create/close/create/close of the executorService, which is expensive.
  3. intern some strings to make use of the new hash cache in String class in jdk.

Following this checklist to help us incorporate your
contribution quickly and easily:

  • Your pull request should address just one issue, without pulling in other changes.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body.
    Note that commits might be squashed by a maintainer on merge.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied.
    This may not always be possible but is a best-practice.
  • Run mvn verify to make sure basic checks pass.
    A more thorough check will be performed on your pull request automatically.
  • You have run the integration tests successfully (mvn -Prun-its verify).

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

@XenoAmess XenoAmess changed the title Refine performance Refine performance (2nd) Jun 14, 2025
@XenoAmess XenoAmess force-pushed the refine-performance branch from 51bcd9b to 7328df7 Compare June 14, 2025 19:58
fix the bug about using MMap when value be collection, but the old codes didn't copy the collection, thus may cause appending to a list value in a DoneMMap, thus makes infinity long value, thus makes the run slow in super multi module project.
@XenoAmess XenoAmess force-pushed the refine-performance branch from 7328df7 to f295e15 Compare June 14, 2025 19:59
@XenoAmess
Copy link
Author

@cstamas please have a look.
the most severe part is the first part.(MMap part)

*
* @since 1.9.0
*/
public static final int DEFAULT_THREADS = 5;
public static final int DEFAULT_THREADS = Runtime.getRuntime().availableProcessors();
Copy link
Member

@cstamas cstamas Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1: we can run on monsters as well. This is not an "improvement" on alg, just "throwing more" on it. In other words, doing this may hit "remote end" as well, think of https://en.wikipedia.org/wiki/Tragedy_of_the_commons like Central is.

Copy link
Author

@XenoAmess XenoAmess Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cstamas we increase the number because we try to merge the executers. see changes in several lines under that in the same file.
before, each instance have one seperated executor,and thus have 2 problems

  1. when running with like T2C, I afraid there might be very many of instance in parallel, thus many small executors in parallel
  2. the start and stop of the small executors be time costing.(especially adding worker threads)

Still,if you don't like mixing idea for merging the executors of every instance, we can remove this part. I just don't think it good to start many small thread executors, and I still think a bigger single executor is better in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we should use forkjoin pool by default, it is alsmot certainly created by a build these days and will avoid the overhead of custom pool with almost no benefit

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we should use forkjoin pool by default, it is alsmot certainly created by a build these days and will avoid the overhead of custom pool with almost no benefit

yes it would be far better than using many small pools.
But as for forkjoin pool, I have no idea why we should use forkjoin pool but not executer here. IMO fork join pool is use for works who have relationship/dependency, and the tasks here seems not quite dependent on each other..?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://issues.apache.org/jira/browse/MRESOLVER-283

Oh, I see #213 and the related comments...
Interesting.
Good, now I kind of getting interested in the fork-join-pool solution now.

@cstamas Is there any news since the original executor pr closed? should I start another pr(and split this from this pr) for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing new since. It was just a faint idea, and did not move anywhere yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nothing new since. It was just a faint idea, and did not move anywhere yet.

Um. Seems it is good enough a thing that worth my time. Would have a try when next weekends.
I agree on split it out of this pr, it is a bigger thing than my original thought.
Though I do agree it seems a better way than I provided in this pr.
Please focus on reviewing other part of this pr then.

@gnodet
Copy link
Contributor

gnodet commented Jun 18, 2025

I think the intern logic (apart from the context in trace requests) would be better placed inside Maven when parsing the POM files.
It should be fairly simple, as it's just a matter of passing a ContentTransformer as an argument to this call.
We should intern all enums (scopes, packaging) and at least groupId, artifactId, version. The models are kept in memory throughout the whole build anyway, so it makes sense to intern quite a few values.

@gnodet
Copy link
Contributor

gnodet commented Jun 18, 2025

I think the intern logic (apart from the context in trace requests) would be better placed inside Maven when parsing the POM files. It should be fairly simple, as it's just a matter of passing a ContentTransformer as an argument to this call. We should intern all enums (scopes, packaging) and at least groupId, artifactId, version. The models are kept in memory throughout the whole build anyway, so it makes sense to intern quite a few values.

I've raised apache/maven#2495. I was quite sure the work was done already, but after investigation, it had not been plugged in.

@XenoAmess
Copy link
Author

XenoAmess commented Jun 18, 2025

I think the intern logic (apart from the context in trace requests) would be better placed inside Maven when parsing the POM files. It should be fairly simple, as it's just a matter of passing a ContentTransformer as an argument to this call. We should intern all enums (scopes, packaging) and at least groupId, artifactId, version. The models are kept in memory throughout the whole build anyway, so it makes sense to intern quite a few values.

yes, I agree.
interning is more important in mordern jdk versions because they add a cache for the hash of the string, thus a same string object calling hash methods for several time would be far faster than different objects.
(and we use hashmap based maps quite a lot)

I've raised apache/maven#2495. I was quite sure the work was done already, but after investigation, it had not been plugged in.

glad to see it. so I will remove this part from this pr too.

@XenoAmess XenoAmess force-pushed the refine-performance branch from 252a8c2 to 82f1b17 Compare June 19, 2025 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants