Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51711][ML][CONNECT] Memory based MLCache eviction policy #50530

Closed
wants to merge 2 commits into from

Conversation

xi-db
Copy link
Contributor

@xi-db xi-db commented Apr 7, 2025

What changes were proposed in this pull request?

Currently the ML Cache is limited by the number of cache entries (100 entries at this time), but it is not ideal because model size varies.

In this PR, we are updating the MLCache model eviction policy to be memory based, i.e. to evict old models if the total size of models is greater than a limit.

Besides, two new internal Spark confs are introduced:

  • spark.connect.session.connectML.mlCache.maxSize: Maximum size of the MLCache per session. The cache will evict the least recently used models if the size exceeds this limit.
  • spark.connect.session.connectML.mlCache.timeout: Timeout of models in MLCache. Models will be evicted from the cache if they are not used for this amount of time.

Why are the changes needed?

This improve the memory management of MLCache.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New test and existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@xi-db
Copy link
Contributor Author

xi-db commented Apr 7, 2025

Hi @WeichenXu123 , could you review this PR of changing entry count based MLCache eviction policy to memory based?

Copy link
Contributor

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this jira a subtask of https://issues.apache.org/jira/browse/SPARK-51236

buildConf("spark.connect.session.connectML.mlCache.maxSize")
.doc("Maximum size of the MLCache per session. The cache will evict the least recently" +
"used models if the size exceeds this limit. The size is in bytes.")
.version("4.0.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.version("4.0.0")
.version("4.1.0")

I think this PR doesn't need to be included in 4.0.0

buildConf("spark.connect.session.connectML.mlCache.timeout")
.doc("Timeout of models in MLCache. Models will be evicted from the cache if they are not " +
"used for this amount of time. The timeout is in minutes.")
.version("4.0.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.version("4.0.0")
.version("4.1.0")

Copy link
Contributor

@WeichenXu123 WeichenXu123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zhengruifeng
Copy link
Contributor

merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants