Skip to content

Simple LRU garbage collection for interned values #839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ibraheemdev
Copy link
Contributor

@ibraheemdev ibraheemdev commented Apr 30, 2025

Basic implementation of #597.

The garbage collection strategy used here is very simple.

  • Slots that have not been accessed in the last 3 active revisions are always reused, and nothing else. Active revisions are those in which interned values were actually read to avoid counting empty revisions.
  • Slots are never swept proactively, though this would be simple to add as an option.
  • We also cannot collect low durability values, which would require &mut db. This could be supported with the above.

This is by no means a perfect strategy, but we can always tune it as needed once the infrastructure is in place. We may also want to expose the N active revisions constant as a configuration option in the #[interned] macro (it seems somewhat related to durability).

The LRU list is sharded by hash, very similar to dashmap. There seems to be no performance difference from the previous implementation in multi-threaded ty benchmarks (though there was a significant regression without sharding). There is also no significant performance difference in single-threaded benchmarks thanks to #864, which lets us avoid additional read dependencies.

My main concern is that this change adds 3 word-sized fields to interned values: an Id and two pointers for the intrusive LRU list. This seems unavoidable to implement garbage collection, but still something to note.

Copy link

netlify bot commented Apr 30, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 286c167
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/683642844850650008c477c2

Copy link

codspeed-hq bot commented Apr 30, 2025

CodSpeed Performance Report

Merging #839 will not alter performance

Comparing ibraheemdev:interned-gc (286c167) with master (f9a9bb2)

Summary

✅ 12 untouched benchmarks

@MichaReiser
Copy link
Contributor

I'm seeing a lot of panics in red-knot, mostly Data was not interned in the latest revision for its durability. I'm not sure exactly why this is happening, but I suspect it's related to durabilities.

Interesting. I suspect this is new but I'm investigating a similar issue related to interned in cycles

@MichaReiser
Copy link
Contributor

MichaReiser commented May 1, 2025

That makes me wonder if your PR might help to come up with simpler repros for the fixpoint issues that we're seeing. I should give this a try tomorrow.

On the perf side. You could try using the knot_benchmark script in the ruff repository

@MichaReiser
Copy link
Contributor

MichaReiser commented May 1, 2025

I quickly tried the branch in Red Knot but I also got new panics that we didn't see before. That makes me suspect that some revision tracking might be off in this PR?

What I like to do to debug those issues is to compare the tracing output and see where things go different

@ibraheemdev
Copy link
Contributor Author

I think the issue is unrelated to durabilities but about interned query arguments. If a query takes an interned struct as an argument then an older memo may be incorrectly verified after the slot has been reused, but the IDs still match. I think we have to check that the memo and first_interned_at revisions match to avoid this, but I'm not exactly sure where that would be.

@MichaReiser
Copy link
Contributor

I think I understand. What you're saying is that query(interned) may be checked for changes in a later revision where interned has been collected.

I think will need something similar to what we have in tracked structs where salsa informs other ingredients that the memo was collected so that they can free resources as well if necessary:

https://github.com/MichaReiser/salsa/blob/953dc34ce5cfd7fca6ad040241c3cbebc11830e7/src/tracked_struct.rs#L597-L655

The second part in tracked struct that protects against this is that we separately track created_at and updated_at. maybe_changed_after can then compare created_at to see if this struct was recrated since rev_before.

@ibraheemdev ibraheemdev force-pushed the interned-gc branch 3 times, most recently from 073a988 to 7f89c28 Compare May 2, 2025 21:19
@ibraheemdev ibraheemdev force-pushed the interned-gc branch 2 times, most recently from 3ac564d to a232751 Compare May 9, 2025 22:06
@ibraheemdev
Copy link
Contributor Author

I updated the PR to add read dependencies and clear memos on reuse (I believe we will need to defer reclamation of the memos), but there are a still some ty tests failing that I'm unsure about.

@ibraheemdev ibraheemdev force-pushed the interned-gc branch 2 times, most recently from 39c219b to bcb023c Compare May 15, 2025 05:13
@ibraheemdev ibraheemdev force-pushed the interned-gc branch 2 times, most recently from e366c0e to 5e693d9 Compare May 22, 2025 14:18
@ibraheemdev
Copy link
Contributor Author

ibraheemdev commented May 22, 2025

Hmm I'm not sure we can remove the read dependency on creation with generational IDs, we still need last_interned_at to be updated for any dependent queries. Maybe it would be cheaper to update that field directly when an interned value is read?

@MichaReiser
Copy link
Contributor

Hmm I'm not sure we can remove the read dependency on creation with generational IDs, we still need last_interned_at to be updated for any dependent queries. Maybe it would be cheaper to update that field directly when an interned value is read?

No, I don't think we can remove it from the writing query. We need the dependency to ensure the interned value gets re-created when it was claimed previously

@ibraheemdev ibraheemdev force-pushed the interned-gc branch 3 times, most recently from 8fc9557 to 54b6bba Compare May 27, 2025 22:13
@ibraheemdev ibraheemdev marked this pull request as ready for review May 27, 2025 22:42
@ibraheemdev
Copy link
Contributor Author

I updated the PR description with the new changes, this should be ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants