-
Notifications
You must be signed in to change notification settings - Fork 175
Use generational identifiers for tracked structs #864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
CodSpeed Performance ReportMerging #864 will degrade performances by 15.98%Comparing Summary
Benchmarks breakdown
|
This is huge! I'm leaning towards removing the |
57ec532
to
66bbb88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, merge whenever you'd like to?
I'd be interested to explore the memory overhead if we changed |
d980d6c
to
5ce6735
Compare
@@ -152,6 +152,9 @@ fn revalidate_no_changes() { | |||
"salsa_event(DidValidateMemoizedValue { database_key: read_value(Id(400)) })", | |||
"salsa_event(DidReinternValue { key: query_d::interned_arguments(Id(800)), revision: R2 })", | |||
"salsa_event(DidValidateMemoizedValue { database_key: query_d(Id(800)) })", | |||
"salsa_event(DidValidateMemoizedValue { database_key: read_value(Id(401)) })", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, this is interesting
I updated the PR to store Unfortunately it looks like the benchmarks don't like this change... |
src/tracked_struct.rs
Outdated
// the memos and return a new ID here as if we have allocated a new slot. | ||
|
||
// SAFETY: We already set `updated_at` to `None` to acquire the lock. | ||
self.delete_entity(zalsa, id, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little unsure about this but it seems correct?
5ce6735
to
4920a8a
Compare
Is there any perf difference on ty? (When running on codspeed) |
4920a8a
to
19d00eb
Compare
No performance difference on ty benchmarks (astral-sh/ruff#18226). |
I plan to review this tomorrow. I'm okay with the regression given that it enables interned GC without a 50% incremental perf and memory regression. If you haven't done so already, maybe take an hour or two to see if you can spot the source of the regression in a local recorded profile (take the benchmark that regresses the most) This is a substantial change where I'd like to get at least a thumbs up from r-a too, given that the performance is now regressing on micro benchmarks. Cc @Veykril |
I don't think there is a specific source, I think the regression is directly related to the size of |
Pack a generation into input IDs.
The generation of a tracked struct is incremented after it is reused, allowing us to avoid read dependencies. Generational IDs were originally meant for #839, as adding the necessary read dependency on interned structs that may be reused introduced a large (~50%) regression to ty's incremental performance.
This increases the size of
Id
from au32
to au64
. However, if the generation is restricted tou16
, and ingredient indices are restricted tou16
, this does not increase the size ofDatabaseKeyIndex
, so the memory usage effect is limited (~5% increase to ty's peak memory usage). However, this should allow us to implement garbage collection for interned values without significant performance concerns, so memory usage over time should benefit.If the generation exceeds
u16::MAX
, we can fallback to adding read dependencies on tracked structs. An alternative would be to leak the slot, which would also allow us to remove thecreated_at
field on tracked structs and may alleviate the memory usage concerns. This might be more feasible if the generation stole a few more bits from ingredient indices (as the number of ingredients is effectively static for a given salsa program).This has a small (~4%) performance improvement on ty's benchmarks.