Skip to content

Use generational identifiers for tracked structs #864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ibraheemdev
Copy link
Contributor

@ibraheemdev ibraheemdev commented May 16, 2025

Pack a generation into input IDs.

The generation of a tracked struct is incremented after it is reused, allowing us to avoid read dependencies. Generational IDs were originally meant for #839, as adding the necessary read dependency on interned structs that may be reused introduced a large (~50%) regression to ty's incremental performance.

This increases the size of Id from a u32 to a u64. However, if the generation is restricted to u16, and ingredient indices are restricted to u16, this does not increase the size of DatabaseKeyIndex, so the memory usage effect is limited (~5% increase to ty's peak memory usage). However, this should allow us to implement garbage collection for interned values without significant performance concerns, so memory usage over time should benefit.

If the generation exceeds u16::MAX, we can fallback to adding read dependencies on tracked structs. An alternative would be to leak the slot, which would also allow us to remove the created_at field on tracked structs and may alleviate the memory usage concerns. This might be more feasible if the generation stole a few more bits from ingredient indices (as the number of ingredients is effectively static for a given salsa program).

This has a small (~4%) performance improvement on ty's benchmarks.

Copy link

netlify bot commented May 16, 2025

Deploy Preview for salsa-rs canceled.

Name Link
🔨 Latest commit 19d00eb
🔍 Latest deploy log https://app.netlify.com/projects/salsa-rs/deploys/682cbf1f736e72000800ab23

Copy link

codspeed-hq bot commented May 16, 2025

CodSpeed Performance Report

Merging #864 will degrade performances by 15.98%

Comparing ibraheemdev:ibraheem/gen-ids (19d00eb) with master (2d4321e)

Summary

❌ 5 regressions
✅ 7 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
accumulator 3.9 ms 4.4 ms -12.01%
amortized[Input] 3 µs 3.2 µs -5.64%
amortized[InternedInput] 2.9 µs 3.1 µs -5.8%
amortized[SupertypeInput] 3.6 µs 3.8 µs -4.93%
many_tracked_structs 33.8 µs 40.2 µs -15.98%

@MichaReiser
Copy link
Contributor

This has a small (~4%) performance improvement on ty's benchmarks.

This is huge! I'm leaning towards removing the created_at field and leaking memos. It's still an improvement over what we have today and we can explore the right solution once we hit that limit (which could as well be that we increase the id size further)

Copy link
Contributor

@davidbarsky davidbarsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, merge whenever you'd like to?

@MichaReiser
Copy link
Contributor

I'd be interested to explore the memory overhead if we changed DatabaseKeyIndex to store three separate u32 (ingredient, generation, and id) as that would, IMO, eliminate the need for a report_read fallback.

@@ -152,6 +152,9 @@ fn revalidate_no_changes() {
"salsa_event(DidValidateMemoizedValue { database_key: read_value(Id(400)) })",
"salsa_event(DidReinternValue { key: query_d::interned_arguments(Id(800)), revision: R2 })",
"salsa_event(DidValidateMemoizedValue { database_key: query_d(Id(800)) })",
"salsa_event(DidValidateMemoizedValue { database_key: read_value(Id(401)) })",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, this is interesting

@ibraheemdev
Copy link
Contributor Author

I updated the PR to store DatabaseKeyIndex as a u32 triple. This didn't seem to have a noticeable affect on memory usage, and allows us to remove the created_at field comfortably, which ends up being a net decrease compared to the previous version.

Unfortunately it looks like the benchmarks don't like this change...

// the memos and return a new ID here as if we have allocated a new slot.

// SAFETY: We already set `updated_at` to `None` to acquire the lock.
self.delete_entity(zalsa, id, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unsure about this but it seems correct?

@MichaReiser
Copy link
Contributor

Is there any perf difference on ty? (When running on codspeed)

@ibraheemdev
Copy link
Contributor Author

ibraheemdev commented May 20, 2025

No performance difference on ty benchmarks (astral-sh/ruff#18226).

@MichaReiser
Copy link
Contributor

I plan to review this tomorrow.

I'm okay with the regression given that it enables interned GC without a 50% incremental perf and memory regression. If you haven't done so already, maybe take an hour or two to see if you can spot the source of the regression in a local recorded profile (take the benchmark that regresses the most)

This is a substantial change where I'd like to get at least a thumbs up from r-a too, given that the performance is now regressing on micro benchmarks. Cc @Veykril

@ibraheemdev
Copy link
Contributor Author

I'm okay with the regression given that it enables interned GC without a 50% incremental perf and memory regression. If you haven't done so already, maybe take an hour or two to see if you can spot the source of the regression in a local recorded profile (take the benchmark that regresses the most)

I don't think there is a specific source, I think the regression is directly related to the size of DatabaseKeyIndex increasing (e.g. worse cache utilization, and it specifically no longer fits into a standard register). It makes sense that the microbenchmarks are regressing, I'm not sure there's much we can do about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants