Skip to content

[persist] Fix a bug in the projection optimization refactoring #32178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 15, 2025

Conversation

bkirwi
Copy link
Contributor

@bkirwi bkirwi commented Apr 11, 2025

Two related bugs:

  • The new optimization path was creating a column for K=Row, not K=SourceData. This would never work, so the optimization wasn't kicking in.
  • The "faked" data didn't have a schema id, which caused issues at part decoding.

Motivation

https://github.com/MaterializeInc/database-issues/issues/9174

Tips for reviewer

There's some additional dedup and cleanup to be done here - I'll take that in a followup PR.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

bkirwi added 2 commits April 11, 2025 15:55
We don't actually need a schema id here, since we have the data
itself... using the id to cache the migration is just an optimization.
@bkirwi
Copy link
Contributor Author

bkirwi commented Apr 11, 2025

@bkirwi bkirwi marked this pull request as ready for review April 11, 2025 22:00
@bkirwi bkirwi requested review from a team as code owners April 11, 2025 22:00
@bkirwi bkirwi requested a review from ParkMyCar April 11, 2025 22:00
Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!! Maybe also worth a backport to reinforce #32173?

Avoid changing behaviour for older parts... probably fine but you never
know.
@bkirwi
Copy link
Contributor Author

bkirwi commented Apr 15, 2025

Reran the benchmarks here: https://buildkite.com/materialize/nightly/builds/11829#01963706-d465-4af9-815e-1555557b59d5

Looks like we're seeing the optimization kick in again as expected.

I've also added a final commit that will limit this to structured-only parts, to limit the blast radius. (Previously, dual-written parts would fall back to decoding codec data if there was no schema... I'm not aware of any cases where that would give different results than the new thing, but this way we don't have to worry about it.)

My default bias is to not backport things... are we worried that #32173 may not fully cover us?

@bkirwi bkirwi merged commit 13e0fde into MaterializeInc:main Apr 15, 2025
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants