-
Notifications
You must be signed in to change notification settings - Fork 361
Refactor: Decouple Core Transformer Blocks #1852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c7e43e0
to
2e56b95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring @parambole. Can you explain the circular dependency you mentioned in the description?
Quant = quantizations.AqtQuantization | ||
|
||
|
||
class DecoderLayer(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more descriptive file name that can be used here instead of 'blockes.py'? Maybe naming this file decoders.py
? Would need to create a separate encoders.py
file for the VisionEncoder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to name this as decoders.py
, and move class VisionEncoder
to encoders.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you select one model to run a quick perf test with same configs? main branch vs. your branch. I see functional tests are passing, just ensure some breakdown does not hurt performance.
TL;DR
What: This PR refactors the core
DecoderLayer
and its related components out oflayers/models.py
and into a new, foundational file:layers/blocks.py
.Why: To improve the overall code architecture and break potential circular dependencies. This is a necessary prerequisite for adding new, complex modules that also need access to these core building blocks.
How: By creating
layers/blocks.py
to house the fundamentalDecoder
andDecoderLayer
classes. Higher-level files likemodels.py
and other future modules now import these components from a single location.Detailed Description
This pull request introduces a structural refactoring to improve modularity and maintainability.
The primary change is the creation of
MaxText/layers/blocks.py
, which now serves as the source for fundamental building blocks of the Transformer architecture, such as:DecoderLayer
Decoder
Previously, these classes were located in
layers/models.py
, which created tight coupling. As we add more features this tight coupling would lead to circular import dependencies.By decoupling these core components, we establish a clearer hierarchy in the codebase, where high-level modules can depend on these "building blocks" without depending on each other.
Checklist
Before submitting this PR, please make sure (put X in square brackets):