Skip to content

Feat: Onbaord PlamoForCausalLM Architecture #351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

quic-shagun
Copy link
Contributor

  1. Add support for PlamoForCausalLM architecture based models
  2. Tested with Batch Size > 1
  3. As the model is not available in HF Transformers yet, Used ModuleMethodMapperTransform to map methods

@quic-amitraj quic-amitraj marked this pull request as draft April 11, 2025 04:39
@quic-shagun quic-shagun marked this pull request as ready for review April 11, 2025 08:12
@quic-amitraj
Copy link
Contributor

Please refer the PR #373 and make change in modelling file accordingly.

@quic-amitraj quic-amitraj self-requested a review April 22, 2025 05:47
@quic-amitraj quic-amitraj marked this pull request as draft April 22, 2025 06:11
@quic-shagun
Copy link
Contributor Author

Please refer the PR #373 and make change in modelling file accordingly.

Updated

@quic-rishinr
Copy link
Contributor

@quic-shagun Do we have the legal approval for merging this architecture?

@quic-akuruvil
Copy link
Contributor

Please run perplexity script (torch vs AIC) and paste results here, for text-to-text models.

@quic-shagun
Copy link
Contributor Author

quic-shagun commented Jun 3, 2025

Please run perplexity script (torch vs AIC) and paste results here, for text-to-text models.

Sample No:0 	 AVG_LOSS: 1.3908
2025-06-03 03:16:55,462 [INFO] E2E Sample Time: 51.6119s	 E2E TOKENS/S : 39.66
2025-06-03 03:16:55,675 [INFO] TORCH Perplexity: 4.01810598
2025-06-03 03:16:55,676 [INFO] TORCH Loss: 1.39081061
2025-06-03 03:16:55,676 [INFO] Total time for evaluation: 0.0498 hrs

Loading Dataset: wikitext-2-raw-v1
Loading Model From: None
Model Type Mentioned: torch
Samples for Inference: 1
Batch Size: 1
Context Length: 2048
Dataset Prompt Length: 1
Dataset Stride: 1024
Overall Loss: 1.3908106088638306
Perplexity: 4.018105983734131
Total time for evaluation: 0.049849354558520846 hrs

*******************************************************
Torch Original Perplexity: 4.018105983734131
Target Perplexity for FP16 Precision: 4.02212381362915
Target Perplexity for MXFP6/MXINT8 Precision: 4.058287143707275
*******************************************************

@quic-rishinr quic-rishinr marked this pull request as ready for review June 9, 2025 04:20
Copy link
Contributor

@quic-amitraj quic-amitraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all the training codes and not needed methods from classes.

num_key_value_heads: Optional[int] = None,
max_position_embeddings: int = 2048,
initializer_range: float = 0.02,
rms_norm_eps: float = 1e-6,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything new you are doing here, if not there is no need of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config is being used

)


def _rotate_half(x: torch.Tensor) -> torch.Tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being used in the attention block

@quic-amitraj quic-amitraj marked this pull request as draft June 10, 2025 03:54
@quic-shagun
Copy link
Contributor Author

Remove all the training codes and not needed methods from classes.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants