-
Notifications
You must be signed in to change notification settings - Fork 28.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
push_to_hub()
for Llama 3.1 8B doesn't save lm_head.weight
tensor
#37303
Comments
If the weights are tied, we don't save lm head. When loading the model, an |
Hi! @wizeng23 , just as @zucchini-nlp mentioned, sometimes tied weights is the reason that lm_head cannot be found in However, for your issue, it seems tied weights is not the cause. You are loading llama 3.1 8B model with import os
import torch
import transformers
model_path = "meta-llama/Llama-3.1-8B"
llama_model = transformers.AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16)
# check if lm_head is in the named_parameters of LlamaForCausalLM
print(any("lm_head" in name for name, _ in llama_model.named_parameters()))
print(type(llama_model))
# save LlamaModel and check the size of the safetensors
llama_model.save_pretrained('./ckpt/llama_model')
print('model-00004-of-00004.safetensors size: {:.2e} bytes'.format(
os.path.getsize('./ckpt/llama_model/model-00004-of-00004.safetensors')
))
llama_for_causal_lm = transformers.AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16)
# check if lm_head is in the named_parameters of LlamaForCausalLM
print(any("lm_head" in name for name, _ in llama_for_causal_lm.named_parameters()))
print([name for name, _ in llama_for_causal_lm.named_parameters() if "lm_head" in name])
# check if lm_head is in the state_dict of LlamaForCausalLM
print(any("lm_head" in key for key in llama_for_causal_lm.state_dict().keys()))
print([key for key in llama_for_causal_lm.state_dict().keys() if "lm_head" in key])
# save LlamaForCausalLM and check the size of the safetensors
llama_for_causal_lm.save_pretrained('./ckpt/llama_for_causal_lm')
print('model-00004-of-00004.safetensors size: {:.2e} bytes'.format(
os.path.getsize('./ckpt/llama_for_causal_lm/model-00004-of-00004.safetensors')
))
# load LlamaModel and LlamaForCausalLM from the save path
llama_model = transformers.AutoModel.from_pretrained('./ckpt/llama_model', torch_dtype=torch.bfloat16)
llama_for_causal_lm = transformers.AutoModelForCausalLM.from_pretrained('./ckpt/llama_model', torch_dtype=torch.bfloat16) Expected ResultsLlamaModel
LlamaForCausalLM
|
Thanks for your answer @Zephyr271828 ! To summarize, this is user error on my part, and I should be using |
Glad it helps! |
System Info
transformers
version: 4.49.0Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
I'd expect the model weights to be completely unchanged when saving the model. However, it seems the
lm_head.weight
is not saved at all.model-00004-of-00004.safetensors
for Llama 3.1 8B 1.17GB, while in the saved model, it's 117MB: https://huggingface.co/wizeng23/Llama-test/tree/main. I checked the save tensor file, and the only difference is the missing lm head tensor (shape [128256, 4096]); this is 500M params, which seems to fully account for the missing size.The text was updated successfully, but these errors were encountered: