`push_to_hub()` for Llama 3.1 8B doesn't save `lm_head.weight` tensor #37303

wizeng23 · 2025-04-05T07:24:53Z

System Info

transformers version: 4.49.0
Platform: Linux-6.8.0-1015-gcp-x86_64-with-glibc2.35
Python version: 3.10.13
Huggingface_hub version: 0.30.1
Safetensors version: 0.5.3
Accelerate version: 1.2.1
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (GPU?): 2.5.1+cu124 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA A100-SXM4-40GB

Who can help?

@ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import torch
import transformers
model = transformers.AutoModel.from_pretrained("meta-llama/Llama-3.1-8B", torch_dtype=torch.bfloat16)
model.push_to_hub('wizeng23/Llama-test')
tokenizer = transformers.AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B")
tokenizer.push_to_hub('wizeng23/Llama-test')

Expected behavior

I'd expect the model weights to be completely unchanged when saving the model. However, it seems the lm_head.weight is not saved at all. model-00004-of-00004.safetensors for Llama 3.1 8B 1.17GB, while in the saved model, it's 117MB: https://huggingface.co/wizeng23/Llama-test/tree/main. I checked the save tensor file, and the only difference is the missing lm head tensor (shape [128256, 4096]); this is 500M params, which seems to fully account for the missing size.

The text was updated successfully, but these errors were encountered:

zucchini-nlp · 2025-04-07T08:43:06Z

If the weights are tied, we don't save lm head. When loading the model, an embed_tokens weight can be used for both: embedding and the head. Is loading back raising warnings like "Some weights are not initialized from ckpt"?

Zephyr271828 · 2025-04-07T18:20:06Z

Hi! @wizeng23 , just as @zucchini-nlp mentioned, sometimes tied weights is the reason that lm_head cannot be found in model.state_dict().keys(). You may check this comment.

However, for your issue, it seems tied weights is not the cause. You are loading llama 3.1 8B model with AutoModel.from_pretrained. As a result, the type of your model is <class 'transformers.models.llama.modeling_llama.LlamaModel'>, which does not have lm_head. You may want to verify with the following code:

import os
import torch
import transformers

model_path = "meta-llama/Llama-3.1-8B"

llama_model = transformers.AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16)

# check if lm_head is in the named_parameters of LlamaForCausalLM
print(any("lm_head" in name for name, _ in llama_model.named_parameters()))
print(type(llama_model))

# save LlamaModel and check the size of the safetensors
llama_model.save_pretrained('./ckpt/llama_model')
print('model-00004-of-00004.safetensors size: {:.2e} bytes'.format(
    os.path.getsize('./ckpt/llama_model/model-00004-of-00004.safetensors')
))

llama_for_causal_lm = transformers.AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16)

# check if lm_head is in the named_parameters of LlamaForCausalLM
print(any("lm_head" in name for name, _ in llama_for_causal_lm.named_parameters()))
print([name for name, _ in llama_for_causal_lm.named_parameters() if "lm_head" in name])

# check if lm_head is in the state_dict of LlamaForCausalLM
print(any("lm_head" in key for key in llama_for_causal_lm.state_dict().keys()))
print([key for key in llama_for_causal_lm.state_dict().keys() if "lm_head" in key])

# save LlamaForCausalLM and check the size of the safetensors
llama_for_causal_lm.save_pretrained('./ckpt/llama_for_causal_lm')
print('model-00004-of-00004.safetensors size: {:.2e} bytes'.format(
    os.path.getsize('./ckpt/llama_for_causal_lm/model-00004-of-00004.safetensors')
))

# load LlamaModel and LlamaForCausalLM from the save path
llama_model = transformers.AutoModel.from_pretrained('./ckpt/llama_model', torch_dtype=torch.bfloat16)
llama_for_causal_lm = transformers.AutoModelForCausalLM.from_pretrained('./ckpt/llama_model', torch_dtype=torch.bfloat16)

Expected Results

LlamaModel

size of model-00004-of-00004.safetensors is 117MB
does not raise a warning when loading from './ckpt/llama_model'

LlamaForCausalLM

size of model-00004-of-00004.safetensors is 1.17GB
raise a warning when loading from './ckpt/llama_model': Some weights of LlamaForCausalLM were not initialized from the model checkpoint at ./ckpt/llama_model and are newly initialized: ['lm_head.weight']

wizeng23 · 2025-04-07T21:11:31Z

Thanks for your answer @Zephyr271828 ! To summarize, this is user error on my part, and I should be using AutoModelForCausalLM instead of AutoModel.

Zephyr271828 · 2025-04-07T21:35:56Z

Thanks for your answer @Zephyr271828 ! To summarize, this is user error on my part, and I should be using AutoModelForCausalLM instead of AutoModel.

Glad it helps!

wizeng23 added the bug label Apr 5, 2025

wizeng23 closed this as completed Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`push_to_hub()` for Llama 3.1 8B doesn't save `lm_head.weight` tensor #37303

`push_to_hub()` for Llama 3.1 8B doesn't save `lm_head.weight` tensor #37303

wizeng23 commented Apr 5, 2025

zucchini-nlp commented Apr 7, 2025

Zephyr271828 commented Apr 7, 2025

wizeng23 commented Apr 7, 2025

Zephyr271828 commented Apr 7, 2025

push_to_hub() for Llama 3.1 8B doesn't save lm_head.weight tensor #37303

push_to_hub() for Llama 3.1 8B doesn't save lm_head.weight tensor #37303

Comments

wizeng23 commented Apr 5, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

zucchini-nlp commented Apr 7, 2025

Zephyr271828 commented Apr 7, 2025

Expected Results

wizeng23 commented Apr 7, 2025

Zephyr271828 commented Apr 7, 2025

`push_to_hub()` for Llama 3.1 8B doesn't save `lm_head.weight` tensor #37303

`push_to_hub()` for Llama 3.1 8B doesn't save `lm_head.weight` tensor #37303