torchtune training fails to validate dataset #1849

booxter · 2025-03-31T21:31:10Z

System Info

.

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

When I try to use torchtune for post-training, it no longer works and fails with:

AttributeError: 'DatasetWithACL' object has no attribute 'dataset_schema'

This happens during dataset validation.

Error executing endpoint route=\'/v1/post-training/supervised-fine-tune\' method=\'post\'\nTraceback (most recent call last):\n  File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 201, in endpoint\n    return await maybe_await(value)\n           ^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/ec2-user/src/llama-stack/schedule/llama_stack/distribution/server/server.py", line 161, in maybe_await\n    return await value\n           ^^^^^^^^^^^\n  File "/home/ec2-user/src/llama-stack/schedule/llama_stack/providers/inline/post_training/torchtune/post_training.py", line 89, in supervised_fine_tune\n    await recipe.setup()\n  File "/home/ec2-user/src/llama-stack/schedule/llama_stack/providers/inline/post_training/torchtune/recipes/lora_finetuning_single_device.py", line 198, in setup\n    self._training_sampler, self._training_dataloader = await self._setup_data(\n                                                        ^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/ec2-user/src/llama-stack/schedule/llama_stack/providers/inline/post_training/torchtune/recipes/lora_finetuning_single_device.py", line 342, in _setup_data\n    await validate_input_dataset_schema(\n  File "/home/ec2-user/src/llama-stack/schedule/llama_stack/providers/inline/post_training/common/validator.py", line 50, in validate_input_dataset_schema\n    if not dataset_def.dataset_schema or len(dataset_def.dataset_schema) == 0:\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/ec2-user/src/llama-stack/schedule/venv/lib64/python3.11/site-packages/pydantic/main.py", line 984, in __getattr__\n    raise AttributeError(f\'{type(self).__name__!r} object has no attribute {item!r}\')\nAttributeError: \'DatasetWithACL\' object has no attribute \'dataset_schema\'' severity=<LogSeverity.ERROR: 'error'>

This happens since dataset API was changed and the dataset_schema field removed: #1573

Error logs

.

Expected behavior

.

The text was updated successfully, but these errors were encountered:

booxter · 2025-03-31T21:31:37Z

@yanxi0830 FYI post-training is broken after datasets API changed

booxter added the bug Something isn't working label Mar 31, 2025

booxter mentioned this issue Apr 1, 2025

feat(api): define a more coherent jobs api across different flows #1772

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchtune training fails to validate dataset #1849

torchtune training fails to validate dataset #1849

booxter commented Mar 31, 2025

booxter commented Mar 31, 2025

torchtune training fails to validate dataset #1849

torchtune training fails to validate dataset #1849

Comments

booxter commented Mar 31, 2025

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

booxter commented Mar 31, 2025