Skip to content

Add project on Clad LLM Training #282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 39 additions & 2 deletions _data/openprojectlist.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,45 @@
* Extended: To be able to execute on GPU using CUDA or OpenMP
* Optional: Extend the magics for the wasm use case (xeus-cpp-lite)
* Present the work at the relevant meetings and conferences

- name: "Integrate Clad to PyTorch and compare the gradient execution times"

- name: "Enhancing LLM Training with Clad for efficient differentiation"
description: |
This project aims to leverage Clad, an automatic differentiation (AD)
plugin for Clang, to optimize large language model (LLM) training primarily
in C++. Automatic differentiation is a crucial component of deep learning
training, enabling efficient computation of gradients for optimization
algorithms such as stochastic gradient descent (SGD). While most modern LLM
frameworks rely on Python-based ecosystems, their heavy reliance on
interpreted code and dynamic computation graphs can introduce performance
bottlenecks. By integrating Clad into C++-based deep learning pipelines,
we can enable high-performance differentiation at the compiler level,
reducing computational overhead and improving memory efficiency. This will
allow developers to build more optimized training workflows without
sacrificing flexibility or precision.

Beyond performance improvements, integrating Clad with LLM training in C++
opens new possibilities for deploying AI models in resource-constrained
environments, such as embedded systems and HPC clusters, where minimizing
memory footprint and maximizing computational efficiency are critical.
Additionally, this work will bridge the gap between modern deep learning
research and traditional scientific computing by providing a more robust
and scalable AD solution for physics-informed machine learning models. By
optimizing the differentiation process at the compiler level, this project
has the potential to enhance both research and production-level AI
applications, aligning with compiler-research.org's broader goal of
advancing computational techniques for scientific discovery.

tasks: |
* Develop a simplified LLM setup in C++
* Apply Clad to compute gradients for selected layers and loss functions
* Enhance clad to support it if necessary, and prepare performance benchmarks
* Enhance the LLM complexity to cover larger projects such as llama
* Repeat bugfixing and benchmarks
* Develop tests to ensure correctness, numerical stability, and efficiency
* Document the approach, implementation details, and performance gains
* Present progress and findings at relevant meetings and conferences

- name: "Integrate Clad in PyTorch and compare the gradient execution times"
description: |
PyTorch is a popular machine learning framework that includes its own
automatic differentiation engine, while Clad is a Clang plugin for
Expand Down