Skip to content

Commit c56f4c7

Browse files
authored
Merge pull request #2 from NVIDIA/release/10.4
pull 10.4
2 parents 4575799 + 6cad70d commit c56f4c7

File tree

155 files changed

+11543
-9888
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

155 files changed

+11543
-9888
lines changed

.clang-format

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ SpacesInContainerLiterals: true
7474
SpacesInParentheses: false
7575
SpacesInSquareBrackets: false
7676
Standard: Cpp11
77-
StatementMacros: [API_ENTRY_TRY]
77+
StatementMacros: [API_ENTRY_TRY,TRT_TRY]
7878
TabWidth: 4
7979
UseTab: Never
8080
...

CHANGELOG.md

Lines changed: 49 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,54 @@
11
# TensorRT OSS Release Changelog
22

3-
## 10.2.0 GA - 2024-07-10
3+
## 10.4.0 GA - 2024-09-11
4+
Key Features and Updates:
5+
6+
- Demo changes
7+
- Added [Stable Cascade](demo/Diffusion) pipeline.
8+
- Enabled INT8 and FP8 quantization for Stable Diffusion v1.5, v2.0 and v2.1 pipelines.
9+
- Enabled FP8 quantization for Stable Diffusion XL pipeline.
10+
- Sample changes
11+
- Add a new python sample `aliased_io_plugin` which demonstrates how in-place updates to plugin inputs can be achieved through I/O aliasing.
12+
- Plugin changes
13+
- Migrated IPluginV2-descendent versions (a) of the following plugins to newer versions (b) which implement IPluginV3 (a->b):
14+
- scatterElementsPlugin (1->2)
15+
- skipLayerNormPlugin (1->5, 2->6, 3->7, 4->8)
16+
- embLayerNormPlugin (2->4, 3->5)
17+
- bertQKVToContextPlugin (1->4, 2->5, 3->6)
18+
- Note
19+
- The newer versions preserve the corresponding attributes and I/O of the corresponding older plugin version.
20+
- The older plugin versions are deprecated and will be removed in a future release.
21+
22+
- Quickstart guide
23+
- Updated deploy_to_triton guide and removed legacy APIs.
24+
- Removed legacy TF-TRT code as the project is no longer supported.
25+
- Removed quantization_tutorial as pytorch_quantization has been deprecated. Check out https://github.com/NVIDIA/TensorRT-Model-Optimizer for the latest quantization support. Check [Stable Diffusion XL (Base/Turbo) and Stable Diffusion 1.5 Quantization with Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/diffusers/quantization) for integration with TensorRT.
26+
- Parser changes
27+
- Added support for tensor `axes` for `Pad` operations.
28+
- Added support for `BlackmanWindow`, `HammingWindow`, and `HannWindow` operations.
29+
- Improved error handling in `IParserRefitter`.
30+
- Fixed kernel shape inference in multi-input convolutions.
31+
32+
- Updated tooling
33+
- polygraphy-extension-trtexec v0.0.9
34+
35+
## 10.3.0 GA - 2024-08-02
36+
37+
Key Features and Updates:
38+
39+
- Demo changes
40+
- Added [Stable Video Diffusion](demo/Diffusion)(`SVD`) pipeline.
41+
- Plugin changes
42+
- Deprecated Version 1 of [ScatterElements plugin](plugin/scatterElementsPlugin). It is superseded by Version 2, which implements the `IPluginV3` interface.
43+
- Quickstart guide
44+
- Updated the [SemanticSegmentation](quickstart/SemanticSegmentation) guide with latest APIs.
45+
- Parser changes
46+
- Added support for tensor `axes` inputs for `Slice` node.
47+
- Updated `ScatterElements` importer to use Version 2 of [ScatterElements plugin](plugin/scatterElementsPlugin), which implements the `IPluginV3` interface.
48+
- Updated tooling
49+
- Polygraphy v0.49.13
50+
51+
## 10.2.0 GA - 2024-07-09
452

553
Key Features and Updates:
654

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ option(BUILD_PARSERS "Build TensorRT parsers" ON)
8080
option(BUILD_SAMPLES "Build TensorRT samples" ON)
8181

8282
# C++14
83-
set(CMAKE_CXX_STANDARD 14)
83+
set(CMAKE_CXX_STANDARD 17)
8484
set(CMAKE_CXX_STANDARD_REQUIRED ON)
8585
set(CMAKE_CXX_EXTENSIONS OFF)
8686

LICENSE

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,10 +337,11 @@
337337
limitations under the License.
338338

339339
> demo/Diffusion/utilities.py
340+
> demo/Diffusion/stable_video_diffusion_pipeline.py
340341

341342
HuggingFace diffusers library.
342343

343-
Copyright 2022 The HuggingFace Team.
344+
Copyright 2024 The HuggingFace Team.
344345

345346
Licensed under the Apache License, Version 2.0 (the "License");
346347
you may not use this file except in compliance with the License.
@@ -380,3 +381,21 @@
380381
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
381382
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
382383
SOFTWARE.
384+
385+
> demo/Diffusion/utilities.py
386+
387+
ModelScope library.
388+
389+
Copyright (c) Alibaba, Inc. and its affiliates.
390+
391+
Licensed under the Apache License, Version 2.0 (the "License");
392+
you may not use this file except in compliance with the License.
393+
You may obtain a copy of the License at
394+
395+
http://www.apache.org/licenses/LICENSE-2.0
396+
397+
Unless required by applicable law or agreed to in writing, software
398+
distributed under the License is distributed on an "AS IS" BASIS,
399+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
400+
See the License for the specific language governing permissions and
401+
limitations under the License.

README.md

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@ You can skip the **Build** section to enjoy TensorRT with Python.
2626
To build the TensorRT-OSS components, you will first need the following software packages.
2727

2828
**TensorRT GA build**
29-
* TensorRT v10.2.0.19
29+
* TensorRT v10.4.0.26
3030
* Available from direct download links listed below
3131

3232
**System Packages**
3333
* [CUDA](https://developer.nvidia.com/cuda-toolkit)
3434
* Recommended versions:
35-
* cuda-12.5.0 + cuDNN-8.9
35+
* cuda-12.6.0 + cuDNN-8.9
3636
* cuda-11.8.0 + cuDNN-8.9
3737
* [GNU make](https://ftp.gnu.org/gnu/make/) >= v4.1
3838
* [cmake](https://github.com/Kitware/CMake/releases) >= v3.13
@@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
7373
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
7474

7575
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
76-
- [TensorRT 10.2.0.19 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77-
- [TensorRT 10.2.0.19 for CUDA 12.5, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/tars/TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz)
78-
- [TensorRT 10.2.0.19 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-11.8.zip)
79-
- [TensorRT 10.2.0.19 for CUDA 12.5, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.2.0/zip/TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip)
76+
- [TensorRT 10.4.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/tars/TensorRT-10.4.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77+
- [TensorRT 10.4.0.26 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/tars/TensorRT-10.4.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz)
78+
- [TensorRT 10.4.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/zip/TensorRT-10.4.0.26.Windows.win10.cuda-11.8.zip)
79+
- [TensorRT 10.4.0.26 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.4.0/zip/TensorRT-10.4.0.26.Windows.win10.cuda-12.6.zip)
8080

8181

82-
**Example: Ubuntu 20.04 on x86-64 with cuda-12.5**
82+
**Example: Ubuntu 20.04 on x86-64 with cuda-12.6**
8383

8484
```bash
8585
cd ~/Downloads
86-
tar -xvzf TensorRT-10.2.0.19.Linux.x86_64-gnu.cuda-12.5.tar.gz
87-
export TRT_LIBPATH=`pwd`/TensorRT-10.2.0.19
86+
tar -xvzf TensorRT-10.4.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
87+
export TRT_LIBPATH=`pwd`/TensorRT-10.4.0.26
8888
```
8989

90-
**Example: Windows on x86-64 with cuda-12.5**
90+
**Example: Windows on x86-64 with cuda-12.6**
9191

9292
```powershell
93-
Expand-Archive -Path TensorRT-10.2.0.19.Windows.win10.cuda-12.5.zip
94-
$env:TRT_LIBPATH="$pwd\TensorRT-10.2.0.19\lib"
93+
Expand-Archive -Path TensorRT-10.4.0.26.Windows.win10.cuda-12.6.zip
94+
$env:TRT_LIBPATH="$pwd\TensorRT-10.4.0.26\lib"
9595
```
9696

9797
## Setting Up The Build Environment
@@ -101,27 +101,27 @@ For Linux platforms, we recommend that you generate a docker container for build
101101
1. #### Generate the TensorRT-OSS build container.
102102
The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build scripts. The build containers are configured for building TensorRT OSS out-of-the-box.
103103

104-
**Example: Ubuntu 20.04 on x86-64 with cuda-12.5 (default)**
104+
**Example: Ubuntu 20.04 on x86-64 with cuda-12.6 (default)**
105105
```bash
106-
./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.5
106+
./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.6
107107
```
108-
**Example: Rockylinux8 on x86-64 with cuda-12.5**
108+
**Example: Rockylinux8 on x86-64 with cuda-12.6**
109109
```bash
110-
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.5
110+
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.6
111111
```
112-
**Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.5 (JetPack SDK)**
112+
**Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.6 (JetPack SDK)**
113113
```bash
114-
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.5
114+
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.6
115115
```
116-
**Example: Ubuntu 22.04 on aarch64 with cuda-12.5**
116+
**Example: Ubuntu 22.04 on aarch64 with cuda-12.6**
117117
```bash
118-
./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.5
118+
./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.6
119119
```
120120

121121
2. #### Launch the TensorRT-OSS build container.
122122
**Example: Ubuntu 20.04 build container**
123123
```bash
124-
./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.5 --gpus all
124+
./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.6 --gpus all
125125
```
126126
> NOTE:
127127
<br> 1. Use the `--tag` corresponding to build container generated in Step 1.
@@ -132,38 +132,38 @@ For Linux platforms, we recommend that you generate a docker container for build
132132
## Building TensorRT-OSS
133133
* Generate Makefiles and build.
134134

135-
**Example: Linux (x86-64) build with default cuda-12.5**
135+
**Example: Linux (x86-64) build with default cuda-12.6**
136136
```bash
137137
cd $TRT_OSSPATH
138138
mkdir -p build && cd build
139139
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out
140140
make -j$(nproc)
141141
```
142-
**Example: Linux (aarch64) build with default cuda-12.5**
142+
**Example: Linux (aarch64) build with default cuda-12.6**
143143
```bash
144144
cd $TRT_OSSPATH
145145
mkdir -p build && cd build
146146
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64-native.toolchain
147147
make -j$(nproc)
148148
```
149-
**Example: Native build on Jetson (aarch64) with cuda-12.5**
149+
**Example: Native build on Jetson (aarch64) with cuda-12.6**
150150
```bash
151151
cd $TRT_OSSPATH
152152
mkdir -p build && cd build
153-
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.5
153+
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.6
154154
CC=/usr/bin/gcc make -j$(nproc)
155155
```
156156
> NOTE: C compiler must be explicitly specified via CC= for native aarch64 builds of protobuf.
157157

158-
**Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.5 (JetPack)**
158+
**Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.6 (JetPack)**
159159
```bash
160160
cd $TRT_OSSPATH
161161
mkdir -p build && cd build
162-
cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.5 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
162+
cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.6 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
163163
make -j$(nproc)
164164
```
165165

166-
**Example: Native builds on Windows (x86) with cuda-12.5**
166+
**Example: Native builds on Windows (x86) with cuda-12.6**
167167
```powershell
168168
cd $TRT_OSSPATH
169169
mkdir -p build

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
10.2.0.19
1+
10.4.0.26

0 commit comments

Comments
 (0)