Update on the development branch #2437
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we have pushed an update to the development branch (and the Triton backend) this Nov 12, 2024.
This update includes:
examples/nemotron
.examples/gpt/README.md
.examples/multimodal/README.md
.trtllm-serve
command to launch a FastAPI based server.examples/prompt_lookup/README.md
.examples/nemotron_nas/README.md
.examples/llama/README.md
.executor
API, see “executorExampleFastLogits” section inexamples/cpp/executor/README.md
.auto
is used as the default value for--dtype
option in quantize and checkpoints conversion scripts.moeTopK()
cannot find the correct expert when the number of experts is not a power of two. Thanks @dongjiyingdjy for reporting this bug.crossKvCacheFraction
. (Assertion failed: Must set crossKvCacheFraction for encoder-decoder model #2419)docs/source/performance/perf-benchmarking.md
, thanks @MARD1NO for pointing it out in Small Typo #2425.nvcr.io/nvidia/pytorch:24.10-py3
.nvcr.io/nvidia/tritonserver:24.10-py3
.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions