OpenDocCN
diff --git a/‎docs/cogvideo-finetune-merge-00.md
+4,177 b/‎docs/cogvideo-finetune-merge-00.md
+4,177
diff --git a/‎docs/cogvideo-finetune-merge-01.md
+2,587 b/‎docs/cogvideo-finetune-merge-01.md
+2,587
diff --git a/‎docs/cogvideo-finetune-merge-02.md
+2,841 b/‎docs/cogvideo-finetune-merge-02.md
+2,841
diff --git a/‎docs/cogvideo-finetune-merge-03.md
+2,587 b/‎docs/cogvideo-finetune-merge-03.md
+2,587
diff --git a/‎docs/cogvideo-finetune-merge-04.md
+3,256 b/‎docs/cogvideo-finetune-merge-04.md
+3,256
diff --git a/‎docs/cogvideo-finetune-merge-05.md
+3,173 b/‎docs/cogvideo-finetune-merge-05.md
+3,173
diff --git a/‎docs/cogvideo-finetune-merge-06.md
+2,574 b/‎docs/cogvideo-finetune-merge-06.md
+2,574
diff --git a/‎docs/cogvideo-finetune-merge-07.md
+2,543 b/‎docs/cogvideo-finetune-merge-07.md
+2,543
diff --git a/‎docs/cogvideo-finetune-merge-08.md
+2,635 b/‎docs/cogvideo-finetune-merge-08.md
+2,635
diff --git a/‎docs/cogvideo-finetune-merge-09.md
+2,650 b/‎docs/cogvideo-finetune-merge-09.md
+2,650
diff --git a/‎docs/cogvideo-finetune-merge-10.md
+2,571 b/‎docs/cogvideo-finetune-merge-10.md
+2,571
diff --git a/‎docs/cogvideo-finetune-merge-11.md
+2,981 b/‎docs/cogvideo-finetune-merge-11.md
+2,981
diff --git a/‎docs/cogvideo-finetune-merge-12.md
+2,339 b/‎docs/cogvideo-finetune-merge-12.md
+2,339
diff --git a/‎docs/cogview3-finetune/README.md
+6-6 b/‎docs/cogview3-finetune/README.md
+6-6
diff --git a/‎docs/cogview3-finetune/README_zh.md
+6-6 b/‎docs/cogview3-finetune/README_zh.md
+6-6
diff --git a/‎docs/cogview3-finetune/inference----cli_demo.py.md
+1-1 b/‎docs/cogview3-finetune/inference----cli_demo.py.md
+1-1
diff --git a/‎docs/cogview3-finetune/inference----gradio_web_demo.py.md
+1-1 b/‎docs/cogview3-finetune/inference----gradio_web_demo.py.md
+1-1
diff --git a/‎docs/cogview3-finetune/prompt_optimize.py.md
+1-1 b/‎docs/cogview3-finetune/prompt_optimize.py.md
+1-1
diff --git a/‎docs/cogview3-finetune/resources----contribute.md
+3-3 b/‎docs/cogview3-finetune/resources----contribute.md
+3-3
diff --git a/‎docs/cogview3-finetune/resources----contribute_zh.md
+3-3 b/‎docs/cogview3-finetune/resources----contribute_zh.md
+3-3
diff --git a/‎docs/cogview3-finetune/sat----README.md
+11-11 b/‎docs/cogview3-finetune/sat----README.md
+11-11
diff --git a/‎docs/cogview3-finetune/sat----README_zh.md
+11-11 b/‎docs/cogview3-finetune/sat----README_zh.md
+11-11
diff --git a/‎docs/cogview3-finetune/sat----arguments.py.md
+1-1 b/‎docs/cogview3-finetune/sat----arguments.py.md
+1-1
diff --git a/‎docs/cogview3-finetune/sat----diffusion.py.md
+1-1 b/‎docs/cogview3-finetune/sat----diffusion.py.md
+1-1
diff --git a/‎docs/cogview3-finetune/sat----sample_dit.py.md
+1-1 b/‎docs/cogview3-finetune/sat----sample_dit.py.md
+1-1
diff --git a/‎docs/cogview3-finetune/sat----sample_unet.py.md
+1-1 b/‎docs/cogview3-finetune/sat----sample_unet.py.md
+1-1
diff --git a/‎docs/cogview3-finetune/sat----sgm----__init__.py.md
+1-1 b/‎docs/cogview3-finetune/sat----sgm----__init__.py.md
+1-1
diff --git a/‎docs/cogview3-finetune/sat----sgm----models----__init__.py.md
+1-1 b/‎docs/cogview3-finetune/sat----sgm----models----__init__.py.md
+1-1
@@ -21,8 +21,8 @@ Experience the CogView3-Plus-3B model online on <a href="https://huggingface.co/
 
 ## Project Updates
 
-- 🔥🔥 ```2024/10/13```: We have adapted and open-sourced the **CogView-3Plus-3B** model in the [diffusers](https://github.com/huggingface/diffusers) version. You can [experience it online](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView3-Plus-3B-Space).
-- 🔥 ```2024/9/29```: We have open-sourced **CogView3** and **CogView-3Plus-3B**. **CogView3** is a text-to-image system based on cascaded diffusion, utilizing a relay diffusion framework. **CogView-3Plus** is a series of newly developed text-to-image models based on Diffusion Transformers.
+- 🔥🔥 ```py/10/13```: We have adapted and open-sourced the **CogView-3Plus-3B** model in the [diffusers](https://github.com/huggingface/diffusers) version. You can [experience it online](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView3-Plus-3B-Space).
+- 🔥 ```py/9/29```: We have open-sourced **CogView3** and **CogView-3Plus-3B**. **CogView3** is a text-to-image system based on cascaded diffusion, utilizing a relay diffusion framework. **CogView-3Plus** is a series of newly developed text-to-image models based on Diffusion Transformers.
 
 ## Model Introduction
 
@@ -108,20 +108,20 @@ large language models (LLMs) before generating text-to-image, as this will signi
 
 We provide an [example script](prompt_optimize.py). We suggest running this script to refine the prompt:
 
-```shell
+```py
 python prompt_optimize.py --api_key "Zhipu AI API Key" --prompt {your prompt} --base_url "https://open.bigmodel.cn/api/paas/v4" --model "glm-4-plus"
 ```
 
 ### Inference Model (Diffusers)
 
 First, ensure the `diffusers` library is installed **from source**. 
-```
+```py
 pip install git+https://github.com/huggingface/diffusers.git
 ```
 
 Then, run the following code:
 
-```python
+```py
 from diffusers import CogView3PlusPipeline
 import torch
 
@@ -182,7 +182,7 @@ Comparison results from human evaluations:
 
 🌟 If you find our work helpful, feel free to cite our paper and leave a star.
 
-```
+```py
 @article{zheng2024cogview3,
   title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
   author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
 
@@ -20,9 +20,9 @@
 
 ## 项目更新
 
-- 🔥🔥 ```2024/10/13```: 我们适配和开源了 [diffusers](https://github.com/huggingface/diffusers) 版本的  **CogView-3Plus-3B**
+- 🔥🔥 ```py/10/13```: 我们适配和开源了 [diffusers](https://github.com/huggingface/diffusers) 版本的  **CogView-3Plus-3B**
   模型。你可以前往[在线体验](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView3-Plus-3B-Space)。
-- 🔥 ```2024/9/29```: 我们已经开源了 **CogView3**  以及 **CogView-3Plus-3B** 。**CogView3** 是一个基于级联扩散的文本生成图像系统，采用了接力扩散框架。
+- 🔥 ```py/9/29```: 我们已经开源了 **CogView3**  以及 **CogView-3Plus-3B** 。**CogView3** 是一个基于级联扩散的文本生成图像系统，采用了接力扩散框架。
   **CogView-3Plus** 是一系列新开发的基 Diffusion Transformer 的文本生成图像模型。
 
 ## 模型介绍
@@ -101,20 +101,20 @@ Zero-SNR
 
 我们提供了一个 [示例脚本](prompt_optimize.py)。我们建议您运行这个脚本，以实现对提示词对润色
 
-```shell
+```py
 python prompt_optimize.py --api_key "智谱AI API Key" --prompt {你的提示词} --base_url "https://open.bigmodel.cn/api/paas/v4" --model "glm-4-plus"
 ```
 
 ### 推理模型(Diffusers)
 
 首先，确保从源代码安装`diffusers`库。
 
-```shell
+```py
 pip install git+https://github.com/huggingface/diffusers.git
 ```
 接着，运行以下代码：
 
-```python
+```py
 from diffusers import CogView3PlusPipeline
 import torch
 
@@ -171,7 +171,7 @@ CogView3 是一种新颖的文本生成图像系统，采用了接力扩散的
 
 🌟 如果您发现我们的工作有所帮助，欢迎引用我们的文章，留下宝贵的stars
 
-```
+```py
 @article{zheng2024cogview3,
   title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
   author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},
 
@@ -1,6 +1,6 @@
 # `.\cogview3-finetune\inference\cli_demo.py`
 
-```
+```py
 # 该脚本演示如何使用 Hugging Face 的 `diffusers` 管道生成图像
 """
 This script demonstrates how to generate an image using the CogView3-Plus-3B model with the Hugging Face `diffusers` pipeline.
 
@@ -1,6 +1,6 @@
 # `.\cogview3-finetune\inference\gradio_web_demo.py`
 
-```
+```py
 # 主文件用于 Gradio 网络演示，使用 CogView3-Plus-3B 模型生成图像
 """
 THis is the main file for the gradio web demo. It uses the CogView3-Plus-3B model to generate images gradio web demo.
 
@@ -1,6 +1,6 @@
 # `.\cogview3-finetune\prompt_optimize.py`
 
-```
+```py
 # 导入正则表达式模块
 import re
 # 导入命令行参数解析模块
 
@@ -25,19 +25,19 @@ style. You can organize the code according to the following specifications:
 
 1. Install the `ruff` tool
 
-```shell
+```py
 pip install ruff
 ```
 
 Then, run the `ruff` tool
 
-```shell
+```py
 ruff check tools sat inference
 ```
 
 Check the code style. If there are issues, you can automatically fix them using the `ruff format` command.
 
-```shell
+```py
 ruff format tools sat inference
 ```
 
 
@@ -15,19 +15,19 @@
 
 1. 安装`ruff`工具
 
-```shell
+```py
 pip install ruff
 ```
 
 接着，运行`ruff`工具
 
-```shell
+```py
 ruff check tools sat inference
 ```
 
 检查代码风格，如果有问题，您可以通过`ruff format .`命令自动修复。
 
-```shell
+```py
 ruff format tools sat inference
 ```
 
 
@@ -12,7 +12,7 @@ The code is the framework used by the team during model training. There are few
 
 Ensure you have installed the dependencies required by this folder:
 
-```shell
+```py
 pip install -r requirements.txt
 ```
 
@@ -49,7 +49,7 @@ The following links are for different model weights:
 
 Next, arrange the model files into the following format:
 
-```
+```py
 .cogview3-plus-3b
 ├── transformer
 │   ├── 1
@@ -63,7 +63,7 @@ Clone the T5 model. This model is not used for training or fine-tuning but is ne
 
 Since we have uploaded the T5 model in `safetensors` format in `CogVideoX`, a simple way is to clone the model from the `CogVideoX-2B` model and move it to the corresponding folder.
 
-```shell
+```py
 git clone https://huggingface.co/THUDM/CogVideoX-2b.git
 # git clone https://www.modelscope.cn/ZhipuAI/CogVideoX-2b.git
 mkdir t5-v1_1-xxl
@@ -72,7 +72,7 @@ mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
 
 With this setup, you will have a safetensor format T5 file, ensuring no errors during Deepspeed fine-tuning.
 
-```
+```py
 ├── added_tokens.json
 ├── config.json
 ├── model-00001-of-00002.safetensors
@@ -89,7 +89,7 @@ With this setup, you will have a safetensor format T5 file, ensuring no errors d
 
 Here is an example using `CogView3-Base`, with explanations for some of the parameters:
 
-```yaml
+```py
 args:
   mode: inference
   relay_model: False # Set to True when using CogView-3-Relay
@@ -135,41 +135,41 @@ Different models require different code for inference. Here are the inference co
 
 ### CogView-3Plus
 
-```shell
+```py
 python sample_dit.py --base configs/cogview3_plus.yaml
 ```
 
 ### CogView-3-Base
 
 + Original model
 
-```shell
+```py
 python sample_unet.py --base configs/cogview3_base.yaml
 ```
 
 + Distilled model
 
-```bash
+```py
 python sample_unet.py --base configs/cogview3_base_distill_4step.yaml
 ```
 
 ### CogView-3-Relay
 
 + Original model
 
-```shell
+```py
 python sample_unet.py --base configs/cogview3_relay.yaml
 ```
 
 + Distilled model
 
-```shell
+```py
 python sample_unet.py --base configs/cogview3_relay_distill_1step.yaml 
 ```
 
 The output image format will be a folder. The folder name will consist of the sequence number and the first 15 characters of the prompt, containing multiple images. The number of images is based on the `batch` parameter. The structure should look like this:
 
-```
+```py
 .
 ├── 000000000.png
 ├── 000000001.png
 
@@ -10,7 +10,7 @@
 
 确保你已经正确安装本文件夹中的要求的依赖
 
-```shell
+```py
 pip install -r requirements.txt
 ```
 
@@ -47,7 +47,7 @@ pip install -r requirements.txt
 
 接着，你需要将模型文件排版成如下格式：
 
-```
+```py
 .cogview3-plus-3b
 ├── transformer
 │   ├── 1
@@ -62,7 +62,7 @@ pip install -r requirements.txt
 
 由于我们在`CogVideoX`中上传过 `safetensors` 格式的T5模型，一个简单的办法是从`CogVideX-2B`模型中克隆模型，然后将其移动到对应的文件夹中。
 
-```shell
+```py
 git clone https://huggingface.co/THUDM/CogVideoX-2b.git #从huggingface下载模型
 # git clone https://www.modelscope.cn/ZhipuAI/CogVideoX-2b.git #从modelscope下载模型
 mkdir t5-v1_1-xxl
@@ -71,7 +71,7 @@ mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
 
 通过上述方案，你将会得到一个 safetensor 格式的T5文件，确保在 Deepspeed微调过程中读入的时候不会报错。
 
-```
+```py
 ├── added_tokens.json
 ├── config.json
 ├── model-00001-of-00002.safetensors
@@ -88,7 +88,7 @@ mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
 
 这里以`CogView3-Base`为例，提供部分参数的讲解和介绍：
 
-```yaml
+```py
 args:
   mode: inference
   relay_model: False # 当模型类型为 CogView-3-Relay 时，需要将该参数设置为 True
@@ -134,42 +134,42 @@ model:
 
 ### CogView-3Plus
 
-```shell
+```py
   python sample_dit.py --base configs/cogview3_plus.yaml
 ```
 
 ### CogView-3-Base
 
 + 原始模型
 
-```shell
+```py
 python sample_unet.py --base configs/cogview3_base.yaml
 ```
 
 + 蒸馏版本模型
 
-```bash
+```py
 python sample_unet.py --base configs/cogview3_base_distill_4step.yaml
 ```
 
 ### CogView-3-Relay
 
 + 原始模型
 
-```shell
+```py
 python sample_unet.py --base configs/cogview3_relay.yaml
 ```
 
 + 蒸馏版本模型
 
-```shell
+```py
 python sample_unet.py --base configs/cogview3_relay_distill_1step.yaml 
 ```
 
 输出图片格式为文件夹，其中，文件夹的名字为生成的序号加提示词的前15个字母，文件夹中包含多张图片，具体数量以 `batch` 参数为准。
 其结构应该如下：
 
-```
+```py
 .
 ├── 000000000.png
 ├── 000000001.png
 
@@ -1,6 +1,6 @@
 # `.\cogview3-finetune\sat\arguments.py`
 
-```
+```py
 # 导入所需的库
 import argparse  # 处理命令行参数
 import os  # 与操作系统交互的功能
 
@@ -1,6 +1,6 @@
 # `.\cogview3-finetune\sat\diffusion.py`
 
-```
+```py
 # 导入数学库以进行数学运算
 import math
 # 从 typing 模块导入类型提示相关的类
 
@@ -1,6 +1,6 @@
 # `.\cogview3-finetune\sat\sample_dit.py`
 
-```
+```py
 # 导入操作系统模块，用于与操作系统交互
 import os
 # 导入数学模块，提供数学函数和常量
 
@@ -1,6 +1,6 @@
 # `.\cogview3-finetune\sat\sample_unet.py`
 
-```
+```py
 # 导入操作系统模块
 import os
 # 导入数学模块
 
@@ -1,6 +1,6 @@
 # `.\cogview3-finetune\sat\sgm\__init__.py`
 
-```
+```py
 # 从当前模块导入 AutoencodingEngine 类
 from .models import AutoencodingEngine
 # 从当前模块导入获取配置路径和根据配置实例化对象的工具函数
 
@@ -1,6 +1,6 @@
 # `.\cogview3-finetune\sat\sgm\models\__init__.py`
 
-```
+```py
 # 从同一模块导入 AutoencodingEngine 类，用于后续的自动编码器操作
 from .autoencoder import AutoencodingEngine