Skip to content

Commit 28836a6

Browse files
committed
2024-10-22 14:30:52
1 parent bc78925 commit 28836a6

File tree

62 files changed

+36997
-83
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+36997
-83
lines changed

docs/cogvideo-finetune-merge-00.md

+4,177
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-01.md

+2,587
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-02.md

+2,841
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-03.md

+2,587
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-04.md

+3,256
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-05.md

+3,173
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-06.md

+2,574
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-07.md

+2,543
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-08.md

+2,635
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-09.md

+2,650
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-10.md

+2,571
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-11.md

+2,981
Large diffs are not rendered by default.

docs/cogvideo-finetune-merge-12.md

+2,339
Large diffs are not rendered by default.

docs/cogview3-finetune/README.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ Experience the CogView3-Plus-3B model online on <a href="https://huggingface.co/
2121

2222
## Project Updates
2323

24-
- 🔥🔥 ```2024/10/13```: We have adapted and open-sourced the **CogView-3Plus-3B** model in the [diffusers](https://github.com/huggingface/diffusers) version. You can [experience it online](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView3-Plus-3B-Space).
25-
- 🔥 ```2024/9/29```: We have open-sourced **CogView3** and **CogView-3Plus-3B**. **CogView3** is a text-to-image system based on cascaded diffusion, utilizing a relay diffusion framework. **CogView-3Plus** is a series of newly developed text-to-image models based on Diffusion Transformers.
24+
- 🔥🔥 ```py/10/13```: We have adapted and open-sourced the **CogView-3Plus-3B** model in the [diffusers](https://github.com/huggingface/diffusers) version. You can [experience it online](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView3-Plus-3B-Space).
25+
- 🔥 ```py/9/29```: We have open-sourced **CogView3** and **CogView-3Plus-3B**. **CogView3** is a text-to-image system based on cascaded diffusion, utilizing a relay diffusion framework. **CogView-3Plus** is a series of newly developed text-to-image models based on Diffusion Transformers.
2626

2727
## Model Introduction
2828

@@ -108,20 +108,20 @@ large language models (LLMs) before generating text-to-image, as this will signi
108108

109109
We provide an [example script](prompt_optimize.py). We suggest running this script to refine the prompt:
110110

111-
```shell
111+
```py
112112
python prompt_optimize.py --api_key "Zhipu AI API Key" --prompt {your prompt} --base_url "https://open.bigmodel.cn/api/paas/v4" --model "glm-4-plus"
113113
```
114114

115115
### Inference Model (Diffusers)
116116

117117
First, ensure the `diffusers` library is installed **from source**.
118-
```
118+
```py
119119
pip install git+https://github.com/huggingface/diffusers.git
120120
```
121121

122122
Then, run the following code:
123123

124-
```python
124+
```py
125125
from diffusers import CogView3PlusPipeline
126126
import torch
127127

@@ -182,7 +182,7 @@ Comparison results from human evaluations:
182182

183183
🌟 If you find our work helpful, feel free to cite our paper and leave a star.
184184

185-
```
185+
```py
186186
@article{zheng2024cogview3,
187187
title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
188188
author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},

docs/cogview3-finetune/README_zh.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,9 @@
2020

2121
## 项目更新
2222

23-
- 🔥🔥 ```2024/10/13```: 我们适配和开源了 [diffusers](https://github.com/huggingface/diffusers) 版本的 **CogView-3Plus-3B**
23+
- 🔥🔥 ```py/10/13```: 我们适配和开源了 [diffusers](https://github.com/huggingface/diffusers) 版本的 **CogView-3Plus-3B**
2424
模型。你可以前往[在线体验](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView3-Plus-3B-Space)
25-
- 🔥 ```2024/9/29```: 我们已经开源了 **CogView3** 以及 **CogView-3Plus-3B****CogView3** 是一个基于级联扩散的文本生成图像系统,采用了接力扩散框架。
25+
- 🔥 ```py/9/29```: 我们已经开源了 **CogView3** 以及 **CogView-3Plus-3B****CogView3** 是一个基于级联扩散的文本生成图像系统,采用了接力扩散框架。
2626
**CogView-3Plus** 是一系列新开发的基 Diffusion Transformer 的文本生成图像模型。
2727

2828
## 模型介绍
@@ -101,20 +101,20 @@ Zero-SNR
101101

102102
我们提供了一个 [示例脚本](prompt_optimize.py)。我们建议您运行这个脚本,以实现对提示词对润色
103103

104-
```shell
104+
```py
105105
python prompt_optimize.py --api_key "智谱AI API Key" --prompt {你的提示词} --base_url "https://open.bigmodel.cn/api/paas/v4" --model "glm-4-plus"
106106
```
107107

108108
### 推理模型(Diffusers)
109109

110110
首先,确保从源代码安装`diffusers`库。
111111

112-
```shell
112+
```py
113113
pip install git+https://github.com/huggingface/diffusers.git
114114
```
115115
接着,运行以下代码:
116116

117-
```python
117+
```py
118118
from diffusers import CogView3PlusPipeline
119119
import torch
120120

@@ -171,7 +171,7 @@ CogView3 是一种新颖的文本生成图像系统,采用了接力扩散的
171171

172172
🌟 如果您发现我们的工作有所帮助,欢迎引用我们的文章,留下宝贵的stars
173173

174-
```
174+
```py
175175
@article{zheng2024cogview3,
176176
title={Cogview3: Finer and faster text-to-image generation via relay diffusion},
177177
author={Zheng, Wendi and Teng, Jiayan and Yang, Zhuoyi and Wang, Weihan and Chen, Jidong and Gu, Xiaotao and Dong, Yuxiao and Ding, Ming and Tang, Jie},

docs/cogview3-finetune/inference----cli_demo.py.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `.\cogview3-finetune\inference\cli_demo.py`
22

3-
```
3+
```py
44
# 该脚本演示如何使用 Hugging Face 的 `diffusers` 管道生成图像
55
"""
66
This script demonstrates how to generate an image using the CogView3-Plus-3B model with the Hugging Face `diffusers` pipeline.

docs/cogview3-finetune/inference----gradio_web_demo.py.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `.\cogview3-finetune\inference\gradio_web_demo.py`
22

3-
```
3+
```py
44
# 主文件用于 Gradio 网络演示,使用 CogView3-Plus-3B 模型生成图像
55
"""
66
THis is the main file for the gradio web demo. It uses the CogView3-Plus-3B model to generate images gradio web demo.

docs/cogview3-finetune/prompt_optimize.py.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `.\cogview3-finetune\prompt_optimize.py`
22

3-
```
3+
```py
44
# 导入正则表达式模块
55
import re
66
# 导入命令行参数解析模块

docs/cogview3-finetune/resources----contribute.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -25,19 +25,19 @@ style. You can organize the code according to the following specifications:
2525

2626
1. Install the `ruff` tool
2727

28-
```shell
28+
```py
2929
pip install ruff
3030
```
3131

3232
Then, run the `ruff` tool
3333

34-
```shell
34+
```py
3535
ruff check tools sat inference
3636
```
3737

3838
Check the code style. If there are issues, you can automatically fix them using the `ruff format` command.
3939

40-
```shell
40+
```py
4141
ruff format tools sat inference
4242
```
4343

docs/cogview3-finetune/resources----contribute_zh.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -15,19 +15,19 @@
1515

1616
1. 安装`ruff`工具
1717

18-
```shell
18+
```py
1919
pip install ruff
2020
```
2121

2222
接着,运行`ruff`工具
2323

24-
```shell
24+
```py
2525
ruff check tools sat inference
2626
```
2727

2828
检查代码风格,如果有问题,您可以通过`ruff format .`命令自动修复。
2929

30-
```shell
30+
```py
3131
ruff format tools sat inference
3232
```
3333

docs/cogview3-finetune/sat----README.md

+11-11
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The code is the framework used by the team during model training. There are few
1212

1313
Ensure you have installed the dependencies required by this folder:
1414

15-
```shell
15+
```py
1616
pip install -r requirements.txt
1717
```
1818

@@ -49,7 +49,7 @@ The following links are for different model weights:
4949

5050
Next, arrange the model files into the following format:
5151

52-
```
52+
```py
5353
.cogview3-plus-3b
5454
├── transformer
5555
│ ├── 1
@@ -63,7 +63,7 @@ Clone the T5 model. This model is not used for training or fine-tuning but is ne
6363

6464
Since we have uploaded the T5 model in `safetensors` format in `CogVideoX`, a simple way is to clone the model from the `CogVideoX-2B` model and move it to the corresponding folder.
6565

66-
```shell
66+
```py
6767
git clone https://huggingface.co/THUDM/CogVideoX-2b.git
6868
# git clone https://www.modelscope.cn/ZhipuAI/CogVideoX-2b.git
6969
mkdir t5-v1_1-xxl
@@ -72,7 +72,7 @@ mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
7272

7373
With this setup, you will have a safetensor format T5 file, ensuring no errors during Deepspeed fine-tuning.
7474

75-
```
75+
```py
7676
├── added_tokens.json
7777
├── config.json
7878
├── model-00001-of-00002.safetensors
@@ -89,7 +89,7 @@ With this setup, you will have a safetensor format T5 file, ensuring no errors d
8989

9090
Here is an example using `CogView3-Base`, with explanations for some of the parameters:
9191

92-
```yaml
92+
```py
9393
args:
9494
mode: inference
9595
relay_model: False # Set to True when using CogView-3-Relay
@@ -135,41 +135,41 @@ Different models require different code for inference. Here are the inference co
135135

136136
### CogView-3Plus
137137

138-
```shell
138+
```py
139139
python sample_dit.py --base configs/cogview3_plus.yaml
140140
```
141141

142142
### CogView-3-Base
143143

144144
+ Original model
145145

146-
```shell
146+
```py
147147
python sample_unet.py --base configs/cogview3_base.yaml
148148
```
149149

150150
+ Distilled model
151151

152-
```bash
152+
```py
153153
python sample_unet.py --base configs/cogview3_base_distill_4step.yaml
154154
```
155155

156156
### CogView-3-Relay
157157

158158
+ Original model
159159

160-
```shell
160+
```py
161161
python sample_unet.py --base configs/cogview3_relay.yaml
162162
```
163163

164164
+ Distilled model
165165

166-
```shell
166+
```py
167167
python sample_unet.py --base configs/cogview3_relay_distill_1step.yaml
168168
```
169169

170170
The output image format will be a folder. The folder name will consist of the sequence number and the first 15 characters of the prompt, containing multiple images. The number of images is based on the `batch` parameter. The structure should look like this:
171171

172-
```
172+
```py
173173
.
174174
├── 000000000.png
175175
├── 000000001.png

docs/cogview3-finetune/sat----README_zh.md

+11-11
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
确保你已经正确安装本文件夹中的要求的依赖
1212

13-
```shell
13+
```py
1414
pip install -r requirements.txt
1515
```
1616

@@ -47,7 +47,7 @@ pip install -r requirements.txt
4747

4848
接着,你需要将模型文件排版成如下格式:
4949

50-
```
50+
```py
5151
.cogview3-plus-3b
5252
├── transformer
5353
│ ├── 1
@@ -62,7 +62,7 @@ pip install -r requirements.txt
6262

6363
由于我们在`CogVideoX`中上传过 `safetensors` 格式的T5模型,一个简单的办法是从`CogVideX-2B`模型中克隆模型,然后将其移动到对应的文件夹中。
6464

65-
```shell
65+
```py
6666
git clone https://huggingface.co/THUDM/CogVideoX-2b.git #从huggingface下载模型
6767
# git clone https://www.modelscope.cn/ZhipuAI/CogVideoX-2b.git #从modelscope下载模型
6868
mkdir t5-v1_1-xxl
@@ -71,7 +71,7 @@ mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
7171

7272
通过上述方案,你将会得到一个 safetensor 格式的T5文件,确保在 Deepspeed微调过程中读入的时候不会报错。
7373

74-
```
74+
```py
7575
├── added_tokens.json
7676
├── config.json
7777
├── model-00001-of-00002.safetensors
@@ -88,7 +88,7 @@ mv CogVideoX-2b/text_encoder/* CogVideoX-2b/tokenizer/* t5-v1_1-xxl
8888

8989
这里以`CogView3-Base`为例,提供部分参数的讲解和介绍:
9090

91-
```yaml
91+
```py
9292
args:
9393
mode: inference
9494
relay_model: False # 当模型类型为 CogView-3-Relay 时,需要将该参数设置为 True
@@ -134,42 +134,42 @@ model:
134134

135135
### CogView-3Plus
136136

137-
```shell
137+
```py
138138
python sample_dit.py --base configs/cogview3_plus.yaml
139139
```
140140

141141
### CogView-3-Base
142142

143143
+ 原始模型
144144

145-
```shell
145+
```py
146146
python sample_unet.py --base configs/cogview3_base.yaml
147147
```
148148

149149
+ 蒸馏版本模型
150150

151-
```bash
151+
```py
152152
python sample_unet.py --base configs/cogview3_base_distill_4step.yaml
153153
```
154154

155155
### CogView-3-Relay
156156

157157
+ 原始模型
158158

159-
```shell
159+
```py
160160
python sample_unet.py --base configs/cogview3_relay.yaml
161161
```
162162

163163
+ 蒸馏版本模型
164164

165-
```shell
165+
```py
166166
python sample_unet.py --base configs/cogview3_relay_distill_1step.yaml
167167
```
168168

169169
输出图片格式为文件夹,其中,文件夹的名字为生成的序号加提示词的前15个字母,文件夹中包含多张图片,具体数量以 `batch` 参数为准。
170170
其结构应该如下:
171171

172-
```
172+
```py
173173
.
174174
├── 000000000.png
175175
├── 000000001.png

docs/cogview3-finetune/sat----arguments.py.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `.\cogview3-finetune\sat\arguments.py`
22

3-
```
3+
```py
44
# 导入所需的库
55
import argparse # 处理命令行参数
66
import os # 与操作系统交互的功能

docs/cogview3-finetune/sat----diffusion.py.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `.\cogview3-finetune\sat\diffusion.py`
22

3-
```
3+
```py
44
# 导入数学库以进行数学运算
55
import math
66
# 从 typing 模块导入类型提示相关的类

docs/cogview3-finetune/sat----sample_dit.py.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `.\cogview3-finetune\sat\sample_dit.py`
22

3-
```
3+
```py
44
# 导入操作系统模块,用于与操作系统交互
55
import os
66
# 导入数学模块,提供数学函数和常量

docs/cogview3-finetune/sat----sample_unet.py.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `.\cogview3-finetune\sat\sample_unet.py`
22

3-
```
3+
```py
44
# 导入操作系统模块
55
import os
66
# 导入数学模块

docs/cogview3-finetune/sat----sgm----__init__.py.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `.\cogview3-finetune\sat\sgm\__init__.py`
22

3-
```
3+
```py
44
# 从当前模块导入 AutoencodingEngine 类
55
from .models import AutoencodingEngine
66
# 从当前模块导入获取配置路径和根据配置实例化对象的工具函数

docs/cogview3-finetune/sat----sgm----models----__init__.py.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# `.\cogview3-finetune\sat\sgm\models\__init__.py`
22

3-
```
3+
```py
44
# 从同一模块导入 AutoencodingEngine 类,用于后续的自动编码器操作
55
from .autoencoder import AutoencodingEngine
66

0 commit comments

Comments
 (0)