From 126a4838d9295ffcc79c87a6401bd0f230a2d878 Mon Sep 17 00:00:00 2001
From: Abosite <334481978@qq.com>
Date: Wed, 5 Mar 2025 00:37:14 +0800
Subject: [PATCH 1/5] Remove all fluid API references Update the document to
 align with the latest API changes

---
 docs/api_guides/low_level/program.rst | 122 +++++++++++++++-----------
 1 file changed, 69 insertions(+), 53 deletions(-)

diff --git a/docs/api_guides/low_level/program.rst b/docs/api_guides/low_level/program.rst
index ea2b53c8fa5..9261e2f3620 100644
--- a/docs/api_guides/low_level/program.rst
+++ b/docs/api_guides/low_level/program.rst
@@ -8,19 +8,13 @@
 Program
 ==================
 
-:code:`Fluid` 中使用类似于编程语言的抽象语法树的形式描述用户的神经网络配置，用户对计算的描述都将写入一段 Program。Fluid 中的 Program 替代了传统框架中模型的概念，通过对顺序执行、条件选择和循环执行三种执行结构的支持，做到对任意复杂模型的描述。书写 :code:`Program` 的过程非常接近于写一段通用程序，如果您已经具有一定的编程经验，会很自然地将自己的知识迁移过来。
-
-
-总得来说：
-
-* 一个模型是一个 Fluid :code:`Program` ,一个模型可以含有多于一个 :code:`Program` ；
+在飞桨中，Program 是一种静态图模型，类似于其他编程语言中的程序。静态图编程采用先编译后执行的方式。需先在代码中预定义完整的神经网络结构，飞桨框架会将神经网络描述为 Program 的数据结构，并对 Program 进行编译优化，再调用执行器获得计算结果。
 
 * :code:`Program` 由嵌套的 :code:`Block` 构成，:code:`Block` 的概念可以类比到 C++ 或是 Java 中的一对大括号，或是 Python 语言中的一个缩进块；
 
 * :code:`Block` 中的计算由顺序执行、条件选择或者循环执行三种方式组合，构成复杂的计算逻辑；
 
-* :code:`Block` 中包含对计算和计算对象的描述。计算的描述称之为 Operator；计算作用的对象（或者说 Operator 的输入和输出）被统一为 Tensor，在 Fluid 中，Tensor 用层级为 0 的 :ref:`Lod_Tensor  <cn_user_guide_lod_tensor>` 表示。
-
+* :code:`Block` 中包含对计算和计算对象的描述。计算的描述称之为 :code:`Operator`；计算作用的对象（或者说 :code:`Operator` 的输入和输出）被统一为 :code:`Tensor`。
 
 
 .. _api_guide_Block:
@@ -29,20 +23,19 @@ Program
 Block
 =========
 
-:code:`Block` 是高级语言中变量作用域的概念，在编程语言中，Block 是一对大括号，其中包含局部变量定义和一系列指令或操作符。编程语言中的控制流结构 :code:`if-else` 和 :code:`for` 在深度学习中可以被等效为：
+:code:`Block` 是高级语言中变量作用域的概念，类似C语言或Java语言中的一对大括号，其中包含局部变量定义和一系列指令或操作符.
+
+:code:`Block` 是计算图中用于表示计算逻辑的基本单元。它包含一系列操作（:code:`Operator`）和计算对象（:code:`Tensor`），支持顺序执行、条件选择和循环执行等控制结构，从而构建复杂的计算流程。
 
-+-----------------+--------------------+
-|    编程语言     |       Fluid        |
-+=================+====================+
-| for, while loop | RNN,WhileOP        |
-+-----------------+--------------------+
-| if-else, switch | IfElseOp, SwitchOp |
-+-----------------+--------------------+
-| 顺序执行        | 一系列 layers      |
-+-----------------+--------------------+
+:code:`Block` 的主要特点：
 
-如上文所说，Fluid 中的 :code:`Block` 描述了一组以顺序、选择或是循环执行的 Operator 以及 Operator 操作的对象：Tensor。
+* 计算描述： :code:`Block` 内部包含多个 :code:`Operator`，每个 Operator 表示一个计算操作，如加法、卷积等。
 
+* 对象描述： :code:`Block` 中的计算对象统一为 :code:`Tensor`，表示多维数组或矩阵，是数据存储和传输的基本单元。
+
+* 控制结构： :code:`Block` 支持顺序执行、条件选择和循环执行等控制结构，使得计算流程更加灵活和复杂。
+
+在飞桨的计算图中，:code:`Block` 、:code:`Operator` 和 :code:`Tensor` 共同构成了计算流程的骨架。Block 提供了容器功能，组织和管理内部的 :code:`Operator` 和 :code:`Tensor`，从而实现高效的计算图构建和执行。
 
 
 
@@ -50,12 +43,9 @@ Block
 Operator
 =============
 
-在 Fluid 中，所有对数据的操作都由 :code:`Operator` 表示，为了便于用户使用，在 Python 端，Fluid 中的 :code:`Operator` 被一步封装入 :code:`paddle.fluid.layers` ， :code:`paddle.fluid.nets` 等模块。
-
-这是因为一些常见的对 Tensor 的操作可能是由更多基础操作构成，为了提高使用的便利性，框架内部对基础 Operator 进行了一些封装，包括创建 Operator 依赖可学习参数，可学习参数的初始化细节等，减少用户重复开发的成本。
+在 Paddle 中，所有对数据的操作都由 :code:`Operator` 表示 每个 :code:`Operator` 执行特定的功能，如矩阵乘法、卷积、激活函数等，通过组合这些 :code:`Operator`，可以构建复杂的计算图，实现模型的前向传播和反向传播。。
 
 
-更多内容可参考阅读 `Fluid 设计思想 <../../advanced_usage/design_idea/fluid_design_idea.html>`_
 
 .. _api_guide_Variable:
 
@@ -63,9 +53,9 @@ Operator
 Variable
 =========
 
-Fluid 中的 :code:`Variable` 可以包含任何类型的值———在大多数情况下是一个 :ref:`Lod_Tensor <cn_user_guide_lod_tensor>` 。
+Paddle 中的 :code:`Variable` 可以包含任何类型的值———在大多数情况下是一个 :ref:`Tensor <cn_user_guide_tensor>` 。
 
-模型中所有的可学习参数都以 :code:`Variable` 的形式保留在内存空间中，您在绝大多数情况下都不需要自己来创建网络中的可学习参数， Fluid 为几乎常见的神经网络基本计算模块都提供了封装。以最简单的全连接模型为例，调用 :code:`fluid.layers.fc` 会直接为全连接层创建连接权值( W )和偏置（ bias ）两个可学习参数，无需显示地调用 :code:`variable` 相关接口创建可学习参数。
+模型中所有的可学习参数都以 :code:`Variable` 的形式保留在内存空间中，您在绝大多数情况下都不需要自己来创建网络中的可学习参数， Paddle 为几乎常见的神经网络基本计算模块都提供了封装。以静态图中最简单的全连接模型为例，调用 :code:`paddle.static.nn.fc` 会直接为全连接层创建连接权值( W )和偏置（ bias ）两个可学习参数，无需显示地调用 :code:`variable` 相关接口创建可学习参数。
 
 .. _api_guide_Name:
 
@@ -73,52 +63,57 @@ Fluid 中的 :code:`Variable` 可以包含任何类型的值———在大多
 Name
 =========
 
-Fluid 中部分网络层里包含了 :code:`name` 参数，如 :ref:`cn_api_fluid_layers_fc` 。此 :code:`name` 一般用来作为网络层输出、权重的前缀标识，具体规则如下：
+Paddle 中部分网络层里包含了 :code:`name` 参数，如 :ref:`cn_api_static_nn_fc` 。此 :code:`name` 一般用来作为网络层输出、权重的前缀标识，具体规则如下：
 
-* 用于网络层输出的前缀标识。若网络层中指定了 :code:`name` 参数，Fluid 将以 ``name 值.tmp_数字`` 作为唯一标识对网络层输出进行命名；未指定 :code:`name` 参数时，则以 ``OP 名_数字.tmp_数字`` 的方式进行命名，其中的数字会自动递增，以区分同名 OP 下的不同网络层。
+* 用于网络层输出的前缀标识。若网络层中指定了 :code:`name` 参数，Paddle 将以 ``name 值.tmp_数字`` 作为唯一标识对网络层输出进行命名；未指定 :code:`name` 参数时，则以 ``OP 名_数字.tmp_数字`` 的方式进行命名，其中的数字会自动递增，以区分同名 OP 下的不同网络层。
 
-* 用于权重或偏置变量的前缀标识。若在网络层中通过 ``param_attr`` 和 ``bias_attr`` 创建了权重变量或偏置变量， 如 :ref:`cn_api_fluid_layers_embedding` 、 :ref:`cn_api_fluid_layers_fc` ，则 Fluid 会自动生成 ``前缀.w_数字`` 或 ``前缀.b_数字`` 的唯一标识对其进行命名，其中 ``前缀`` 为用户指定的 :code:`name` 或自动生成的 ``OP 名_数字`` 。若在 ``param_attr`` 和 ``bias_attr`` 中指定了 :code:`name` ，则用此 :code:`name` ，不再自动生成。细节请参考示例代码。
+* 用于权重或偏置变量的前缀标识。若在网络层中通过 ``param_attr`` 和 ``bias_attr`` 创建了权重变量或偏置变量， 如 :ref:`cn_api_nn_embedding` 、 :ref:`cn_api_static_nn_fc` ，则 Paddle 会自动生成 ``前缀.w_数字`` 或 ``前缀.b_数字`` 的唯一标识对其进行命名，其中 ``前缀`` 为用户指定的 :code:`name` 或自动生成的 ``OP 名_数字`` 。若在 ``param_attr`` 和 ``bias_attr`` 中指定了 :code:`name` ，则用此 :code:`name` ，不再自动生成。细节请参考示例代码。
 
-此外，在 :ref:`cn_api_fluid_ParamAttr` 中，可通过指定 :code:`name` 参数实现多个网络层的权重共享。
+此外，在 :ref:`cn_api_ParamAttr` 中，可通过指定 :code:`name` 参数实现多个网络层的权重共享。
 
 示例代码如下：
 
 .. code-block:: python
 
-    import paddle.fluid as fluid
+    import paddle
     import numpy as np
 
-    x = fluid.layers.data(name='x', shape=[1], dtype='int64', lod_level=1)
-    emb = fluid.layers.embedding(input=x, size=(128, 100))  # embedding_0.w_0
-    emb = fluid.layers.Print(emb) # Tensor[embedding_0.tmp_0]
+    embedding = paddle.nn.Embedding(num_embeddings=128, embedding_dim=100)
+    emb = embedding(x)  # embedding_0.w_0
+    print(emb) # Tensor[embedding_0.tmp_0]
 
     # default name
-    fc_none = fluid.layers.fc(input=emb, size=1)  # fc_0.w_0, fc_0.b_0
-    fc_none = fluid.layers.Print(fc_none)  # Tensor[fc_0.tmp_1]
+    fc = paddle.nn.Linear(in_features=100, out_features=1)
+    fc_out = fc(emb)  # fc_0.w_0, fc_0.b_0
+    print(fc_out)  # Tensor[fc_0.tmp_1]
 
-    fc_none1 = fluid.layers.fc(input=emb, size=1)  # fc_1.w_0, fc_1.b_0
-    fc_none1 = fluid.layers.Print(fc_none1)  # Tensor[fc_1.tmp_1]
+    fc1 = paddle.nn.Linear(in_features=100, out_features=1)  # fc_1.w_0, fc_1.b_0
+    fc1_out = fc1(emb)  # fc_1.w_0, fc_1.b_0
+    print(fc1_out)  # Tensor[fc_1.tmp_1]
 
     # name in ParamAttr
-    w_param_attrs = fluid.ParamAttr(name="fc_weight", learning_rate=0.5, trainable=True)
+    w_param_attrs = paddle.ParamAttr(name="fc_weight", learning_rate=0.5, trainable=True)
     print(w_param_attrs.name)  # fc_weight
 
     # name == 'my_fc'
-    my_fc1 = fluid.layers.fc(input=emb, size=1, name='my_fc', param_attr=w_param_attrs) # fc_weight, my_fc.b_0
-    my_fc1 = fluid.layers.Print(my_fc1)  # Tensor[my_fc.tmp_1]
+    my_fc = paddle.nn.Linear(in_features=100, out_features=1, name='my_fc', weight_attr=w_param_attrs)
+    my_fc_out = my_fc(emb) # fc_weight, my_fc.b_0
+    print(my_fc_out)  # Tensor[my_fc.tmp_1]
+
+    my_fc2 = paddle.nn.Linear(in_features=100, out_features=1, name='my_fc', weight_attr=w_param_attrs)
+    my_fc2_out = my_fc2(emb) # fc_weight, my_fc.b_1
+    print(my_fc2_out)  # Tensor[my_fc.tmp_3]
+
+    place = paddle.CPUPlace()
 
-    my_fc2 = fluid.layers.fc(input=emb, size=1, name='my_fc', param_attr=w_param_attrs) # fc_weight, my_fc.b_1
-    my_fc2 = fluid.layers.Print(my_fc2)  # Tensor[my_fc.tmp_3]
+    exe = paddle.static.Executor(place)
 
-    place = fluid.CPUPlace()
-    x_data = np.array([[1],[2],[3]]).astype("int64")
-    x_lodTensor = fluid.create_lod_tensor(x_data, [[1, 2]], place)
-    exe = fluid.Executor(place)
-    exe.run(fluid.default_startup_program())
-    ret = exe.run(feed={'x': x_lodTensor}, fetch_list=[fc_none, fc_none1, my_fc1, my_fc2], return_numpy=False)
+    exe.run(paddle.static.default_startup_program())
 
+    ret = exe.run(feed={'x': x}, fetch_list=[fc_out, fc1_out, my_fc_out, my_fc2_out], return_numpy=False)
 
-上述示例中， ``fc_none`` 和 ``fc_none1`` 均未指定 :code:`name` 参数，则以 ``OP 名_数字.tmp_数字`` 分别对该 OP 输出进行命名：``fc_0.tmp_1`` 和 ``fc_1.tmp_1`` ，其中 ``fc_0``  和 ``fc_1`` 中的数字自动递增以区分两个全连接层； ``my_fc1`` 和 ``my_fc2`` 均指定了 :code:`name` 参数，但取值相同，Fluid 以后缀 ``tmp_数字`` 进行区分，即 ``my_fc.tmp_1`` 和 ``my_fc.tmp_3`` 。
+
+上述示例中， ``fc_none`` 和 ``fc_none1`` 均未指定 :code:`name` 参数，则以 ``OP 名_数字.tmp_数字`` 分别对该 OP 输出进行命名：``fc_0.tmp_1`` 和 ``fc_1.tmp_1`` ，其中 ``fc_0``  和 ``fc_1`` 中的数字自动递增以区分两个全连接层； ``my_fc1`` 和 ``my_fc2`` 均指定了 :code:`name` 参数，但取值相同，Paddle 以后缀 ``tmp_数字`` 进行区分，即 ``my_fc.tmp_1`` 和 ``my_fc.tmp_3`` 。
 
 对于网络层中创建的变量， ``emb`` 层和 ``fc_none`` 、 ``fc_none1`` 层均默认以 ``OP 名_数字`` 为前缀对权重或偏置变量进行命名，如 ``embedding_0.w_0`` 、 ``fc_0.w_0`` 、 ``fc_0.b_0`` ，其前缀与 OP 输出的前缀一致。 ``my_fc1`` 层和 ``my_fc2`` 层则优先以 ``ParamAttr`` 中指定的 ``fc_weight`` 作为共享权重的名称。而偏置变量 ``my_fc.b_0`` 和 ``my_fc.b_1`` 则次优地以 :code:`name` 作为前缀标识。
 
@@ -130,16 +125,37 @@ Fluid 中部分网络层里包含了 :code:`name` 参数，如 :ref:`cn_api_flui
 ParamAttr
 =========
 
+ParamAttr 是用于设置模型参数（如权重和偏置）属性的配置类。通过 ParamAttr，用户可以灵活地定义参数的初始化方式、正则化策略、梯度裁剪以及模型平均等特性。
+
+实例代码如下：
+
+.. code-block:: python
+    import paddle
+    from paddle import ParamAttr
+
+    # 创建一个全连接层，设置权重和偏置的属性
+    fc = paddle.nn.Linear(in_features=128, out_features=64,
+                          weight_attr=ParamAttr(
+                              name='fc_weight',
+                              initializer=paddle.nn.initializer.XavierUniform(),
+                              regularizer=paddle.regularizer.L2Decay(0.0001)
+                          ),
+                           bias_attr=ParamAttr(
+                              name='fc_bias',
+                              initializer=paddle.nn.initializer.Constant(0.0)
+                          ))
+
+
+在上述示例中： :code:`weight_attr` 和 :code:`bias_attr` 分别设置了权重和偏置的属性。:code:`name` 指定参数的名称。:code:`initializer` 设置参数的初始化方式。:code:`regularizer` 设置参数的正则化策略。
+
 =========
 相关 API
 =========
 
-* 用户配置的单个神经网络叫做 :ref:`cn_api_fluid_Program` 。值得注意的是，训练神经网
+* 用户配置的单个神经网络叫做 :ref:`cn_api_Program` 。值得注意的是，训练神经网
   络时，用户经常需要配置和操作多个 :code:`Program` 。比如参数初始化的
   :code:`Program` ， 训练用的 :code:`Program` ，测试用的
   :code:`Program` 等等。
 
 
-* 用户还可以使用 :ref:`cn_api_fluid_program_guard` 配合 :code:`with` 语句，修改配置好的 :ref:`cn_api_fluid_default_startup_program` 和 :ref:`cn_api_fluid_default_main_program` 。
-
-* 在 Fluid 中，Block 内部执行顺序由控制流决定，如 :ref:`cn_api_fluid_layers_IfElse` , :ref:`cn_api_fluid_layers_While`, :ref:`cn_api_fluid_layers_Switch` 等，更多内容可参考： :ref:`api_guide_control_flow`
+* 用户还可以使用 :ref:`cn_api_program_guard` 配合 :code:`with` 语句，修改配置好的 :ref:`cn_api_default_startup_program` 和 :ref:`cn_api_default_main_program` 。

From bc54870fb1b25ee82d6d59c93e77b306a075dab2 Mon Sep 17 00:00:00 2001
From: Abosite <334481978@qq.com>
Date: Wed, 5 Mar 2025 01:44:06 +0800
Subject: [PATCH 2/5] Remove fluid APIs and update APIs in program.rst and
 program_en.rst

---
 docs/api_guides/low_level/program.rst    |  11 +-
 docs/api_guides/low_level/program_en.rst | 132 +++++++++++++----------
 2 files changed, 78 insertions(+), 65 deletions(-)

diff --git a/docs/api_guides/low_level/program.rst b/docs/api_guides/low_level/program.rst
index 9261e2f3620..759c075e15a 100644
--- a/docs/api_guides/low_level/program.rst
+++ b/docs/api_guides/low_level/program.rst
@@ -39,11 +39,13 @@ Block
 
 
 
+.. _api_guide_Operator:
+
 =============
 Operator
 =============
 
-在 Paddle 中，所有对数据的操作都由 :code:`Operator` 表示 每个 :code:`Operator` 执行特定的功能，如矩阵乘法、卷积、激活函数等，通过组合这些 :code:`Operator`，可以构建复杂的计算图，实现模型的前向传播和反向传播。。
+在 Paddle 中，所有对数据的操作都由 :code:`Operator` 表示 每个 :code:`Operator` 执行特定的功能，如矩阵乘法、卷积、激活函数等，通过组合这些 :code:`Operator`，可以构建复杂的计算图，实现模型的前向传播和反向传播。
 
 
 
@@ -125,7 +127,7 @@ Paddle 中部分网络层里包含了 :code:`name` 参数，如 :ref:`cn_api_sta
 ParamAttr
 =========
 
-ParamAttr 是用于设置模型参数（如权重和偏置）属性的配置类。通过 ParamAttr，用户可以灵活地定义参数的初始化方式、正则化策略、梯度裁剪以及模型平均等特性。
+``ParamAttr`` 是用于设置模型参数（如权重和偏置）属性的配置类。通过 ``ParamAttr``，用户可以灵活地定义参数的初始化方式、正则化策略、梯度裁剪以及模型平均等特性。
 
 实例代码如下：
 
@@ -152,10 +154,7 @@ ParamAttr 是用于设置模型参数（如权重和偏置）属性的配置类
 相关 API
 =========
 
-* 用户配置的单个神经网络叫做 :ref:`cn_api_Program` 。值得注意的是，训练神经网
-  络时，用户经常需要配置和操作多个 :code:`Program` 。比如参数初始化的
-  :code:`Program` ， 训练用的 :code:`Program` ，测试用的
-  :code:`Program` 等等。
+* 用户配置的单个神经网络叫做 :ref:`cn_api_Program` 。值得注意的是，训练神经网络时，用户经常需要配置和操作多个 :code:`Program` 。比如参数初始化的:code:`Program` ， 训练用的 :code:`Program` ，测试用的:code:`Program` 等等。
 
 
 * 用户还可以使用 :ref:`cn_api_program_guard` 配合 :code:`with` 语句，修改配置好的 :ref:`cn_api_default_startup_program` 和 :ref:`cn_api_default_main_program` 。
diff --git a/docs/api_guides/low_level/program_en.rst b/docs/api_guides/low_level/program_en.rst
index 3a380229560..9777450274d 100644
--- a/docs/api_guides/low_level/program_en.rst
+++ b/docs/api_guides/low_level/program_en.rst
@@ -8,19 +8,21 @@ Basic Concept
 Program
 ==================
 
-:code:`Fluid` describes neural network configuration in the form of abstract grammar tree similar to that of a programming language, and the user's description of computation will be written into a Program. Program in Fluid replaces the concept of models in traditional frameworks. It can describe any complex model through three execution structures: sequential execution, conditional selection and loop execution. Writing :code:`Program` is very close to writing a common program. If you have tried programming before, you will naturally apply your expertise to it.
+In PaddlePaddle, a Program is a static graph model, similar to programs in other programming languages. Static graph programming follows a "define-and-run" approach:
 
-In brief：
+* Define: The complete neural network architecture is predefined in the code.
 
-* A model is a Fluid :code:`Program`  and can contain more than one :code:`Program` ;
+* Compile: PaddlePaddle represents the neural network as a Program data structure and performs compilation optimizations.
 
-* :code:`Program` consists of nested :code:`Block` , and the concept of :code:`Block` can be analogized to a pair of braces in C++ or Java, or an indentation block in Python.
+* Execute: An executor is invoked to obtain the computation results.
 
+This approach allows for efficient execution but requires the entire network structure to be defined before running the program. 
 
-* Computing in :code:`Block` is composed of three ways: sequential execution, conditional selection or loop execution, which constitutes complex computational logic.
+* A :code:`Program` consists of nested :code:`Blocks`. The concept of a :code:`Block` can be likened to a pair of curly braces ``{}`` in languages like C++ or Java, or to an indented block in Python. 
 
+* The computation in the :code:`Block` is composed of three types of execution: sequential execution, conditional selection, and loop execution, which together form a complex computational logic.
 
-* :code:`Block` contains descriptions of computation and computational objects. The description of computation is called Operator; the object of computation (or the input and output of Operator) is unified as Tensor. In Fluid, Tensor is represented by 0-leveled `LoD-Tensor <http://paddlepaddle.org/documentation/docs/zh/1.2/user_guides/howto/prepare_data/lod_tensor.html#permalink-4-lod-tensor>`_ .
+* The :code:`Block` contains descriptions of the computation and the objects involved in the computation. The description of the computation is called the :code:`Operator`; the objects on which the computation acts (or the inputs and outputs of the :code:`Operator`) are unified as :code:`Tensors`.
 
 .. _api_guide_Block_en:
 
@@ -28,43 +30,31 @@ In brief：
 Block
 =========
 
-:code:`Block` is the concept of variable scope in advanced languages. In programming languages, Block is a pair of braces, which contains local variable definitions and a series of instructions or operators. Control flow structures :code:`if-else` and :code:`for` in programming languages can be equivalent to the following counterparts in deep learning:
+The :code:`Block` is the concept of variable scope in high-level languages, similar to a pair of curly braces in C or Java, which contain local variable definitions and a series of instructions or operators.
 
-+----------------------+-------------------------+
-| programming languages| Fluid                   |
-+======================+=========================+
-| for, while loop      | RNN,WhileOP             |
-+----------------------+-------------------------+
-| if-else, switch      | IfElseOp, SwitchOp      |
-+----------------------+-------------------------+
-| execute sequentially | a series of layers      |
-+----------------------+-------------------------+
+The :code:`Block` is the fundamental unit in a computation graph used to represent computational logic. It contains a series of operations (:code:`Operator`) and computational objects (:code:`Tensor`), supporting control structures such as sequential execution, conditional selection, and loop execution, thereby building complex computational flows.
 
-As mentioned above,  :code:`Block` in Fluid describes a set of Operators that include sequential execution, conditional selection or loop execution, and the operating object of Operator: Tensor.
+* Computation description: The :code:`Block` contains multiple :code:`Operators` internally, with each :code:`Operator` representing a computational operation, such as addition, convolution, etc.
 
+* Object description: The computational objects in the :code:`Block` are unified as :code:`Tensors`, representing multi-dimensional arrays or matrices, and are the basic units of data storage and transmission.
 
+* Control structures: The :code:`Block` supports control structures such as sequential execution, conditional selection, and loop execution, making the computational flow more flexible and complex.
+
+In the PaddlePaddle computation graph, :code:`Block`, :code:`Operator`, and :code:`Tensor` together form the backbone of the computational flow. The :code:`Block` provides a container function, organizing and managing the internal :code:`Operators` and :code:`Tensors`, thereby enabling efficient construction and execution of the computation graph.
 
 =============
 Operator
 =============
 
-In Fluid, all operations of data are represented by :code:`Operator` . In Python, :code:`Operator` in Fluid is encapsulated into modules like :code:`paddle.fluid.layers` , :code:`paddle.fluid.nets` .
-
-This is because some common operations on Tensor may consist of more basic operations. For simplicity, some encapsulation of the basic Operator is carried out inside the framework, including the creation of learnable parameters relied by an Operator, the initialization details of learnable parameters, and so on, so as to reduce the cost of further development.
-
-
-
-More information can be read for reference. `Fluid Design Idea <../../advanced_usage/design_idea/fluid_design_idea.html>`_
-
-.. _api_guide_Variable_en:
+In Paddle, all operations on data are represented by :code:`Operators`. Each :code:`Operator` performs a specific function, such as matrix multiplication, convolution, activation functions, etc. By combining these :code:`Operators`, complex computation graphs can be constructed to implement the forward and backward propagation of a model.
 
 =========
 Variable
 =========
 
-In Fluid， :code:`Variable` can contain any type of value -- in most cases a LoD-Tensor.
+In Paddle, a :code:`Variable` can contain any type of value — most commonly a :code:`Tensor`.
 
-All the learnable parameters in the model are kept in the memory space in form of :code:`Variable` . In most cases, you do not need to create the learnable parameters in the network by yourself. Fluid provides encapsulation for almost common basic computing modules of the neural network. Taking the simplest full connection model as an example, calling :code:`fluid.layers.fc` directly creates two learnable parameters for the full connection layer, namely, connection weight (W) and bias, without explicitly calling :code:`Variable` related interfaces to create learnable parameters.
+All learnable parameters in the model are stored as :code:`Variable` objects in memory. In most cases, you don't need to manually create the learnable parameters in the network, as Paddle provides wrappers for almost all common neural network basic computation modules. For example, in the simplest fully connected model in a static graph, calling :code:`paddle.static.nn.fc` will automatically create the learnable parameters for the fully connected layer: connection weights (W) and biases (bias), without the need to explicitly call the :code:`variable` interface to create learnable parameters.
 
 .. _api_guide_Name:
 
@@ -72,56 +62,61 @@ All the learnable parameters in the model are kept in the memory space in form o
 Name
 =========
 
-In Fluid, some layers contain the parameter :code:`name` , such as :ref:`api_fluid_layers_fc` . This :code:`name` is generally used as the prefix identification of output and weight in network layers. The specific rules are as follows:
+In Paddle, some network layers include a :code:`name` parameter, such as in the :code:`paddle.static.nn.fc` API. This :code:`name` is generally used as a prefix identifier for the network layer's output and weights. The specific rules are as follows:
 
-* Prefix identification for output of layers. If :code:`name` is specified in the layer, Fluid will name the output with ``nameValue.tmp_number`` . If the :code:`name` is not specified, ``OPName_number.tmp_number`` is automatically generated to name the layer. The numbers are automatically incremented to distinguish different network layers under the same operator.
+* The prefix identifier used for the network layer output. If the :code:`name` parameter is specified in the network layer, Paddle will use the :code:`name` value followed by ``.tmp_number`` as a unique identifier for naming the network layer's output. If the :code:`name` parameter is not specified, it will use the format ``OP_name_number.tmp_number`` for naming, where the numbers will automatically increment to distinguish different network layers under the same OP name.
 
-* Prefix identification for weight or bias variable. If the weight and bias variables are created by ``param_attr`` and ``bias_attr`` in operator, such as :ref:`api_fluid_layers_embedding` 、 :ref:`api_fluid_layers_fc` , Fluid will generate ``prefix.w_number`` or ``prefix.b_number`` as unique identifier to name them, where the ``prefix`` is :code:`name` specified by users or ``OPName_number`` generated by default. If :code:`name` is specified in ``param_attr`` and ``bias_attr`` , the :code:`name` is no longer generated automatically. Refer to the sample code for details.
+* The prefix identifier used for weight or bias variables. If weight or bias variables are created in the network layer through ``param_attr`` and ``bias_attr``, such as in the :ref:`api_nn_embedding` or :ref:`api_static_nn_fc` APIs, Paddle will automatically generate a unique identifier in the format ``prefix.w_number`` or ``prefix.b_number`` for naming them, where ``prefix`` is either the user-specified :code:`name` or the automatically generated ``OP_name_number``. If a :code:`name` is specified in ``param_attr`` or ``bias_attr``, this :code:`name` will be used, and the automatic generation will not occur. For details, please refer to the example code.
 
-In addition, the weights of multiple network layers can be shared by specifying the :code:`name` parameter in :ref:`api_fluid_ParamAttr`.
+Additionally, in the :ref:`api_ParamAttr` API, you can achieve weight sharing across multiple network layers by specifying the :code:`name` parameter.
 
 Sample Code:
 
 .. code-block:: python
 
-    import paddle.fluid as fluid
+    import paddle
     import numpy as np
 
-    x = fluid.layers.data(name='x', shape=[1], dtype='int64', lod_level=1)
-    emb = fluid.layers.embedding(input=x, size=(128, 100))  # embedding_0.w_0
-    emb = fluid.layers.Print(emb) # Tensor[embedding_0.tmp_0]
+    embedding = paddle.nn.Embedding(num_embeddings=128, embedding_dim=100)
+    emb = embedding(x)  # embedding_0.w_0
+    print(emb) # Tensor[embedding_0.tmp_0]
 
     # default name
-    fc_none = fluid.layers.fc(input=emb, size=1)  # fc_0.w_0, fc_0.b_0
-    fc_none = fluid.layers.Print(fc_none)  # Tensor[fc_0.tmp_1]
+    fc = paddle.nn.Linear(in_features=100, out_features=1)
+    fc_out = fc(emb)  # fc_0.w_0, fc_0.b_0
+    print(fc_out)  # Tensor[fc_0.tmp_1]
 
-    fc_none1 = fluid.layers.fc(input=emb, size=1)  # fc_1.w_0, fc_1.b_0
-    fc_none1 = fluid.layers.Print(fc_none1)  # Tensor[fc_1.tmp_1]
+    fc1 = paddle.nn.Linear(in_features=100, out_features=1)  # fc_1.w_0, fc_1.b_0
+    fc1_out = fc1(emb)  # fc_1.w_0, fc_1.b_0
+    print(fc1_out)  # Tensor[fc_1.tmp_1]
 
     # name in ParamAttr
-    w_param_attrs = fluid.ParamAttr(name="fc_weight", learning_rate=0.5, trainable=True)
+    w_param_attrs = paddle.ParamAttr(name="fc_weight", learning_rate=0.5, trainable=True)
     print(w_param_attrs.name)  # fc_weight
 
     # name == 'my_fc'
-    my_fc1 = fluid.layers.fc(input=emb, size=1, name='my_fc', param_attr=w_param_attrs) # fc_weight, my_fc.b_0
-    my_fc1 = fluid.layers.Print(my_fc1)  # Tensor[my_fc.tmp_1]
+    my_fc = paddle.nn.Linear(in_features=100, out_features=1, name='my_fc', weight_attr=w_param_attrs)
+    my_fc_out = my_fc(emb) # fc_weight, my_fc.b_0
+    print(my_fc_out)  # Tensor[my_fc.tmp_1]
+
+    my_fc2 = paddle.nn.Linear(in_features=100, out_features=1, name='my_fc', weight_attr=w_param_attrs)
+    my_fc2_out = my_fc2(emb) # fc_weight, my_fc.b_1
+    print(my_fc2_out)  # Tensor[my_fc.tmp_3]
+
+    place = paddle.CPUPlace()
 
-    my_fc2 = fluid.layers.fc(input=emb, size=1, name='my_fc', param_attr=w_param_attrs) # fc_weight, my_fc.b_1
-    my_fc2 = fluid.layers.Print(my_fc2)  # Tensor[my_fc.tmp_3]
+    exe = paddle.static.Executor(place)
 
-    place = fluid.CPUPlace()
-    x_data = np.array([[1],[2],[3]]).astype("int64")
-    x_lodTensor = fluid.create_lod_tensor(x_data, [[1, 2]], place)
-    exe = fluid.Executor(place)
-    exe.run(fluid.default_startup_program())
-    ret = exe.run(feed={'x': x_lodTensor}, fetch_list=[fc_none, fc_none1, my_fc1, my_fc2], return_numpy=False)
+    exe.run(paddle.static.default_startup_program())
 
+    ret = exe.run(feed={'x': x}, fetch_list=[fc_out, fc1_out, my_fc_out, my_fc2_out], return_numpy=False)
 
-In the above example, ``fc_none`` and ``fc_none1`` are not specified :code:`name` parameter, so this two layers are named with ``fc_0.tmp_1`` and ``fc_1.tmp_1`` in the form ``OPName_number.tmp_number`` , where the numbers in ``fc_0`` and ``fc_1`` are automatically incremented to distinguish this two fully connected layers. The other two fully connected layers ``my_fc1`` and ``my_fc2`` both specify the :code:`name` parameter with same values. Fluid will distinguish the two layers by suffix ``tmp_number`` . That is ``my_fc.tmp_1`` and ``my_fc.tmp_3`` .
 
-Variables created in ``emb`` layer and ``fc_none`` , ``fc_none1`` are named by the ``OPName_number`` , such as ``embedding_0.w_0`` 、 ``fc_0.w_0`` 、 ``fc_0.b_0`` . And the prefix is consistent with the prefix of network layer. The ``my_fc1`` layer and ``my_fc2`` layer preferentially name the shared weight with ``fc_weight`` specified in ``ParamAttr`` . The bias variables ``my_fc.b_0`` and ``my_fc.b_1`` are identified suboptimally with :code:`name` int the operator as prefix.
+In the above example, ``fc_none`` and ``fc_none1`` did not specify the :code:`name` parameter, so the outputs of these OPs are named using the format ``OP_name_number.tmp_number``: ``fc_0.tmp_1`` and ``fc_1.tmp_1``, where the numbers in ``fc_0`` and ``fc_1`` automatically increment to distinguish the two fully connected layers. ``my_fc1`` and ``my_fc2`` both specified the :code:`name` parameter, but with the same value. Paddle differentiates them by appending ``tmp_number``, resulting in ``my_fc.tmp_1`` and ``my_fc.tmp_3``.
 
-In the above example, the ``my_fc1`` and ``my_fc2`` two fully connected layers implement the sharing of weight parameters by constructing ``ParamAttr`` and specifying the :code:`name` parameter.
+For variables created in the network layers, the ``emb`` layer, ``fc_none``, and ``fc_none1`` layers default to naming weight or bias variables with the prefix ``OP_name_number``, such as ``embedding_0.w_0``, ``fc_0.w_0``, and ``fc_0.b_0``, with the prefix matching the OP output. The ``my_fc1`` and ``my_fc2`` layers prioritize the ``fc_weight`` specified in ``ParamAttr`` as the name for the shared weights. The bias variables ``my_fc.b_0`` and ``my_fc.b_1`` are next in priority, named with the :code:`name` prefix.
+
+In the above example, the two fully connected layers, ``my_fc1`` and ``my_fc2``, achieved weight variable sharing by constructing ``ParamAttr`` and specifying the :code:`name` parameter.
 
 .. _api_guide_ParamAttr:
 
@@ -129,15 +124,34 @@ In the above example, the ``my_fc1`` and ``my_fc2`` two fully connected layers i
 ParamAttr
 =========
 
+``ParamAttr`` is a configuration class used to set the attributes of model parameters, such as weights and biases. Through ``ParamAttr``, users can flexibly define characteristics such as parameter initialization methods, regularization strategies, gradient clipping, and model averaging.
+
+Sample Code:
+
+.. code-block:: python
+    import paddle
+    from paddle import ParamAttr
+
+    # Create a fully connected layer and set the attributes for the weights and biases.
+    fc = paddle.nn.Linear(in_features=128, out_features=64,
+                          weight_attr=ParamAttr(
+                              name='fc_weight',
+                              initializer=paddle.nn.initializer.XavierUniform(),
+                              regularizer=paddle.regularizer.L2Decay(0.0001)
+                          ),
+                           bias_attr=ParamAttr(
+                              name='fc_bias',
+                              initializer=paddle.nn.initializer.Constant(0.0)
+                          ))
+
+In the above example, ``weight_attr`` and ``bias_attr`` set the attributes for the weights and biases, respectively. The :code:`name` specifies the name of the parameter. The ``initializer`` sets the initialization method for the parameter, and the ``regularizer`` sets the regularization strategy for the parameter.
+
 ==================
 Related API
 ==================
 
 
-* A single neural network configured by the user is called :ref:`api_fluid_Program` . It is noteworthy that when training neural networks, users often need to configure and operate multiple :code:`Program` . For example,  :code:`Program` for parameter initialization, :code:`Program` for training,  :code:`Program` for testing, etc.
-
-
-* Users can also use :ref:`api_fluid_program_guard` with :code:`with` to modify the configured :ref:`api_fluid_default_startup_program` and :ref:`api_fluid_default_main_program` .
+* The user-configured individual neural network is called a :code:`Program`. It is important to note that during the training of a neural network, users often need to configure and operate multiple :code:`Programs`. For example, a :code:`Program` for parameter initialization, a :code:`Program` for training, and a :code:`Program` for testing, etc.
 
 
-* In Fluid，the execution order in a Block is determined by control flow，such as :ref:`api_fluid_layers_IfElse` , :ref:`api_fluid_layers_While` and :ref:`api_fluid_layers_Switch` . For more information, please refer to： :ref:`api_guide_control_flow_en`
+* Users can also use the :ref:`api_program_guard` in conjunction with the :code:`with` statement to modify the configured :ref:`api_default_startup_program` and :ref:`api_default_main_program`.

From 929ef6707fc6f5e0662b3155f7510c9382f7383f Mon Sep 17 00:00:00 2001
From: Abosite <334481978@qq.com>
Date: Wed, 5 Mar 2025 11:31:54 +0800
Subject: [PATCH 3/5] Modify code style.

---
 docs/api_guides/low_level/program.rst    | 2 +-
 docs/api_guides/low_level/program_en.rst | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/api_guides/low_level/program.rst b/docs/api_guides/low_level/program.rst
index 759c075e15a..735dd2390f5 100644
--- a/docs/api_guides/low_level/program.rst
+++ b/docs/api_guides/low_level/program.rst
@@ -23,7 +23,7 @@ Program
 Block
 =========
 
-:code:`Block` 是高级语言中变量作用域的概念，类似C语言或Java语言中的一对大括号，其中包含局部变量定义和一系列指令或操作符.
+:code:`Block` 是高级语言中变量作用域的概念，类似 C 语言或 Java 语言中的一对大括号，其中包含局部变量定义和一系列指令或操作符.
 
 :code:`Block` 是计算图中用于表示计算逻辑的基本单元。它包含一系列操作（:code:`Operator`）和计算对象（:code:`Tensor`），支持顺序执行、条件选择和循环执行等控制结构，从而构建复杂的计算流程。
 
diff --git a/docs/api_guides/low_level/program_en.rst b/docs/api_guides/low_level/program_en.rst
index 9777450274d..7e7f7f9197f 100644
--- a/docs/api_guides/low_level/program_en.rst
+++ b/docs/api_guides/low_level/program_en.rst
@@ -16,9 +16,9 @@ In PaddlePaddle, a Program is a static graph model, similar to programs in other
 
 * Execute: An executor is invoked to obtain the computation results.
 
-This approach allows for efficient execution but requires the entire network structure to be defined before running the program. 
+This approach allows for efficient execution but requires the entire network structure to be defined before running the program.
 
-* A :code:`Program` consists of nested :code:`Blocks`. The concept of a :code:`Block` can be likened to a pair of curly braces ``{}`` in languages like C++ or Java, or to an indented block in Python. 
+* A :code:`Program` consists of nested :code:`Blocks`. The concept of a :code:`Block` can be likened to a pair of curly braces ``{}`` in languages like C++ or Java, or to an indented block in Python.
 
 * The computation in the :code:`Block` is composed of three types of execution: sequential execution, conditional selection, and loop execution, which together form a complex computational logic.
 

From 39e39bed012819d4a93afb864e207fd87c3bd120 Mon Sep 17 00:00:00 2001
From: Abosite <334481978@qq.com>
Date: Wed, 5 Mar 2025 23:31:45 +0800
Subject: [PATCH 4/5] Update related concepts

---
 docs/api_guides/low_level/program.rst    | 95 +++++++++++++++++-------
 docs/api_guides/low_level/program_en.rst | 93 ++++++++++++++++-------
 2 files changed, 135 insertions(+), 53 deletions(-)

diff --git a/docs/api_guides/low_level/program.rst b/docs/api_guides/low_level/program.rst
index 735dd2390f5..5a49195c828 100644
--- a/docs/api_guides/low_level/program.rst
+++ b/docs/api_guides/low_level/program.rst
@@ -4,17 +4,56 @@
 基础概念
 #########
 
+.. _api_guide_IR:
+
+==================
+IR
+==================
+
+:code:`Paddle` 通过一种 IR（Intermediate Representation，中间表示形式）来表示计算图，并在此基础上借助编译器的理念、技术和工具对神经网络进行自动优化和代码生成。
+
+新 IR 通过 :code:`Operation` 、:code:`Region` 、:code:`Block` 三者的循环嵌套来表示结构化控制流。
+
+一个 Op 会包含 0 个或多个 :code:`Region` , 一个 :code:`Region` 会包含 0 个或多个 :code:`Block` , 一个 :code:`Block` 里面包含了 0 个或多个:code:`Operation` 。 三者循环嵌套包含，用来描述复杂的模型结构。
+
+
 ==================
 Program
 ==================
 
-在飞桨中，Program 是一种静态图模型，类似于其他编程语言中的程序。静态图编程采用先编译后执行的方式。需先在代码中预定义完整的神经网络结构，飞桨框架会将神经网络描述为 Program 的数据结构，并对 Program 进行编译优化，再调用执行器获得计算结果。
+:code:`Program` 用来表示一个具体的模型。它包含两部分：计算图 和 权重 。 模型等价于一个有向无环图。 :code:`Operation` 为节点，:code:`Value` 为边
+
+
+权重（ :code:`Weight` ）用来对模型的权重参数进行单独存储，:code:`Value `、:code:`Operation` 用来对计算图进行抽象。
+
+:code:`Operation` 表示计算图中的节点。一个 :code:`Operation` 表示一个算子，它里面包含了零个或多个 :code:`Region` 。:code:`Region` 表示一个闭包，它里面包含了零个或多个 :code:`Block` 。:code:`Block` 表示一个符合 SSA 的基本块，里面包含了零个或多个 :code:`Operation` 。三者循环嵌套，可以实现任意复杂的语法结构。
+
+:code:`Value` 表示计算图中的有向边，他用来将两个 :code:`Operation` 关联起来，描述了程序中的 UD 链 。
+
+:code:`Program` 中 ``ModuleOp module_`` 存计算图 , ParameterMap ``parameters_`` 存权重。 ``ModuleOp`` 类中, 用 :code:`Block` 来存计算图中的内容。
+
+.. _api_guide_Region:
+
+=========
+Region
+=========
+
+:code:`Region` 里面包含了一个:code:`Block`列表， 第一个:code:`Block` (如果存在的话)，称为该 :code:`Region` 的入口块。
+
+与基本块不同，:code:`Region` 存在一个最显著的约束是：:code:`Region` 内定义的 :code:`Value` 只能在该 :code:`Region` 内部使用，:code:`Region` 的外面不允许使用。
+
+当控制流进入一个 :code:`Region` ， 相当于创建了一个新的子 scope， 当控制流退出该 :code:`Region` 时，该子 scope 中定义的所有变量都可以回收。
+
+控制流进入 :code:`Region` , 一定会首先进入该 :code:`Region` 的入口块。因此，:code:`Region` 的参数用入口块参数即可描述，不需要额外处理。
+
+当 :code:`Region` 的一次执行结束，控制流由子 :code:`Block` 返回到该 :code:`Region` 时，控制流会有两种去处：
 
-* :code:`Program` 由嵌套的 :code:`Block` 构成，:code:`Block` 的概念可以类比到 C++ 或是 Java 中的一对大括号，或是 Python 语言中的一个缩进块；
+* 进入同 Op 的某一个 :code:`Region` (可能是自己)。
+* 返回该 :code:`Region` 的父 Op，表示该 Op 的一次执行的结束。
 
-* :code:`Block` 中的计算由顺序执行、条件选择或者循环执行三种方式组合，构成复杂的计算逻辑；
+具体去处由该 :code:`Region` 的父 Op 的语意决定。
 
-* :code:`Block` 中包含对计算和计算对象的描述。计算的描述称之为 :code:`Operator`；计算作用的对象（或者说 :code:`Operator` 的输入和输出）被统一为 :code:`Tensor`。
+注: 在控制流之前， 一个 :code:`Operation` 由它的输入、输出、属性以及类型信息构成。 加入控制流以后，一个 :code:`Operation` 的内容包含：它的输入(OpOperand)、输出（OpResult）、属性(AttributeMap)、后继块（BlockOperand）、:code:`Region` 组成。 新增了后继块和 :code:`Region` 。
 
 
 .. _api_guide_Block:
@@ -23,21 +62,32 @@ Program
 Block
 =========
 
-:code:`Block` 是高级语言中变量作用域的概念，类似 C 语言或 Java 语言中的一对大括号，其中包含局部变量定义和一系列指令或操作符.
+:code:`Block` 等价于基本块， 里面包含了一个算子列表(``std::list<Operaiton*>``)， 用来表示该基本块的计算语意。
 
-:code:`Block` 是计算图中用于表示计算逻辑的基本单元。它包含一系列操作（:code:`Operator`）和计算对象（:code:`Tensor`），支持顺序执行、条件选择和循环执行等控制结构，从而构建复杂的计算流程。
+当 :code:`Block` 的最后一个算子执行结束时，根据块内最后一个算子(终止符算子)的语意，控制流会有两种去处：
 
-:code:`Block` 的主要特点：
+* 进入同 :code:`Region` 的另外一个 :code:`Block` , 该 :code:`Block` 一定是终止符算子的后继块。
+* 返回该 :code:`Block` 的父 :code:`Region` , 表示该 :code:`Region` 的一次执行的结束。
 
-* 计算描述： :code:`Block` 内部包含多个 :code:`Operator`，每个 Operator 表示一个计算操作，如加法、卷积等。
+.. _api_guide_Operation:
 
-* 对象描述： :code:`Block` 中的计算对象统一为 :code:`Tensor`，表示多维数组或矩阵，是数据存储和传输的基本单元。
+=============
+Operation
+=============
 
-* 控制结构： :code:`Block` 支持顺序执行、条件选择和循环执行等控制结构，使得计算流程更加灵活和复杂。
+算子( :code:`Operation` )是有向图的节点。 算子信息分为四部分：输入(OpOperandImpl)、输出(OpResultImpl)、属性(Attribute)、类型信息(OpInfo)。 其中，输入和输出的数量因为在构造的时候才能确定，而且构造完以后，数量就不会再改变。
 
-在飞桨的计算图中，:code:`Block` 、:code:`Operator` 和 :code:`Tensor` 共同构成了计算流程的骨架。Block 提供了容器功能，组织和管理内部的 :code:`Operator` 和 :code:`Tensor`，从而实现高效的计算图构建和执行。
+Attribute 来描述一个属性。用户可以在算子中临时存储一些运行时属性，但是运行时属性只能用来辅助计算，不允许改变计算语意。模型在导出时，默认会裁剪掉所有的运行时属性。
 
+算子类型信息(OpInfo)，本质上是对相同类型的算子所具有的公共性质的抽象。
 
+.. _api_guide_Weight:
+
+=============
+Weight
+=============
+
+权重属性是一种特殊的属性，权重属性的数据量一般会非常大，:code:`Paddle` 将权重单独存储，在模型中通过权重名对权重值进行获取和保存。目前 :code:`Paddle` 所有模型的权重都是 ``Variable`` 类型。
 
 .. _api_guide_Operator:
 
@@ -45,7 +95,7 @@ Block
 Operator
 =============
 
-在 Paddle 中，所有对数据的操作都由 :code:`Operator` 表示 每个 :code:`Operator` 执行特定的功能，如矩阵乘法、卷积、激活函数等，通过组合这些 :code:`Operator`，可以构建复杂的计算图，实现模型的前向传播和反向传播。
+在 :code:`Paddle` 中，所有对数据的操作都由 :code:`Operator` 表示 每个 :code:`Operator` 执行特定的功能，如矩阵乘法、卷积、激活函数等，通过组合这些 :code:`Operator`，可以构建复杂的计算图，实现模型的前向传播和反向传播。
 
 
 
@@ -55,9 +105,9 @@ Operator
 Variable
 =========
 
-Paddle 中的 :code:`Variable` 可以包含任何类型的值———在大多数情况下是一个 :ref:`Tensor <cn_user_guide_tensor>` 。
+:code:`Paddle` 中的 :code:`Variable` 可以包含任何类型的值———在大多数情况下是一个 :ref:`Tensor <cn_user_guide_tensor>` 。
 
-模型中所有的可学习参数都以 :code:`Variable` 的形式保留在内存空间中，您在绝大多数情况下都不需要自己来创建网络中的可学习参数， Paddle 为几乎常见的神经网络基本计算模块都提供了封装。以静态图中最简单的全连接模型为例，调用 :code:`paddle.static.nn.fc` 会直接为全连接层创建连接权值( W )和偏置（ bias ）两个可学习参数，无需显示地调用 :code:`variable` 相关接口创建可学习参数。
+模型中所有的可学习参数都以 :code:`Variable` 的形式保留在内存空间中，您在绝大多数情况下都不需要自己来创建网络中的可学习参数， :code:`Paddle` 为几乎常见的神经网络基本计算模块都提供了封装。以静态图中最简单的全连接模型为例，调用 :code:`paddle.static.nn.fc` 会直接为全连接层创建连接权值( W )和偏置（ bias ）两个可学习参数，无需显示地调用 :code:`variable` 相关接口创建可学习参数。
 
 .. _api_guide_Name:
 
@@ -65,11 +115,11 @@ Paddle 中的 :code:`Variable` 可以包含任何类型的值———在大多
 Name
 =========
 
-Paddle 中部分网络层里包含了 :code:`name` 参数，如 :ref:`cn_api_static_nn_fc` 。此 :code:`name` 一般用来作为网络层输出、权重的前缀标识，具体规则如下：
+:code:`Paddle` 中部分网络层里包含了 :code:`name` 参数，如 :ref:`cn_api_static_nn_fc` 。此 :code:`name` 一般用来作为网络层输出、权重的前缀标识，具体规则如下：
 
-* 用于网络层输出的前缀标识。若网络层中指定了 :code:`name` 参数，Paddle 将以 ``name 值.tmp_数字`` 作为唯一标识对网络层输出进行命名；未指定 :code:`name` 参数时，则以 ``OP 名_数字.tmp_数字`` 的方式进行命名，其中的数字会自动递增，以区分同名 OP 下的不同网络层。
+* 用于网络层输出的前缀标识。若网络层中指定了 :code:`name` 参数，:code:`Paddle` 将以 ``name 值.tmp_数字`` 作为唯一标识对网络层输出进行命名；未指定 :code:`name` 参数时，则以 ``OP 名_数字.tmp_数字`` 的方式进行命名，其中的数字会自动递增，以区分同名 OP 下的不同网络层。
 
-* 用于权重或偏置变量的前缀标识。若在网络层中通过 ``param_attr`` 和 ``bias_attr`` 创建了权重变量或偏置变量， 如 :ref:`cn_api_nn_embedding` 、 :ref:`cn_api_static_nn_fc` ，则 Paddle 会自动生成 ``前缀.w_数字`` 或 ``前缀.b_数字`` 的唯一标识对其进行命名，其中 ``前缀`` 为用户指定的 :code:`name` 或自动生成的 ``OP 名_数字`` 。若在 ``param_attr`` 和 ``bias_attr`` 中指定了 :code:`name` ，则用此 :code:`name` ，不再自动生成。细节请参考示例代码。
+* 用于权重或偏置变量的前缀标识。若在网络层中通过 ``param_attr`` 和 ``bias_attr`` 创建了权重变量或偏置变量， 如 :ref:`cn_api_nn_embedding` 、 :ref:`cn_api_static_nn_fc` ，则 :code:`Paddle` 会自动生成 ``前缀.w_数字`` 或 ``前缀.b_数字`` 的唯一标识对其进行命名，其中 ``前缀`` 为用户指定的 :code:`name` 或自动生成的 ``OP 名_数字`` 。若在 ``param_attr`` 和 ``bias_attr`` 中指定了 :code:`name` ，则用此 :code:`name` ，不再自动生成。细节请参考示例代码。
 
 此外，在 :ref:`cn_api_ParamAttr` 中，可通过指定 :code:`name` 参数实现多个网络层的权重共享。
 
@@ -115,7 +165,7 @@ Paddle 中部分网络层里包含了 :code:`name` 参数，如 :ref:`cn_api_sta
     ret = exe.run(feed={'x': x}, fetch_list=[fc_out, fc1_out, my_fc_out, my_fc2_out], return_numpy=False)
 
 
-上述示例中， ``fc_none`` 和 ``fc_none1`` 均未指定 :code:`name` 参数，则以 ``OP 名_数字.tmp_数字`` 分别对该 OP 输出进行命名：``fc_0.tmp_1`` 和 ``fc_1.tmp_1`` ，其中 ``fc_0``  和 ``fc_1`` 中的数字自动递增以区分两个全连接层； ``my_fc1`` 和 ``my_fc2`` 均指定了 :code:`name` 参数，但取值相同，Paddle 以后缀 ``tmp_数字`` 进行区分，即 ``my_fc.tmp_1`` 和 ``my_fc.tmp_3`` 。
+上述示例中， ``fc_none`` 和 ``fc_none1`` 均未指定 :code:`name` 参数，则以 ``OP 名_数字.tmp_数字`` 分别对该 OP 输出进行命名：``fc_0.tmp_1`` 和 ``fc_1.tmp_1`` ，其中 ``fc_0``  和 ``fc_1`` 中的数字自动递增以区分两个全连接层； ``my_fc1`` 和 ``my_fc2`` 均指定了 :code:`name` 参数，但取值相同，:code:`Paddle` 以后缀 ``tmp_数字`` 进行区分，即 ``my_fc.tmp_1`` 和 ``my_fc.tmp_3`` 。
 
 对于网络层中创建的变量， ``emb`` 层和 ``fc_none`` 、 ``fc_none1`` 层均默认以 ``OP 名_数字`` 为前缀对权重或偏置变量进行命名，如 ``embedding_0.w_0`` 、 ``fc_0.w_0`` 、 ``fc_0.b_0`` ，其前缀与 OP 输出的前缀一致。 ``my_fc1`` 层和 ``my_fc2`` 层则优先以 ``ParamAttr`` 中指定的 ``fc_weight`` 作为共享权重的名称。而偏置变量 ``my_fc.b_0`` 和 ``my_fc.b_1`` 则次优地以 :code:`name` 作为前缀标识。
 
@@ -149,12 +199,3 @@ ParamAttr
 
 
 在上述示例中： :code:`weight_attr` 和 :code:`bias_attr` 分别设置了权重和偏置的属性。:code:`name` 指定参数的名称。:code:`initializer` 设置参数的初始化方式。:code:`regularizer` 设置参数的正则化策略。
-
-=========
-相关 API
-=========
-
-* 用户配置的单个神经网络叫做 :ref:`cn_api_Program` 。值得注意的是，训练神经网络时，用户经常需要配置和操作多个 :code:`Program` 。比如参数初始化的:code:`Program` ， 训练用的 :code:`Program` ，测试用的:code:`Program` 等等。
-
-
-* 用户还可以使用 :ref:`cn_api_program_guard` 配合 :code:`with` 语句，修改配置好的 :ref:`cn_api_default_startup_program` 和 :ref:`cn_api_default_main_program` 。
diff --git a/docs/api_guides/low_level/program_en.rst b/docs/api_guides/low_level/program_en.rst
index 7e7f7f9197f..f22e85282c3 100644
--- a/docs/api_guides/low_level/program_en.rst
+++ b/docs/api_guides/low_level/program_en.rst
@@ -4,25 +4,60 @@
 Basic Concept
 ###############
 
+==================
+IR
+==================
+
+:code:`Paddle` represents the computation graph using an IR (Intermediate Representation) and leverages compiler principles, techniques, and tools to perform automatic optimization and code generation for neural networks.
+
+The new IR represents structured control flow through a recursive nesting of :code:`Operation`, :code:`Region`, and :code:`Block`.
+
+* An :code:`Operation` contains zero or more :code:`Regions`.
+* A :code:`Region` contains zero or more :code:`Blocks`.
+* A :code:`Block` contains zero or more :code:`Operations`.
+
+These three components are recursively nested to describe complex model structures.
+
 ==================
 Program
 ==================
 
-In PaddlePaddle, a Program is a static graph model, similar to programs in other programming languages. Static graph programming follows a "define-and-run" approach:
+A :code:`Program` represents a specific model. It consists of two parts: the computation graph and the weights. The model is equivalent to a directed acyclic graph (DAG), where :code:`Operation` serves as the nodes and :code:`Value` represents the edges.
+
+:code:`Weight` is used to store the model's weight parameters separately, while :code:`Value` and :code:`Operation` abstract the computation graph.
+
+:code:`Operation` represents a node in the computation graph. Each :code:`Operation` corresponds to an operator and contains zero or more :code:`Regions`.
+
+:code:`Region` acts as a closure and contains zero or more :code:`Blocks`.
+
+:code:`Block` represents a basic block conforming to SSA (Static Single Assignment) form and contains zero or more :code:`Operations`.
+
+:code:`Value` represents a directed edge in the computation graph, linking two :code:`Operations` and describing the UD (Use-Define) chain in the program.
+
+In a :code:`Program`, ``ModuleOp module_`` stores the computation graph, while the ``ParameterMap parameters_`` stores the weights. In the ``ModuleOp`` class, a :code:`Block` is used to store the contents of the computation graph.
+
+.. _api_guide_Region_en:
+
+=========
+Region
+=========
+
+A :code:`Region` contains a list of :code:`Blocks`. The first :code:`Block` (if it exists) is referred to as the entry block of that :code:`Region`.
 
-* Define: The complete neural network architecture is predefined in the code.
+Unlike basic blocks, a key constraint of a :code:`Region` is that any :code:`Value` defined within the :code:`Region` can only be used inside that :code:`Region` and cannot be accessed externally.
 
-* Compile: PaddlePaddle represents the neural network as a Program data structure and performs compilation optimizations.
+When control flow enters a :code:`Region`, it effectively creates a new sub-scope. Upon exiting the :code:`Region`, all variables defined within this sub-scope can be reclaimed.
 
-* Execute: An executor is invoked to obtain the computation results.
+Control flow always enters a :code:`Region` through its entry block. Therefore, the parameters of a :code:`Region` can be described using the entry block's parameters without additional handling.
 
-This approach allows for efficient execution but requires the entire network structure to be defined before running the program.
+Once a :code:`Region` completes its execution and control flow returns from a child :code:`Block` to the :code:`Region`, there are two possible outcomes:
 
-* A :code:`Program` consists of nested :code:`Blocks`. The concept of a :code:`Block` can be likened to a pair of curly braces ``{}`` in languages like C++ or Java, or to an indented block in Python.
+* The control flow enters another :code:`Region` of the same Op (which may be itself).
+* The control flow returns to the parent Op of the :code:`Region`, marking the completion of one execution cycle of that Op.
 
-* The computation in the :code:`Block` is composed of three types of execution: sequential execution, conditional selection, and loop execution, which together form a complex computational logic.
+The specific destination is determined by the semantics of the parent Op of the :code:`Region`.
 
-* The :code:`Block` contains descriptions of the computation and the objects involved in the computation. The description of the computation is called the :code:`Operator`; the objects on which the computation acts (or the inputs and outputs of the :code:`Operator`) are unified as :code:`Tensors`.
+Note: Before introducing control flow, an :code:`Operation` consists of its inputs, outputs, attributes, and type information. After incorporating control flow, an :code:`Operation` additionally includes its inputs (:code:`OpOperand`), outputs (:code:`OpResult`), attributes (:code:`AttributeMap`), successor blocks (:code:`BlockOperand`), and :code:`Region`. The successor blocks and :code:`Region` are newly added components.
 
 .. _api_guide_Block_en:
 
@@ -30,17 +65,33 @@ This approach allows for efficient execution but requires the entire network str
 Block
 =========
 
-The :code:`Block` is the concept of variable scope in high-level languages, similar to a pair of curly braces in C or Java, which contain local variable definitions and a series of instructions or operators.
+A :code:`Block` is equivalent to a basic block and contains a list of operators (``std::list<Operation*>``) that represent the computation semantics of the basic block.
 
-The :code:`Block` is the fundamental unit in a computation graph used to represent computational logic. It contains a series of operations (:code:`Operator`) and computational objects (:code:`Tensor`), supporting control structures such as sequential execution, conditional selection, and loop execution, thereby building complex computational flows.
+When the last operator in a :code:`Block` finishes execution, the control flow follows one of two paths based on the semantics of the last operator (terminator operator) in the block:
 
-* Computation description: The :code:`Block` contains multiple :code:`Operators` internally, with each :code:`Operator` representing a computational operation, such as addition, convolution, etc.
+* It transitions to another :code:`Block` within the same :code:`Region`. This :code:`Block` must be a successor block of the terminator operator.
+* It returns to the parent :code:`Region` of the :code:`Block`, indicating the completion of one execution cycle of that :code:`Region`.
 
-* Object description: The computational objects in the :code:`Block` are unified as :code:`Tensors`, representing multi-dimensional arrays or matrices, and are the basic units of data storage and transmission.
+.. _api_guide_Operation_en:
 
-* Control structures: The :code:`Block` supports control structures such as sequential execution, conditional selection, and loop execution, making the computational flow more flexible and complex.
+=============
+Operation
+=============
+
+An :code:`Operation` is a node in a directed graph. The information of an :code:`Operation` is divided into four parts: inputs (:code:`OpOperandImpl`), outputs (:code:`OpResultImpl`), attributes (:code:`Attribute`), and type information (:code:`OpInfo`). The number of inputs and outputs is determined at the time of construction and remains unchanged afterward.
+
+:code:`Attribute` is used to describe an attribute. Users can temporarily store some runtime attributes within an operator, but these runtime attributes are only for assisting computation and are not allowed to alter the computation semantics. When exporting a model, all runtime attributes are removed by default.
+
+:code:`Operation` type information (:code:`OpInfo`) is essentially an abstraction of the common properties shared by operators of the same type.
 
-In the PaddlePaddle computation graph, :code:`Block`, :code:`Operator`, and :code:`Tensor` together form the backbone of the computational flow. The :code:`Block` provides a container function, organizing and managing the internal :code:`Operators` and :code:`Tensors`, thereby enabling efficient construction and execution of the computation graph.
+
+.. _api_guide_Weight_en:
+
+=============
+Weight
+=============
+
+:code:`Weight` attributes are a special type of attribute, typically involving a large amount of data. :code:`Paddle` stores weights separately and retrieves or saves weight values in the model using weight names. Currently, all model weights in :code:`Paddle` are of type ``Variable``.
 
 =============
 Operator
@@ -56,7 +107,7 @@ In Paddle, a :code:`Variable` can contain any type of value — most commonly a
 
 All learnable parameters in the model are stored as :code:`Variable` objects in memory. In most cases, you don't need to manually create the learnable parameters in the network, as Paddle provides wrappers for almost all common neural network basic computation modules. For example, in the simplest fully connected model in a static graph, calling :code:`paddle.static.nn.fc` will automatically create the learnable parameters for the fully connected layer: connection weights (W) and biases (bias), without the need to explicitly call the :code:`variable` interface to create learnable parameters.
 
-.. _api_guide_Name:
+.. _api_guide_Name_en:
 
 =========
 Name
@@ -118,7 +169,7 @@ For variables created in the network layers, the ``emb`` layer, ``fc_none``, and
 
 In the above example, the two fully connected layers, ``my_fc1`` and ``my_fc2``, achieved weight variable sharing by constructing ``ParamAttr`` and specifying the :code:`name` parameter.
 
-.. _api_guide_ParamAttr:
+.. _api_guide_ParamAttr_en:
 
 =========
 ParamAttr
@@ -145,13 +196,3 @@ Sample Code:
                           ))
 
 In the above example, ``weight_attr`` and ``bias_attr`` set the attributes for the weights and biases, respectively. The :code:`name` specifies the name of the parameter. The ``initializer`` sets the initialization method for the parameter, and the ``regularizer`` sets the regularization strategy for the parameter.
-
-==================
-Related API
-==================
-
-
-* The user-configured individual neural network is called a :code:`Program`. It is important to note that during the training of a neural network, users often need to configure and operate multiple :code:`Programs`. For example, a :code:`Program` for parameter initialization, a :code:`Program` for training, and a :code:`Program` for testing, etc.
-
-
-* Users can also use the :ref:`api_program_guard` in conjunction with the :code:`with` statement to modify the configured :ref:`api_default_startup_program` and :ref:`api_default_main_program`.

From 5f650f586e87bf8d6e3e45b7fb7391b861efbda1 Mon Sep 17 00:00:00 2001
From: Abosite <334481978@qq.com>
Date: Thu, 6 Mar 2025 00:01:00 +0800
Subject: [PATCH 5/5] delete all the files in the layers document

---
 .../low_level/layers/activations.rst          |  28 ---
 .../low_level/layers/activations_en.rst       |  49 -----
 .../low_level/layers/control_flow.rst         |  58 ------
 .../low_level/layers/control_flow_en.rst      |  59 ------
 docs/api_guides/low_level/layers/conv.rst     |  65 ------
 docs/api_guides/low_level/layers/conv_en.rst  |  58 ------
 .../low_level/layers/data_feeder.rst          |  44 ----
 .../low_level/layers/data_feeder_en.rst       |  41 ----
 .../low_level/layers/data_in_out.rst          |  32 ---
 .../low_level/layers/data_in_out_en.rst       |  27 ---
 .../api_guides/low_level/layers/detection.rst | 101 ---------
 .../low_level/layers/detection_en.rst         |  62 ------
 docs/api_guides/low_level/layers/index.rst    |  20 --
 docs/api_guides/low_level/layers/index_en.rst |  20 --
 .../layers/learning_rate_scheduler.rst        |  69 -------
 .../layers/learning_rate_scheduler_en.rst     |  50 -----
 .../low_level/layers/loss_function.rst        |  60 ------
 .../low_level/layers/loss_function_en.rst     |  61 ------
 docs/api_guides/low_level/layers/math.rst     | 193 ------------------
 docs/api_guides/low_level/layers/math_en.rst  | 193 ------------------
 docs/api_guides/low_level/layers/pooling.rst  |  80 --------
 .../low_level/layers/pooling_en.rst           |  80 --------
 docs/api_guides/low_level/layers/sequence.rst | 111 ----------
 .../low_level/layers/sequence_en.rst          | 110 ----------
 .../low_level/layers/sparse_update.rst        |  45 ----
 .../low_level/layers/sparse_update_en.rst     |  45 ----
 docs/api_guides/low_level/layers/tensor.rst   | 141 -------------
 .../api_guides/low_level/layers/tensor_en.rst | 141 -------------
 28 files changed, 2043 deletions(-)
 delete mode 100644 docs/api_guides/low_level/layers/activations.rst
 delete mode 100644 docs/api_guides/low_level/layers/activations_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/control_flow.rst
 delete mode 100755 docs/api_guides/low_level/layers/control_flow_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/conv.rst
 delete mode 100755 docs/api_guides/low_level/layers/conv_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/data_feeder.rst
 delete mode 100755 docs/api_guides/low_level/layers/data_feeder_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/data_in_out.rst
 delete mode 100755 docs/api_guides/low_level/layers/data_in_out_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/detection.rst
 delete mode 100755 docs/api_guides/low_level/layers/detection_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/index.rst
 delete mode 100644 docs/api_guides/low_level/layers/index_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/learning_rate_scheduler.rst
 delete mode 100755 docs/api_guides/low_level/layers/learning_rate_scheduler_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/loss_function.rst
 delete mode 100755 docs/api_guides/low_level/layers/loss_function_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/math.rst
 delete mode 100644 docs/api_guides/low_level/layers/math_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/pooling.rst
 delete mode 100755 docs/api_guides/low_level/layers/pooling_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/sequence.rst
 delete mode 100644 docs/api_guides/low_level/layers/sequence_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/sparse_update.rst
 delete mode 100755 docs/api_guides/low_level/layers/sparse_update_en.rst
 delete mode 100644 docs/api_guides/low_level/layers/tensor.rst
 delete mode 100755 docs/api_guides/low_level/layers/tensor_en.rst

diff --git a/docs/api_guides/low_level/layers/activations.rst b/docs/api_guides/low_level/layers/activations.rst
deleted file mode 100644
index 5fecf03707d..00000000000
--- a/docs/api_guides/low_level/layers/activations.rst
+++ /dev/null
@@ -1,28 +0,0 @@
-.. _api_guide_activations:
-
-####
-激活函数
-####
-
-激活函数将非线性的特性引入到神经网络当中。
-
-PaddlePaddle Fluid 对大部分的激活函数进行了支持，其中有:
-
-:ref:`cn_api_fluid_layers_relu`, :ref:`cn_api_fluid_layers_tanh`, :ref:`cn_api_fluid_layers_sigmoid`, :ref:`cn_api_fluid_layers_elu`, :ref:`cn_api_fluid_layers_relu6`, :ref:`cn_api_fluid_layers_pow`, :ref:`cn_api_fluid_layers_stanh`, :ref:`cn_api_fluid_layers_hard_sigmoid`, :ref:`cn_api_fluid_layers_swish`, :ref:`cn_api_fluid_layers_prelu`, :ref:`cn_api_fluid_layers_brelu`, :ref:`cn_api_fluid_layers_leaky_relu`, :ref:`cn_api_fluid_layers_soft_relu`, :ref:`cn_api_fluid_layers_thresholded_relu`, :ref:`cn_api_fluid_layers_maxout`, :ref:`cn_api_fluid_layers_logsigmoid`, :ref:`cn_api_fluid_layers_hard_shrink`, :ref:`cn_api_fluid_layers_softsign`, :ref:`cn_api_fluid_layers_softplus`, :ref:`cn_api_fluid_layers_tanh_shrink`, :ref:`cn_api_fluid_layers_softshrink`, :ref:`cn_api_fluid_layers_exp`。
-
-
-**Fluid 提供了两种使用激活函数的方式：**
-
-- 如果一个层的接口提供了 :code:`act` 变量（默认值为 None），我们可以通过该变量指定该层的激活函数类型。该方式支持常见的激活函数: :code:`relu`, :code:`tanh`, :code:`sigmoid`, :code:`identity`。
-
-.. code-block:: python
-
-    conv2d = fluid.layers.conv2d(input=data, num_filters=2, filter_size=3, act="relu")
-
-
-- Fluid 为每个 Activation 提供了接口，我们可以显式的对它们进行调用。
-
-.. code-block:: python
-
-    conv2d = fluid.layers.conv2d(input=data, num_filters=2, filter_size=3)
-    relu1 = fluid.layers.relu(conv2d)
diff --git a/docs/api_guides/low_level/layers/activations_en.rst b/docs/api_guides/low_level/layers/activations_en.rst
deleted file mode 100644
index 53829ae5696..00000000000
--- a/docs/api_guides/low_level/layers/activations_en.rst
+++ /dev/null
@@ -1,49 +0,0 @@
-.. _api_guide_activations_en:
-
-###################
-Activation Function
-###################
-
-The activation function incorporates non-linearity properties into the neural network.
-
-PaddlePaddle Fluid supports most of the activation functions, including:
-
-:ref:`api_fluid_layers_relu`,
-:ref:`api_fluid_layers_tanh`,
-:ref:`api_fluid_layers_sigmoid`,
-:ref:`api_fluid_layers_elu`,
-:ref:`api_fluid_layers_relu6`,
-:ref:`api_fluid_layers_pow`,
-:ref:`api_fluid_layers_stanh`,
-:ref:`api_fluid_layers_hard_sigmoid`,
-:ref:`api_fluid_layers_swish`,
-:ref:`api_fluid_layers_prelu`,
-:ref:`api_fluid_layers_brelu`,
-:ref:`api_fluid_layers_leaky_relu`,
-:ref:`api_fluid_layers_soft_relu`,
-:ref:`api_fluid_layers_thresholded_relu`,
-:ref:`api_fluid_layers_maxout`,
-:ref:`api_fluid_layers_logsigmoid`,
-:ref:`api_fluid_layers_hard_shrink`,
-:ref:`api_fluid_layers_softsign`,
-:ref:`api_fluid_layers_softplus`,
-:ref:`api_fluid_layers_tanh_shrink`,
-:ref:`api_fluid_layers_softshrink`,
-:ref:`api_fluid_layers_exp`.
-
-
-**Fluid provides two ways to use the activation function:**
-
-- If a layer interface provides :code:`act` variables (default None), we can specify the type of layer activation function through this parameter. This mode supports common activation functions :code:`relu`, :code:`tanh`, :code:`sigmoid`, :code:`identity`.
-
-.. code-block:: python
-
-    conv2d = fluid.layers.conv2d(input=data, num_filters=2, filter_size=3, act="relu")
-
-
-- Fluid provides an interface for each Activation, and we can explicitly call it.
-
-.. code-block:: python
-
-    conv2d = fluid.layers.conv2d(input=data, num_filters=2, filter_size=3)
-    relu1 = fluid.layers.relu(conv2d)
diff --git a/docs/api_guides/low_level/layers/control_flow.rst b/docs/api_guides/low_level/layers/control_flow.rst
deleted file mode 100644
index 9fb350b6088..00000000000
--- a/docs/api_guides/low_level/layers/control_flow.rst
+++ /dev/null
@@ -1,58 +0,0 @@
-.. _api_guide_control_flow:
-
-######
-控制流
-######
-
-在程序语言中，控制流(control flow)决定了语句的执行顺序，常见的控制流包括顺序执行、分支和循环等。PaddlePaddle Fluid 继承了这一概念，提供了多种控制流 API, 以控制深度学习模型在训练或者预测过程中的执行逻辑。
-
-IfElse
-======
-
-条件分支，允许对同一个 batch 的输入，根据给定的条件，分别选择 :code:`true_block` 或 :code:`false_block` 中的逻辑进行执行，执行完成之后再将两个分支的输出合并为同一个输出。通常，条件表达式可由 :ref:`cn_api_fluid_layers_less_than`, :ref:`cn_api_fluid_layers_equal` 等逻辑比较 API 产生。
-
-请参考 :ref:`cn_api_fluid_layers_IfElse`
-
-**注意：** 强烈建议您使用新的 OP :ref:`cn_api_fluid_layers_cond` 而不是 ``IfElse``。:ref:`cn_api_fluid_layers_cond` 的使用方式更简单，并且调用该 OP 所用的代码更少且功能与 ``IfElse`` 一样。
-
-Switch
-======
-
-多分支选择结构，如同程序语言中常见的 :code:`switch-case` 声明, 其根据输入表达式的取值不同，选择不同的分支执行。具体来说，Fluid 所定义的 :code:`Switch` 控制流有如下特性：
-
-* case 的条件是个 bool 类型的值，即在 Program 中是一个张量类型的 Variable；
-* 依次检查逐个 case，选择第一个满足条件的 case 执行，完成执行后即退出所属的 block；
-* 如果所有 case 均不满足条件，会选择默认的 case 进行执行。
-
-请参考 :ref:`cn_api_fluid_layers_Switch`
-
-**注意：** 强烈建议您使用新的 OP :ref:`cn_api_fluid_layers_case` 而不是 ``Switch``。 :ref:`cn_api_fluid_layers_case` 的使用方式更简单，并且调用该 OP 所用的代码更少且功能与 ``Switch`` 一样。
-
-While
-=====
-
-While 循环，当条件判断为真时，循环执行 :code:`While` 控制流所属 :code:`block` 内的逻辑，条件判断为假时退出循环。与之相关的 API 有
-
-* :ref:`cn_api_fluid_layers_increment` ：累加 API，通常用于对循环次数进行计数；
-* :ref:`cn_api_fluid_layers_array_read` ：从 :code:`DENSE_TENSOR_ARRAY` 中指定的位置读入 Variable，进行计算；
-* :ref:`cn_api_fluid_layers_array_write` ：将 Variable 写回到 :code:`DENSE_TENSOR_ARRAY` 指定的位置，存储计算结果。
-
-请参考 :ref:`cn_api_fluid_layers_While`
-
-**注意：** 强烈建议您使用新的 OP :ref:`cn_api_fluid_layers_while_loop` 而不是 ``While``。 :ref:`cn_api_fluid_layers_while_loop` 的使用方式更简单，并且调用该 OP 所用的代码更少且功能与 ``While`` 一样。
-
-DynamicRNN
-==========
-
-即动态 RNN，可处理一个 batch 不等长的序列数据，其接受 :code:`lod_level=1` 的 Variable 作为输入，在 :code:`DynamicRNN` 的 :code:`block` 内，用户需自定义 RNN 的单步计算逻辑。在每一个时间步，用户可将需记忆的状态写入到 :code:`DynamicRNN` 的 :code:`memory` 中，并将需要的输出写出到其 :code:`output` 中。
-
-:ref:`cn_api_fluid_layers_sequence_last_step` 可获取 :code:`DynamicRNN` 最后一个时间步的输出。
-
-请参考 :ref:`cn_api_fluid_layers_DynamicRNN`
-
-StaticRNN
-=========
-
-即静态 RNN，只能处理固定长度的序列数据，接受 :code:`lod_level=0` 的 Variable 作为输入。与 :code:`DynamicRNN` 类似，在 RNN 的每单个时间步，用户需自定义计算逻辑，并可将状态和输出写出。
-
-请参考 :ref:`cn_api_fluid_layers_StaticRNN`
diff --git a/docs/api_guides/low_level/layers/control_flow_en.rst b/docs/api_guides/low_level/layers/control_flow_en.rst
deleted file mode 100755
index 1eec6e11857..00000000000
--- a/docs/api_guides/low_level/layers/control_flow_en.rst
+++ /dev/null
@@ -1,59 +0,0 @@
-.. api_guide_control_flow_en:
-
-#############
-Control Flow
-#############
-
-In programming languages, the control flow determines the order in which statements are executed. Common control flows contain sequential execution, branching, and looping. PaddlePaddle Fluid inherits this concept and provides a variety of control flow APIs to control the execution logic of the deep learning model during training or prediction.
-
-IfElse
-======
-
-Conditional branch, for the input of a batch, according to the given conditions, select the process in :code:`true_block` or :code:`false_block` to execute respectively, and then merge the outputs of the two branches into one after the execution. In general, conditional expressions can be generated by a logical comparison API such as :ref:`api_fluid_layers_less_than`, :ref:`api_fluid_layers_equal`.
-
-Please refer to :ref:`api_fluid_layers_IfElse`
-
-**Note:** A new OP :ref:`api_fluid_layers_cond` is highly recommended instead of ``IfElse`` . OP :ref:`api_fluid_layers_cond` is easier to use and is called with less code but does the same thing as ``IfElse`` .
-
-Switch
-======
-
-Switch, like the :code:`switch-case` declaration commonly found in programming languages, selects different branch to execute depending on the value of the input expression. Specifically, the :code:`Switch` control flow defined by Fluid has the following characteristics:
-
-* The condition of the case is a bool type value, which is a tensor type Variable in the Program;
-* It checks each case one by one, selects the first case that satisfies the condition, and exits the block after completion of the execution;
-* If all cases do not meet the conditions, the default case will be selected for execution.
-
-Please refer to :ref:`api_fluid_layers_Switch`
-
-**Note:** A new OP :ref:`api_fluid_layers_case` is highly recommended instead of ``Switch`` . OP :ref:`api_fluid_layers_case` is easier to use and is called with less code but does the same thing as ``Switch`` .
-
-While
-=====
-
-When the condition is true, repeatedly execute logic in the :code:`block` which :code:`While` flow belongs to until the condition is judged to be false and the loop will be ended. The related APIs are as follows:
-
-* :ref:`api_fluid_layers_increment` : It is usually used to count the number of loops;
-* :ref:`api_fluid_layers_array_read` : Reads Variable from the specified location in :code:`DENSE_TENSOR_ARRAY` to perform calculations;
-* :ref:`api_fluid_layers_array_write` : Writes the Variable back to the specified location in :code:`DENSE_TENSOR_ARRAY` and stores the result of the calculation.
-
-Please refer to :ref:`api_fluid_layers_While`
-
-**Note**: A new OP :ref:`api_fluid_layers_while_loop` is highly recommended instead of ``While`` . OP :ref:`api_fluid_layers_while_loop` is easier to use and is called with less code but does the same thing as ``While`` .
-
-
-DynamicRNN
-==========
-
-Dynamic RNN can process a batch of unequal(variable)-length sequence data, which accepts the variable with :code:`lod_level=1` as input. In the :code:`block` of :code:`DynamicRNN`, the user needs to customize RNN's single-step calculation logic. At each time step, the user can write the state to be remembered to the :code:`memory` of :code:`DynamicRNN` and write the required output to its :code:`output`.
-
-:ref:`api_fluid_layers_sequence_last_step` gets the output of the last time step of :code:`DynamicRNN`.
-
-Please refer to :ref:`api_fluid_layers_DynamicRNN`
-
-StaticRNN
-=========
-
-Static RNN can only process fixed-length sequence data, and accept Variable with :code:`lod_level=0` as input. Similar to :code:`DynamicRNN`, at each single time step of the RNN, the user needs to customize the calculation logic and export the status and output.
-
-Please refer to :ref:`api_fluid_layers_StaticRNN`
diff --git a/docs/api_guides/low_level/layers/conv.rst b/docs/api_guides/low_level/layers/conv.rst
deleted file mode 100644
index 7a4a4b08b61..00000000000
--- a/docs/api_guides/low_level/layers/conv.rst
+++ /dev/null
@@ -1,65 +0,0 @@
-.. _api_guide_conv:
-
-#####
-卷积
-#####
-
-卷积有两组输入：特征图和卷积核，依据输入特征和卷积核的形状、Layout 不同、计算方式的不同，在 Fluid 里，有针对变长序列特征的一维卷积，有针对定长图像特征的二维(2D Conv)、三维卷积(3D Conv)，同时也有卷积计算的逆向过程，下面先介绍 Fluid 里的 2D/3D 卷积，再来介绍序列卷积。
-
-
-2D/3D 卷积
-==============
-
-1. 卷积输入参数：
----------------------
-
-卷积需要依据滑动步长(stride)、填充长度(padding)、卷积核窗口大小(filter size)、分组数(groups)、扩张系数(dilation rate)来决定如何计算。groups 最早在 `AlexNet <https://www.nvidia.cn/content/tesla/pdf/machine-learning/imagenet-classification-with-deep-convolutional-nn.pdf>`_ 中引入, 可以理解为将原始的卷积分为独立若干组卷积计算。
-
-  **注意**: 同 cuDNN 的方式，Fluid 目前只支持在特征图上下填充相同的长度，左右也是。
-
-- 输入输出 Layout:
-
-  2D 卷积输入特征的 Layout 为[N, C, H, W]或[N, H, W, C], N 即 batch size，C 是通道数，H、W 是特征的高度和宽度，输出特征和输入特征的 Layout 一致。(相应的 3D 卷积输入特征的 Layout 为[N, C, D, H, W]或[N, D, H, W, C]，但 **注意**，Fluid 的卷积当前只支持[N, C, H, W]，[N, C, D, H, W]。)
-
-- 卷积核的 Layout:
-
-  Fluid 中 2D 卷积的卷积核(也称权重)的 Layout 为[C_o, C_in / groups, f_h, f_w]，C_o、C_in 表示输出、输入通道数，f_h、f_w 表示卷积核窗口的高度和宽度，按行序存储。(相应的 3D 卷积的卷积核 Layout 为[C_o, C_in / groups, f_d, f_h, d_w]，同样按行序存储。)
-
-- 深度可分离卷积(depthwise separable convolution):
-
-  在深度可分离卷积中包括 depthwise convolution 和 pointwise convolution 两组，这两个卷积的接口和上述普通卷积接口相同。前者可以通过给普通卷积设置 groups 来做，后者通过设置卷积核 filters 的大小为 1x1，深度可分离卷积减少参数的同时减少了计算量。
-
-  对于 depthwise convolution，可以设置 groups 等于输入通道数，此时，2D 卷积的卷积核形状为[C_o, 1, f_h, f_w]。
-  对于 pointwise convolution，卷积核的形状为[C_o, C_in, 1, 1]。
-
-  **注意**：Fluid 针对 depthwise convolution 的 GPU 计算做了高度优化，您可以通过在
-  :code:`fluid.layers.conv2d` 接口设置 :code:`use_cudnn=False` 来使用 Fluid 自身优化的 CUDA 程序。
-
-- 空洞卷积(dilated convolution):
-
-  空洞卷积相比普通卷积而言，卷积核在特征图上取值时不在连续，而是间隔的，这个间隔数称作 dilation，等于 1 时，即为普通卷积，空洞卷积相比普通卷积的感受野更大。
-
-- API 汇总:
- - :ref:`cn_api_fluid_layers_conv2d`
- - :ref:`cn_api_fluid_layers_conv3d`
- - :ref:`cn_api_fluid_layers_conv2d_transpose`
- - :ref:`cn_api_fluid_layers_conv3d_transpose`
-
-
-1D 序列卷积
-==============
-
-Fluid 可以表示变长的序列结构，这里的变长是指不同样本的时间步(step)数不一样，通常是一个 2D 的 Tensor 和一个能够区分的样本长度的辅助结构来表示。假定，2D 的 Tensor 的形状是 shape，shape[0]是所有样本的总时间步数，shape[1]是序列特征的大小。
-
-基于此数据结构的卷积在 Fluid 里称作序列卷积，也表示一维卷积。同图像卷积，序列卷积的输入参数有卷积核大小、填充大小、滑动步长，但与 2D 卷积不同的是，这些参数个数都为 1。**注意**，目前仅支持 stride 为 1 的情况，输出序列的时间步数和输入序列相同。
-
-假如：输入序列形状为(T, N)， T 即该序列的时间步数，N 是序列特征大小；卷积核的上下文步长为 K，输出序列长度为 M，则卷积核权重形状为(K * N, M），输出序列形状为(T, M)。
-
-另外，参考 DeepSpeech，Fluid 实现了行卷积 row convolution, 或称
-`look ahead convolution <http://www.cs.cmu.edu/~dyogatam/papers/wang+etal.iclrworkshop2016.pdf>`_ ，
-该卷积相比上述普通序列卷积可以减少参数。
-
-
-- API 汇总:
- - :ref:`cn_api_fluid_layers_sequence_conv`
- - :ref:`cn_api_fluid_layers_row_conv`
diff --git a/docs/api_guides/low_level/layers/conv_en.rst b/docs/api_guides/low_level/layers/conv_en.rst
deleted file mode 100755
index 4dd4c8ea611..00000000000
--- a/docs/api_guides/low_level/layers/conv_en.rst
+++ /dev/null
@@ -1,58 +0,0 @@
-.. _api_guide_conv_en:
-
-#############
-Convolution
-#############
-
-Convolution has two sets of inputs: feature maps and convolution kernels. Depending on the input features, the shape of the convolution kernel, the layout and the calculation method, in Fluid, there is a one-dimensional convolution for variable-length sequence features, two-dimensional (2D Conv) and three-dimensional convolution (3D Conv) for fixed-length image features. At the same time, there is also a reverse(backward) process of convolution calculation. The subsequent content describes the 2D/3D convolution in Fluid, and then introduces the sequence convolution.
-
-
-2D/3D Convolution
-==================
-
-1. Input parameters of convolution:
---------------------------------------
-The convolution needs to be determined according to stride, padding, filter size, groups, and dilation rate. Groups were first introduced in `AlexNet <https://www.nvidia.cn/content/tesla/pdf/machine-learning/imagenet-classification-with-deep-convolutional-nn.pdf>`_ . It can be considered that the original convolution is split into independent sets of convolution to be calculated.
-
-**Note**: In the same way as cuDNN, Fluid currently only supports padding upper and lower part of feature maps with equal length , as well as that for left and right part.
-
-- The layout(shape) of Input and Output :
-
-  Layout of input feature of 2D convolution is [N, C, H, W] or [N, H, W, C], where N is the batch size, C is the number of channels, H,W is the height and width of feature. Layout of input feature is the same as that of output feature. (Layout of input feature of 3D convolution is [N, C, D, H, W] or [N, D, H, W, C]. But **note**, Fluid convolution currently only supports [N, C, H, W],[N, C, D, H, W].)
-
-- The layout of convolution kernel:
-
-  The layout of the 2D_conv convolution kernel (also called weight) in Fluid is [C_o, C_in / groups, f_h, f_w], where C_o, C_in represent the number of output and input channels, and f_h, f_w represent the height and width of filter, which are stored in row order. (The corresponding 2D_conv convolution kernel layout is [C_o, C_in / groups, f_d, f_h, d_w], which is also stored in row order.)
-
-- Depthwise Separable Convolution:
-
-  Depthwise Separable Convolution contains depthwise convolution 和 pointwise convolution. The interfaces of these two convolutions are the same as the above normal convolutional interfaces. The former can be performed by setting groups for ordinary convolutions. The latter can be realised by setting the size of the convolution kernel filters to 1x1. Depthwise Separable Convolution reduces the parameters as well as the volume of computation.
-
-  For depthwise convolution, you can set groups equal to the number of input channels. At this time, the convolution kernel shape of the 2D convolution is [C_o, 1, f_h, f_w]. For pointwise convolution, the shape of the convolution kernel is [C_o, C_in, 1, 1].
-
-  **Note**: Fluid optimized GPU computing for depthwise convolution greatly. You can use Fluid's self-optimized CUDA program by setting :code:`use_cudnn=False` in the :code:`fluid.layers.conv2d` interface.
-
-- Dilated Convolution:
-
-  Compared with ordinary convolution, for dilated convolution, the convolution kernel does not continuously read values from the feature map, but with intervals. This interval is called dilation. When it is equal to 1, it becomes ordinary convolution. And receptive fields of dilated convolution is larger than that of ordinary convolution.
-
-
-- related API:
- - :ref:`api_fluid_layers_conv2d`
- - :ref:`api_fluid_layers_conv3d`
- - :ref:`api_fluid_layers_conv2d_transpose`
- - :ref:`api_fluid_layers_conv3d_transpose`
-
-
-1D sequence convolution
-=========================
-
-Fluid can represent a variable-length sequence structure. The variable length here means that the number of time steps of different samples is different. It is usually represented by a 2D Tensor and an auxiliary structure that can distinguish the sample length. Assume that the shape of the 2D Tensor is shape, shape[0] is the total number of time steps for all samples, and shape[1] is the size of the sequence feature.
-
-Convolution based on this data structure is called sequence convolution in Fluid and also represents one-dimensional convolution. Similar to image convolution, the input parameters of the sequence convolution contain the filter size, the padding size, and the size of sliding stride. But unlike the 2D convolution, the number of each parameter is 1. **Note**, it currently only supports stride = 1. The output sequence has the same number of time steps as the input sequence.
-
-Suppose the input sequence shape is (T, N), while T is the number of time steps of the sequence, and N is the sequence feature size; The convolution kernel has a context stride of K. The length of output sequence is M, the shape of convolution kernel weight is (K * N, M), and the shape of output sequence is (T, M).
-
-- related API:
- - :ref:`api_fluid_layers_sequence_conv`
- - :ref:`api_fluid_layers_row_conv`
diff --git a/docs/api_guides/low_level/layers/data_feeder.rst b/docs/api_guides/low_level/layers/data_feeder.rst
deleted file mode 100644
index fa1d4d3db19..00000000000
--- a/docs/api_guides/low_level/layers/data_feeder.rst
+++ /dev/null
@@ -1,44 +0,0 @@
-..  _api_guide_data_feeder:
-
-使用 DataFeeder 传入训练/预测数据
-###################################
-
-Fluid 提供 :code:`DataFeeder` 类，将 numpy array 等数据转换为 :code:`DenseTensor` 类型传入训练/预测网络。
-
-用户创建 :code:`DataFeeder` 对象的方式为：
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-
-    image = fluid.layers.data(name='image', shape=[-1, 3, 224, 224], dtype='float32')
-    label = fluid.layers.data(name='label', shape=[-1, 1], dtype='int64')
-    place = fluid.CUDAPlace(0) if fluid.core.is_compiled_with_cuda() else fluid.CPUPlace()
-    feeder = fluid.DataFeeder(feed_list=[image, label], place=place)
-
-其中，:code:`feed_list` 参数为变量列表，这些变量由 :code:`fluid.layers.data()` 创建，
-:code:`place` 参数表示应将 Python 端传入的 numpy array 等数据转换为 GPU 端或是 CPU 端的 :code:`DenseTensor` 。
-创建 :code:`DataFeeder` 对象后，用户可调用其 :code:`feed(iterable)` 方法将用户传入的
-:code:`iterable` 数据转换为 :code:`DenseTensor`。
-
-:code:`iterable` 应为 Python List 或 Tuple 类型对象，且 :code:`iterable` 的每个元素均为长度为 N 的
-Python List 或 Tuple 类型对象，其中 N 为创建 :code:`DataFeeder` 对象时传入的 :code:`feed_list` 变量个数。
-
-:code:`iterable` 的具体格式为：
-
-.. code-block:: python
-
-    iterable = [
-        (image_1, label_1),
-        (image_2, label_2),
-        ...
-        (image_n, label_n)
-    ]
-
-其中，:code:`image_i` 与 :code:`label_i` 均为 numpy array 类型数据。若传入数据的维度为[1]，如 :code:`label_i`,
-则可传入 Python int、float 等类型数据。 :code:`image_i` 与 :code:`label_i` 的数据类型和维度不必
-与 :code:`fluid.layers.data()` 创建时指定的 :code:`dtype` 和 :code:`shape` 完全一致，:code:`DataFeeder` 内部
-会完成数据类型和维度的转换。若 :code:`feed_list` 中的变量的 :code:`lod_level` 不为零，则 Fluid 会将经过维度转换后的
-:code:`iterable` 中每行数据的第 0 维作为返回结果的 :code:`LoD`。
-
-具体使用方法请参见 :ref:`cn_api_fluid_DataFeeder` 。
diff --git a/docs/api_guides/low_level/layers/data_feeder_en.rst b/docs/api_guides/low_level/layers/data_feeder_en.rst
deleted file mode 100755
index 243a803e7c3..00000000000
--- a/docs/api_guides/low_level/layers/data_feeder_en.rst
+++ /dev/null
@@ -1,41 +0,0 @@
-.. _api_guide_data_feeder_en:
-
-Feed training/inference data with DataFeeder
-########################################################
-
-Fluid provides the :code:`DataFeeder` class, which converts data types such as numpy array into a :code:`DenseTensor` type to feed the training/inference network.
-
-To create a :code:`DataFeeder` object:
-
-.. code-block:: python
-
-    import paddle.fluid as fluid
-
-    image = fluid.layers.data(name='image', shape=[-1, 3, 224, 224], dtype='float32')
-    label = fluid.layers.data(name='label', shape=[-1, 1], dtype='int64')
-    place = fluid.CUDAPlace(0) if fluid.core.is_compiled_with_cuda() else fluid.CPUPlace()
-    feeder = fluid.DataFeeder(feed_list=[image, label], place=place)
-
-The :code:`feed_list` parameter is a list of variables created by :code:`fluid.layers.data()` .
-The :code:`place` parameter indicates that data such as numpy array passed in from the Python side should be converted to GPU or CPU :code:`DenseTensor`.
-After creating the :code:`DataFeeder` object, the user can call the :code:`feed(iterable)` method to convert :code:`iterable` data given by user into :code:`DenseTensor` .
-
-:code:`iterable` should be a object of Python List or a Tuple type, and each element in :code:`iterable` is a Python List of length N or Tuple type object, where N is the number of :code:`feed_list` variables passed in when the :code:`DataFeeder` object is created.
-
-The concrete format of :code:`iterable` is:
-
-.. code-block:: python
-
-    iterable = [
-        (image_1, label_1),
-        (image_2, label_2),
-        ...
-        (image_n, label_n)
-    ]
-
-:code:`image_i` and :code:`label_i` are both numpy array data. If the dimension of the input data is [1], such as :code:`label_i`,
-you can feed Python int, float, and other types of data. The data types and dimensions of :code:`image_i` and :code:`label_i` are not necessarily
-the same as :code:`dtype` and :code:`shape` specified at :code:`fluid.layers.data()`. :code:`DataFeeder` internally
-performs the conversion of data types and dimensions. If the :code:`lod_level` of the variable in :code:`feed_list` is not zero, in Fluid, the 0th dimension of each row in the dimensionally converted :code:`iterable` will be returned as :code:`LoD` .
-
-Read :ref:`api_fluid_DataFeeder` for specific usage.
diff --git a/docs/api_guides/low_level/layers/data_in_out.rst b/docs/api_guides/low_level/layers/data_in_out.rst
deleted file mode 100644
index 3d6dbd5b849..00000000000
--- a/docs/api_guides/low_level/layers/data_in_out.rst
+++ /dev/null
@@ -1,32 +0,0 @@
-..  _api_guide_data_in_out:
-
-数据输入输出
-###############
-
-
-数据输入
--------------
-
-Fluid 支持两种数据输入方式，包括：
-
-1. Python Reader: 纯 Python 的 Reader。用户在 Python 端定义 :code:`fluid.layers.data` 层构建网络，并通过
-:code:`executor.run(feed=...)` 的方式读入数据。数据读取和模型训练/预测的过程是同步进行的。
-
-2. PyReader: 高效灵活的 C++ Reader 接口。PyReader 内部维护容量为 :code:`capacity` 的队列（队列容量由
-:code:`fluid.layers.py_reader` 接口中的 :code:`capacity` 参数设置），Python 端调用队列的 :code:`push`
-方法送入训练/预测数据，C++端的训练/预测程序调用队列的 :code:`pop` 方法取出 Python 端送入的数据。PyReader 可与
-:code:`double_buffer` 配合使用，实现数据读取和训练/预测的异步执行。
-
-具体使用方法请参考 :ref:`cn_api_fluid_layers_py_reader`。
-
-
-数据输出
-------------
-
-Fluid 支持在训练/预测阶段获取当前 batch 的数据。
-
-用户可通过 :code:`executor.run(fetch_list=[...], return_numpy=...)` 的方式
-fetch 期望的输出变量，通过设置 :code:`return_numpy` 参数设置是否将输出数据转为 numpy array。
-若 :code:`return_numpy` 为 :code:`False` ，则返回 :code:`DenseTensor` 类型数据。
-
-具体使用方式请参考相关 API 文档 :ref:`cn_api_paddle_static_Executor`。
diff --git a/docs/api_guides/low_level/layers/data_in_out_en.rst b/docs/api_guides/low_level/layers/data_in_out_en.rst
deleted file mode 100755
index 7b5b0bb227f..00000000000
--- a/docs/api_guides/low_level/layers/data_in_out_en.rst
+++ /dev/null
@@ -1,27 +0,0 @@
-.. _api_guide_data_in_out_en:
-
-Data input and output
-######################
-
-
-Data input
--------------
-
-Fluid supports two methods for data input, including:
-
-1. Python Reader: A pure Python Reader. The user defines the :code:`fluid.layers.data` layer on the Python side and builds the network.
-Then, read the data by calling :code:`executor.run(feed=...)` . The process of data reading and model training/inference is performed simultaneously.
-
-2. PyReader: An Efficient and flexible C++ Reader interface. PyReader internally maintains a queue with size of :code:`capacity`  (queue capacity is determined by
-:code:`capacity` parameter in the :code:`fluid.layers.py_reader` interface ). Python side call queue :code:`push` to feed the training/inference data, and the C++ side training/inference program calls the :code:`pop` method to retrieve the data sent by the Python side. PyReader can work in conjunction with :code:`double_buffer` to realize asynchronous execution of data reading and model training/inference.
-
-For details, please refer to :ref:`api_fluid_layers_py_reader`.
-
-
-Data output
-------------
-
-Fluid supports obtaining data for the current batch in the training/inference phase.
-
-The user can fetch expected variables from :code:`executor.run(fetch_list=[...], return_numpy=...)` . User can determine whether to convert the output data to numpy array by setting the :code:`return_numpy` parameter.
-If :code:`return_numpy` is :code:`False` , data of type :code:`DenseTensor` will be returned.
diff --git a/docs/api_guides/low_level/layers/detection.rst b/docs/api_guides/low_level/layers/detection.rst
deleted file mode 100644
index 2f289edccdf..00000000000
--- a/docs/api_guides/low_level/layers/detection.rst
+++ /dev/null
@@ -1,101 +0,0 @@
-..  _api_guide_detection:
-
-
-图像检测
-#########
-
-PaddlePaddle Fluid 在图像检测任务中实现了多个特有的操作。以下分模型介绍各个 api：
-
-通用操作
--------------
-
-图像检测中的一些通用操作，是对检测框的一系列操作，其中包括：
-
-* 对检测框的编码，解码（box_coder）：实现两种框之间编码和解码的转换。例如训练阶段对先验框和真实框进行编码得到训练目标值。API Reference 请参考 :ref:`cn_api_fluid_layers_box_coder`
-
-* 比较两个检测框并进行匹配：
-
-  * iou_similarity：计算两组框的 IOU 值。API Reference 请参考 :ref:`cn_api_fluid_layers_iou_similarity`
-
-  * bipartite_match：通过贪心二分匹配算法得到每一列中距离最大的一行。API Reference 请参考 :ref:`cn_api_fluid_layers_bipartite_match`
-
-* 根据检测框和标签得到分类和回归目标值（target_assign）：通过匹配索引和非匹配索引得到目标值和对应权重。API Reference 请参考 :ref:`cn_api_fluid_layers_target_assign`
-
-* 对检测框进行后处理：
-
-  * box_clip: 将检测框剪切到指定大小。API Reference 请参考 :ref:`cn_api_fluid_layers_box_clip`
-
-  * multiclass_nms: 对边界框和评分进行多类非极大值抑制。API Reference 请参考 :ref:`cn_api_fluid_layers_multiclass_nms`
-
-
-RCNN
--------------
-
-RCNN 系列模型是两阶段目标检测器，其中包含`Faster RCNN <https://arxiv.org/abs/1506.01497>`_，`Mask RCNN <https://arxiv.org/abs/1703.06870>`_，相较于传统提取区域的方法，RCNN 中 RPN 网络通过共享卷积层参数大幅提高提取区域的效率，并提出高质量的候选区域。RPN 网络需要对输入 anchor 和真实值进行比较生成初选候选框，并对初选候选框分配分类和回归值，需要如下五个特有 api：
-
-* rpn_target_assign：通过 anchor 和真实框为 anchor 分配 RPN 网络的分类和回归目标值。API Reference 请参考 :ref:`cn_api_fluid_layers_rpn_target_assign`
-
-* anchor_generator：为每个位置生成一系列 anchor。API Reference 请参考 :ref:`cn_api_fluid_layers_anchor_generator`
-
-* generate_proposal_labels: 通过 generate_proposals 得到的候选框和真实框得到 RCNN 部分的分类和回归的目标值。API Reference 请参考 :ref:`cn_api_fluid_layers_generate_proposal_labels`
-
-* generate_proposals: 对 RPN 网络输出 box 解码并筛选得到新的候选框。API Reference 请参考 :ref:`cn_api_fluid_layers_generate_proposals`
-
-* generate_mask_labels: 通过 generate_proposal_labels 得到的 RoI，和真实框对比后进一步筛选 RoI 并得到 Mask 分支的目标值。API Reference 请参考 :ref:`cn_api_fluid_layers_generate_mask_labels`
-
-FPN
--------------
-
-`FPN <https://arxiv.org/abs/1612.03144>`_ 全称 Feature Pyramid Networks, 采用特征金字塔做目标检测。 顶层特征通过上采样和低层特征做融合，并将 FPN 放在 RPN 网络中用于生成候选框，有效的提高检测精度，需要如下两种特有 api：
-
-* collect_fpn_proposals: 拼接多层 RoI，同时选择分数较高的 RoI。API Reference 请参考 :ref:`cn_api_fluid_layers_collect_fpn_proposals`
-
-* distribute_fpn_proposals: 将多个 RoI 依据面积分配到 FPN 的多个层级中。API Reference 请参考 :ref:`cn_api_fluid_layers_distribute_fpn_proposals`
-
-SSD
-----------------
-
-`SSD <https://arxiv.org/abs/1512.02325>`_ 全称 Single Shot MultiBox Detector，是目标检测领域较新且效果较好的检测算法之一，具有检测速度快且检测精度高的特点。与两阶段的检测方法不同，单阶段目标检测并不进行区域推荐，而是直接从特征图回归出目标的边界框和分类概率。SSD 网络对六个尺度特>征图计算损失，进行预测，需要如下五种特有 api：
-
-* 根据不同参数为每个输入位置生成一系列候选框。
-
-  * prior box: API Reference 请参考 :ref:`cn_api_fluid_layers_prior_box`
-
-  * density_prior box: API Reference 请参考 :ref:`cn_api_fluid_layers_density_prior_box`
-
-* multi_box_head ：得到不同 prior box 的位置和置信度。API Reference 请参考 :ref:`cn_api_fluid_layers_multi_box_head`
-
-* detection_output：对 prior box 解码，通过多分类 NMS 得到检测结果。API Reference 请参考 :ref:`cn_api_fluid_layers_detection_output`
-
-* ssd_loss：通过位置偏移预测值，置信度，检测框位置和真实框位置和标签计算损失。API Reference 请参考 :ref:`cn_api_fluid_layers_ssd_loss`
-
-* detection_map: 利用 mAP 评估 SSD 网络模型。API Reference 请参考 :ref:`cn_api_fluid_layers_detection_map`
-
-YOLO V3
----------------
-
-`YOLO V3 <https://arxiv.org/abs/1804.02767>`_ 是单阶段目标检测器，同时具备了精度高，速度快的特点。对特征图划分多个区块，每个区块得到坐标位置和置信度。采用了多尺度融合的方式预测以得到更高的训练精度，需要如下两种特有 api：
-
-* yolo_box: 从 YOLOv3 网络的输出生成 YOLO 检测框。API Reference 请参考 :ref:`cn_api_fluid_layers_yolo_box`
-
-* yolov3_loss：通过给定的预测结果和真实框生成 yolov3 损失。API Reference 请参考 :ref:`cn_api_fluid_layers_yolov3_loss`
-
-RetinaNet
----------------
-
-`RetinaNet <https://arxiv.org/abs/1708.02002>`_ 是单阶段目标检测器，引入 Focal Loss 和 FPN 后，能以更快的速率实现与双阶段目标检测网络近似或更优的效果，需要如下三种特有 api：
-
-* sigmoid_focal_loss: 用于处理单阶段检测器中类别不平均问题的损失。API Reference 请参考 :ref:`cn_api_fluid_layers_sigmoid_focal_loss`
-
-* retinanet_target_assign: 对给定 anchor 和真实框，为每个 anchor 分配分类和回归的目标值，用于训练 RetinaNet。API Reference 请参考 :ref:`cn_api_fluid_layers_retinanet_target_assign`
-
-* retinanet_detection_output: 对检测框进行解码，并做非极大值抑制后得到检测输出。API Reference 请参考 :ref:`cn_api_fluid_layers_retinanet_detection_output`
-
-OCR
----------
-
-场景文字识别是在图像背景复杂、分辨率低下、字体多样、分布随意等情况下，将图像信息转化为文字序列的过程，可认为是一种特别的翻译过程：将图像输入翻译为自然语言输出。OCR 任务中需要对检测框进行不规则变换，其中需要如下两个 api：
-
-* roi_perspective_transform：对输入 roi 做透视变换。API Reference 请参考 :ref:`cn_api_fluid_layers_roi_perspective_transform`
-
-* polygon_box_transform：对不规则检测框进行坐标变换。API Reference 请参考 :ref:`cn_api_fluid_layers_polygon_box_transform`
diff --git a/docs/api_guides/low_level/layers/detection_en.rst b/docs/api_guides/low_level/layers/detection_en.rst
deleted file mode 100755
index d2ead09fcd7..00000000000
--- a/docs/api_guides/low_level/layers/detection_en.rst
+++ /dev/null
@@ -1,62 +0,0 @@
-
-.. _api_guide_detection_en:
-
-
-Image Detection
-#################
-
-PaddlePaddle Fluid implements several unique operators for image detection tasks. This article introduces related APIs grouped by diverse model types.
-
-General operations
---------------------
-
-Some common operations in image detection are a series of operations on the bounding boxes, including:
-
-* Encoding and decoding of the bounding box : Conversion between encoding and decoding between the two kinds of boxes. For example, the training phase encodes the prior box and the ground-truth box to obtain the training target value. For API Reference, please refer to :ref:`api_fluid_layers_box_coder`
-
-* Compare the two bounding boxes and match them:
-
-  * iou_similarity: Calculate the IOU value of the two sets of boxes. For API Reference, please refer to :ref:`api_fluid_layers_iou_similarity`
-
-  * bipartite_match: Get the row with the largest distance in each column by the greedy binary matching algorithm. For API Reference, please refer to :ref:`api_fluid_layers_bipartite_match`
-
-* Get classification and regression target values ​​(target_assign) based on the bounding boxes and labels: Get the target values and corresponding weights by matched indices and negative indices. For API Reference, please refer to :ref:`api_fluid_layers_target_assign`
-
-
-Faster RCNN
--------------
-
-`Faster RCNN <https://arxiv.org/abs/1506.01497>`_ is a typical dual-stage target detector. Compared with the traditional extraction method, the RPN network in Faster RCNN greatly improves the extraction efficiency by sharing convolution layer parameters, and proposes high-quality region proposals. The RPN network needs to compare the input anchor with the ground-truth value to generate a primary candidate region, and assigns a classification and regression value to the primary candidate box. The following four unique apis are required:
-
-* rpn_target_assign: Assign the classification and regression target values ​​of the RPN network to the anchor through the anchor and the ground-truth box. For API Reference, please refer to :ref:`api_fluid_layers_rpn_target_assign`
-
-* anchor_generator: Generate a series of anchors for each location. For API Reference, please refer to :ref:`api_fluid_layers_anchor_generator`
-
-* generate_proposal_labels: Get the classification and regression target values ​​of the RCNN part through the candidate box and the ground-truth box obtained by generate_proposals. For API Reference, please refer to :ref:`api_fluid_layers_generate_proposal_labels`
-
-* generate_proposals: Decode the RPN network output box and selects a new region proposal. For API Reference, please refer to :ref:`api_fluid_layers_generate_proposals`
-
-
-SSD
-----------------
-
-`SSD <https://arxiv.org/abs/1512.02325>`_ , the acronym for Single Shot MultiBox Detector, is one of the latest and better detection algorithms in the field of target detection. It has the characteristics of fast detection speed and high detection accuracy. Unlike the dual-stage detection method, the single-stage target detection does not perform regional proposals, but directly returns the target's bounding box and classification probability from the feature map. The SSD network calculates the loss through six metrics of features maps and performs prediction. SSD requires the following five unique apis:
-
-* Prior Box: Generate a series of candidate boxes for each input position based on different parameters. For API Reference, please refer to :ref:`api_fluid_layers_prior_box`
-
-* multi_box_head : Get the position and confidence of different prior boxes. For API Reference, please refer to :ref:`api_fluid_layers_multi_box_head`
-
-* detection_output: Decode the prior box and obtains the detection result by multi-class NMS. For API Reference, please refer to :ref:`api_fluid_layers_detection_output`
-
-* ssd_loss: Calculate the loss by prediction value of position offset, confidence, bounding box position and ground-truth box position and label. For API Reference, please refer to :ref:`api_fluid_layers_ssd_loss`
-
-* detection map: Evaluate the SSD network model using mAP. For API Reference, please refer to :ref:`api_fluid_layers_detection_map`
-
-OCR
----------
-
-Scene text recognition is a process of converting image information into a sequence of characters in the case of complex image background, low resolution, diverse fonts, random distribution and so on. It can be considered as a special translation process: translation of image input into natural language output. The OCR task needs to perform irregular transformation on the bounding box, which requires the following two APIs:
-
-* roi_perspective_transform: Make a perspective transformation on the input RoI. For API Reference, please refer to :ref:`api_fluid_layers_roi_perspective_transform`
-
-* polygon_box_transform: Coordinate transformation of the irregular bounding box. For API Reference, please refer to :ref:`api_fluid_layers_polygon_box_transform`
diff --git a/docs/api_guides/low_level/layers/index.rst b/docs/api_guides/low_level/layers/index.rst
deleted file mode 100644
index d0182ed8ae7..00000000000
--- a/docs/api_guides/low_level/layers/index.rst
+++ /dev/null
@@ -1,20 +0,0 @@
-=============
-神经网络层
-=============
-
-..  toctree::
-    :maxdepth: 1
-
-    conv.rst
-    pooling.rst
-    detection.rst
-    sequence.rst
-    math.rst
-    activations.rst
-    loss_function.rst
-    data_in_out.rst
-    control_flow.rst
-    sparse_update.rst
-    data_feeder.rst
-    learning_rate_scheduler.rst
-    tensor.rst
diff --git a/docs/api_guides/low_level/layers/index_en.rst b/docs/api_guides/low_level/layers/index_en.rst
deleted file mode 100644
index 06ce0de3809..00000000000
--- a/docs/api_guides/low_level/layers/index_en.rst
+++ /dev/null
@@ -1,20 +0,0 @@
-=====================
-Neural Network Layer
-=====================
-
-..  toctree::
-    :maxdepth: 1
-
-    conv_en.rst
-    pooling_en.rst
-    detection_en.rst
-    sequence_en.rst
-    math_en.rst
-    activations_en.rst
-    loss_function_en.rst
-    data_in_out_en.rst
-    control_flow_en.rst
-    sparse_update_en.rst
-    data_feeder_en.rst
-    learning_rate_scheduler_en.rst
-    tensor_en.rst
diff --git a/docs/api_guides/low_level/layers/learning_rate_scheduler.rst b/docs/api_guides/low_level/layers/learning_rate_scheduler.rst
deleted file mode 100644
index 0a18dde0f9a..00000000000
--- a/docs/api_guides/low_level/layers/learning_rate_scheduler.rst
+++ /dev/null
@@ -1,69 +0,0 @@
-.. _api_guide_learning_rate_scheduler:
-
-############
-学习率调度器
-############
-
-当我们使用诸如梯度下降法等方式来训练模型时，一般会兼顾训练速度和损失(loss)来选择相对合适的学习率。但若在训练过程中一直使用一个学习率，训练集的损失下降到一定程度后便不再继续下降，而是在一定范围内震荡。其震荡原理如下图所示，即当损失函数收敛到局部极小值附近时，会由于学习率过大导致更新步幅过大，每步参数更新会反复越过极小值而出现震荡。
-
-.. image:: ../../../images/learning_rate_scheduler.png
-    :scale: 80 %
-    :align: center
-
-
-学习率调度器定义了常用的学习率衰减策略来动态生成学习率，学习率衰减函数以 epoch 或 step 为参数，返回一个随训练逐渐减小的学习率，从而兼顾降低训练时间和在局部极小值能更好寻优两个方面。
-
-下面介绍学习率调度器中相关的 Api：
-
-======
-
-* :code:`NoamDecay`: 诺姆衰减，相关算法请参考 `《Attention Is All You Need》 <https://arxiv.org/pdf/1706.03762.pdf>`_ 。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_NoamDecay`
-
-* :code:`ExponentialDecay`: 指数衰减，即每次将当前学习率乘以给定的衰减率得到下一个学习率。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_ExponentialDecay`
-
-* :code:`NaturalExpDecay`: 自然指数衰减，即每次将当前学习率乘以给定的衰减率的自然指数得到下一个学习率。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_NaturalExpDecay`
-
-* :code:`InverseTimeDecay`: 逆时间衰减，即得到的学习率与当前衰减次数成反比。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_InverseTimeDecay`
-
-* :code:`PolynomialDecay`: 多项式衰减，即得到的学习率为初始学习率和给定最终学习之间由多项式计算权重定比分点的插值。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_PolynomialDecay`
-
-* :code:`PiecewiseDecay`: 分段衰减，即由给定 step 数分段呈阶梯状衰减，每段内学习率相同。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_PiecewiseDecay`
-
-* :code:`CosineAnnealingDecay`: 余弦式衰减，即学习率随 step 数变化呈余弦函数周期变化。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_CosineAnnealingDecay`
-
-* :code:`LinearWarmup`: 学习率随 step 数线性增加到指定学习率。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_LinearWarmup`
-
-* :code:`StepDecay`: 学习率每隔一定的 step 数进行衰减，需要指定 step_size。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_StepDecay`
-
-* :code:`MultiStepDecay`: 学习率在指定的 step 数时进行衰减，需要指定衰减的节点位置。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_MultiStepDecay`
-
-* :code:`LambdaDecay`: 学习率根据自定义的 lambda 函数进行衰减。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_LambdaDecay`
-
-* :code:`ReduceOnPlateau`: 学习率根据当前监控指标（一般为 loss）来进行自适应调整，当 loss 趋于稳定时衰减学习率。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_ReduceOnPlateau`
-
-* :code:`MultiplicativeDecay`: 每次将当前学习率乘以 lambda 函数得到下一个学习率。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_MultiplicativeDecay`
-
-* :code:`OneCycleLR`: One Cycle 衰减，学习率上升至最大，再下降至最小.
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_OneCycleLR`
-
-* :code:`CyclicLR`: 学习率根据指定的缩放策略以固定频率在最小和最大学习率之间进行循环。
-  相关 API Reference 请参考 :ref:`_cn_api_paddle_optimizer_lr_CyclicLR`
-
-* :code:`LinearLR`: 学习率随 step 数线性增加到指定学习率。
-  相关 API Reference 请参考 :ref:`_cn_api_paddle_optimizer_lr_LinearLR`
-
-* :code:`CosineAnnealingWarmRestarts`: 余弦退火学习率，即学习率随 step 数变化呈余弦函数周期变化。
-  相关 API Reference 请参考 :ref:`cn_api_paddle_optimizer_lr_CosineAnnealingWarmRestarts`
diff --git a/docs/api_guides/low_level/layers/learning_rate_scheduler_en.rst b/docs/api_guides/low_level/layers/learning_rate_scheduler_en.rst
deleted file mode 100755
index 06e4a06870c..00000000000
--- a/docs/api_guides/low_level/layers/learning_rate_scheduler_en.rst
+++ /dev/null
@@ -1,50 +0,0 @@
-.. _api_guide_learning_rate_scheduler_en:
-
-########################
-Learning rate scheduler
-########################
-
-When we use a method such as the gradient descent method to train the model, the training speed and loss are generally taken into consideration to select a relatively appropriate learning rate. However, if a fixed learning rate is used throughout the training process, the loss of the training set will not continue to decline after falling to a certain extent, but will 'jump' within a certain range. The jumping principle is shown in the figure below. When the loss function converges to the local minimum value, the update step will be too large due to the excessive learning rate. The parameter update will repeatedly *jump over* the local minimum value and an oscillation-like phenomenon will occur.
-
-.. image:: ../../../images/learning_rate_scheduler.png
-    :scale: 80 %
-    :align: center
-
-
-The learning rate scheduler defines a commonly used learning rate decay strategy to dynamically generate the learning rate. The learning rate decay function takes epoch or step as the parameter and returns a learning rate that gradually decreases with training. Thereby it reduces the training time and finds the local minimum value at the same time.
-
-The following content describes the APIs related to the learning rate scheduler:
-
-======
-
-* :code:`NoamDecay`: Noam decay. Please refer to `Attention Is All You Need <https://arxiv.org/pdf/1706.03762.pdf>`_ for related algorithms. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_NoamDecay`
-
-* :code:`ExponentialDecay`: Exponential decay. That is, each time the current learning rate is multiplied by the given decay rate to get the next learning rate. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_ExponentialDecay`
-
-* :code:`NaturalExpDecay`: Natural exponential decay. That is, each time the current learning rate is multiplied by the natural exponent of the given decay rate to get the next learning rate. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_NaturalExpDecay`
-
-* :code:`InverseTimeDecay`: Inverse time decay. The decayed learning rate is inversely proportional to the current number of decays. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_InverseTimeDecay`
-
-* :code:`PolynomialDecay`: Polynomial decay, i.e. the decayed learning rate is calculated in a polynomial format with the initial learning rate and the end learning rate. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_PolynomialDecay`
-
-* :code:`PiecewiseDecay`: Piecewise decay. That is, the stair-like decay for a given number of steps, the learning rate stays the same within each step. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_PiecewiseDecay`
-
-* :code:`CosineAnnealingDecay`: Cosine attenuation. It means the learning rate changes with the number of steps in the form of a cosine function. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_CosineAnnealingDecay`
-
-* :code:`LinearWarmup`: The learning rate increases linearly to an appointed rate with the number of steps. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_LinearWarmup`
-
-* :code:`StepDecay`: Decay the learning rate every certain number of steps, and ``step_size`` needs to be specified. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_StepDecay`
-
-* :code:`MultiStepDecay`: Decay the learning rate at specified step, and ``milestones`` needs to be specified. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_MultiStepDecay`
-
-* :code:`LambdaDecay`: Decay the learning rate by lambda function. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_LambdaDecay`
-
-* :code:`ReduceOnPlateau`: Adjust the learning rate according to monitoring index(In general, it's loss), and decay the learning rate when monitoring index becomes stable. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_ReduceOnPlateau`
-
-* :code:`OneCycleLR`: One cycle decay. That is, the initial learning rate first increases to maximum learning rate, and then it decreases to minimum learning rate which is much less than initial learning rate. For related API Reference please refer to :ref:`cn_api_paddle_optimizer_lr_OneCycleLR`
-
-* :code:`CyclicLR`: Cyclic decay. That is, the learning rate cycles between minimum and maximum learning rate with a constant frequency in specified a scale method. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_CyclicLR`
-
-* :code:`LinearLR`: Linear decay. That is, the learning rate will be firstly multiplied by start_factor and linearly increase to end learning rate. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_LinearLR`
-
-* :code:`CosineAnnealingWarmRestarts`: Cosine attenuation. It means the learning rate changes with the number of steps in the form of a cosine function. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_CosineAnnealingWarmRestarts`
diff --git a/docs/api_guides/low_level/layers/loss_function.rst b/docs/api_guides/low_level/layers/loss_function.rst
deleted file mode 100644
index b8e7e44282d..00000000000
--- a/docs/api_guides/low_level/layers/loss_function.rst
+++ /dev/null
@@ -1,60 +0,0 @@
-..  _api_guide_loss_function:
-
-#######
-损失函数
-#######
-
-损失函数定义了拟合结果和真实结果之间的差异，作为优化的目标直接关系模型训练的好坏，很多研究工作的内容也集中在损失函数的设计优化上。
-Paddle Fluid 中提供了面向多种任务的多种类型的损失函数，以下列出了一些 Paddle Fluid 中包含的较为常用的损失函数。
-
-回归
-====
-
-平方误差损失（squared error loss）使用预测值和真实值之间误差的平方作为样本损失，是回归问题中最为基本的损失函数。
-API Reference 请参考 :ref:`cn_api_fluid_layers_square_error_cost`。
-
-平滑 L1 损失（smooth_l1 loss）是一种分段的损失函数，较平方误差损失其对异常点相对不敏感，因而更为鲁棒。
-API Reference 请参考 :ref:`cn_api_fluid_layers_smooth_l1`。
-
-
-分类
-====
-
-`交叉熵（cross entropy） <https://en.wikipedia.org/wiki/Cross_entropy>`_ 是分类问题中使用最为广泛的损失函数，Paddle Fluid 中提供了接受归一化概率值和非归一化分值输入的两种交叉熵损失函数的接口，并支持 soft label 和 hard label 两种样本类别标签。
-API Reference 请参考 :ref:`cn_api_fluid_layers_cross_entropy` 和 :ref:`cn_api_fluid_layers_softmax_with_cross_entropy`。
-
-多标签分类
----------
-对于多标签分类问题，如一篇文章同属于政治、科技等多个类别的情况，需要将各类别作为独立的二分类问题计算损失，Paddle Fluid 中为此提供了 sigmoid_cross_entropy_with_logits 损失函数，
-API Reference 请参考 :ref:`cn_api_fluid_layers_sigmoid_cross_entropy_with_logits`。
-
-大规模分类
----------
-对于大规模分类问题，通常需要特殊的方法及相应的损失函数以加速训练，常用的方法有 `噪声对比估计（Noise-contrastive estimation，NCE） <http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf>`_ 和 `层级 sigmoid <http://www.iro.umontreal.ca/~lisa/pointeurs/hierarchical-nnlm-aistats05.pdf>`_ 。
-
-* 噪声对比估计通过将多分类问题转化为学习分类器来判别数据来自真实分布和噪声分布的二分类问题，基于二分类来进行极大似然估计，避免在全类别空间计算归一化因子从而降低了计算复杂度。
-* 层级 sigmoid 通过二叉树进行层级的二分类来实现多分类，每个样本的损失对应了编码路径上各节点二分类交叉熵的和，避免了归一化因子的计算从而降低了计算复杂度。
-这两种方法对应的损失函数在 Paddle Fluid 中均有提供，API Reference 请参考 :ref:`cn_api_fluid_layers_nce` 和 :ref:`cn_api_fluid_layers_hsigmoid`。
-
-序列分类
--------
-序列分类可以分为以下三种：
-
-* 序列分类（Sequence Classification）问题，整个序列对应一个预测标签，如文本分类。这种即是普通的分类问题，可以使用 cross entropy 作为损失函数。
-* 序列片段分类（Segment Classification）问题，序列中的各个片段对应有自己的类别标签，如命名实体识别。对于这种序列标注问题，`（线性链）条件随机场（Conditional Random Field，CRF） <http://www.cs.columbia.edu/~mcollins/fb.pdf>`_ 是一种常用的模型方法，其使用句子级别的似然概率，序列中不同位置的标签不再是条件独立，能够有效解决标记偏置问题。Paddle Fluid 中提供了 CRF 对应损失函数的支持，API Reference 请参考 :ref:`cn_api_fluid_layers_linear_chain_crf`。
-* 时序分类（Temporal Classification）问题，需要对未分割的序列进行标注，如语音识别。对于这种时序分类问题，`CTC（Connectionist Temporal Classification） <http://people.idsia.ch/~santiago/papers/icml2006.pdf>`_ 损失函数不需要对齐输入数据及标签，可以进行端到端的训练，Paddle Fluid 提供了 warpctc 的接口来计算相应的损失，API Reference 请参考 :ref:`cn_api_fluid_layers_warpctc`。
-
-排序
-====
-
-`排序问题 <https://en.wikipedia.org/wiki/Learning_to_rank>`_ 可以使用 Pointwise、Pairwise 和 Listwise 的学习方法，不同的方法需要使用不同的损失函数：
-
-* Pointwise 的方法通过近似为回归问题解决排序问题，可以使用回归问题的损失函数。
-* Pairwise 的方法需要特殊设计的损失函数，其通过近似为分类问题解决排序问题，使用两篇文档与 query 的相关性得分以偏序作为二分类标签来计算损失。Paddle Fluid 中提供了两种常用的 Pairwise 方法的损失函数，API Reference 请参考 :ref:`cn_api_fluid_layers_rank_loss` 和 :ref:`cn_api_fluid_layers_margin_rank_loss`。
-
-更多
-====
-
-对于一些较为复杂的损失函数，可以尝试使用其他损失函数组合实现；Paddle Fluid 中提供的用于图像分割任务的 :ref:`cn_api_fluid_layers_dice_loss` 即是使用其他 OP 组合（计算各像素位置似然概率的均值）而成；多目标损失函数也可看作这样的情况，如 Faster RCNN 就使用 cross entropy 和 smooth_l1 loss 的加权和作为损失函数。
-
-**注意**，在定义损失函数之后为能够使用 :ref:`api_guide_optimizer` 进行优化，通常需要使用 :ref:`cn_api_fluid_layers_mean` 或其他操作将损失函数返回的高维 Tensor 转换为 Scalar 值。
diff --git a/docs/api_guides/low_level/layers/loss_function_en.rst b/docs/api_guides/low_level/layers/loss_function_en.rst
deleted file mode 100755
index 487a8515cb1..00000000000
--- a/docs/api_guides/low_level/layers/loss_function_en.rst
+++ /dev/null
@@ -1,61 +0,0 @@
-.. _api_guide_loss_function_en:
-
-##############
-Loss function
-##############
-
-The loss function defines the difference between the inference result and the ground-truth result. As the optimization target, it directly determines whether the model training is good or not, and many researches also focus on the optimization of the loss function design.
-Paddle Fluid offers diverse types of loss functions for a variety of tasks. Let's take a look at the commonly-used loss functions included in Paddle Fluid.
-
-Regression
-===========
-
-The squared error loss uses the square of the error between the predicted value and the ground-truth value as the sample loss, which is the most basic loss function in the regression problems.
-For API Reference,  please refer to :ref:`api_fluid_layers_square_error_cost`.
-
-Smooth L1 loss (smooth_l1 loss) is a piecewise loss function that is relatively insensitive to outliers and therefore more robust.
-For API Reference,  please refer to :ref:`api_fluid_layers_smooth_l1`.
-
-
-Classification
-================
-
-`cross entropy <https://en.wikipedia.org/wiki/Cross_entropy>`_ is the most widely used loss function in classification problems.  The interfaces in Paddle Fluid for the cross entropy loss functions are divided into the one accepting fractional input of normalized probability values ​​and another for non-normalized input. And Fluid supports two types labels, namely soft label and hard label.
-For API Reference,  please refer to :ref:`api_fluid_layers_cross_entropy` and :ref:`api_fluid_layers_softmax_with_cross_entropy`.
-
-Multi-label classification
-----------------------------
-For the multi-label classification, such as the occasion that an article belongs to multiple categories like politics, technology, it is necessary to calculate the loss by treating each category as an independent binary-classification problem. We provide the sigmoid_cross_entropy_with_logits loss function for this purpose.
-For API Reference,  please refer to :ref:`api_fluid_layers_sigmoid_cross_entropy_with_logits`.
-
-Large-scale classification
------------------------------
-For large-scale classification problems, special methods and corresponding loss functions are usually needed to speed up the training. The commonly used methods are
-`Noise contrastive estimation (NCE) <http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf>`_ and `Hierarchical sigmoid <http://www.iro.umontreal.ca/~lisa/pointeurs/hierarchical-nnlm-aistats05.pdf>`_ .
-
-* NCE solves the binary-classification problem of discriminating the true distribution and the noise distribution by converting the multi-classification problem into a classifier. The maximum likelihood estimation is performed based on the binary-classification to avoid calculating the normalization factor in the full-class space to reduce computational complexity.
-* Hierarchical sigmoid realizes multi-classification by hierarchical classification of binary trees. The loss of each sample corresponds to the sum of the cross-entropy of the binary-classification for each node on the coding path, which avoids the calculation of the normalization factor and reduces the computational complexity.
-The loss functions for both methods are available in Paddle Fluid. For API Reference please refer to :ref:`api_fluid_layers_nce` and :ref:`api_fluid_layers_hsigmoid`.
-
-Sequence classification
--------------------------
-Sequence classification can be divided into the following three types:
-
-* Sequence Classification problem is that the entire sequence corresponds to a prediction label, such as text classification. This is a common classification problem, you can use cross entropy as the loss function.
-* Segment Classification problem is that each segment in the sequence corresponds to its own category tag, such as named entity recognition. For this sequence labeling problem, `the (Linear Chain) Conditional Random Field (CRF) <http://www.cs.columbia.edu/~mcollins/fb.pdf>`_ is a commonly used model. The method uses the likelihood probability on the sentence level, and the labels for different positions in the sequence are no longer conditionally independent, which can effectively solve the label offset problem. Support for CRF loss functions is available in Paddle Fluid. For API Reference please refer to :ref:`api_fluid_layers_linear_chain_crf` .
-* Temporal Classification problem needs to label unsegmented sequences, such as speech recognition. For this time-based classification problem, `CTC(Connectionist Temporal Classification) <http://people.idsia.ch/~santiago/papers/icml2006.pdf>`_ loss function does not need to align input data and labels, and is able to perform end-to-end training. Paddle Fluid provides a warpctc interface to calculate the corresponding loss. For API Reference,  please refer to :ref:`api_fluid_layers_warpctc` .
-
-Rank
-=========
-
-`Rank problems <https://en.wikipedia.org/wiki/Learning_to_rank>`_ can use learning methods of Pointwise, Pairwise, and Listwise. Different methods require different loss functions:
-
-* The Pointwise method solves the ranking problem by approximating the regression problem. Therefore the loss function of the regression problem can be used.
-* Pairwise's method requires a special loss function. Pairwise solves the sorting problem by approximating the classification problem, using relevance score of two documents and the query to use the partial order as the binary-classification label to calculate the loss. Paddle Fluid provides two commonly used loss functions for Pairwise methods. For API Reference please refer to :ref:`api_fluid_layers_rank_loss` and :ref:`api_fluid_layers_margin_rank_loss`.
-
-More
-====
-
-For more complex loss functions, try to use combinations of other loss functions; the :ref:`api_fluid_layers_dice_loss` provided in Paddle Fluid for image segmentation tasks is an example of using combinations of other operators  (calculate the average likelihood probability of each pixel position). The multi-objective loss function can also be considered similarly, such as Faster RCNN that uses the weighted sum of cross entropy and smooth_l1 loss as a loss function.
-
-**Note**, after defining the loss function, in order to optimize with :ref:`api_guide_optimizer_en`, you usually need to use :ref:`api_fluid_layers_mean` or other operations to convert the high-dimensional Tensor returned by the loss function to a Scalar value.
diff --git a/docs/api_guides/low_level/layers/math.rst b/docs/api_guides/low_level/layers/math.rst
deleted file mode 100644
index 2044c91d64e..00000000000
--- a/docs/api_guides/low_level/layers/math.rst
+++ /dev/null
@@ -1,193 +0,0 @@
-..  _api_guide_math:
-
-
-数学操作
-#########
-
-Paddle 提供了丰富的数学操作，以下列出的数学操作都是对目标张量进行逐元素的操作。其中，如果二元操作的两个输入有不同形状，会先进行 :code:`broadcast`. 部分数学操作还支持数学操作符，比如： :code:`+`,  :code:`-`, :code:`*`, :code:`/` 等。数学操作符不仅支持张量，还支持标量。
-
-
-一元操作
-==================
-
-exp
-------------------
-
-对输入 :code:`Tensor` 逐元素做 :code:`exp` 操作。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_exp`
-
-tanh
-------------------
-
-对输入 :code:`Tensor` 逐元素取正切。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_tanh`
-
-sqrt
-------------------
-
-对输入 :code:`Tensor` 逐元素取平方根。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sqrt`
-
-abs
-------------------
-
-对输入 :code:`Tensor` 逐元素取绝对值。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_abs`
-
-ceil
-------------------
-
-对输入 :code:`Tensor` 逐元素向上取整。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_ceil`
-
-floor
-------------------
-
-对输入 :code:`Tensor` 逐元素向下取整。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_floor`
-
-sin
-------------------
-
-对输入 :code:`Tensor` 逐元素取正弦。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sin`
-
-cos
-------------------
-
-对输入 :code:`Tensor` 逐元素取余弦。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_cos`
-
-cosh
-------------------
-
-对输入 :code:`Tensor` 逐元素取双曲余弦。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_cosh`
-
-round
-------------------
-
-对输入 :code:`Tensor` 逐元素四舍五入取整。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_round`
-
-square
-------------------
-
-对输入 :code:`Tensor` 逐元素取平方。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_square`
-
-reciprocal
-------------------
-
-对输入 :code:`Tensor` 逐元素取倒数。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_reciprocal`
-
-
-reduce
-------------------
-
-对输入 :code:`Tensor` 在指定的若干轴上做 reduce 操作，包括：min, max, sum, mean, product
-
-API Reference 请参考:
-:ref:`cn_api_fluid_layers_reduce_min`
-:ref:`cn_api_fluid_layers_reduce_max`
-:ref:`cn_api_fluid_layers_reduce_sum`
-:ref:`cn_api_fluid_layers_reduce_mean`
-:ref:`cn_api_fluid_layers_reduce_prod`
-
-
-二元操作
-==================
-
-elementwise_add
-------------------
-
-对两个 :code:`Tensor` 逐元素相加，对应的数学操作符为 :code:`+`
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_elementwise_add`
-
-elementwise_sub
-------------------
-
-对两个 :code:`Tensor` 逐元素相减，对应数学操作符 :code:`-`
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_elementwise_sub`
-
-elementwise_mul
-------------------
-
-对两个 :code:`Tensor` 逐元素相乘， 对应数学操作符 :code:`*`
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_elementwise_mul`
-
-elementwise_div
-------------------
-
-对两个 :code:`Tensor` 逐元素相除， 对应数学操作符 :code:`/` 或 :code:`//`
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_elementwise_div`
-
-
-elementwise_pow
-------------------
-
-对两个 :code:`Tensor` 逐元素做次幂操作， 对应数学操作符 :code:`**`
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_elementwise_pow`
-
-equal
-------------------
-
-对两个 :code:`Tensor` 逐元素判断是否相等， 对应数学操作符 :code:`==`
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_equal`
-
-
-less_than
-------------------
-
-对两个 :code:`Tensor` 逐元素判断是否满足小于关系， 对应数学操作符 :code:`<`
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_less_than`
-
-
-
-sum
-------------------
-
-对两个 :code:`Tensor` 逐元素相加。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sum`
-
-elementwise_min
-------------------
-
-对两个 :code:`Tensor` 逐元素进行 :code:`min(x, y)` 操作。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_elementwise_min`
-
-elementwise_max
-------------------
-
-对两个 :code:`Tensor` 逐元素进行 :code:`max(x, y)` 操作。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_elementwise_max`
-
-matmul
-------------------
-
-对两个 :code:`Tensor` 进行矩阵乘操作。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_matmul`
diff --git a/docs/api_guides/low_level/layers/math_en.rst b/docs/api_guides/low_level/layers/math_en.rst
deleted file mode 100644
index 5851d87e688..00000000000
--- a/docs/api_guides/low_level/layers/math_en.rst
+++ /dev/null
@@ -1,193 +0,0 @@
-_api_guide_math_en:
-
-
-Mathematical operation
-###########################
-
-Paddle provides a wealth of mathematical operations. The mathematical operations listed below are all elementwise operations on the target tensor. If the two inputs of the binary operations have different shapes, they will be processed first by :code:`broadcast`. Some mathematical operations also support mathematical operators, such as: :code:`+`, :code:`-`, :code:`*`, :code:`/`, etc. Math operators not only support tensors but also scalars.
-
-
-Unary operation
-==================
-
-exp
-------------------
-
-Perform an :code:`exp` operation on each input :code:`Tensor` element.
-
-API Reference:  :ref:`api_fluid_layers_exp`
-
-tanh
-------------------
-
-For the input :code:`Tensor`, take the tanh value of each element.
-
-API Reference:  :ref:`api_fluid_layers_tanh`
-
-sqrt
-------------------
-
-For the input :code:`Tensor`, take the square root of each element.
-
-API Reference:  :ref:`api_fluid_layers_sqrt`
-
-abs
-------------------
-
-For the input :code:`Tensor`, take the elementwise absolute value.
-
-API Reference:  :ref:`api_fluid_layers_abs`
-
-ceil
-------------------
-
-Round up each input :code:`Tensor` element to the nearest greater integer.
-
-API Reference:  :ref:`api_fluid_layers_ceil`
-
-floor
-------------------
-
-Round down each input :code:`Tensor` element to the nearest less integer.
-
-API Reference:  :ref:`api_fluid_layers_floor`
-
-sin
-------------------
-
-For the input :code:`Tensor`, take the elementwise sin value.
-
-API Reference:  :ref:`api_fluid_layers_sin`
-
-cos
-------------------
-
-For input :code:`Tensor`, take the elementwise cosine value.
-
-API Reference:  :ref:`api_fluid_layers_cos`
-
-cosh
-------------------
-
-For input :code:`Tensor`, take the elementwise hyperbolic cosine value.
-
-API Reference:  :ref:`api_fluid_layers_cosh`
-
-round
-------------------
-
-Rounding the input :code:`Tensor` in elementwise order.
-
-API Reference:  :ref:`api_fluid_layers_round`
-
-square
-------------------
-
-Square the input :code:`Tensor` in elementwise order.
-
-API Reference:  :ref:`api_fluid_layers_square`
-
-reciprocal
-------------------
-
-For the input :code:`Tensor`, take the reciprocal in elementwise order.
-
-API Reference:  :ref:`api_fluid_layers_reciprocal`
-
-
-reduce
-------------------
-
-For the input :code:`Tensor`, it performs reduce operations on the specified axes, including: min, max, sum, mean, product
-
-API Reference:
-:ref:`api_fluid_layers_reduce_min`
-:ref:`api_fluid_layers_reduce_max`
-:ref:`fluid_layers_reduce_sum`
-:ref:`api_fluid_layers_reduce_mean`
-:ref:`api_fluid_layers_reduce_prod`
-
-
-Binary operation
-==================
-
-elementwise_add
-------------------
-
-Add two :code:`Tensor` in elementwise order, and the corresponding math operator is :code:`+` .
-
-API Reference:  :ref:`api_fluid_layers_elementwise_add`
-
-elementwise_sub
-------------------
-
-Sub two :code:`Tensor` in elementwise order, the corresponding math operator is :code:`-` .
-
-API Reference:  :ref:`api_fluid_layers_elementwise_sub`
-
-elementwise_mul
-------------------
-
-Multiply two :code:`Tensor` in elementwise order, and the corresponding math operator is :code:`*` .
-
-API Reference:  :ref:`api_fluid_layers_elementwise_mul`
-
-elementwise_div
-------------------
-
-Divide two :code:`Tensor` in elementwise order, and the corresponding math operator is :code:`/` or :code:`//` .
-
-API Reference:  :ref:`api_fluid_layers_elementwise_div`
-
-
-elementwise_pow
-------------------
-
-Do power operations on two :code:`Tensor` in elementwise order, and the corresponding math operator is :code:`**` .
-
-API Reference:  :ref:`api_fluid_layers_elementwise_pow`
-
-equal
-------------------
-
-Judge whether the two :code:`Tensor` elements are equal, and the corresponding math operator is :code:`==` .
-
-API Reference:  :ref:`api_fluid_layers_equal`
-
-
-less_than
-------------------
-
-Judge whether the two :code:`Tensor` elements satisfy the 'less than' relationship, and the corresponding math operator is :code:`<` .
-
-API Reference:  :ref:`api_fluid_layers_less_than`
-
-
-
-sum
-------------------
-
-Add two :code:`Tensor` in elementwise order.
-
-API Reference:  :ref:`api_fluid_layers_sum`
-
-elementwise_min
-------------------
-
-Perform :code:`min(x, y)` operations on two :code:`Tensor` in elementwise order .
-
-API Reference:  :ref:`api_fluid_layers_elementwise_min`
-
-elementwise_max
-------------------
-
-Perform :code:`max(x, y)` operations on two :code:`Tensor` in elementwise order .
-
-API Reference:  :ref:`api_fluid_layers_elementwise_max`
-
-matmul
-------------------
-
-Perform matrix multiplication operations on two :code:`Tensor`.
-
-API Reference:  :ref:`api_fluid_layers_matmul`
diff --git a/docs/api_guides/low_level/layers/pooling.rst b/docs/api_guides/low_level/layers/pooling.rst
deleted file mode 100644
index 6ae1cc10590..00000000000
--- a/docs/api_guides/low_level/layers/pooling.rst
+++ /dev/null
@@ -1,80 +0,0 @@
-.. _api_guide_pool:
-
-#####
-池化
-#####
-
-池化的作用是对输入特征做下采样和降低过拟合。降低过拟合是减小输出大小的结果，它同样也减少了后续层中的参数的数量。
-
-池化通常只需要将前一层的特征图作为输入，此外需要一些参数来确定池化具体的操作。在 PaddlePaddle 中我们同样通过设定池化的大小，方式，步长，是否是全局池化，是否使用 cudnn，是否使用 ceil 函数计算输出等参数来选择具体池化的方式。
-PaddlePaddle 中有针对定长图像特征的二维(pool2d)、三维卷积(pool3d)，RoI 池化(roi_pool)，以及针对序列的序列池化(sequence_pool)，同时也有池化计算的反向过程，下面先介绍 2D/3D 池化，以及 RoI 池化，再来介绍序列池化。
-
---------------
-
-1. pool2d/pool3d
-------------------------
-
--  ``input`` : 池化操作接收任何符合 layout 是：\ ``N（batch size）* C(channel size) * H(height) * W(width)``\ 格式的\ ``Tensor``\ 类型作为输入。
-
--  ``pool_size``\ : 用来确定池化\ ``filter``\ 的大小，即将多大范围内的数据池化为一个值。
-
--  ``num_channels``\ : 用来确定输入的\ ``channel``\ 数量，如果未设置参数或设置为\ ``None``\ ，其实际值将自动设置为输入的\ ``channel``\ 数量。
-
--  ``pool_type``\ : 接收\ ``avg``\ 和\ ``max``\ 2 种类型之一作为 pooling 的方式，默认值为\ ``max``\ 。其中\ ``max``\ 意为最大池化，即计算池化\ ``filter``\ 区域内的数据的最大值作为输出；而\ ``avg``\ 意为平均池化，即计算池化\ ``filter``\ 区域内的数据的平均值作为输出。
-
--  ``pool_stride``\ : 意为池化的\ ``filter``\ 在输入特征图上移动的步长。
-
--  ``pool_padding``\ : 用来确定池化中\ ``padding``\ 的大小，\ ``padding``\ 的使用是为了对于特征图边缘的特征进行池化，选择不同的\ ``pool_padding``\ 大小确定了在特征图边缘增加多大区域的补零。从而决定边缘特征被池化的程度。
-
--  ``global_pooling``\ : 意为是否使用全局池化，全局池化是指使用和特征图大小相同的\ ``filter``\ 来进行池化，同样这个过程也可以使用平均池化或者最大池化来做为池化的方式，全局池化通常会用来替换全连接层以大量减少参数防止过拟合。
-
--  ``use_cudnn``\ : 选项可以来选择是否使用 cudnn 来优化计算池化速度。
-
--  ``ceil_mode``\ : 是否使用 ceil 函数计算输出高度和宽度。\ ``ceil mode``\ 意为天花板模式，是指会把特征图中不足\ ``filter size``\ 的边给保留下来，单独另算，或者也可以理解为在原来的数据上补充了值为-NAN 的边。而 floor 模式则是直接把不足\ ``filter size``\ 的边给舍弃了。具体计算公式如下：
-
-    -  非\ ``ceil_mode``\ 下:\ ``输出大小 = (输入大小 - filter size + 2 * padding) / stride（步长） + 1``
-
-    -  ``ceil_mode``\ 下:\ ``输出大小 = (输入大小 - filter size + 2 * padding + stride - 1) / stride + 1``
-
-
-
-api 汇总：
-
-- :ref:`cn_api_fluid_layers_pool2d`
-- :ref:`cn_api_fluid_layers_pool3d`
-
-
-2. roi_pool
-------------------
-
-``roi_pool``\ 一般用于检测网络中，将输入特征图依据候选框池化到特定的大小。
-
--  ``rois``\ : 接收\ ``DenseTensor``\ 类型来表示需要池化的 Regions of Interest，关于 RoI 的解释请参考\ `论文 <https://arxiv.org/abs/1506.01497>`__
-
--  ``pooled_height`` 和 ``pooled_width``\ : 这里可以接受非正方的池化窗口大小
-
--  ``spatial_scale``\ : 用作设定缩放 RoI 和原图缩放的比例，注意，这里的设定需要用户自行计算 RoI 和原图的实际缩放比例。
-
-
-api 汇总：
-
-- :ref:`cn_api_fluid_layers_roi_pool`
-
-
-3. sequence_pool
---------------------
-
-``sequence_pool``\ 是一个用作对于不等长序列进行池化的接口，它将每一个实例的全部时间步的特征进行池化，它同样支持
-``average``, ``sum``, ``sqrt`` 和\ ``max``\ 4 种类型之一作为 pooling 的方式。 其中:
-
--  ``average``\ 是对于每一个时间步内的数据求和后分别取平均值做为池化的结果。
-
--  ``sum``\ 则是对每一个时间步内的数据分别求和作为池化的结果。
-
--  ``sqrt``\ 则是对每一个时间步内的数据分别求和再分别取平方根作为池化的结果。
-
--  ``max``\ 则是对每一个时间步内的数据分别求取最大值作为池化的结果。
-
-api 汇总：
-
-- :ref:`cn_api_fluid_layers_sequence_pool`
diff --git a/docs/api_guides/low_level/layers/pooling_en.rst b/docs/api_guides/low_level/layers/pooling_en.rst
deleted file mode 100755
index 1debaa88a16..00000000000
--- a/docs/api_guides/low_level/layers/pooling_en.rst
+++ /dev/null
@@ -1,80 +0,0 @@
-.. _api_guide_pool_en:
-
-########
-Pooling
-########
-
-Pooling is to downsample the input features and reduce overfitting. Reducing overfitting is the result of reducing the output size, which also reduces the number of parameters in subsequent layers.
-
-Pooling usually only takes the feature maps of the previous layer as input, and some parameters are needed to determine the specific operation of the pooling. In PaddlePaddle, we also choose the specific pooling by setting parameters like the size, method, step, whether to pool globally, whether to use cudnn, whether to use ceil function to calculate output.
-PaddlePaddle has two-dimensional (pool2d), three-dimensional convolution (pool3d), RoI pooling (roi_pool) for fixed-length image features, and sequence pooling (sequence_pool) for sequences, as well as the reverse(backward) process of pooling calculations. The following text describes the 2D/3D pooling, and the RoI pooling, and then the sequence pooling.
-
---------------
-
-1. pool2d/pool3d
-------------------------
-
-- ``input`` : The pooling operation receives any ``Tensor`` that conforms to the layout: ``N(batch size)* C(channel size) * H(height) * W(width)`` format as input.
-
-- ``pool_size`` : It is used to determine the size of the pooling  ``filter``, which determines the size of data to be pooled into a single value.
-
-- ``num_channels`` : It is used to determine the number of ``channel`` of input. If it is not set or is set to ``None``, its actual value will be automatically set to the ``channel`` quantity of input.
-
-- ``pool_type`` : It receives one of ``agg`` and ``max`` as the pooling method. The default value is  ``max`` . ``max`` means maximum pooling, i.e. calculating the maximum value of the data in the pooled ``filter`` area as output; and ``avg`` means averaging pooling, i.e. calculating the average of the data in the pooled  ``filter`` area as output.
-
-- ``pool_stride`` : It is the stride size in which the pooling ``filter`` moves on the input feature map.
-
-- ``pool_padding`` : It is used to determine the size of  ``padding`` in the pooling, ``padding`` is used to pool the features of the edges of feature maps. The ``pool_padding`` size determines how much zero is padded to the edge of the feature maps. Thereby it determines the extent to which the edge features are pooled.
-
-- ``global_pooling`` : It Means whether to use global pooling. Global pooling refers to pooling using  ``filter`` of the same size as the feature map. This process can also use average pooling or the maximum pooling as the pooling method. Global pooling is usually used to replace the fully connected layer to greatly reduce the parameters to prevent overfitting.
-
-- The ``use_cudnn`` : This option allows you to choose whether or not to use cudnn to accelerate pooling.
-
-- ``ceil_mode`` : Whether to use the ceil function to calculate the output height and width.  ``ceil mode`` means ceiling mode, which means that, in the feature map, the edge parts that are smaller than ``filter size`` will be retained, and separately calculated. It can be understood as supplementing the original data with edge with a value of -NAN. By contrast, The floor mode directly discards the edges smaller than the ``filter size``. The specific calculation formula is as follows:
-
-  * Non ``ceil_mode`` :  ``Output size = (input size - filter size + 2 * padding) / stride (stride size) + 1``
-
-  * ``ceil_mode`` : ``Output size = (input size - filter size + 2 * padding + stride - 1) / stride + 1``
-
-
-
-related API:
-
-- :ref:`api_fluid_layers_pool2d`
-- :ref:`api_fluid_layers_pool3d`
-
-
-2. roi_pool
-------------------
-
-``roi_pool`` is generally used in detection networks, and the input feature map is pooled to a specific size by the bounding box.
-
-- ``rois`` : It receives ``DenseTensor`` type to indicate the Regions of Interest that needs to be pooled. For an explanation of RoI, please refer to `Paper <https://arxiv.org/abs/1506.01497>`__
-
-- ``pooled_height`` and ``pooled_width`` : accept non-square pooling box sizes
-
-- ``spatial_scale`` : Used to set the scale of scaling the RoI and the original image. Note that the settings here require the user to manually calculate the actual scaling of the RoI and the original image.
-
-
-related API:
-
-- :ref:`api_fluid_layers_roi_pool`
-
-
-3. sequence_pool
---------------------
-
-``sequence_pool`` is an interface used to pool variable-length sequences. It pools the features of all time steps of each instance, and also supports
-one of  ``average``, ``sum``, ``sqrt`` and ``max`` to be used as the pooling method. Specifically:
-
-- ``average`` sums up the data in each time step and takes its average as the pooling result.
-
-- ``sum`` take the sum of the data in each time step as pooling result.
-
-- ``sqrt`` sums the data in each time step and takes its square root as the pooling result.
-
-- ``max`` takes the maximum value for each time step as the pooling result.
-
-related API:
-
-- :ref:`api_fluid_layers_sequence_pool`
diff --git a/docs/api_guides/low_level/layers/sequence.rst b/docs/api_guides/low_level/layers/sequence.rst
deleted file mode 100644
index 8b27d7c0a40..00000000000
--- a/docs/api_guides/low_level/layers/sequence.rst
+++ /dev/null
@@ -1,111 +0,0 @@
-..  _api_guide_sequence:
-
-########
-序列
-########
-
-在深度学习领域许多问题涉及到对 `序列（sequence） <https://en.wikipedia.org/wiki/Sequence>`_ 的处理。
-从 Wiki 上的释义可知，序列可以表征多种物理意义，但在深度学习中，最常见的仍然是"时间序列"——一个序列包含多个时间步的信息。
-
-在 Paddle Fluid 中，我们将序列表示为 ``DenseTensor``。
-因为一般进行神经网络计算时都是一个 batch 一个 batch 地计算，所以我们用一个 DenseTensor 来存储一个 mini batch 的序列。
-一个 DenseTensor 的第 0 维包含该 mini batch 中所有序列的所有时间步，并且用 LoD 来记录各个序列的长度，区分不同序列。
-而在运算时，还需要根据 LoD 信息将 DenseTensor 中一个 mini batch 的第 0 维拆开成多个序列。（具体请参考上述 LoD 相关的文档。）
-所以，对这类 DenseTensor 第 0 维的操作不能简单地使用一般的 layer 来进行，针对这一维的操作必须要结合 LoD 的信息。
-(例如，你不能用 :code:`layers.reshape` 来对一个序列的第 0 维进行 reshape)。
-
-为了实行各类针对序列的操作，我们设计了一系列序列相关的 API，专门用于正确处理序列相关的操作。
-实践中，由于一个 DenseTensor 包括一个 mini batch 的序列，同一个 mini batch 中不同的序列通常属于多个 sample，它们彼此之间不会也不应该发生相互作用。
-因此，若一个 layer 以两个（或多个）DenseTensor 为输入（或者以一个 list 的 DenseTensor 为输入），每一个 DenseTensor 代表一个 mini batch 的序列，则第一个 DenseTensor 中的第一个序列只会和第二个 DenseTensor 中的第一个序列发生计算，
-第一个 DenseTensor 中的第二个序列只会和第二个 DenseTensor 中的第二个序列发生计算，第一个 DenseTensor 中的第 i 个序列只会和第二个 DenseTensor 中第 i 个序列发生计算，依此类推。
-
-**总而言之，一个 DenseTensor 存储一个 mini batch 的多个序列，其中的序列个数为 batch size；多个 DenseTensor 间发生计算时，每个 DenseTensor 中的第 i 个序列只会和其他 DenseTensor 中第 i 个序列发生计算。理解这一点对于理解接下来序列相关的操作会至关重要。**
-
-1. sequence_softmax
--------------------
-这个 layer 以一个 mini batch 的序列为输入，在每个序列内做 softmax 操作。其输出为一个 mini batch 相同 shape 的序列，但在序列内是经 softmax 归一化过的。
-这个 layer 往往用于在每个 sequence 内做 softmax 归一化。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_softmax`
-
-
-2. sequence_concat
-------------------
-这个 layer 以一个 list 为输入，该 list 中可以含有多个 DenseTensor，每个 DenseTensor 为一个 mini batch 的序列。
-该 layer 会将每个 batch 中第 i 个序列在时间维度上拼接成一个新序列，作为返回的 batch 中的第 i 个序列。
-理所当然地，list 中每个 DenseTensor 的序列必须有相同的 batch size。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_concat`
-
-
-3. sequence_first_step
-----------------------
-这个 layer 以一个 DenseTensor 作为输入，会取出每个序列中的第一个元素（即第一个时间步的元素），并作为返回值。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_first_step`
-
-
-4. sequence_last_step
----------------------
-同 :code:`sequence_first_step` ，除了本 layer 是取每个序列中最后一个元素（即最后一个时间步）作为返回值。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_last_step`
-
-
-5. sequence_expand
-------------------
-这个 layer 有两个 DenseTensor 的序列作为输入，并按照第二个 DenseTensor 中序列的 LoD 信息来扩展第一个 batch 中的序列。
-通常用来将只有一个时间步的序列（例如 :code:`sequence_first_step` 的返回结果）延展成有多个时间步的序列，以此方便与有多个时间步的序列进行运算。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_expand`
-
-
-6. sequence_expand_as
----------------------
-这个 layer 需要两个 DenseTensor 的序列作为输入，然后将第一个 Tensor 序列中的每一个序列延展成和第二个 Tensor 中对应序列等长的序列。
-不同于 :code:`sequence_expand` ，这个 layer 会将第一个 DenseTensor 中的序列严格延展为和第二个 DenseTensor 中的序列等长。
-如果无法延展成等长的（例如第二个 batch 中的序列长度不是第一个 batch 中序列长度的整数倍），则会报错。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_expand_as`
-
-
-7. sequence_enumerate
----------------------
-这个 layer 需要一个 DenseTensor 的序列作为输入，同时需要指定一个 :code:`win_size` 的长度。这个 layer 将依次取所有序列中长度为 :code:`win_size` 的子序列，并组合成新的序列。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_enumerate`
-
-
-8. sequence_reshape
--------------------
-这个 layer 需要一个 DenseTensor 的序列作为输入，同时需要指定一个 :code:`new_dim` 作为新的序列的维度。
-该 layer 会将 mini batch 内每个序列 reshape 为 new_dim 给定的维度。注意，每个序列的长度会改变（因此 LoD 信息也会变），以适应新的形状。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_reshape`
-
-
-9. sequence_scatter
--------------------
-这个 layer 可以将一个序列的数据 scatter 到另一个 tensor 上。这个 layer 有三个 input，一个要被 scatter 的目标 tensor :code:`input`；
-一个是序列的数据 :code:`update` ，一个是目标 tensor 的上坐标 :code:`index` 。Output 为 scatter 后的 tensor，形状和 :code:`input` 相同。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_scatter`
-
-
-10. sequence_pad
-----------------
-这个 layer 可以将不等长的序列补齐成等长序列。使用这个 layer 需要提供一个 :code:`PadValue` 和一个 :code:`padded_length`。
-前者是用来补齐序列的元素，可以是一个数也可以是一个 tensor；后者是序列补齐的目标长度。
-这个 layer 会返回补齐后的序列，以及一个记录补齐前各个序列长度的 tensor :code:`Length`。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_pad`
-
-
-11. sequence_mask
------------------
-这个 layer 会根据 :code:`input` 生成一个 mask，:code:`input` 是一个记录了每个序列长度的 tensor。
-此外这个 layer 还需要一个参数 :code:`maxlen` 用于指定序列中最长的序列长度。
-通常这个 layer 用于生成一个 mask，将被 pad 后的序列中 pad 的部分过滤掉。
-:code:`input` 的长度 tensor 通常可以直接用 :code:`sequence_pad` 返回的 :code:`Length`。
-
-API Reference 请参考 :ref:`cn_api_fluid_layers_sequence_mask`
diff --git a/docs/api_guides/low_level/layers/sequence_en.rst b/docs/api_guides/low_level/layers/sequence_en.rst
deleted file mode 100644
index 6d6780ef772..00000000000
--- a/docs/api_guides/low_level/layers/sequence_en.rst
+++ /dev/null
@@ -1,110 +0,0 @@
-.. _api_guide_sequence_en:
-
-########
-Sequence
-########
-
-Many problems in the field of deep learning involve the processing of the `sequence <https://en.wikipedia.org/wiki/Sequence>`_.
-From Wiki's definition, sequences can represent a variety of physical meanings, but in deep learning, the most common is still "time sequence" - a sequence containing information of multiple time steps.
-
-In Paddle Fluid, we represent the sequence as ``DenseTensor``.
-Because the general neural network performs computing batch by batch, we use a DenseTensor to store a mini batch of sequences.
-The 0th dimension of a DenseTensor contains all the time steps of all sequences in the mini batch, and LoD is used to record the length of each sequence to distinguish different sequences.
-In the calculation, it is also necessary to split the 0th dimension of a mini batch in the DenseTensor into a number of sequences according to the LoD information. (Please refer to the LoD related documents for details. )
-Therefore, the operation for the 0th dimension of DenseTensor cannot be performed simply by a general layer. The operation of this dimension must be combined with the information of LoD.
-(For example, you can't reshape the 0th dimension of a sequence with :code:`layers.reshape`).
-
-In order to correctly implement various sequence-oriented operations, we have designed a series of sequence-related APIs.
-In practice, because a DenseTensor contains a mini batch of sequences, and different sequences in the same mini batch usually belong to multiple samples, they do not and should not interact with each other.
-Therefore, if a layer is input with two (or more) DenseTensors (or with a list of DenseTensors), and each DenseTensor represents a mini batch of sequences, the first sequence in the first DenseTensor will be only calculated with the first sequence in the second DenseTensor, and the second sequence in the first DenseTensor will only be calculated with the second sequence in the second DenseTensor. To conclude with, the *i'th* sequence in the first DenseTensor will only be calculated with the *i'th* sequence in the second DenseTensor, and so on.
-
-**In summary, a DenseTensor stores multiple sequences in a mini batch, where the number of sequences is batch size; when multiple DenseTensors are calculated, the i'th sequence in each DenseTensor will only be calculated with the i'th of the other DenseTensors. Understanding this is critical to understand the following associated operations.**
-
-1. sequence_softmax
--------------------
-This layer takes a mini batch of sequences as input and does a softmax operation in each sequence. The output is a mini batch of sequences in the same shape, but it is normalized by softmax within the sequence.
-This layer is often used to do softmax normalization within each sequence.
-
- Please refer to :ref:`api_fluid_layers_sequence_softmax`
-
-
-2. sequence_concat
-------------------
-The layer takes a list as input, which can contain multiple DenseTensors, and every DenseTensors is a mini batch of sequences.
-The layer will concatenate the i-th sequence in each batch into a new sequence in the time dimension as the i'th sequence in the returned batch.
-Of course, the sequences of each DenseTensor in the list must have the same batch size.
-
- Please refer to :ref:`api_fluid_layers_sequence_concat`
-
-
-3. sequence_first_step
-----------------------
-This layer takes a DenseTensor as input and takes the first element in each sequence (the element of the first time step) as the return value.
-
- Please refer to :ref:`api_fluid_layers_sequence_first_step`
-
-
-4. sequence_last_step
----------------------
-Same as :code:`sequence_first_step` except that this layer takes the last element in each sequence (i.e. the last time step) as the return value.
-
- Please refer to :ref:`api_fluid_layers_sequence_last_step`
-
-
-5. sequence_expand
-------------------
-This layer has two DenseTensors of sequences as input and extends the sequence in the first batch according to the LoD information of the sequence in the second DenseTensor.
-It is usually used to extend a sequence with only one time step (for example, the return result of :code:`sequence_first_step`) into a sequence with multiple time steps, which is convenient for calculations with sequences composed of multiple time steps.
-
- Please refer to :ref:`api_fluid_layers_sequence_expand`
-
-
-6. sequence_expand_as
----------------------
-This layer takes two DenseTensors of sequences as input and then extends each sequence in the first Tensor to a sequence with the same length as the corresponding one in the second Tensor.
-Unlike :code:`sequence_expand` , this layer will strictly extend the sequence in the first DenseTensor to have the same length as the corresponding one in the second DenseTensor.
-If it cannot be extended to the same length (for example, the sequence length of the second batch is not an integer multiple of the sequence length of the first batch), an error will be reported.
-
- Please refer to :ref:`api_fluid_layers_sequence_expand_as`
-
-
-7. sequence_enumerate
----------------------
-This layer takes a DenseTensor of sequences as input and also specifies the length of a :code:`win_size`. This layer will take a subsequence of length :code:`win_size` in all sequences and combine them into a new sequence.
-
- Please refer to :ref:`api_fluid_layers_sequence_enumerate`
-
-
-8. sequence_reshape
--------------------
-This layer requires a DenseTensor of sequences as input, and you need to specify a :code:`new_dim` as the dimension of the new sequence.
-The layer will reshape each sequence in the mini batch to the dimension given by new_dim. Note that the length of each sequence will be changed (so does the LoD information) to accommodate the new shape.
-
- Please refer to :ref:`api_fluid_layers_sequence_reshape`
-
-
-9. sequence_scatter
--------------------
-This layer can scatter a sequence of data onto another tensor. This layer has three inputs, one is a target tensor to be scattered :code:`input`;
-One is the sequence of data to scatter :code:`update` ; One is the upper coordinate of the target tensor :code:`index` . Output is the tensor after scatter, whose shape is the same as :code:`input`.
-
- Please refer to :ref:`api_fluid_layers_sequence_scatter`
-
-
-10. sequence_pad
-----------------
-This layer can pad sequences of unequal length into equal-length sequences. To use this layer you need to provide a :code:`PadValue` and a :code:`padded_length`.
-The former is the element used to pad the sequence, it can be a number or a tensor; the latter is the target length of the sequence.
-This layer will return the padded sequence, and a tensor :code:`Length` of the length for each sequence before padding.
-
- Please refer to :ref:`api_fluid_layers_sequence_pad`
-
-
-11. sequence_mask
------------------
-This layer will generate a mask based on :code:`input`, where the :code:`input` is a tensor that records the length of each sequence.
-In addition, this layer requires a parameter :code:`maxlen` to specify the largest sequence length in the sequence.
-Usually, this layer is used to generate a mask that will filter away the portion of the paddings in the sequence.
-The :code:`input` tensor can usually directly use the returned :code:`Length` from :code:`sequence_pad`  .
-
- Please refer to :ref:`api_fluid_layers_sequence_mask`
diff --git a/docs/api_guides/low_level/layers/sparse_update.rst b/docs/api_guides/low_level/layers/sparse_update.rst
deleted file mode 100644
index c77b9f90809..00000000000
--- a/docs/api_guides/low_level/layers/sparse_update.rst
+++ /dev/null
@@ -1,45 +0,0 @@
-.. _api_guide_sparse_update:
-
-#####
-稀疏更新
-#####
-
-Fluid 的 :ref:`cn_api_fluid_layers_embedding`  层在单机训练和分布式训练时，均可以支持“稀疏更新”，即梯度以 sparse tensor 结构存储，只保存梯度不为 0 的行。
-在分布式训练中，对于较大的 embedding 层，开启稀疏更新有助于减少通信数据量，提升训练速度。
-
-在 paddle 内部，我们用 lookup_table 来实现 embedding。下边这张图说明了 embedding 在正向和反向计算的过程：
-
-如图所示：一个 Tensor 中有两行不为 0，正向计算的过程中，我们使用 ids 存储不为 0 的行，并使用对应的两行数据来进行计算；反向更新的过程也只更新这两行。
-
-.. image:: ../../../images/lookup_table_training.png
-   :scale: 50 %
-
-embedding 使用例子:
----------------------
-
-API 详细使用方法参考 :ref:`cn_api_fluid_layers_embedding` ，以下是一个简单的例子：
-
-.. code-block:: python
-
-   DICT_SIZE = 10000 * 10
-   EMBED_SIZE = 64
-   IS_SPARSE = False
-   def word_emb(word, dict_size=DICT_SIZE, embed_size=EMBED_SIZE):
-       embed = fluid.layers.embedding(
-           input=word,
-           size=[dict_size, embed_size],
-           dtype='float32',
-           param_attr=fluid.ParamAttr(
-               initializer=fluid.initializer.Normal(scale=1/math.sqrt(dict_size))),
-           is_sparse=IS_SPARSE,
-           is_distributed=False)
-       return embed
-
-以上参数中：
-
-- :code:`is_sparse` ： 反向计算的时候梯度是否为 sparse tensor。如果不设置，梯度是一个 :ref:`Lod_Tensor <cn_user_guide_lod_tensor>` 。默认为 False。
-
-- :code:`is_distributed` ： 标志是否是用在分布式的场景下。一般大规模稀疏更新（embedding 的第 0 维维度很大，比如几百万以上）才需要设置。具体可以参考大规模稀疏的 API guide  :ref:`cn_api_guide_async_training`  。默认为 False。
-
-- API 汇总:
- - :ref:`cn_api_fluid_layers_embedding`
diff --git a/docs/api_guides/low_level/layers/sparse_update_en.rst b/docs/api_guides/low_level/layers/sparse_update_en.rst
deleted file mode 100755
index 8e0f8fc7885..00000000000
--- a/docs/api_guides/low_level/layers/sparse_update_en.rst
+++ /dev/null
@@ -1,45 +0,0 @@
-.. _api_guide_sparse_update_en:
-
-###############
-Sparse update
-###############
-
-Fluid's :ref:`api_fluid_layers_embedding` layer supports "sparse updates" in both single-node and distributed training, which means gradients are stored in a sparse tensor structure where only rows with non-zero gradients are saved.
-In distributed training, for larger embedding layers, sparse updates reduce the amount of communication data and speed up training.
-
-In paddle, we use lookup_table to implement embedding. The figure below illustrates the process of embedding in the forward and backward calculations:
-
-As shown in the figure: two rows in a Tensor are not 0. In the process of forward calculation, we use ids to store rows that are not 0, and use the corresponding two rows of data for calculation; the process of backward update is only to update the two lines.
-
-.. image:: ../../../images/lookup_table_training.png
-   :scale: 50 %
-
-Example
---------------------------
-
-API reference :ref:`api_fluid_layers_embedding` . Here is a simple example:
-
-.. code-block:: python
-
-   DICT_SIZE = 10000 * 10
-   EMBED_SIZE = 64
-   IS_SPARSE = False
-   def word_emb(word, dict_size=DICT_SIZE, embed_size=EMBED_SIZE):
-       embed = fluid.layers.embedding(
-           input=word,
-           size=[dict_size, embed_size],
-           dtype='float32',
-           param_attr=fluid.ParamAttr(
-               initializer=fluid.initializer.Normal(scale=1/math.sqrt(dict_size))),
-           is_sparse=IS_SPARSE,
-           is_distributed=False)
-       return embed
-
-The parameters:
-
-- :code:`is_sparse` : Whether the gradient is a sparse tensor in the backward calculation. If not set, the gradient is a `LodTensor <https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/user_guides/howto/basic_concept/lod_tensor_en.html>`_ . The default is False.
-
-- :code:`is_distributed` : Whether the current training is in a distributed scenario. Generally, this parameter can only be set in large-scale sparse updates (the 0th dimension of embedding is very large, such as several million or more). For details, please refer to the large-scale sparse API guide :ref:`api_guide_async_training`. The default is False.
-
-- API :
-   - :ref:`api_fluid_layers_embedding`
diff --git a/docs/api_guides/low_level/layers/tensor.rst b/docs/api_guides/low_level/layers/tensor.rst
deleted file mode 100644
index b8f209ae635..00000000000
--- a/docs/api_guides/low_level/layers/tensor.rst
+++ /dev/null
@@ -1,141 +0,0 @@
-..  _api_guide_tensor:
-
-########
-张量
-########
-
-Fluid 中使用两种数据结构来承载数据，分别是 `Tensor 和 LoD_Tensor <../../../user_guides/howto/basic_concept/lod_tensor.html>`_ 。 其中 LoD-Tensor 是 Fluid 的特有概念，它在 Tensor 基础上附加了序列信息。框架中可传输的数据包括：输入、输出、网络中的可学习参数，全部统一使用 LoD-Tensor 表示，Tensor 可以看作是一种特殊的 LoD-Tensor。
-
-下面介绍这两种数据的相关操作。
-
-Tensor
-======
-
-1. create_tensor
----------------------
-Tensor 用于在框架中承载数据，使用 :code:`create_tensor` 可以创建一个指定数据类型的 Lod-Tensor 变量，
-
-API reference 请参考： :ref:`cn_api_fluid_layers_create_tensor`
-
-
-2. create_parameter
----------------------
-神经网络的训练过程是一个对参数的学习过程，Fluid 使用 :code:`create_parameter` 创建一个可学习的参数。该参数的值可以被 operator 改变。
-
-API reference 请参考：:ref:`cn_api_fluid_layers_create_parameter`
-
-
-
-3. create_global_var
----------------------
-Fluid 使用 :code:`create_global_var` 创建一个全局 tensor，通过此 API 可以指定被创建 Tensor 变量的数据类型、形状和值。
-
-API reference 请参考：:ref:`cn_api_fluid_layers_create_global_var`
-
-
-4. cast
----------------
-
-Fluid 使用 :code:`cast` 将数据转换为指定类型。
-
-API reference 请参考：:ref:`cn_api_fluid_layers_cast`
-
-
-5. concat
-----------------
-
-Fluid 使用 :code:`concat` 将输入数据沿指定维度连接。
-
-API reference 请参考：:ref:`cn_api_fluid_layers_concat`
-
-
-6. sums
-----------------
-
-Fluid 使用 :code:`sums` 执行对输入数据的加和。
-
-API reference 请参考：:ref:`cn_api_fluid_layers_sums`
-
-7. fill_constant
------------------
-
-Fluid 使用 :code:`fill_constant` 创建一个具有特定形状和类型的 Tensor。可以通过 :code:`value` 设置该变量的初始值。
-
-API reference 请参考： :ref:`cn_api_fluid_layers_fill_constant`
-
-8. assign
----------------
-
-Fluid 使用 :code:`assign` 复制一个变量。
-
-API reference 请参考：:ref:`cn_api_fluid_layers_assign`
-
-9. argmin
---------------
-
-Fluid 使用 :code:`argmin` 计算输入 Tensor 指定轴上最小元素的索引。
-
-API reference 请参考：:ref:`cn_api_fluid_layers_assign`
-
-10. argmax
------------
-
-Fluid 使用 :code:`argmax` 计算输入 Tensor 指定轴上最大元素的索引。
-
-API reference 请参考：:ref:`cn_api_fluid_layers_argmax`
-
-11. argsort
-------------
-
-Fluid 使用 :code:`argsort` 对输入 Tensor 在指定轴上进行排序，并返回排序后的数据变量及其对应的索引值。
-
-API reference 请参考： :ref:`cn_api_fluid_layers_argsort`
-
-12. ones
--------------
-
-Fluid 使用 :code:`ones` 创建一个指定大小和数据类型的 Tensor，且初始值为 1。
-
-API reference 请参考： :ref:`cn_api_fluid_layers_ones`
-
-13. zeros
----------------
-
-Fluid 使用 :code:`zeros` 创建一个指定大小和数据类型的 Tensor，且初始值为 0。
-
-API reference 请参考： :ref:`cn_api_fluid_layers_zeros`
-
-14. reverse
--------------------
-
-Fluid 使用 :code:`reverse` 沿指定轴反转 Tensor。
-
-API reference 请参考： :ref:`cn_api_fluid_layers_reverse`
-
-
-
-LoD-Tensor
-============
-
-LoD-Tensor 非常适用于序列数据，相关知识可以参考阅读 `LoD_Tensor <../../../user_guides/howto/basic_concept/lod_tensor.html>`_ 。
-
-1. create_lod_tensor
------------------------
-
-Fluid 使用 :code:`create_lod_tensor` 基于 numpy 数组、列表或现有 LoD_Tensor 创建拥有新的层级信息的 LoD_Tensor。
-
-API reference 请参考： :ref:`cn_api_fluid_create_lod_tensor`
-
-2. create_random_int_lodtensor
-----------------------------------
-
-Fluid 使用 :code:`create_random_int_lodtensor` 创建一个由随机整数组成的 LoD_Tensor。
-
-API reference 请参考： :ref:`cn_api_fluid_create_random_int_lodtensor`
-
-3. reorder_lod_tensor_by_rank
----------------------------------
-
-Fluid 使用 :code:`reorder_lod_tensor_by_rank` 对输入 LoD_Tensor 的序列信息按指定顺序重拍。
-
-API reference 请参考：:ref:`cn_api_fluid_layers_reorder_lod_tensor_by_rank`
diff --git a/docs/api_guides/low_level/layers/tensor_en.rst b/docs/api_guides/low_level/layers/tensor_en.rst
deleted file mode 100755
index 9f62acb452e..00000000000
--- a/docs/api_guides/low_level/layers/tensor_en.rst
+++ /dev/null
@@ -1,141 +0,0 @@
-.. _api_guide_tensor_en:
-
-########
-Tensor
-########
-
-There are two data structures used in Fluid to host the data, namely `Tensor and LoD_Tensor <../../../user_guides/howto/basic_concept/lod_tensor_en.html>`_ .  LoD-Tensor is a unique concept of Fluid, which appends sequence information to Tensor. The data that can be transferred in the framework includes: input, output, and learnable parameters in the network. All of them are uniformly represented by LoD-Tensor. In addition, tensor can be regarded as a special LoD-Tensor.
-
-Now let's take a closer look at the operations related to these two types of data.
-
-Tensor
-======
-
-1. create_tensor
----------------------
-Tensor is used to carry data in the framework, using :code:`create_tensor` to create a Lod-Tensor variable of the specified the data type.
-
-API reference : :ref:`api_fluid_layers_create_tensor`
-
-
-2. create_parameter
----------------------
-The neural network training process is a learning process for parameters. Fluid uses :code:`create_parameter` to create a learnable parameter. The value of this parameter can be changed by the operator.
-
-API reference  : :ref:`api_fluid_layers_create_parameter`
-
-
-
-3. create_global_var
----------------------
-Fluid uses :code:`create_global_var` to create a global tensor and this API allows you to specify the data type, shape, and value of the Tensor variable being created.
-
-API reference  : :ref:`api_fluid_layers_create_global_var`
-
-
-4. cast
----------------
-
-Fluid uses :code:`cast` to convert the data to the specified type.
-
-API reference  : :ref:`api_fluid_layers_cast`
-
-
-5.concat
-----------------
-
-Fluid uses :code:`concat` to concatenate input data along a specified dimension.
-
-API reference  : :ref:`api_fluid_layers_concat`
-
-
-6. sums
-----------------
-
-Fluid uses :code:`sums` to sum up the input data.
-
-API reference  : :ref:`api_fluid_layers_sums`
-
-7. fill_constant
------------------
-
-Fluid uses :code:`fill_constant` to create a Tensor with a specific shape and type. The initial value of this variable can be set via :code:`value`.
-
-API reference : :ref:`api_fluid_layers_fill_constant`
-
-8. assign
----------------
-
-Fluid uses :code:`assign` to duplicate a variable.
-
-API reference  : :ref:`api_fluid_layers_assign`
-
-9. argmin
---------------
-
-Fluid uses :code:`argmin` to calculate the index of the smallest element on the specified axis of Tensor.
-
-API reference  : :ref:`api_fluid_layers_argmin`
-
-10. argmax
------------
-
-Fluid uses :code:`argmax` to calculate the index of the largest element on the specified axis of Tensor.
-
-API reference  : :ref:`api_fluid_layers_argmax`
-
-11. argsort
-------------
-
-Fluid uses :code:`argsort` to sort the input Tensor on the specified axis and it will return the sorted data variables and their corresponding index values.
-
-API reference : :ref:`api_fluid_layers_argsort`
-
-12. ones
--------------
-
-Fluid uses :code:`ones` to create a Tensor of the specified size and data type with an initial value of 1.
-
-API reference : :ref:`api_fluid_layers_ones`
-
-13. zeros
----------------
-
-Fluid uses :code:`zeros` to create a Tensor of the specified size and data type with an initial value of zero.
-
-API reference : :ref:`api_fluid_layers_zeros`
-
-14. reverse
--------------------
-
-Fluid uses :code:`reverse` to invert Tensor along the specified axis.
-
-API reference : :ref:`api_fluid_layers_reverse`
-
-
-
-LoD-Tensor
-============
-
-LoD-Tensor is very suitable for sequence data. For related knowledge, please read `Tensor and LoD_Tensor <../../../user_guides/howto/basic_concept/lod_tensor_en.html>`_ .
-
-1.create_lod_tensor
------------------------
-
-Fluid uses :code:`create_lod_tensor` to create a LoD_Tensor with new hierarchical information based on a numpy array, a list, or an existing LoD_Tensor.
-
-API reference : :ref:`api_fluid_create_lod_tensor`
-
-2. create_random_int_lodtensor
-----------------------------------
-
-Fluid uses :code:`create_random_int_lodtensor` to create a LoD_Tensor composed of random integers.
-
-API reference : :ref:`api_fluid_create_random_int_lodtensor`
-
-3. reorder_lod_tensor_by_rank
----------------------------------
-
-Fluid uses :code:`reorder_lod_tensor_by_rank` to reorder the sequence information of the input LoD_Tensor in the specified order.
-
-API reference : :ref:`api_fluid_layers_reorder_lod_tensor_by_rank`