Minor fixes on Intro How It Works (#9939)

mergennachin · metascroy · web-flow · commit 0ee74ff0b1f9 · 2025-04-08T12:56:53.000-04:00
- Replace SDK with Developer Tools
- Added a few links.

Co-authored-by: Scott Roy &lt;161522778+metascroy@users.noreply.github.com&gt;
diff --git a/docs/source/intro-how-it-works.md b/docs/source/intro-how-it-works.md
@@ -17,10 +17,10 @@ ExecuTorch provides the following benefits to engineers who need to deploy machi
 
 * **Export that is robust and powerful.** Export uses [`torch.export()`](https://pytorch.org/docs/main/export.html), which uses the same technology used in PyTorch 2.x to capture PyTorch programs for fast execution. While eager mode is flexible and allows experimentation in Python, it may not work well if Python isn't available or cannot deliver efficient execution. The _Export Intermediate Representation (Export IR)_ that export flow generates can describe a wide range of dynamism in PyTorch models, including control flow and dynamic shapes, which makes it a powerful tool for fully capturing existing PyTorch models with little effort.
 * **Operator standardization.** During the graph export process, the nodes in the graph represent operators such as addition, multiplication, or convolution. These operators are part of a small standardized list called the [Core ATen Op set](https://pytorch.org/docs/main/torch.compiler_ir.html#core-aten-ir). Most PyTorch programs can be decomposed into a graph using this small set of operators during export. Small list of standardized operators reduces the surface, needed to be covered, by third-party operator libraries as well as accelerator backends, in order to run models exported for ExecuTorch. ExecuTorch runtime ships with one such library, called portable operator library, that implements core ATen opset.
-* **Standardization for compiler interfaces (aka delegates) and the OSS ecosystem.** In addition to the _Operator standardization_ above, ExecuTorch has a standardized interface for delegation to compilers. This allows third-party vendors and compilers to implement interfaces and API entry points for compilation and execution of (either partial or full) graphs targeting their specialized hardware. This provides greater flexibility in terms of hardware support and performance optimization, as well as easier integration with the PyTorch open source ecosystem for on-device AI.
-* **First-party SDK and toolchain.** Due to the above standardization efforts, it was possible to build a unified first-party SDK for ExecuTorch, where developers can export, compile, and deploy to a wide range of target devices--such as iOS, Android, and microcontrollers--using the same SDK, streamlining the process and gaining productivity. Additionally, the SDK provides profiling and debugging functionality to easily inspect intermediate states, which are core parts of most developer workflows.
+* **Standardization for compiler interfaces (aka delegates) and the OSS ecosystem.** In addition to the _Operator standardization_ above, ExecuTorch has a [standardized interface](./compiler-delegate-and-partitioner.md) for delegation to compilers. This allows third-party vendors and compilers to implement interfaces and API entry points for compilation and execution of (either partial or full) graphs targeting their specialized hardware. This provides greater flexibility in terms of hardware support and performance optimization, as well as easier integration with the PyTorch open source ecosystem for on-device AI.
+* **First-party Developer Tools** Due to the above standardization efforts, it was possible to build unified first-party [developer tools](./devtools-overview.md) for ExecuTorch, where developers can export, compile, and deploy to a wide range of target devices—such as iOS, Android, and microcontrollers—using the same APIs, streamlining the process and increasing productivity. Additionally, ExecuTorch provides profiling and debugging functionality to easily inspect intermediate states, which are core parts of most developer workflows.
 * **No intermediate conversions necessary.** ExecuTorch's main design principle is to allow developers to run their models on target devices without the need for converting to third-party intermediate representations. This eliminates a number of problems that on-device developers typically face when working with these conversion steps, such as lack of debuggability and profiling, the need to familiarize themselves with hardware-specific tools, and models not being able to run due to conversion steps failing.
-* **Ease of customization.** Developers can optimize their deployment for even better performance gains on the target architecture by applying custom techniques, such as linking with high-performance operator implementations or customizing memory planning based on storage and latency trade-offs. This level of customization is made possible through the standardization of the compiler pass interface and registration APIs on exported graphs.
+* **Ease of customization.** Developers can optimize their deployment for even better performance gains on the target architecture by applying custom techniques, such as [linking with high-performance operator implementations](./kernel-library-custom-aten-kernel.md) or [customizing memory planning](./compiler-memory-planning.md) based on storage and latency trade-offs. This level of customization is made possible through the standardization of the [compiler pass interface](./compiler-custom-compiler-passes.md) and registration APIs on exported graphs.
 * **Low overhead runtime and execution.** The ExecuTorch runtime, written in C++, is highly efficient and can run on a wide range of architectures, including Linux, iOS, Android, embedded systems, and bare metal hardware, with little additional setup or configuration. It is capable of linking in only those operators needed for the model, resulting in a minimal runtime binary size. It is also able to run at low latency because of ahead-of-time compilation and memory planning stages, with the runtime responsible only for execution (e.g., call operator `conv` and save the result in memory location X).
 
 The above highlights the key advantages of ExecuTorch across three main categories: portability, productivity, and performance. We consider it to be an ideal choice for enabling on-device AI across mobile and edge computing platforms.