Skip to content

Commit 78f58c8

Browse files
jc4x4alamb
andauthored
Add new user doc to translate logical plan to physical plan (#12026)
* Add new user doc to translate logical plan to physical plan #7306 * prettier * Run doc examples as part of cargo --doc * Update first example to run * Fix next example * fix last example * prettier * clarify table source * prettier * Revert changes --------- Co-authored-by: Andrew Lamb <[email protected]>
1 parent 1c7209b commit 78f58c8

File tree

3 files changed

+139
-56
lines changed

3 files changed

+139
-56
lines changed

datafusion/core/src/lib.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -678,6 +678,12 @@ doc_comment::doctest!(
678678
library_user_guide_sql_api
679679
);
680680

681+
#[cfg(doctest)]
682+
doc_comment::doctest!(
683+
"../../../docs/source/library-user-guide/building-logical-plans.md",
684+
library_user_guide_logical_plans
685+
);
686+
681687
#[cfg(doctest)]
682688
doc_comment::doctest!(
683689
"../../../docs/source/library-user-guide/using-the-dataframe-api.md",

datafusion/expr/src/logical_plan/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ pub mod tree_node;
2626

2727
pub use builder::{
2828
build_join_schema, table_scan, union, wrap_projection_for_join_if_necessary,
29-
LogicalPlanBuilder, UNNAMED_TABLE,
29+
LogicalPlanBuilder, LogicalTableSource, UNNAMED_TABLE,
3030
};
3131
pub use ddl::{
3232
CreateCatalog, CreateCatalogSchema, CreateExternalTable, CreateFunction,

docs/source/library-user-guide/building-logical-plans.md

Lines changed: 132 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -31,44 +31,52 @@ explained in more detail in the [Query Planning and Execution Overview] section
3131
DataFusion's [LogicalPlan] is an enum containing variants representing all the supported operators, and also
3232
contains an `Extension` variant that allows projects building on DataFusion to add custom logical operators.
3333

34-
It is possible to create logical plans by directly creating instances of the [LogicalPlan] enum as follows, but is is
34+
It is possible to create logical plans by directly creating instances of the [LogicalPlan] enum as shown, but it is
3535
much easier to use the [LogicalPlanBuilder], which is described in the next section.
3636

3737
Here is an example of building a logical plan directly:
3838

39-
<!-- source for this example is in datafusion_docs::library_logical_plan::plan_1 -->
40-
4139
```rust
42-
// create a logical table source
43-
let schema = Schema::new(vec![
44-
Field::new("id", DataType::Int32, true),
45-
Field::new("name", DataType::Utf8, true),
46-
]);
47-
let table_source = LogicalTableSource::new(SchemaRef::new(schema));
48-
49-
// create a TableScan plan
50-
let projection = None; // optional projection
51-
let filters = vec![]; // optional filters to push down
52-
let fetch = None; // optional LIMIT
53-
let table_scan = LogicalPlan::TableScan(TableScan::try_new(
54-
"person",
55-
Arc::new(table_source),
56-
projection,
57-
filters,
58-
fetch,
59-
)?);
60-
61-
// create a Filter plan that evaluates `id > 500` that wraps the TableScan
62-
let filter_expr = col("id").gt(lit(500));
63-
let plan = LogicalPlan::Filter(Filter::try_new(filter_expr, Arc::new(table_scan))?);
64-
65-
// print the plan
66-
println!("{}", plan.display_indent_schema());
40+
use datafusion::common::DataFusionError;
41+
use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
42+
use datafusion::logical_expr::{Filter, LogicalPlan, TableScan, LogicalTableSource};
43+
use datafusion::prelude::*;
44+
use std::sync::Arc;
45+
46+
fn main() -> Result<(), DataFusionError> {
47+
// create a logical table source
48+
let schema = Schema::new(vec![
49+
Field::new("id", DataType::Int32, true),
50+
Field::new("name", DataType::Utf8, true),
51+
]);
52+
let table_source = LogicalTableSource::new(SchemaRef::new(schema));
53+
54+
// create a TableScan plan
55+
let projection = None; // optional projection
56+
let filters = vec![]; // optional filters to push down
57+
let fetch = None; // optional LIMIT
58+
let table_scan = LogicalPlan::TableScan(TableScan::try_new(
59+
"person",
60+
Arc::new(table_source),
61+
projection,
62+
filters,
63+
fetch,
64+
)?
65+
);
66+
67+
// create a Filter plan that evaluates `id > 500` that wraps the TableScan
68+
let filter_expr = col("id").gt(lit(500));
69+
let plan = LogicalPlan::Filter(Filter::try_new(filter_expr, Arc::new(table_scan)) ? );
70+
71+
// print the plan
72+
println!("{}", plan.display_indent_schema());
73+
Ok(())
74+
}
6775
```
6876

6977
This example produces the following plan:
7078

71-
```
79+
```text
7280
Filter: person.id > Int32(500) [id:Int32;N, name:Utf8;N]
7381
TableScan: person [id:Int32;N, name:Utf8;N]
7482
```
@@ -78,7 +86,7 @@ Filter: person.id > Int32(500) [id:Int32;N, name:Utf8;N]
7886
DataFusion logical plans can be created using the [LogicalPlanBuilder] struct. There is also a [DataFrame] API which is
7987
a higher-level API that delegates to [LogicalPlanBuilder].
8088

81-
The following associated functions can be used to create a new builder:
89+
There are several functions that can can be used to create a new builder, such as
8290

8391
- `empty` - create an empty plan with no fields
8492
- `values` - create a plan from a set of literal values
@@ -102,41 +110,107 @@ The following example demonstrates building the same simple query plan as the pr
102110
<!-- source for this example is in datafusion_docs::library_logical_plan::plan_builder_1 -->
103111

104112
```rust
105-
// create a logical table source
106-
let schema = Schema::new(vec![
107-
Field::new("id", DataType::Int32, true),
108-
Field::new("name", DataType::Utf8, true),
109-
]);
110-
let table_source = LogicalTableSource::new(SchemaRef::new(schema));
111-
112-
// optional projection
113-
let projection = None;
114-
115-
// create a LogicalPlanBuilder for a table scan
116-
let builder = LogicalPlanBuilder::scan("person", Arc::new(table_source), projection)?;
117-
118-
// perform a filter operation and build the plan
119-
let plan = builder
120-
.filter(col("id").gt(lit(500)))? // WHERE id > 500
121-
.build()?;
122-
123-
// print the plan
124-
println!("{}", plan.display_indent_schema());
113+
use datafusion::common::DataFusionError;
114+
use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
115+
use datafusion::logical_expr::{LogicalPlanBuilder, LogicalTableSource};
116+
use datafusion::prelude::*;
117+
use std::sync::Arc;
118+
119+
fn main() -> Result<(), DataFusionError> {
120+
// create a logical table source
121+
let schema = Schema::new(vec![
122+
Field::new("id", DataType::Int32, true),
123+
Field::new("name", DataType::Utf8, true),
124+
]);
125+
let table_source = LogicalTableSource::new(SchemaRef::new(schema));
126+
127+
// optional projection
128+
let projection = None;
129+
130+
// create a LogicalPlanBuilder for a table scan
131+
let builder = LogicalPlanBuilder::scan("person", Arc::new(table_source), projection)?;
132+
133+
// perform a filter operation and build the plan
134+
let plan = builder
135+
.filter(col("id").gt(lit(500)))? // WHERE id > 500
136+
.build()?;
137+
138+
// print the plan
139+
println!("{}", plan.display_indent_schema());
140+
Ok(())
141+
}
125142
```
126143

127144
This example produces the following plan:
128145

129-
```
146+
```text
130147
Filter: person.id > Int32(500) [id:Int32;N, name:Utf8;N]
131148
TableScan: person [id:Int32;N, name:Utf8;N]
132149
```
133150

151+
## Translating Logical Plan to Physical Plan
152+
153+
Logical plans can not be directly executed. They must be "compiled" into an
154+
[`ExecutionPlan`], which is often referred to as a "physical plan".
155+
156+
Compared to `LogicalPlan`s `ExecutionPlans` have many more details such as
157+
specific algorithms and detailed optimizations compared to. Given a
158+
`LogicalPlan` the easiest way to create an `ExecutionPlan` is using
159+
[`SessionState::create_physical_plan`] as shown below
160+
161+
```rust
162+
use datafusion::datasource::{provider_as_source, MemTable};
163+
use datafusion::common::DataFusionError;
164+
use datafusion::physical_plan::display::DisplayableExecutionPlan;
165+
use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
166+
use datafusion::logical_expr::{LogicalPlanBuilder, LogicalTableSource};
167+
use datafusion::prelude::*;
168+
use std::sync::Arc;
169+
170+
// Creating physical plans may access remote catalogs and data sources
171+
// thus it must be run with an async runtime.
172+
#[tokio::main]
173+
async fn main() -> Result<(), DataFusionError> {
174+
175+
// create a default table source
176+
let schema = Schema::new(vec![
177+
Field::new("id", DataType::Int32, true),
178+
Field::new("name", DataType::Utf8, true),
179+
]);
180+
// To create an ExecutionPlan we must provide an actual
181+
// TableProvider. For this example, we don't provide any data
182+
// but in production code, this would have `RecordBatch`es with
183+
// in memory data
184+
let table_provider = Arc::new(MemTable::try_new(Arc::new(schema), vec![])?);
185+
// Use the provider_as_source function to convert the TableProvider to a table source
186+
let table_source = provider_as_source(table_provider);
187+
188+
// create a LogicalPlanBuilder for a table scan without projection or filters
189+
let logical_plan = LogicalPlanBuilder::scan("person", table_source, None)?.build()?;
190+
191+
// Now create the physical plan by calling `create_physical_plan`
192+
let ctx = SessionContext::new();
193+
let physical_plan = ctx.state().create_physical_plan(&logical_plan).await?;
194+
195+
// print the plan
196+
println!("{}", DisplayableExecutionPlan::new(physical_plan.as_ref()).indent(true));
197+
Ok(())
198+
}
199+
```
200+
201+
This example produces the following physical plan:
202+
203+
```text
204+
MemoryExec: partitions=0, partition_sizes=[]
205+
```
206+
134207
## Table Sources
135208

136-
The previous example used a [LogicalTableSource], which is used for tests and documentation in DataFusion, and is also
137-
suitable if you are using DataFusion to build logical plans but do not use DataFusion's physical planner. However, if you
138-
want to use a [TableSource] that can be executed in DataFusion then you will need to use [DefaultTableSource], which is a
139-
wrapper for a [TableProvider].
209+
The previous examples use a [LogicalTableSource], which is used for tests and documentation in DataFusion, and is also
210+
suitable if you are using DataFusion to build logical plans but do not use DataFusion's physical planner.
211+
212+
However, it is more common to use a [TableProvider]. To get a [TableSource] from a
213+
[TableProvider], use [provider_as_source] or [DefaultTableSource].
140214

141215
[query planning and execution overview]: https://docs.rs/datafusion/latest/datafusion/index.html#query-planning-and-execution-overview
142216
[architecture guide]: https://docs.rs/datafusion/latest/datafusion/index.html#architecture
@@ -145,5 +219,8 @@ wrapper for a [TableProvider].
145219
[dataframe]: using-the-dataframe-api.md
146220
[logicaltablesource]: https://docs.rs/datafusion-expr/latest/datafusion_expr/logical_plan/builder/struct.LogicalTableSource.html
147221
[defaulttablesource]: https://docs.rs/datafusion/latest/datafusion/datasource/default_table_source/struct.DefaultTableSource.html
222+
[provider_as_source]: https://docs.rs/datafusion/latest/datafusion/datasource/default_table_source/fn.provider_as_source.html
148223
[tableprovider]: https://docs.rs/datafusion/latest/datafusion/datasource/provider/trait.TableProvider.html
149224
[tablesource]: https://docs.rs/datafusion-expr/latest/datafusion_expr/trait.TableSource.html
225+
[`executionplan`]: https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html
226+
[`sessionstate::create_physical_plan`]: https://docs.rs/datafusion/latest/datafusion/execution/session_state/struct.SessionState.html#method.create_physical_plan

0 commit comments

Comments
 (0)