Skip to content

Minor: Improve documentation of MemoryPool #6388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion datafusion/execution/src/memory_pool/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,30 @@ pub mod proxy;

pub use pool::*;

/// The pool of memory on which [`MemoryReservation`] record their memory reservations
/// The pool of memory on which [`MemoryReservation`]s record their
/// memory reservations.
///
/// DataFusion is a streaming query engine, processing most queries
/// without buffering the entire input. However, certain operations
/// such as sorting and grouping/joining with a large number of
/// distinct groups/keys, can require buffering intermediate results
/// and for large datasets this can require large amounts of memory.
///
/// In order to avoid allocating memory until the OS or the container
/// system kills the process, DataFusion operators only allocate
/// memory they are able to reserve from the configured
/// [`MemoryPool`]. Once the memory tracked by the pool is exhausted,
/// operators must either free memory by spilling to local disk or
/// error.
///
/// A `MemoryPool` can be shared by concurrently executing plans in
/// the same process to control memory usage in a multi-tenant system.
///
/// The following memory pool implementations are available:
///
/// * [`UnboundedMemoryPool`](pool::UnboundedMemoryPool)
/// * [`GreedyMemoryPool`](pool::GreedyMemoryPool)
/// * [`FairSpillPool`](pool::FairSpillPool)
pub trait MemoryPool: Send + Sync + std::fmt::Debug {
/// Registers a new [`MemoryConsumer`]
///
Expand Down
16 changes: 15 additions & 1 deletion datafusion/execution/src/memory_pool/pool.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

use crate::memory_pool::{MemoryConsumer, MemoryPool, MemoryReservation};
use datafusion_common::{DataFusionError, Result};
use log::debug;
use parking_lot::Mutex;
use std::sync::atomic::{AtomicUsize, Ordering};

Expand Down Expand Up @@ -45,7 +46,11 @@ impl MemoryPool for UnboundedMemoryPool {
}
}

/// A [`MemoryPool`] that implements a greedy first-come first-serve limit
/// A [`MemoryPool`] that implements a greedy first-come first-serve limit.
///
/// This pool works well for queries that do not need to spill or have
/// a single spillable operator. See [`GreedyMemoryPool`] if there are
/// multiple spillable operators that all will spill.
#[derive(Debug)]
pub struct GreedyMemoryPool {
pool_size: usize,
Expand All @@ -55,6 +60,7 @@ pub struct GreedyMemoryPool {
impl GreedyMemoryPool {
/// Allocate up to `limit` bytes
pub fn new(pool_size: usize) -> Self {
debug!("Created new GreedyMemoryPool(pool_size={pool_size})");
Self {
pool_size,
used: AtomicUsize::new(0),
Expand Down Expand Up @@ -92,6 +98,13 @@ impl MemoryPool for GreedyMemoryPool {
/// an even fraction of the available memory sans any unspillable reservations
/// (i.e. `(pool_size - unspillable_memory) / num_spillable_reservations`)
///
/// This pool works best when you know beforehand the query has
/// multiple spillable operators that will likely all need to
/// spill. Sometimes it will cause spills even when there was
/// sufficient memory (reserved for other operators) to avoid doing
/// so.
///
/// ```text
/// ┌───────────────────────z──────────────────────z───────────────┐
/// │ z z │
/// │ z z │
Expand All @@ -100,6 +113,7 @@ impl MemoryPool for GreedyMemoryPool {
/// │ z z │
/// │ z z │
/// └───────────────────────z──────────────────────z───────────────┘
/// ```
///
/// Unspillable memory is allocated in a first-come, first-serve fashion
#[derive(Debug)]
Expand Down