You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched in the issues and found no similar issues.
Describe the feature
This feature aims to introduce dynamic resizing (scaling up and down) capabilities to Kyuubi's Engine Pool mechanism. By introducing a pluggable PoolScalingStrategy, a coordinator manager EnginePoolManager, an accessor interface EnginePoolAccessor to interact with specific pool implementations, and corresponding monitoring metrics PoolMetrics, the size of the Engine Pool can automatically adjust based on predefined policies—such as time-based rules and, potentially in the future, load-based strategies.
Motivation
Current Situation:
Kyuubi’s Engine Pool pre-starts and manages a set of engine instances for specific users or groups to reduce session creation latency. Currently, the size of each pool is statically configured via kyuubi.engine.pool.size, requiring administrators to manually set and adjust this size based on expected loads.
Problems:
Resource Waste: A statically large pool during low-load periods leaves many engines idle, consuming CPU, memory, and possibly licensing resources unnecessarily.
Performance Bottlenecks and User Experience Degradation: During peak traffic, a statically small pool may be insufficient to handle concurrency, causing session requests to queue or even time out, hurting throughput and user experience.
Operational Complexity: Manual monitoring and resizing based on experience is inefficient, reactive, error-prone, and ill-suited for complex or rapidly changing load patterns.
Inadequate Cloud-Native Adaptability: Static sizing cannot leverage cloud elasticity effectively, limiting dynamic resource allocation per actual demand—contrary to cloud-native principles.
Real-World Scenario:
We have a big data cluster submitting jobs via Kyuubi with day-night resource elasticity (e.g., 8–10 TB memory resident during the day and roughly 1.5× expanded at night), engine pools are used for certain heavy users:
Option 1 (2 engines, 48GB RAM each): Under nighttime peak, the few engines become bottlenecks, prone to GC pauses; often an engine hangs on GC causing jobs to be delayed by 1-2 hours and severely increasing night-time operational burden.
Option 2 (4 engines, 32GB RAM each): Can barely handle nighttime but engines remain occupied during daytime due to ongoing sessions and Spark Dynamic Allocation executor caching, wasting resources and affecting other non-Kyuubi jobs.
This illustrates the limitations of static pools facing dynamic resources and varying loads, underscoring the urgent need for automated elastic scaling.
Goals and Benefits:
Improve Resource Efficiency: Automatically shrink pool size during low load to free idle resources and cut costs.
Enhance System Resilience: Expand pool size proactively in high load to promptly respond to user demand, ensuring service performance and availability.
Increase Adaptability: Enable Kyuubi Engine Pools to automatically adapt to periodic or bursty workload fluctuations.
Simplify Operations: Reduce manual intervention and management complexity with automated scaling.
We introduce a set of new components and configurations to enable dynamic resizing of Engine Pools. The core architecture revolves around instantiating an EnginePoolManager for each pool (or sub-pool) requiring dynamic scaling. This manager periodically runs (as per scaling.interval), computes the target size through a configurable PoolScalingStrategy, and interacts with the concrete pool implementation via an EnginePoolAccessor to carry out scale-up or graceful scale-down operations. A cooldown period is enforced to stabilize scaling, and detailed metrics are exposed through PoolMetrics.
Core Components
EnginePoolManager
Responsibilities: Manages the dynamic scaling lifecycle for a single Engine Pool identified by poolIdentifier (e.g., user/group key). Runs periodic scaling checks using a scheduled executor. It retrieves the current pool size and optional metrics via EnginePoolAccessor, calculates the desired size via PoolScalingStrategy, respects the cooldown period (bypassing scaling if within cooldown), and triggers resize operations as needed. It logs and reports scaling events, target and actual sizes, latencies, and errors to PoolMetrics and logs.
Lifecycle: Tied to the Engine Pool instance in the Kyuubi server, created and started on server/pool startup, and gracefully stopped on shutdown or pool destruction.
PoolScalingStrategy (Pluggable Interface)
Responsibilities: Defines the core logic for computing the target pool size. Must be stateless or serializable if needed for configuration distribution. Receives a PoolContext with pool ID, current time, current size, min/max bounds, and optional load/performance metrics collected from the pool. Returns a desired target size, ideally within min/max bounds (final bounds enforcement is done by EnginePoolManager).
Allows users to implement and plug in custom scaling algorithms.
EnginePoolAccessor (Interface to the Pool Implementation)
Responsibilities: Abstracts interaction with the concrete Engine Pool implementations (EnginePool, UserGroupAwareEnginePool, etc.). Provides:
Precise retrieval of the current effective pool size (excluding starting or pending-removal engines).
Execution of scale-up commands (for example, creating new engines asynchronously).
Collection of internal metrics useful to scaling decisions (active sessions, pending sessions, idle engines, pending removal counts, etc.).
PoolMetrics Interface
Responsibilities: Defines APIs to report and monitor dynamic scaling activities such as current and target pool sizes, scaling events (scale-ups and downs), scaling latencies, and errors.
A default implementation will integrate with Kyuubi’s existing MetricsSystem to register gauges, counters, timers, etc., with appropriate labels to distinguish pools.
sequenceDiagram
title Dynamic Scaling Check Sequence
participant S as Scheduler (in Manager)
participant EPM as EnginePoolManager
participant EPA as EnginePoolAccessor
participant PSS as PoolScalingStrategy
participant PM as PoolMetrics
S ->>+ EPM: Trigger Scaling Check (Every Interval)
EPM ->> EPM: Check Cooldown Period
opt Cooldown Active
EPM -->> S: Skip Check (In Cooldown)
end
EPM ->>+ EPA: getCurrentSize()
EPA -->>- EPM: currentSize
EPM ->> PM: recordPoolSize(currentSize)
EPM ->>+ EPA: collectMetrics()
EPA -->>- EPM: poolMetricsMap
EPM ->> EPM: Create PoolContext(currentSize, poolMetricsMap, ...)
EPM ->>+ PSS: calculateTargetSize(context)
PSS -->>- EPM: targetSizeRaw
EPM ->> EPM: Clamp targetSize = max(minSize, min(maxSize, targetSizeRaw))
EPM ->> PM: recordTargetPoolSize(targetSize)
alt targetSize != currentSize
EPM ->>+ EPA: resize(targetSize)
Note right of EPA: Initiates async scale-up or<br/>graceful scale-down
EPA -->>- EPM: Resize Requested (returns)
EPM ->> EPM: Update lastScalingTimestamp
EPM ->> PM: recordScalingEvent(currentSize, targetSize)
else targetSize == currentSize
EPM ->> EPM: Log "No scaling needed"
end
EPM ->> PM: recordScalingLatency(...)
EPM -->>- S: Check Complete
Loading
Additional context
This is an initial proposal aiming to address the dynamic scaling capabilities of the Engine Pool in Kyuubi. The design and implementation details are still open for discussion. I sincerely welcome feedback, suggestions, and any improvements from the community to help refine and make this feature more robust and aligned with real-world needs. Looking forward to collaborating with everyone!
Are you willing to submit PR?
Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
No. I cannot submit a PR at this time.
The text was updated successfully, but these errors were encountered:
Code of Conduct
Search before asking
Describe the feature
This feature aims to introduce dynamic resizing (scaling up and down) capabilities to Kyuubi's Engine Pool mechanism. By introducing a pluggable
PoolScalingStrategy
, a coordinator managerEnginePoolManager
, an accessor interfaceEnginePoolAccessor
to interact with specific pool implementations, and corresponding monitoring metricsPoolMetrics
, the size of the Engine Pool can automatically adjust based on predefined policies—such as time-based rules and, potentially in the future, load-based strategies.Motivation
Current Situation:
Kyuubi’s Engine Pool pre-starts and manages a set of engine instances for specific users or groups to reduce session creation latency. Currently, the size of each pool is statically configured via
kyuubi.engine.pool.size
, requiring administrators to manually set and adjust this size based on expected loads.Problems:
Real-World Scenario:
We have a big data cluster submitting jobs via Kyuubi with day-night resource elasticity (e.g., 8–10 TB memory resident during the day and roughly 1.5× expanded at night), engine pools are used for certain heavy users:
This illustrates the limitations of static pools facing dynamic resources and varying loads, underscoring the urgent need for automated elastic scaling.
Goals and Benefits:
Describe the solution
We introduce a set of new components and configurations to enable dynamic resizing of Engine Pools. The core architecture revolves around instantiating an
EnginePoolManager
for each pool (or sub-pool) requiring dynamic scaling. This manager periodically runs (as perscaling.interval
), computes the target size through a configurablePoolScalingStrategy
, and interacts with the concrete pool implementation via anEnginePoolAccessor
to carry out scale-up or graceful scale-down operations. A cooldown period is enforced to stabilize scaling, and detailed metrics are exposed throughPoolMetrics
.Core Components
EnginePoolManager
poolIdentifier
(e.g., user/group key). Runs periodic scaling checks using a scheduled executor. It retrieves the current pool size and optional metrics viaEnginePoolAccessor
, calculates the desired size viaPoolScalingStrategy
, respects the cooldown period (bypassing scaling if within cooldown), and triggers resize operations as needed. It logs and reports scaling events, target and actual sizes, latencies, and errors toPoolMetrics
and logs.PoolScalingStrategy
(Pluggable Interface)PoolContext
with pool ID, current time, current size, min/max bounds, and optional load/performance metrics collected from the pool. Returns a desired target size, ideally within min/max bounds (final bounds enforcement is done byEnginePoolManager
).EnginePoolAccessor
(Interface to the Pool Implementation)EnginePool
,UserGroupAwareEnginePool
, etc.). Provides:PoolMetrics
InterfaceMetricsSystem
to register gauges, counters, timers, etc., with appropriate labels to distinguish pools.Additional context
This is an initial proposal aiming to address the dynamic scaling capabilities of the Engine Pool in Kyuubi. The design and implementation details are still open for discussion. I sincerely welcome feedback, suggestions, and any improvements from the community to help refine and make this feature more robust and aligned with real-world needs. Looking forward to collaborating with everyone!
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: