Skip to content

✨ add cost budget, runtime cost estimator and metrics #964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

haoqing0110
Copy link
Member

@haoqing0110 haoqing0110 commented Apr 23, 2025

Summary

Related issue(s)

#843
#955

Per expression cost limit exceeded

I0424 09:55:58.810235       1 cel.go:175] "Expression evaluation failed" rule="managedCluster.metadata.labels[\"test\"] == \"true\"" cluster="cluster1" err="operation cancelled: actual cost limit exceeded"

Total expression cost budget exceeded

I0424 09:21:39.903577       1 cel.go:206] "Cost budget exceeded" rule="managedCluster.metadata.labels[\"test\"] == \"true\"" cost=5 budget=0

Metrics

# HELP scheduling_cel_runtime_duration_seconds [ALPHA] How long in seconds CEL expressions validation runs for a placement.
# TYPE scheduling_cel_runtime_duration_seconds histogram
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="1e-06"} 0
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="9.999999999999999e-06"} 0
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="9.999999999999999e-05"} 3
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="0.001"} 6
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="0.01"} 8
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="0.1"} 8
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="1"} 8
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="10"} 8
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="100"} 8
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="1000"} 8
scheduling_cel_runtime_duration_seconds_bucket{name="placement_scheduling",le="+Inf"} 8
scheduling_cel_runtime_duration_seconds_sum{name="placement_scheduling"} 0.0037694679999999998
scheduling_cel_runtime_duration_seconds_count{name="placement_scheduling"} 8

Copy link

codecov bot commented Apr 23, 2025

Codecov Report

Attention: Patch coverage is 84.26966% with 14 lines in your changes missing coverage. Please review.

Project coverage is 64.34%. Comparing base (ad8de01) to head (f8d82c0).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/placement/helpers/cel.go 83.33% 11 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #964      +/-   ##
==========================================
+ Coverage   64.27%   64.34%   +0.06%     
==========================================
  Files         194      194              
  Lines       19015    19080      +65     
==========================================
+ Hits        12222    12277      +55     
- Misses       5784     5793       +9     
- Partials     1009     1010       +1     
Flag Coverage Δ
unit 64.34% <84.26%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@haoqing0110
Copy link
Member Author

/assign @qiujian16 @zhujian7

@haoqing0110 haoqing0110 changed the title WIP ✨ add cost budget, runtime cost estimator and metrics ✨ add cost budget, runtime cost estimator and metrics Apr 28, 2025
@haoqing0110 haoqing0110 force-pushed the br_cel-estimator branch 2 times, most recently from 25c4439 to 7d9f072 Compare April 29, 2025 03:02
@@ -63,7 +66,7 @@ func (c *ClusterSelector) Matches(ctx context.Context, cluster *clusterapiv1.Man

// match with cel selector if exists
if c.celSelector != nil {
if ok := c.celSelector.Validate(ctx, cluster); !ok {
if ok, _ := c.celSelector.Validate(ctx, cluster, celconfig.RuntimeCELCostBudget); !ok {
Copy link
Member

@qiujian16 qiujian16 Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need this as an input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for UT to input a small budget and test "running out of cost budget" case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it seems like a global var that we can also set during ut?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Modified this part to set CostBudget as a global var.

@@ -29,7 +32,7 @@ func NewClusterSelector(selector clusterapiv1beta1.ClusterSelector, env *cel.Env
return nil, err
}
// build cel selector
celSelector := NewCELSelector(env, selector.CelSelector.CelExpressions)
celSelector := NewCELSelector(env, selector.CelSelector.CelExpressions, metricsRecorder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a general metrics for predicate? we should add a TODO if we do not have one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haoqing0110 haoqing0110 force-pushed the br_cel-estimator branch 2 times, most recently from 3caa280 to f8d82c0 Compare April 30, 2025 03:40
@qiujian16
Copy link
Member

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 30, 2025
Copy link
Contributor

openshift-ci bot commented Apr 30, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haoqing0110, qiujian16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit df87f52 into open-cluster-management-io:main Apr 30, 2025
15 checks passed
@haoqing0110 haoqing0110 deleted the br_cel-estimator branch April 30, 2025 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants