-
Notifications
You must be signed in to change notification settings - Fork 823
Add proposal for tenant limits API #6818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add proposal for tenant limits API #6818
Conversation
Hello, this has been something that has been bothering me and my team for a while. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. it's a great idea
Signed-off-by: Bogdan Stancu <[email protected]>
Signed-off-by: Bogdan Stancu <[email protected]>
Signed-off-by: Bogdan Stancu <[email protected]>
2ca6d4b
to
e2eca89
Compare
} | ||
``` | ||
|
||
#### 2. PUT /api/v1/user-limits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limits are managed by the runtime-config which is either stored on a volume backed by a config map or in from an S3/gcs/azure bucket.
- How would this API work in the former case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The end goal is to remove admin intervention for user limits. My initial idea was writing to either the config map or the s3/gcs/azure bucket but I'm not 100% sure of all the implications, other than requiring more access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. @bogdan-at-adobe Configmap are normally readonly. Put it in the spec that the API will not support configmaps, only block storage backends.
### Endpoints | ||
|
||
#### 1. GET /api/v1/user-limits | ||
Returns the current limits configuration for a specific tenant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the limits are loaded periodically in an interval. Would this API read the config directly from storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reading from the currently loaded config is fine enough for this. Using the api to make changes will also trigger a reload of the loaded limits so the only issue I see would be changing the config manually and waiting for it to get reloaded which will lead to a wrong answer from the api for 10 seconds max (assuming the default), change that is probably made by an admin and is aware of this implication. I might be wrong on this. GET /runtime_config
endpoint makes the same assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer is yes, s3 is our only source of truth. Similar to how Alertmanager cortex API works.
|
||
### Endpoints | ||
|
||
#### 1. GET /api/v1/user-limits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which component in Cortex will serve this API? Maybe a new admin service in Cortex for this purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as @friedrichg said
We already have a component that reads limits, so it's perfect for this use case.
So my guess is that the cortex-overrides is a good place. Looking at the fact that the GET /runtime_config
is on all components I don't see a reason why the limits api wouldn't be the same though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only want this in the cortex overrides, no need to put In the other components.
Signed-off-by: Bogdan Stancu <[email protected]>
7990645
to
3544ba3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good so far.
In terms of design, I have a question on soft vs hard limits. So far all our limits are hard and this API modifies those limits.
But what if some malicious tenant tries to set unreasonable limits for their tenant? do we allow it? or do we also think of a way to prevent tenants from doing this in the API?
You can answer the question or leave it up as an open question. I think it would need to be tackled eventually
I think we need to have some validations on this API for checking whether a limit update is within a safe ranges. Also, not all limits should be allowed to be modified through this API. Limits like shard_size should only be modified by the admin |
I agree that many limits should only be modified by an admin, we use the tiers defined in the cortex-jsonnet repo and I wrote this proposal with those in mind, there have been few occasions when users needed limit increases other than those. Related to setting unreasonable limits, I taught a bit about this and, at least in our use case, defining "reasonable" would be pretty hard. It might be perfectly reasonable for a huge user to double their data overnight and not very reasonable for a small one to do the same thing. |
@@ -0,0 +1,71 @@ | |||
--- | |||
title: "Limits API" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
title: "Limits API" | |
title: "Overrides API" |
@@ -0,0 +1,71 @@ | |||
--- | |||
title: "Limits API" | |||
linkTitle: "Limits API" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linkTitle: "Limits API" | |
linkTitle: "Overrides API" |
|
||
### Endpoints | ||
|
||
#### 1. GET /api/v1/user-limits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### 1. GET /api/v1/user-limits | |
#### 1. GET /api/v1/user-overrides |
I just noticed you keep saying limits, but this is more about overrides. That is the name in cortex https://cortexmetrics.io/docs/guides/overrides-exporter/
call it hard overrides
configurable-overrides or hard-overrides. I don't know which one communicates better the situation . |
What about defining a quota unit (the default values) and keeping the "hard limit" as an integer for how many quota units a user can reach? |
I believe you mean increasing quota would increase a couple of limits. But I think there is a misunderstanding. Overrides is more than just limits, it's configuration like DisabledRuleGroups and OutOfOrderTimeWindow. |
What this PR does:
This PR adds a proposal for a tenant limits API.
Which issue(s) this PR fixes:
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]