-
Notifications
You must be signed in to change notification settings - Fork 20
Allow PCUI to support multiple Pcluster versions #418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… stack for each version and handle the mapping of version to invoke url
… load version when creating cluster from an existing cluster
since this PR is changing the frontend, could you please attach screenshot(s) that represent the most relervant changes? |
@@ -32,13 +32,14 @@ | |||
USER_POOL_ID = os.getenv("USER_POOL_ID") | |||
AUTH_PATH = os.getenv("AUTH_PATH") | |||
API_BASE_URL = os.getenv("API_BASE_URL") | |||
API_VERSION = os.getenv("API_VERSION", "3.1.0") | |||
API_VERSION = sorted(os.getenv("API_VERSION", "3.1.0").split(","), key=lambda x: [-int(n) for n in x.split('.')]) | |||
DEFAULT_API_VERSION = API_VERSION[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are setting the latest PC version as the default one.
The user should be able to control the default version, but with this solution is not.
Example: a user wants to try a new version of PC, but keeping the default version to the one that they consider the most stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because for the home page where all of the clusters are displayed, it uses the list clusters API of the default version.
Currently, if the supported API version is larger than an unsupport version of one of the clusters, then you will be able to see some of its information, there will just be a warning saying that you can not edit the cluster. But is the supported API version it less than the unsupported version, you won't be able to see any cluster information and will getting an error message saying that the version is not supported.
The reason for the sorting/default version being the largest version is so that for example if the supported versions for 3.13.0 and 3.11.0, you will be able to see the information for a 3.12.0 cluster no matter the order you put in the supported versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the explanation. Please put a comment explaining why the default version must be the most recent one.
api/PclusterApiHandler.py
Outdated
@@ -32,13 +32,14 @@ | |||
USER_POOL_ID = os.getenv("USER_POOL_ID") | |||
AUTH_PATH = os.getenv("AUTH_PATH") | |||
API_BASE_URL = os.getenv("API_BASE_URL") | |||
API_VERSION = os.getenv("API_VERSION", "3.1.0") | |||
API_VERSION = sorted(os.getenv("API_VERSION", "3.1.0").split(","), key=lambda x: [-int(n) for n in x.split('.')]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the implications of this sorting?
Is it used only to determine the default version in the line below?
Also, I suggest to trim blank spaces from the string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained the sorting in the comment above
api/PclusterApiHandler.py
Outdated
API_USER_ROLE = os.getenv("API_USER_ROLE") | ||
OIDC_PROVIDER = os.getenv("OIDC_PROVIDER") | ||
CLIENT_ID = os.getenv("CLIENT_ID") | ||
CLIENT_SECRET = os.getenv("CLIENT_SECRET") | ||
SECRET_ID = os.getenv("SECRET_ID") | ||
SITE_URL = os.getenv("SITE_URL", API_BASE_URL) | ||
SITE_URL = os.getenv("SITE_URL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should't this be the API URL of the default version?
api/PclusterApiHandler.py
Outdated
else: | ||
info_resp = sigv4_request("GET", API_BASE_URL, url) | ||
info_resp = sigv4_request("GET", get_base_url(request.args.get("version")), url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: avoid duplication. read the target version ones from the request
api/PclusterApiHandler.py
Outdated
@@ -735,14 +744,19 @@ def _get_params(_request): | |||
params.pop("path") | |||
return params | |||
|
|||
def get_base_url(v): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: v
-> version
api/PclusterApiHandler.py
Outdated
@@ -62,6 +63,14 @@ | |||
if not JWKS_URL: | |||
JWKS_URL = os.getenv("JWKS_URL", | |||
f"https://cognito-idp.{REGION}.amazonaws.com/{USER_POOL_ID}/" ".well-known/jwks.json") | |||
API_BASE_URL_MAPPING = {} | |||
|
|||
if API_BASE_URL: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May you please wrap this logic into a dedicated function and unit test it?
frontend/locales/en/strings.json
Outdated
"version": { | ||
"label": "Cluster Version", | ||
"title": "Version", | ||
"placeholder": "Select your cluster version", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessayr to repeat "Select your" in a placeholder. Could simply be "Cluster version"
api/PclusterApiHandler.py
Outdated
@@ -484,7 +493,7 @@ def get_dcv_session(): | |||
|
|||
|
|||
def get_custom_image_config(): | |||
image_info = sigv4_request("GET", API_BASE_URL, f"/v3/images/custom/{request.args.get('image_id')}").json() | |||
image_info = sigv4_request("GET", get_base_url(request.args.get("version")), f"/v3/images/custom/{request.args.get('image_id')}").json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good practice to specify the request param name version
in a constant.
const ARG_VERSION="version"
request.args.get(ARG_VERSION)
We did not respect this best practice elsewhere (eg: region), but we can start applying on new code.
api/PclusterApiHandler.py
Outdated
@@ -233,9 +242,9 @@ def ec2_action(): | |||
def get_cluster_config_text(cluster_name, region=None): | |||
url = f"/v3/clusters/{cluster_name}" | |||
if region: | |||
info_resp = sigv4_request("GET", API_BASE_URL, url, params={"region": region}) | |||
info_resp = sigv4_request("GET", get_base_url(request.args.get("version")), url, params={"region": region}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the request is missing the version patameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the get_base_url function checks if the version has been passed in
@@ -32,13 +32,14 @@ | |||
USER_POOL_ID = os.getenv("USER_POOL_ID") | |||
AUTH_PATH = os.getenv("AUTH_PATH") | |||
API_BASE_URL = os.getenv("API_BASE_URL") | |||
API_VERSION = os.getenv("API_VERSION", "3.1.0") | |||
API_VERSION = sorted(os.getenv("API_VERSION", "3.1.0").split(","), key=lambda x: [-int(n) for n in x.split('.')]) | |||
DEFAULT_API_VERSION = API_VERSION[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks for the explanation. Please put a comment explaining why the default version must be the most recent one.
@@ -34,7 +35,7 @@ Parameters: | |||
Version: | |||
Description: Version of AWS ParallelCluster to deploy. | |||
Type: String | |||
AllowedPattern: "^([0-9]+)\\.([0-9]+)\\.([0-9]+)$" | |||
AllowedPattern: "^([0-9]+)\\.([0-9]+)\\.([0-9]+)(,([0-9]+)\\.([0-9]+)\\.([0-9]+))*$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a limit to the maximum number of versions that a user can specify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what if a user specifies the same version twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the user specifies the sam number twice than it will just launch the api stack for that version once, because the api handler makes the list of versions a set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is not maximum
@@ -161,17 +162,6 @@ Conditions: | |||
- !Not [!Equals [!Ref SNSRole, ""]] | |||
UseNewCognito: | |||
!Not [ Condition: UseExistingCognito] | |||
UseNonDockerizedPCAPI: | |||
!Not [ Condition: UseDockerizedPCAPI] | |||
UseDockerizedPCAPI: !And |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a user specifies version <=3.5.0?
This is a breaking change unless you manage <= 3.5.0 in another way (does not seems so).
Can we avoid this change? Why are we forced to introduce it?
Fn::ForEach::ParallelClusterApi: | ||
- ApiVersion | ||
- !Split [",", !Ref Version] | ||
- ParallelClusterApi&{ApiVersion}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CFN implicitly transform the dotted version with something else?
I'm asking because I assiume you cannot use dots within a resource name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you use the & symbol instead of the # symbol when referencing the variable, it automatically removes the dots
- ParallelClusterApi&{ApiVersion}: | ||
Type: AWS::CloudFormation::Stack | ||
Properties: | ||
Parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove ImageBuilderSubnetId
and ImageBuilderVpcId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if we only support PC version >3.5.0, than all will use the NonDockerizedPCAPI, which has those two variables set to !Ref AWS::NoValue
reason = "Failed {}: {}".format(event["RequestType"], e) | ||
|
||
Timeout: 300 | ||
MemorySize: 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a very low mem size. What's the maximum mem usage observed across 10 executions? Whatever it is, I suggest to double it. The impact on costs would be very low because this function is executed only on stack create/update/delete, but the potential impact in case of failures is bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mem usage is was 86. I will increase it.
Action: | ||
- cloudformation:ListStacks | ||
- cloudformation:DescribeStacks | ||
Resource: '*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we scope these permissions down? I think we can set the resource to the current stack name, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cannot scope this down using stack name, what about scoping this down using tags as per https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-requesttag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why leaving * as resource? Isn't it possible to specify the stack arn with a wildcard only on the stackname?
Resource: !Sub | ||
- arn:${AWS::Partition}:execute-api:${AWS::Region}:${AWS::AccountId}:${PCApiGateway}/*/* | ||
- { PCApiGateway: !Select [2, !Split ['/', !Select [0, !Split ['.', !GetAtt [ ParallelClusterApi, Outputs.ParallelClusterApiInvokeUrl ]]]]] } | ||
Resource: !Sub "arn:${AWS::Partition}:execute-api:${AWS::Region}:${AWS::AccountId}:*/*/*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a security vulnerabikity. We must scope this policy down.
If injecting APIG ARNs does not work b/c of CFN limitations, then we can do it setting conditions on tags https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-resourcetag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a working solution to restrict the permissions using tags.
- Add tag
aws-parallelcluster-ui:stack-id
equal to the PCUI stack id to the PCAPI nested stacks. - Add condition
Example:
ParallelClusterApi:
Type: AWS::CloudFormation::Stack
Properties:
...
Tags:
- Key: 'aws-parallelcluster-ui:stack-id'
Value: !Ref AWS::StackId
ParallelClusterApiGatewayInvoke:
Type: AWS::IAM::ManagedPolicy
Properties:
....
PolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- execute-api:Invoke
Effect: Allow
Resource: !Sub "arn:${AWS::Partition}:execute-api:${AWS::Region}:${AWS::AccountId}:*/*/*"
Condition:
StringEquals:
"aws:ResourceTag/aws-parallelcluster-ui:stack-id": !Ref 'AWS::StackId'
result = f"{result}{version}={output['OutputValue']}," | ||
|
||
parsed_url = urlparse(output['OutputValue']).hostname.split('.')[0] | ||
api_gateway_arns.append(f"arn:{get_partition(os.environ['AWS_REGION'])}:execute-api:{os.environ['AWS_REGION']}:{account_id}:{parsed_url}/*/*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address path traversal vulnerability, we recently made a change that must be reflected here (#422)
return ( | ||
<Select | ||
expandToViewport={true} | ||
selectedOption={{label: selectedVersion, value: selectedVersion}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding version selection for images as well. For clusters we added tests to cover this change. What about images?
gateway = urlparse(output['OutputValue']).hostname.split('.')[0] | ||
stage = output['OutputValue'].split('/')[3] | ||
api_gateway_arns.append(f"arn:{get_partition(os.environ['AWS_REGION'])}:execute-api:{os.environ['AWS_REGION']}:{account_id}:{gateway}/{stage}/*") | ||
print(f"API arn: {parsed_url}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is parsed_url? What about adding the ARN determined on line 325 to the next print and get rid of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that was leftover from testing.
The arns are listed in the log as the output so I will remove the print.
for output in stack_response['Stacks'][0]['Outputs']: | ||
if output['OutputKey'] == 'ParallelClusterApiInvokeUrl': | ||
# Construct the result string | ||
result = f"{result}{version}={output['OutputValue']}," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this concantenation you will end up with a trailing comma at the end of the result. Have you verified this does not have any side effect? Wha about getting rid of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has not side effect because but I will remove the trailing comma
for output in stack_response['Stacks'][0]['Outputs']: | ||
if output['OutputKey'] == 'ParallelClusterApiInvokeUrl': | ||
# Construct the result string | ||
result = f"{result}{version}={output['OutputValue']}," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codestyle: output['OutoputValue'] is repeated 4 times. what about improving readability assigning it to a more self explanatory variable api_url
instead?
reason = "Failed {}: {}".format(event["RequestType"], e) | ||
|
||
Timeout: 300 | ||
MemorySize: 256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don;'t remember if we already discussed this topic. However, what is the maximum amount of memory consumed that you observed. Is 256 way enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mem usage is was 86. so i think 256 should be more than enough.
Action: | ||
- cloudformation:ListStacks | ||
- cloudformation:DescribeStacks | ||
Resource: '*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cannot scope this down using stack name, what about scoping this down using tags as per https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-requesttag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Pr contains an approved breaking change: it breaks the support for PC <= 3.5.0. Let's highlight this in the PR description.
- !Sub "arn:${AWS::Partition}:cloudformation:${AWS::Region}:${AWS::AccountId}:stack/*" | ||
Condition: | ||
StringEquals: | ||
"aws:ResourceTag/parallelcluster:api-id": !Ref ApiGatewayRestApi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this tag set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 217 or parallelcluster-ui.yaml
TemplateURL: !Sub https://${AWS::Region}-aws-parallelcluster.s3.${AWS::Region}.amazonaws.com/parallelcluster/${ApiVersion}/api/parallelcluster-api.yaml | ||
TimeoutInMinutes: 30 | ||
Tags: | ||
- Key: 'parallelcluster:api-id' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag name is misleading as it seems to refer to ParallelCluster API id, wheres the id we are injecting here is the ID of ParallelCluster UI API. Can you please rename the tag to parallelcluster-ui:api-id
?
Description
Allow PCUI to support managing clusters of different PCUI versions
Changes
How Has This Been Tested?
Shell
DCV
andStop Fleet
work for only the supported versionsIn order to increase the likelihood of your contribution being accepted, please make sure you have read both the Contributing Guidelines and the Project Guidelines
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.