Skip to content

CORS-3959, CORS-3864: CAPI-based AzureStack Installs #9645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 15, 2025

Conversation

patrickdillon
Copy link
Contributor

Adds support for installing to Azure Stack using CAPZ.

Utilizes an openshift fork: openshift/cluster-api-provider-azurestack based on kubernetes-sigs/cluster-api-provider-azure#5532. It is simple enough to build the controller from the fork, but in order to pick up the API changes I introduced the forked API (adds a single ARMEnvironment field to the ClusterClass) as a subpackage. See 7352698 and 3ab4beb.

The other commits handle the azure stack specifics and are described in the commit messages. Particularly, API Versions need to be updated to compatible versions, older SDKs need to be used to upload blobs, and we need a process for creating managed images, as Azure Stack does not support image galleries.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 8, 2025

@patrickdillon: This pull request references CORS-3959 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

This pull request references CORS-3864 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Adds support for installing to Azure Stack using CAPZ.

Utilizes an openshift fork: openshift/cluster-api-provider-azurestack based on kubernetes-sigs/cluster-api-provider-azure#5532. It is simple enough to build the controller from the fork, but in order to pick up the API changes I introduced the forked API (adds a single ARMEnvironment field to the ClusterClass) as a subpackage. See 7352698 and 3ab4beb.

The other commits handle the azure stack specifics and are described in the commit messages. Particularly, API Versions need to be updated to compatible versions, older SDKs need to be used to upload blobs, and we need a process for creating managed images, as Azure Stack does not support image galleries.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 8, 2025
@openshift-ci openshift-ci bot requested review from rna-afk and rwsu April 8, 2025 20:51
@patrickdillon
Copy link
Contributor Author

/uncc @rwsu
/cc @jhixson74 @tthvo @sadasu @barbacbd

@openshift-ci openshift-ci bot requested review from barbacbd, jhixson74, sadasu and tthvo and removed request for rwsu April 8, 2025 20:52
@patrickdillon
Copy link
Contributor Author

Upstream bug in cloud-provider-azure prevents CCCMO from starting. This fix in our fork resolves: openshift/cloud-provider-azure#141

@patrickdillon
Copy link
Contributor Author

Oops i screwed up the dns api version in public azure. Can fix tomorrow

@patrickdillon patrickdillon force-pushed the azurestack-mark-iii branch 2 times, most recently from 9ff3814 to 53a47f5 Compare April 10, 2025 18:30
@patrickdillon
Copy link
Contributor Author

Good news is ci is up.

Bad news is that it's turning up a totally unexpected panic. weird.

/test e2e-azurestack

@patrickdillon
Copy link
Contributor Author

patrickdillon commented Apr 11, 2025

The panic seems ok. I completely broke the controller with my latest changes. So just need to figure that out.

@patrickdillon
Copy link
Contributor Author

the problem was that i didn't enable the feature gate to produce capi manifests

@patrickdillon
Copy link
Contributor Author

Ok latest CI run looks better, but we're hitting quota issues in the ci subscription. I don't think I have access to that subscription:

      virtualmachine failed to create or update. err: failed to create or update resource ci-op-fmgzhtdg-4055a/ci-op-fmgzhtdg-4055a-9l9vn-master-0 (service: virtualmachine): PUT https://management.mtcazs.wwtatc.com/subscriptions/de7e09c3-b59a-4c7d-9c77-439c11b92879/resourceGroups/ci-op-fmgzhtdg-4055a/providers/Microsoft.Compute/virtualMachines/ci-op-fmgzhtdg-4055a-9l9vn-master-0
      --------------------------------------------------------------------------------
      RESPONSE 409: 409 Conflict
      ERROR CODE: OperationNotAllowed
      --------------------------------------------------------------------------------
      {
        "error": {
          "code": "OperationNotAllowed",
          "message": "Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details - Deployment Model: Resource Manager, Location: mtcazs, Current Limit: 224, Current Usage: 224, Additional Required: 8, (Minimum) New Limit Required: 232. Please read more about quota increase at https://docs.microsoft.com/en-us/azure/azure-supportability/regional-quota-requests"
        }
      }

@patrickdillon
Copy link
Contributor Author

...and again my access to the azurestack portal has been cut off. I reached out to the vendor.

In the meantime I'll just blindly try another run and hope quota has been cleaned up

/test e2e-azurestack

@tthvo
Copy link
Member

tthvo commented Apr 11, 2025

Hmm, test proceeded well but bootstrap timeout for whatever reason :D

/test e2e-azurestack

AzureStack requires a preloaded image, so skip the image upload
on ASH environments.
Azure V2 block blob storage SDKs are not compatible with Azure Stack.
Creates a separate code path for Azure Stack to utilize the original
SDK.
Adds compatibility for Azure Stack when generating CAPZ machine
manifests. The API for Azure Stack machines is consistent with CAPZ
(no fork as is this case for cluster). The changes here are due
to unsupported features and bugs, such as:

- no managed boot diagnostics, so user managed only
- FailureDomain must be nil to trigger availability sets, not empty
- Setting the flag correctly to distinguish between managed images
and image galleries.
Azure Stack does not support compute galleries. Create managed images
utilizing the SDK instead.
Selects and runs the CAPZASH controller when using Azure Stack.
Locally copies the types from the azure stack fork into the repo,
so that the additional ARMEndpoint field in the cluster spec can
be used in Azure Stack installs.

The type is copied locally so that we can continue to update
cluster-api-provider-azure without needing to continually rebase
the azure stack fork.
Vendors the openshift azurestack capi provider fork, so that we
can run this controller when performing an azurestack install.

The API fork is committed directly in the repo, so we do not need
to vendor the module into the installer directly.
Some of the calls we depend on for Azure cluster generation are not
available in AzureStack. Adds a dumber way of selecting IP for
the load balancer.
go mod tidy && go mod vendor
@jhixson74
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2025
@patrickdillon
Copy link
Contributor Author

/test e2e-azurestack

VMs failed to create on last run, but not clear why.

I also was looking at one small change to the commit structure, so I may need one more push

@patrickdillon
Copy link
Contributor Author

Machines are failing due to quota:

          "code": "OperationNotAllowed",
          "message": "Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details - Deployment Model: Resource Manager, Location: mtcazs, Current Limit: 224, Current Usage: 220, Additional Required: 8, (Minimum) New Limit Required: 228. Please read more about quota increase at https://docs.microsoft.com/en-us/azure/azure-supportability/regional-quota-requests"

@patrickdillon
Copy link
Contributor Author

Let's move forward with this version. I'm just as liable to screw something up trying to rebase all these commits as to make any minor improvements. If there is anything really that needs fixing we can fix it post.

/approve

Copy link
Contributor

openshift-ci bot commented Apr 15, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: patrickdillon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2025
@sadasu
Copy link
Contributor

sadasu commented Apr 15, 2025

APIVersion: "2019-06-01" is sprinkled throughout, can we define a constant in pkg/types/azure?

Hm I only see two usages, one is the constant definition in pkg/infrastructure and then one is in a comment.

It might be me looking at the same code appearing in multiple commits. Nevermind.

Copy link
Contributor

openshift-ci bot commented Apr 15, 2025

@patrickdillon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azurestack 46beec0 link false /test e2e-azurestack
ci/prow/e2e-vsphere-ovn-multi-network 46beec0 link false /test e2e-vsphere-ovn-multi-network
ci/prow/okd-scos-e2e-aws-ovn 46beec0 link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-azure-ovn-shared-vpc 46beec0 link false /test e2e-azure-ovn-shared-vpc

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 63e0c35 into openshift:main Apr 15, 2025
27 of 31 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-terraform-providers
This PR has been included in build ose-installer-terraform-providers-container-v4.19.0-202504152221.p0.g63e0c35.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-altinfra
This PR has been included in build ose-installer-altinfra-container-v4.19.0-202504152221.p0.g63e0c35.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-baremetal-installer
This PR has been included in build ose-baremetal-installer-container-v4.19.0-202504152221.p0.g63e0c35.assembly.stream.el9.
All builds following this will include this PR.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: ose-installer-artifacts
This PR has been included in build ose-installer-artifacts-container-v4.19.0-202504152221.p0.g63e0c35.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants