Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Kuttl E2E tests to Ginkgo #877

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

jgwest
Copy link
Collaborator

@jgwest jgwest commented Apr 9, 2025

I have ported the gitops-operator kuttl sequential/parallel E2E tests to Ginkgo.

It was a very careful translation, maintaining all existing behaviour and test names. This should make it easy to verify that the translation was completed successfully.

I have also created test fixtures (utility functions) which makes it easy to write flexible, maintainable, reliable tests.

This work includes:

  • I've added detailed descriptions of what the test is testing, plus individual test step comments in the tests themselves
  • Fixed intermittent race conditions when I found them, including removing sleep statements from tests (and replacing them with Eventually/Consistently)
  • Audited what each tests does to make sure it makes sense
  • Tests are still fast:
    • Sequential: (JGW: need final measurement)
    • Parallel: (JGW: need final measurement)
  • Environment variables can be used to customize which tests run:
    • NON_OLM: will skip tests that require OLM (this is similar to the non-olm script file, but is built into the test itself)
    • LOCAL_RUN: run the tests against gitops-operator running via 'make run'
    • SKIP_HA_TESTS: will skip tests that require a cluster with >= 3 nodes
    • E2E_DEBUG_SKIP_CLEANUP: will prevent test cleanup at the end of a test, making it easier to debug failures
    • You can specify multiple variables at the same time.
  • I've fixed the problem we were seeing in E2E tests: sometimes when certain test fails on GitHub, but there is no log output because they have been flagged as containing sensitive data (and thus removed).
  • Kuttl tests were still using v1alpha1, I've switched to use v1beta1 API
  • Switched to using quay.io instead of docker hub for some test images, because docker hub causes intermittent failures due to rate limiting.

Try it out for yourself:

A) Run tests against OpenShift GitOps installed via OLM

# 1) Get a cluster from cluster bot, for example, `rosa create 4.17` (for ROSA hypershift cluster with 2 nodes)
# 2) Login and install OpenShift GitOps v1.16 on it via OLM

cd /tmp
git clone [email protected]:jgwest/gitops-operator
cd gitops-operator
git checkout port-e2e-to-ginkgo-feb-2025

# 3a) Run Sequential tests
SKIP_HA_TESTS=true  make e2e-tests-sequential-ginkgo
# If you are using a non-hypershift cluster you can remove SKIP_HA_TESTS var (some of our tests enable redis HA, which require >=3 worker nodes)

# 3b) Run Parallel tests
SKIP_HA_TESTS=true  make e2e-tests-parallel-ginkgo
# As above, can remove SKIP_HA_TESTS if you are on non-hypershift

B) Run E2E tests against local operator (operator running via make run)

# 1) Start operator locally
CLUSTER_SCOPED_ARGO_ROLLOUTS_NAMESPACES=argo-rollouts,test-rom-ns-1,rom-ns-1,openshift-gitops  make run 
# Add 'ARGOCD_REDIS_IMAGE=quay.io/fedora/redis-7@sha256:0217e8a80a03c3e43c716a1c137e5e615c6bc583605a754000f6efa382349a5b' if you hit docker hub rate-limiting

# 2) Start tests in LOCAL_RUN mode (this skips tests that require Subscription or CSVs)
LOCAL_RUN=true  make e2e-tests-sequential-ginkgo
LOCAL_RUN=true  make e2e-tests-parallel-ginkgo
# Not all tests are supported when running locally. See 'Skip' messages for details.

C) Run a specific test:

# 'make ginkgo' to download ginkgo, if needed
# Examples:
./bin/ginkgo -v -focus "1-106_validate_argocd_metrics_controller"  -r ./test/openshift/e2e/ginkgo/sequential
./bin/ginkgo -v -focus "1-099_validate_server_autoscale"  -r ./test/openshift/e2e/ginkgo/parallel

Notes

Additional Notes:

  • In parallel, there are 2 '1-063' tests
  • I did not port 'sequential 1-077_validate_workload_status_monitoring_alert' because it is commented out, and has been for the last 2 years
  • I did not port 'sequential 1-084_validate_prune_templates' because it is disabled, and the fxnality is not used anymore?
  • I did not port 'parallel 1-098_validate_dex_clientsecret' because it has the exact same content as 'parallel 1-095_validate_dex_clientsecret'
  • 'parallel 1-058_validate_prometheus_rule' has the name of 'validate prometheus rule', but it doesn't actually do anything with prometheus?
  • These tests are up to date with master branch kuttl tests as of April 4th, 2025.

Blocked:

  • I could not port 'sequential 1-041_validate_argocd_sync_alert' because we're using an extremely old version of prometheus operator PR. We need to fix this first.

I've not moved any tests to/from parallel/sequential, but...

Recommendations:

I recommend moving these tests from parallel to sequential:

  • 'parallel 1-052_validate_rolebinding_number': Since this test indirectly touches openshift-gitops, arguably it should not be parallel. It might break other parallel tests that run at the same time.
  • 'parallel 1-074_validate_terminating_namespace_block': Since this test is specifically designed to see if the entire operator can be blocked, arguably it should not be in parallel.
  • 'parallel 1-071_validate_SCC_HA': This creates a cluster-scoped resource, and thus should probably be sequential
  • 'parallel 1-102_validate_handle_terminating_namespaces': Since this test is specifically designed to see if the entire operator can be blocked, arguably it should not be in parallel.
  • 'parallel 1-105_validate_default_argocd_route': This references and modifies 'openshift-gitops' namespace resources, and thus should not be in parallel.

I recommend moving these tests from sequential to parallel:

  • 'sequential 1-050_validate_sso': AFAICT this test doesn't touch any cluster-scoped resources, and thus is a good candidate to run in parallel.
  • 'sequential 1-107_validate_redis_ssc': AFAICT this test doesn't touch any cluster-scoped resources, and thus is a good candidate to run in parallel.

(I've not acted on these recommendations as part of this PR, in order to reduce friction for merging the PR)

Copy link

openshift-ci bot commented Apr 9, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Apr 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jannfis for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jgwest jgwest force-pushed the port-e2e-to-ginkgo-feb-2025 branch from c3bf477 to d8377e8 Compare April 11, 2025 04:22
@jgwest
Copy link
Collaborator Author

jgwest commented Apr 11, 2025

/test all

@jgwest
Copy link
Collaborator Author

jgwest commented Apr 11, 2025

/test all

@jgwest
Copy link
Collaborator Author

jgwest commented Apr 14, 2025

/test all

Copy link

openshift-ci bot commented Apr 14, 2025

@jgwest: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/v4.14-kuttl-parallel 5406f71 link true /test v4.14-kuttl-parallel
ci/prow/v4.17-e2e 5406f71 link true /test v4.17-e2e
ci/prow/v4.14-e2e 5406f71 link true /test v4.14-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant