Skip to content

Commit f8302c8

Browse files
jlewik8s-ci-robot
authored andcommitted
Revert "Revert "Support subdirectories for junit files and grouping in test grid by name (kubeflow#490)" (kubeflow#493)" (kubeflow#494)
This reverts commit 81326be. * Relates to kubeflow#489 * Roll forward the orginal change now that kfctl_create_e2e_workflow.py has been updated to handle the extra argument. add a leading /
1 parent 81649a1 commit f8302c8

File tree

9 files changed

+735
-61
lines changed

9 files changed

+735
-61
lines changed

README.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
- [Argo UI](#argo-ui)
1212
- [Stackdriver logs](#stackdriver-logs)
1313
- [Debugging Failed Tests](#debugging-failed-tests)
14+
- [Logs and Cluster Access for Kubeflow CI](#logs-and-cluster-access-for-kubeflow-ci)
15+
- [Access Control](#access-control)
1416
- [No results show up in Gubernator](#no-results-show-up-in-gubernator)
1517
- [No Logs in Argo UI For Step or Pod Id missing in Argo Logs](#no-logs-in-argo-ui-for-step-or-pod-id-missing-in-argo-logs)
1618
- [Debugging Failed Deployments](#debugging-failed-deployments)
@@ -30,6 +32,9 @@
3032
- [Setting up a Kubeflow Repository to Use Prow <a id="prow-setup"></a>](#setting-up-a-kubeflow-repository-to-use-prow-a-idprow-setupa)
3133
- [Writing An Argo Workflow For An E2E Test](#writing-an-argo-workflow-for-an-e2e-test)
3234
- [Adding an E2E test to a repository](#adding-an-e2e-test-to-a-repository)
35+
- [Python function](#python-function)
36+
- [ksonnet](#ksonnet)
37+
- [Using pytest to write tests](#using-pytest-to-write-tests)
3338
- [Prow Variables](#prow-variables)
3439
- [Argo Spec](#argo-spec)
3540
- [Creating K8s resources in tests.](#creating-k8s-resources-in-tests)
@@ -176,6 +181,65 @@ gcloud one liners for fetching logs.
176181

177182
## Debugging Failed Tests
178183

184+
### Logs and Cluster Access for Kubeflow CI
185+
186+
Our tests are split across three projects
187+
188+
* **k8s-prow-builds**
189+
190+
* This is owned by the prow team
191+
* This is where the prow jobs run
192+
* We are working on changing this see [kubeflow/testing#475](https://github.com/kubeflow/testing/issues/475)
193+
194+
* **kubeflow-ci**
195+
196+
* This is where the Argo E2E workflows kicked off by the prow jobs run
197+
* This is where other Kubeflow test infra (e.g. various cron jobs run)
198+
199+
* **kubeflow-ci-deployment**
200+
201+
* This is the project where E2E tests actually create Kubeflow clusters
202+
203+
204+
#### Access Control
205+
206+
We currently have the following levels of access
207+
208+
* **ci-viewer-only**
209+
210+
* This is controlled by the group [ci-viewer](https://github.com/kubeflow/internal-acls/blob/master/ci-viewer.members.txt)
211+
212+
* This group basically grants viewer only access to projects **kubeflow-ci** and **kubeflow-ci-deployment**
213+
* This provides access to stackdriver for both projects
214+
215+
* Folks making regular and continual contributions to Kubeflow and in need of access to debug
216+
tests can generally have access
217+
218+
* **ci-edit/admin**
219+
220+
* This is controlled by the group [ci-team](https://github.com/kubeflow/internal-acls/blob/master/ci-team.members.txt)
221+
222+
* This group grants permissions necessary to administer the infrastructure running in **kubeflow-ci** and **kubeflow-ci-deployment**
223+
224+
* Access to this group is highly restricted since this is critical infrastructure for the project
225+
226+
* Following standard operating procedures we want to limit the number of folks with direct access to infrastructure
227+
228+
* Rather than granting more people access we want to develop scalable practices that eliminate the need for
229+
granting large numbers of people access (e.g. developing git ops processes)
230+
231+
* **example-maintainers**
232+
233+
* This is controlled by the group [example-maintainers](https://github.com/kubeflow/internal-acls/blob/master/example-maintainers.members.txt)
234+
235+
* This group provides more direct access to the Kubeflow clusters running **kubeflow-ci-deployment**
236+
237+
* This group is intended for the folks actively developing and maintaining tests for Kubeflow examples
238+
239+
* Continuous testing for kubeflow examples should run against regularly updated, auto-deployed clusters in project **kubeflow-ci-deployment**
240+
241+
* Example maintainers are granted elevated access to these clusters in order to facilitate development of these tests
242+
179243
### No results show up in Gubernator
180244

181245
If no results show up in Gubernator this means the prow job didn't get far enough to upload any results/logs to GCS.
@@ -210,6 +274,10 @@ To access the stackdriver logs
210274

211275
### No Logs in Argo UI For Step or Pod Id missing in Argo Logs
212276

277+
The Argo UI will surface logs for the pod but only if the pod hasn't been deleted yet by Kubernetes.
278+
279+
Using stackdriver to fetch pod logs is more reliable/durable but requires viewer permissions for Kubeflow's ci's infrastructure.
280+
213281
An Argo workflow fails and you click on the failed step in the Argo UI to get the logs
214282
and you see the error
215283

@@ -795,6 +863,52 @@ Follow these steps to add a new test to a repository.
795863
* **params**: A dictionary of parameters to set on the ksonnet component e.g. by running `ks param set ${COMPONENT} ${PARAM_NAME} ${PARAM_VALUE}`
796864
797865
866+
### Using pytest to write tests
867+
868+
* [pytest](https://docs.pytest.org/en/latest/) is really useful for writing tests
869+
870+
* Results can be emitted as junit files which is what prow needs to report test results
871+
* It provides [annotations](http://doc.pytest.org/en/latest/skipping.html) to skip tests or mark flaky tests as expected to fail
872+
873+
* Use pytest to easily script various checks
874+
875+
* For example [kf_is_ready_test.py](https://github.com/kubeflow/kubeflow/blob/master/testing/kfctl/kf_is_ready_test.py)
876+
uses some simple scripting to test that various K8s objects are deployed and healthy
877+
878+
* Pytest provides fixtures for setting additional attributes in the junit files ([docs](http://doc.pytest.org/en/latest/usage.html))
879+
880+
* In particular [record_xml_attribute](http://doc.pytest.org/en/latest/usage.html#record-xml-attribute) allows us to set attributes
881+
that control how's the results are grouped in test grid
882+
883+
* **name** - This is the name shown in test grid
884+
885+
* Testgrid supports [grouping](https://github.com/kubernetes/test-infra/tree/master/testgrid#grouping-tests) by spliting the tests into a hierarchy based on the name
886+
887+
* **recommendation** Leverage this feature to name tests to support grouping; e.g. use the pattern
888+
889+
```
890+
{WORKFLOW_NAME}/{PY_FUNC_NAME}
891+
```
892+
893+
* **workflow_name** Workflow name as set in prow_config.yaml
894+
* **PY_FUNC_NAME** the name of the python test function
895+
896+
* util.py provides the helper method `set_pytest_junit` to set the required attributes
897+
* run_e2e_workflow.py will pass the argument `test_target_name` to your py function to create the Argo workflow
898+
899+
* Use this argument to set the environment variable **TEST_TARGET_NAME** on all Argo pods.
900+
901+
* **classname** - testgrid uses **classname** as the test target and allows results to be grouped by name
902+
903+
* **recommendation** - Set the classname to the workflow name as defined in **prow_config.yaml**
904+
905+
* This allows easy grouping of tests by the entries defined in **prow_config.yaml**
906+
907+
* Each entry in **prow_config.yaml** usually corresponds to a different configuration e.g. "GCP with IAP" vs. "GCP with basic auth"
908+
909+
* So worflow name is a natural grouping
910+
911+
798912
### Prow Variables
799913
800914
* For each test run PROW defines several variables that pass useful information to your job.

0 commit comments

Comments
 (0)