|
11 | 11 | - [Argo UI](#argo-ui)
|
12 | 12 | - [Stackdriver logs](#stackdriver-logs)
|
13 | 13 | - [Debugging Failed Tests](#debugging-failed-tests)
|
| 14 | + - [Logs and Cluster Access for Kubeflow CI](#logs-and-cluster-access-for-kubeflow-ci) |
| 15 | + - [Access Control](#access-control) |
14 | 16 | - [No results show up in Gubernator](#no-results-show-up-in-gubernator)
|
15 | 17 | - [No Logs in Argo UI For Step or Pod Id missing in Argo Logs](#no-logs-in-argo-ui-for-step-or-pod-id-missing-in-argo-logs)
|
16 | 18 | - [Debugging Failed Deployments](#debugging-failed-deployments)
|
|
30 | 32 | - [Setting up a Kubeflow Repository to Use Prow <a id="prow-setup"></a>](#setting-up-a-kubeflow-repository-to-use-prow-a-idprow-setupa)
|
31 | 33 | - [Writing An Argo Workflow For An E2E Test](#writing-an-argo-workflow-for-an-e2e-test)
|
32 | 34 | - [Adding an E2E test to a repository](#adding-an-e2e-test-to-a-repository)
|
| 35 | + - [Python function](#python-function) |
| 36 | + - [ksonnet](#ksonnet) |
| 37 | + - [Using pytest to write tests](#using-pytest-to-write-tests) |
33 | 38 | - [Prow Variables](#prow-variables)
|
34 | 39 | - [Argo Spec](#argo-spec)
|
35 | 40 | - [Creating K8s resources in tests.](#creating-k8s-resources-in-tests)
|
@@ -176,6 +181,65 @@ gcloud one liners for fetching logs.
|
176 | 181 |
|
177 | 182 | ## Debugging Failed Tests
|
178 | 183 |
|
| 184 | +### Logs and Cluster Access for Kubeflow CI |
| 185 | + |
| 186 | +Our tests are split across three projects |
| 187 | + |
| 188 | +* **k8s-prow-builds** |
| 189 | + |
| 190 | + * This is owned by the prow team |
| 191 | + * This is where the prow jobs run |
| 192 | + * We are working on changing this see [kubeflow/testing#475](https://github.com/kubeflow/testing/issues/475) |
| 193 | + |
| 194 | +* **kubeflow-ci** |
| 195 | + |
| 196 | + * This is where the Argo E2E workflows kicked off by the prow jobs run |
| 197 | + * This is where other Kubeflow test infra (e.g. various cron jobs run) |
| 198 | + |
| 199 | +* **kubeflow-ci-deployment** |
| 200 | + |
| 201 | + * This is the project where E2E tests actually create Kubeflow clusters |
| 202 | + |
| 203 | + |
| 204 | +#### Access Control |
| 205 | + |
| 206 | +We currently have the following levels of access |
| 207 | + |
| 208 | +* **ci-viewer-only** |
| 209 | + |
| 210 | + * This is controlled by the group [ci-viewer](https://github.com/kubeflow/internal-acls/blob/master/ci-viewer.members.txt) |
| 211 | + |
| 212 | + * This group basically grants viewer only access to projects **kubeflow-ci** and **kubeflow-ci-deployment** |
| 213 | + * This provides access to stackdriver for both projects |
| 214 | + |
| 215 | + * Folks making regular and continual contributions to Kubeflow and in need of access to debug |
| 216 | + tests can generally have access |
| 217 | + |
| 218 | +* **ci-edit/admin** |
| 219 | + |
| 220 | + * This is controlled by the group [ci-team](https://github.com/kubeflow/internal-acls/blob/master/ci-team.members.txt) |
| 221 | + |
| 222 | + * This group grants permissions necessary to administer the infrastructure running in **kubeflow-ci** and **kubeflow-ci-deployment** |
| 223 | + |
| 224 | + * Access to this group is highly restricted since this is critical infrastructure for the project |
| 225 | + |
| 226 | + * Following standard operating procedures we want to limit the number of folks with direct access to infrastructure |
| 227 | + |
| 228 | + * Rather than granting more people access we want to develop scalable practices that eliminate the need for |
| 229 | + granting large numbers of people access (e.g. developing git ops processes) |
| 230 | + |
| 231 | + * **example-maintainers** |
| 232 | + |
| 233 | + * This is controlled by the group [example-maintainers](https://github.com/kubeflow/internal-acls/blob/master/example-maintainers.members.txt) |
| 234 | + |
| 235 | + * This group provides more direct access to the Kubeflow clusters running **kubeflow-ci-deployment** |
| 236 | + |
| 237 | + * This group is intended for the folks actively developing and maintaining tests for Kubeflow examples |
| 238 | + |
| 239 | + * Continuous testing for kubeflow examples should run against regularly updated, auto-deployed clusters in project **kubeflow-ci-deployment** |
| 240 | + |
| 241 | + * Example maintainers are granted elevated access to these clusters in order to facilitate development of these tests |
| 242 | + |
179 | 243 | ### No results show up in Gubernator
|
180 | 244 |
|
181 | 245 | If no results show up in Gubernator this means the prow job didn't get far enough to upload any results/logs to GCS.
|
@@ -210,6 +274,10 @@ To access the stackdriver logs
|
210 | 274 |
|
211 | 275 | ### No Logs in Argo UI For Step or Pod Id missing in Argo Logs
|
212 | 276 |
|
| 277 | +The Argo UI will surface logs for the pod but only if the pod hasn't been deleted yet by Kubernetes. |
| 278 | + |
| 279 | +Using stackdriver to fetch pod logs is more reliable/durable but requires viewer permissions for Kubeflow's ci's infrastructure. |
| 280 | + |
213 | 281 | An Argo workflow fails and you click on the failed step in the Argo UI to get the logs
|
214 | 282 | and you see the error
|
215 | 283 |
|
@@ -795,6 +863,52 @@ Follow these steps to add a new test to a repository.
|
795 | 863 | * **params**: A dictionary of parameters to set on the ksonnet component e.g. by running `ks param set ${COMPONENT} ${PARAM_NAME} ${PARAM_VALUE}`
|
796 | 864 |
|
797 | 865 |
|
| 866 | +### Using pytest to write tests |
| 867 | +
|
| 868 | +* [pytest](https://docs.pytest.org/en/latest/) is really useful for writing tests |
| 869 | +
|
| 870 | + * Results can be emitted as junit files which is what prow needs to report test results |
| 871 | + * It provides [annotations](http://doc.pytest.org/en/latest/skipping.html) to skip tests or mark flaky tests as expected to fail |
| 872 | +
|
| 873 | +* Use pytest to easily script various checks |
| 874 | +
|
| 875 | + * For example [kf_is_ready_test.py](https://github.com/kubeflow/kubeflow/blob/master/testing/kfctl/kf_is_ready_test.py) |
| 876 | + uses some simple scripting to test that various K8s objects are deployed and healthy |
| 877 | +
|
| 878 | +* Pytest provides fixtures for setting additional attributes in the junit files ([docs](http://doc.pytest.org/en/latest/usage.html)) |
| 879 | +
|
| 880 | + * In particular [record_xml_attribute](http://doc.pytest.org/en/latest/usage.html#record-xml-attribute) allows us to set attributes |
| 881 | + that control how's the results are grouped in test grid |
| 882 | +
|
| 883 | + * **name** - This is the name shown in test grid |
| 884 | +
|
| 885 | + * Testgrid supports [grouping](https://github.com/kubernetes/test-infra/tree/master/testgrid#grouping-tests) by spliting the tests into a hierarchy based on the name |
| 886 | +
|
| 887 | + * **recommendation** Leverage this feature to name tests to support grouping; e.g. use the pattern |
| 888 | +
|
| 889 | + ``` |
| 890 | + {WORKFLOW_NAME}/{PY_FUNC_NAME} |
| 891 | + ``` |
| 892 | +
|
| 893 | + * **workflow_name** Workflow name as set in prow_config.yaml |
| 894 | + * **PY_FUNC_NAME** the name of the python test function |
| 895 | +
|
| 896 | + * util.py provides the helper method `set_pytest_junit` to set the required attributes |
| 897 | + * run_e2e_workflow.py will pass the argument `test_target_name` to your py function to create the Argo workflow |
| 898 | +
|
| 899 | + * Use this argument to set the environment variable **TEST_TARGET_NAME** on all Argo pods. |
| 900 | +
|
| 901 | + * **classname** - testgrid uses **classname** as the test target and allows results to be grouped by name |
| 902 | +
|
| 903 | + * **recommendation** - Set the classname to the workflow name as defined in **prow_config.yaml** |
| 904 | +
|
| 905 | + * This allows easy grouping of tests by the entries defined in **prow_config.yaml** |
| 906 | +
|
| 907 | + * Each entry in **prow_config.yaml** usually corresponds to a different configuration e.g. "GCP with IAP" vs. "GCP with basic auth" |
| 908 | +
|
| 909 | + * So worflow name is a natural grouping |
| 910 | + |
| 911 | +
|
798 | 912 | ### Prow Variables
|
799 | 913 |
|
800 | 914 | * For each test run PROW defines several variables that pass useful information to your job.
|
|
0 commit comments