Skip to content

Commit 9d9a55a

Browse files
authored
Update docs for steps to take if CI fails (dotnet#32548)
* Update docs for steps to take if CI fails * update * more * more * more * include dumps * more * more * typo
1 parent e4d7893 commit 9d9a55a

File tree

1 file changed

+53
-12
lines changed

1 file changed

+53
-12
lines changed

docs/pr-guide.md

Lines changed: 53 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -25,21 +25,62 @@ Anyone with write access can merge a pull request manually or by setting the [au
2525
* The PR has been approved by at least one reviewer and any other objections are addressed.
2626
* You can request another review from the original reviewer.
2727
* The PR successfully builds and passes all tests in the Continuous Integration (CI) system.
28-
* You can trigger a rebuild by adding a comment like `/azp run <pipeline name>` or manually re-run only the failing lanes in Azure DevOps menu or on GitHub Checks tab clicking on "re-run failed checks" or "re-run all checks" if you want to re-run all.
29-
* You can list the available pipelines by adding a comment like `/azp list` or get the available commands by adding a comment like `azp help`.
30-
* Reach out to the infrastructure team for assistance on [Teams channel](https://teams.microsoft.com/l/channel/19%3ab27b36ecd10a46398da76b02f0411de7%40thread.skype/Infrastructure?groupId=014ca51d-be57-47fa-9628-a15efcc3c376&tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47) (for corpnet users) or on [Gitter](https://gitter.im/dotnet/community) in other cases.
28+
* Depending on your change, you may need to re-run validation. See [rerunning validation](#rerunning-validation) below.
3129

3230
Please always **squash** the pull request unless there are special circumstances. Do so, even if the PR contains only one commit. It creates a simpler history than a Merge Commit. "Special circumstances" are rare, and typically mean that there are a series of cleanly separated changes that will be too hard to understand if squashed together, or for some reason we want to preserve the ability to bisect them.
3331

34-
## Unrelated failure
35-
36-
In case CI indicates failures which are **highly unlikely** to be caused by changes in the PR, the following actions should be taken:
37-
38-
* An existing issue in the repository should be searched for. Usually the test method's or the test assembly's name (in case of a crash) are good parameters.
39-
* If there's an existing issue, a comment should be placed that includes a) the link to the build, b) the affected configuration (ie `netcoreapp-Windows_NT-Release-x64-Windows.81.Amd64.Open`) and c) the Error message and Stack trace. This is necessary as retention policies are in place that recycle _old_ builds. In case the issue is already closed, it should be reopened and labels should be updated to reflect the current failure state.
40-
* If there's no existing issue, an issue should be created with the same information outlined above.
41-
* In a follow-up Pull Request, the failing test(s) should be disabled with the corresponding issue link, e.g. `[ActiveIssue(x)]`, and the tracking issue should be labeled as `disabled-test`.
42-
* A comment should be placed in the original Pull Request that links to the created or updated issues.
32+
## Rerunning Validation
33+
34+
Validation may fail for several reasons:
35+
36+
### Option 1: You have a defect in your PR
37+
38+
* Simply push the fix to your PR branch, and validation will start over.
39+
40+
### Option 2: There is a flaky test that is not related to your PR
41+
42+
* Your assumption should be that a failed test indicates a problem in your PR. (If we don't operate this way, chaos ensues.) If the test fails when run again, it is almost surely a failure caused by your PR. However, there are occasions where unrelated failures occur. Here's some ways to know:
43+
* Perhaps you see the same failure in CI results for unrelated active PR's.
44+
* It's a known issue listed in our [big tracking issue](https://github.com/dotnet/runtime/issues/702) or tagged `blocking-clean-ci` [(query here)](https://github.com/dotnet/runtime/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+label%3Ablocking-clean-ci+)
45+
* Its otherwise beyond any reasonable doubt that your code changes could not have caused this.
46+
* If the tests pass on rerun, that may suggest it's not related.
47+
* In this situation, you want to re-run but not necessarily rebase on master.
48+
* To rerun just the failed leg(s):
49+
* Click on any leg. Navigate through the Azure DevOps UI, find the "..." button and choose "Retry failed legs"
50+
* Or, on the GitHub Checks tab choose "re-run failed checks". This will not rebase your change.
51+
* To rerun all validation:
52+
* Add a comment `/azp run runtime`
53+
* Or, click on "re-run all checks" in the GitHub Checks tab
54+
* Or, simply close and reopen the PR.
55+
* If you have established that it is an unrelated failure, please ensure we have an active issue for it. See the [unrelated failure](#unrelated-failure) section below.
56+
* Whoever merges the PR should be satisfied that the failure is unrelated, is not introduced by the change, and that we are appropriately tracking it.
57+
58+
### Option 3: The state of the master branch HEAD is bad.
59+
60+
* This is the very rare case where there was a build break in master, and you got unlucky. Hopefully the break has been fixed, and you want CI to rebase your change and rerun validation.
61+
* To rebase and rerun all validation:
62+
* Add a comment `/azp run runtime`
63+
* Or, click on "re-run all checks" in the GitHub Checks tab
64+
* Or, simply close and reopen the PR.
65+
66+
### Additional information:
67+
* You can list the available pipelines by adding a comment like `/azp list` or get the available commands by adding a comment like `azp help`.
68+
* Reach out to the infrastructure team for assistance on [Teams channel](https://teams.microsoft.com/l/channel/19%3ab27b36ecd10a46398da76b02f0411de7%40thread.skype/Infrastructure?groupId=014ca51d-be57-47fa-9628-a15efcc3c376&tenantId=72f988bf-86f1-41af-91ab-2d7cd011db47) (for corpnet users) or on [Gitter](https://gitter.im/dotnet/community) in other cases.
69+
70+
## What to do if you determine the failure is unrelated
71+
72+
If you have determined the failure is definitely not caused by changes in your PR, please do this:
73+
74+
* Search for an [existing issue](https://github.com/dotnet/runtime/issues). Usually the test method name or (if a crash/hang) the test assembly name are good search parameters.
75+
* If there's an existing issue, add a comment with
76+
* a) the link to the build
77+
* b) the affected configuration (ie `netcoreapp-Windows_NT-Release-x64-Windows.81.Amd64.Open`)
78+
* c) all console output including the error message and stack trace from the Azure DevOps tab (This is necessary as retention policies are in place that recycle old builds.)
79+
* d) if there's a dump file (see Attachments tab in Azure DevOps) include that
80+
* If the issue is already closed, reopen it and update the labels to reflect the current failure state.
81+
* If there's no existing issue, create an issue with the same information listed above.
82+
* Update the original pull request with a comment linking to the new or existing issue.
83+
* In a follow-up Pull Request, disable the failing test(s) with the corresponding issue link, e.g. `[ActiveIssue(x)]`, and update the tracking issue with the label `disabled-test`.
4384

4485
There are plenty of possible bugs, e.g. race conditions, where a failure might highlight a real problem and it won't manifest again on a retry. Therefore these steps should be followed for every iteration of the PR build, e.g. before retrying/rebuilding.
4586

0 commit comments

Comments
 (0)