Skip to content

[nexus] the support bundle task should execute diag commands concurrently #7461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

papertigers
Copy link
Contributor

@papertigers papertigers commented Jan 31, 2025

This PR adds some of the sled-diagnostics crates commands that were not yet being collected. Additionally we now have an array of commands that will be ran concurrently (currently limited to 10 at a time) that we can add to as more support commands become available.

Created using spr 1.3.6-beta.1
Comment on lines 651 to 653
while let Some(result) = diag_cmds.next().await {
result?;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm, this will exit only if we fail to write the output of the command, not if the commands themselves fail, right? Just want to make sure that one failing command on a sled would not short-circuit everything else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's only going to fail if the tokio::write call fails on the support bundle file itself.

But thinking about this, should we instead log that we failed to write to file rather then potentially not getting through the entire array?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it's better to get partial results than nothing at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 6b02fb3

Comment on lines 651 to 653
while let Some(result) = diag_cmds.next().await {
result?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree it's better to get partial results than nothing at all.

Created using spr 1.3.6-beta.1
@papertigers
Copy link
Contributor Author

The test failure seems related to oxidecomputer/buildomat#62 so I am going to re-run it.

Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
@papertigers papertigers enabled auto-merge (squash) February 28, 2025 21:41
Created using spr 1.3.6-beta.1
@papertigers
Copy link
Contributor Author

Re-running failed CI job due to oxidecomputer/helios#191

Created using spr 1.3.6-beta.1
@papertigers papertigers merged commit f6f85a7 into main Mar 3, 2025
16 checks passed
@papertigers papertigers deleted the spr/papertigers/nexus-the-support-bundle-task-should-execute-diag-commands-concurrently branch March 3, 2025 23:45
papertigers added a commit that referenced this pull request Mar 4, 2025
Diagnostic commands now output the json serialized value rather than the
debug output for the inner type.

This is on top of:
- #7461
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants