Skip to content

cosmo-hf: add limited busy-polls in all wait paths #2055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented May 2, 2025

Presently, the cosmo-hf driver has three functions that wait for some status to change before continuing: wait_fpga_busy, wait_fpga_rx, and wait_flash_busy. All of these functions check the desired condition in a loop and sleep for 1ms when it is not satisfied. Currently, wait_fpga_busy will first perform a limited number of busy-polls before starting to sleep between polls, but the other functions do not. While investigating HF write performance on Grapefruit,
@cbiffle and I found that commenting out the sleeps in these polling functions improved HF write performance significantly: writing 2MiB of random data using humility qspi -D went from taking about a minute to about 36 seconds --- that's a 40% performance improvement.

This branch is like a slightly more principled version of commenting out the sleep_for calls. I've generalized the pervious wait_fpga_busy of doing 32 busy-polls before starting to sleep to something that can be used by all three wait-for-condition type functions. I did change the code for implementing this a bit from the previous thing, which implemented the counter using a for i in 0.. loop that checked i > 32 This was a bit more concise and kind of a cute trick, but had a somewhat surprising behavior: the loop wasn't actually infinite, and would terminate after u32::MAX polls. Of course, polling for 4,294,967,295ms means we would have waited about 50 days for a write to complete before terminating the loop unexpectedly, and if that happened, something would have to be REALLY broken. But, it's still a pretty wrong behavior, so the implementation I've done --- using a while loop with an explicitly incremented counter --- will never terminate unless the condition has actually changed, which seems a little less anxiety-inducing.

@hawkw hawkw requested review from cbiffle and mkeeter May 2, 2025 18:22
@hawkw
Copy link
Member Author

hawkw commented May 2, 2025

I don't have a Cosmo or Grapefruit readily available for testing this, so I've opened it as a draft for now. If @mkeeter or @cbiffle wants to give this a quick smoke test, I can un-draft it, otherwise I'll go into the office some time next week and try it out on a Grapefruit from the executive bathroom.

// desired status transition occurs in less than 1ms, avoiding a 1-2ms
// sleep and round-trip through the scheduler. status transitions
// quickly.
const MAX_BUSY_POLLS: u32 = 32;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 may be more busy-polls than are really necessary, at least for wait_fpga_rx; @cbiffle and I saw it take about one poll before becoming ready pretty consistently, on Grapefruit. But, I figured we'd keep the current number for now.

Presently, the `cosmo-hf` driver has three functions that wait for some
status to change before continuing: `wait_fpga_busy`, `wait_fpga_rx`,
and `wait_flash_busy`. All of these functions check the desired
condition in a loop and sleep for 1ms when it is not satisfied.
Currently, `wait_fpga_busy` will first perform a limited number of
busy-polls before starting to sleep between polls, but the other
functions do not. While investigating HF write performance on
Grapefruit,
@cbiffle and I found that commenting out the sleeps in these polling
functions improved HF write performance significantly: writing 2MiB of
random data using `humility qspi -D` went from taking about a minute to
about 36 seconds --- that's a 40% performance improvement.

This branch is like a slightly more principled version of commenting out
the `sleep_for` calls. I've generalized the pervious `wait_fpga_busy` of
doing 32 busy-polls before starting to sleep to something that can be
used by all three wait-for-condition type functions. I did change the
code for implementing this a bit from the previous thing, which
implemented the counter using a `for i in 0..` loop that checked `i >
32` This was a bit more concise and kind  of a cute trick, but had a
somewhat surprising behavior: the loop wasn't actually infinite, and
would terminate after `u32::MAX` polls. Of course, polling for
4,294,967,295ms means we would have waited about 50 days for a write to
complete before terminating the loop unexpectedly, and if that happened,
something would have to be REALLY broken. But, it's still a pretty wrong
behavior, so the implementation I've done --- using a `while` loop with an explicitly incremented counter --- will never terminate unless the condition has actually changed, which seems a little less anxiety-inducing.
@hawkw hawkw force-pushed the eliza/hostflash-turbo-mode branch from 5206d4d to 406d9b1 Compare May 2, 2025 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant