Contribution guidelines for bevy_audio (#12338)

SolarLiner · spectria-limina · commit 729d0a89a6ad · 2024-03-09T03:36:27.000-05:00
# Objective

Provide guidelines for contributing code to `bevy_audio`, with a focus
on the critical sections of the audio engine.

## Changelog

Added to the crate-level documentation comment with a section
introducing audio programming, real-time safety and why it is important
to audio programming, as well as recommendations for some programming
use-cases. The section concludes with links to more resources about
audio programming.

I might have gone overboard with the writeup, but I didn't want to
assume a lot out of potential `bevy_audio` contributors, and so I spent
a bit of time defining terms as simply as I could.

I didn't want to pressure people to do so, but the first link on the
additional resources should really be "required reading" as it goes more
in depth about the why and how of audio programming.

---------

Co-authored-by: Nathan Graule &lt;nathan.graule@arturia.com&gt;
diff --git a/crates/bevy_audio/CONTRIBUTING.md b/crates/bevy_audio/CONTRIBUTING.md
@@ -0,0 +1,152 @@
+# Contributing to `bevy_audio`
+
+This document highlights documents some general explanation and guidelines for
+contributing code to this crate. It assumes knowledge of programming, but not
+necessarily of audio programming specifically. It lays out rules to follow, on
+top of the general programming and contribution guidelines of Bevy, that are of
+particular interest for performance reasons.
+
+This section applies to the equivalent in abstraction level to working with
+nodes in the render graph, and not manipulating entities with meshes and
+materials.
+
+Note that these guidelines are general to any audio programming application, and
+not just Bevy.
+
+## Fundamentals of working with audio
+
+### A brief introduction to digital audio signals
+
+Audio signals, when working within a computer, are digital streams of audio
+samples (historically with different types, but nowadays the values are 32-bit
+floats), taken at regular intervals of each other.
+
+How often this sampling is done is determined by the **sample rate** parameter.
+This parameter is available to the users in OS settings, as well as some
+applications.
+
+The sample rate directly determines the spectrum of audio frequencies that will
+be representable by the system. That limit sits at half the sample rate, meaning
+that any sound with frequencies higher than that will introduce artifacts.
+
+If you want to learn more, read about the **Nyquist sampling theorem** and
+**Frequency aliasing**.
+
+### How the computer interfaces with the sound card
+
+When requesting for audio input or output, the OS creates a special
+high-priority thread whose task it is to take in the input audio stream, and/or
+produce the output stream. The audio driver passes an audio buffer that you read
+from (for input) or write to (for output). The size of that buffer is also a
+parameter that is configured when opening an audio stream with the sound card,
+and is sometimes reflected in application settings.
+
+Typical values for buffer size and sample rate are 512 samples at a sample rate
+of 48 kHz. This means that for every 512 samples of audio the driver is going to
+send to the sound card the output callback function is run in this high-priority
+audio thread.  Every second, as dictated by the sample rate, the sound card
+needs 48 000 samples of audio data. This means that we can expect the callback
+function to be run every `512/(48000 Hz)` or 10.666... ms.
+
+This figure is also the latency of the audio engine, that is, how much time it
+takes between a user interaction and hearing the effects out the speakers.
+Therefore, there is a "tug of war" between decreasing the buffer size for
+latency reasons, and increasing it for performance reasons.  The threshold for
+instantaneity in audio is around 15 ms, which is why 512 is a good value for
+interactive applications.
+
+### Real-time programming
+
+The parts of the code running in the audio thread have exactly
+`buffer_size/samplerate` seconds to complete, beyond which the audio driver
+outputs silence (or worse, the previous buffer output, or garbage data), which
+the user perceives as a glitch and severely deteriorates the quality of the
+audio output of the engine. It is therefore critical to work with code that is
+guaranteed to finish in that time.
+
+One step to achieving this is making sure that all machines across the spectrum
+of supported CPUs can reliably perform the computations needed for the game in
+that amount of time, and play around with the buffer size to find the best
+compromise between latency and performance. Another is to conditionally enable
+certain effects for more powerful CPUs, when that is possible.
+
+But the main step is to write code to run in the audio thread following
+real-time programming guidelines.  Real-time programming is a set of constraints
+on code and structures that guarantees the code completes at some point, ie. it
+cannot be stuck in an infinite loop nor can it trigger a deadlock situation.
+
+Practically, the main components of real-time programming are about using
+wait-free and lock-free structures. Examples of things that are *not* correct in
+real-time programming are:
+
+- Allocating anything on the heap (that is, no direct or indirect creation of a
+`Vec`, `Box`, or any standard collection, as they are not designed with
+real-time programming in mind)
+
+- Locking a mutex - Generally, any kind of system call gives the OS the
+opportunity to pause the thread, which is an unbounded operation as we don't
+know how long the thread is going to be paused for
+
+- Waiting by looping until some condition is met (also called a spinloop or a
+spinlock)
+
+Writing wait-free and lock-free structures is a hard task, and difficult to get
+correct; however many structures already exists, and can be directly used. There
+are crates for most replacements of standard collections.
+
+### Where in the code should real-time programming principles be applied?
+
+Any code that is directly or indirectly called by audio threads, needs to be
+real-time safe.
+
+For the Bevy engine, that is:
+
+- In the callback of `cpal::Stream::build_input_stream` and
+`cpal::Stream::build_output_stream`, and all functions called from them
+
+- In implementations of the [`Source`] trait, and all functions called from it
+
+Code that is run in Bevy systems do not need to be real-time safe, as they are
+not run in the audio thread, but in the main game loop thread.
+
+## Communication with the audio thread
+
+To be able to to anything useful with audio, the thread has to be able to
+communicate with the rest of the system, ie. update parameters, send/receive
+audio data, etc., and all of that needs to be done within the constraints of
+real-time programming, of course.
+
+### Audio parameters
+
+In most cases, audio parameters can be represented by an atomic floating point
+value, where the game loop updates the parameter, and it gets picked up when
+processing the next buffer. The downside to this approach is that the audio only
+changes once per audio callback, and results in a noticeable "stair-step "
+motion of the parameter. The latter can be mitigated by "smoothing" the change
+over time, using a tween or linear/exponential smoothing.
+
+Precise timing for non-interactive events (ie. on the beat) need to be setup
+using a clock backed by the audio driver -- that is, counting the number of
+samples processed, and deriving the time elapsed by diving by the sample rate to
+get the number of seconds elapsed. The precise sample at which the parameter
+needs to be changed can then be computed.
+
+Both interactive and precise events are hard to do, and need very low latency
+(ie. 64 or 128 samples for ~2 ms of latency). It is fundamentally impossible to
+react to user event the very moment it is registered.
+
+### Audio data
+
+Audio data is generally transferred between threads with circular buffers, as
+they are simple to implement, fast enough for 99% of use-cases, and are both
+wait-free and lock-free. The only difficulty in using circular buffers is how
+big they should be; however even going for 1 s of audio costs ~50 kB of memory,
+which is small enough to not be noticeable even with potentially 100s of those
+buffers.
+
+## Additional resources for audio programming
+
+More in-depth article about audio programming:
+<http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing>
+
+Awesome Audio DSP: <https://github.com/BillyDM/awesome-audio-dsp>
diff --git a/crates/bevy_audio/src/lib.rs b/crates/bevy_audio/src/lib.rs
@@ -19,7 +19,6 @@
 //!     });
 //! }
 //! ```
-
 #![forbid(unsafe_code)]
 
 mod audio;