|
| 1 | +# Contributing to `bevy_audio` |
| 2 | + |
| 3 | +This document highlights documents some general explanation and guidelines for |
| 4 | +contributing code to this crate. It assumes knowledge of programming, but not |
| 5 | +necessarily of audio programming specifically. It lays out rules to follow, on |
| 6 | +top of the general programming and contribution guidelines of Bevy, that are of |
| 7 | +particular interest for performance reasons. |
| 8 | + |
| 9 | +This section applies to the equivalent in abstraction level to working with |
| 10 | +nodes in the render graph, and not manipulating entities with meshes and |
| 11 | +materials. |
| 12 | + |
| 13 | +Note that these guidelines are general to any audio programming application, and |
| 14 | +not just Bevy. |
| 15 | + |
| 16 | +## Fundamentals of working with audio |
| 17 | + |
| 18 | +### A brief introduction to digital audio signals |
| 19 | + |
| 20 | +Audio signals, when working within a computer, are digital streams of audio |
| 21 | +samples (historically with different types, but nowadays the values are 32-bit |
| 22 | +floats), taken at regular intervals of each other. |
| 23 | + |
| 24 | +How often this sampling is done is determined by the **sample rate** parameter. |
| 25 | +This parameter is available to the users in OS settings, as well as some |
| 26 | +applications. |
| 27 | + |
| 28 | +The sample rate directly determines the spectrum of audio frequencies that will |
| 29 | +be representable by the system. That limit sits at half the sample rate, meaning |
| 30 | +that any sound with frequencies higher than that will introduce artifacts. |
| 31 | + |
| 32 | +If you want to learn more, read about the **Nyquist sampling theorem** and |
| 33 | +**Frequency aliasing**. |
| 34 | + |
| 35 | +### How the computer interfaces with the sound card |
| 36 | + |
| 37 | +When requesting for audio input or output, the OS creates a special |
| 38 | +high-priority thread whose task it is to take in the input audio stream, and/or |
| 39 | +produce the output stream. The audio driver passes an audio buffer that you read |
| 40 | +from (for input) or write to (for output). The size of that buffer is also a |
| 41 | +parameter that is configured when opening an audio stream with the sound card, |
| 42 | +and is sometimes reflected in application settings. |
| 43 | + |
| 44 | +Typical values for buffer size and sample rate are 512 samples at a sample rate |
| 45 | +of 48 kHz. This means that for every 512 samples of audio the driver is going to |
| 46 | +send to the sound card the output callback function is run in this high-priority |
| 47 | +audio thread. Every second, as dictated by the sample rate, the sound card |
| 48 | +needs 48 000 samples of audio data. This means that we can expect the callback |
| 49 | +function to be run every `512/(48000 Hz)` or 10.666... ms. |
| 50 | + |
| 51 | +This figure is also the latency of the audio engine, that is, how much time it |
| 52 | +takes between a user interaction and hearing the effects out the speakers. |
| 53 | +Therefore, there is a "tug of war" between decreasing the buffer size for |
| 54 | +latency reasons, and increasing it for performance reasons. The threshold for |
| 55 | +instantaneity in audio is around 15 ms, which is why 512 is a good value for |
| 56 | +interactive applications. |
| 57 | + |
| 58 | +### Real-time programming |
| 59 | + |
| 60 | +The parts of the code running in the audio thread have exactly |
| 61 | +`buffer_size/samplerate` seconds to complete, beyond which the audio driver |
| 62 | +outputs silence (or worse, the previous buffer output, or garbage data), which |
| 63 | +the user perceives as a glitch and severely deteriorates the quality of the |
| 64 | +audio output of the engine. It is therefore critical to work with code that is |
| 65 | +guaranteed to finish in that time. |
| 66 | + |
| 67 | +One step to achieving this is making sure that all machines across the spectrum |
| 68 | +of supported CPUs can reliably perform the computations needed for the game in |
| 69 | +that amount of time, and play around with the buffer size to find the best |
| 70 | +compromise between latency and performance. Another is to conditionally enable |
| 71 | +certain effects for more powerful CPUs, when that is possible. |
| 72 | + |
| 73 | +But the main step is to write code to run in the audio thread following |
| 74 | +real-time programming guidelines. Real-time programming is a set of constraints |
| 75 | +on code and structures that guarantees the code completes at some point, ie. it |
| 76 | +cannot be stuck in an infinite loop nor can it trigger a deadlock situation. |
| 77 | + |
| 78 | +Practically, the main components of real-time programming are about using |
| 79 | +wait-free and lock-free structures. Examples of things that are *not* correct in |
| 80 | +real-time programming are: |
| 81 | + |
| 82 | +- Allocating anything on the heap (that is, no direct or indirect creation of a |
| 83 | +`Vec`, `Box`, or any standard collection, as they are not designed with |
| 84 | +real-time programming in mind) |
| 85 | + |
| 86 | +- Locking a mutex - Generally, any kind of system call gives the OS the |
| 87 | +opportunity to pause the thread, which is an unbounded operation as we don't |
| 88 | +know how long the thread is going to be paused for |
| 89 | + |
| 90 | +- Waiting by looping until some condition is met (also called a spinloop or a |
| 91 | +spinlock) |
| 92 | + |
| 93 | +Writing wait-free and lock-free structures is a hard task, and difficult to get |
| 94 | +correct; however many structures already exists, and can be directly used. There |
| 95 | +are crates for most replacements of standard collections. |
| 96 | + |
| 97 | +### Where in the code should real-time programming principles be applied? |
| 98 | + |
| 99 | +Any code that is directly or indirectly called by audio threads, needs to be |
| 100 | +real-time safe. |
| 101 | + |
| 102 | +For the Bevy engine, that is: |
| 103 | + |
| 104 | +- In the callback of `cpal::Stream::build_input_stream` and |
| 105 | +`cpal::Stream::build_output_stream`, and all functions called from them |
| 106 | + |
| 107 | +- In implementations of the [`Source`] trait, and all functions called from it |
| 108 | + |
| 109 | +Code that is run in Bevy systems do not need to be real-time safe, as they are |
| 110 | +not run in the audio thread, but in the main game loop thread. |
| 111 | + |
| 112 | +## Communication with the audio thread |
| 113 | + |
| 114 | +To be able to to anything useful with audio, the thread has to be able to |
| 115 | +communicate with the rest of the system, ie. update parameters, send/receive |
| 116 | +audio data, etc., and all of that needs to be done within the constraints of |
| 117 | +real-time programming, of course. |
| 118 | + |
| 119 | +### Audio parameters |
| 120 | + |
| 121 | +In most cases, audio parameters can be represented by an atomic floating point |
| 122 | +value, where the game loop updates the parameter, and it gets picked up when |
| 123 | +processing the next buffer. The downside to this approach is that the audio only |
| 124 | +changes once per audio callback, and results in a noticeable "stair-step " |
| 125 | +motion of the parameter. The latter can be mitigated by "smoothing" the change |
| 126 | +over time, using a tween or linear/exponential smoothing. |
| 127 | + |
| 128 | +Precise timing for non-interactive events (ie. on the beat) need to be setup |
| 129 | +using a clock backed by the audio driver -- that is, counting the number of |
| 130 | +samples processed, and deriving the time elapsed by diving by the sample rate to |
| 131 | +get the number of seconds elapsed. The precise sample at which the parameter |
| 132 | +needs to be changed can then be computed. |
| 133 | + |
| 134 | +Both interactive and precise events are hard to do, and need very low latency |
| 135 | +(ie. 64 or 128 samples for ~2 ms of latency). It is fundamentally impossible to |
| 136 | +react to user event the very moment it is registered. |
| 137 | + |
| 138 | +### Audio data |
| 139 | + |
| 140 | +Audio data is generally transferred between threads with circular buffers, as |
| 141 | +they are simple to implement, fast enough for 99% of use-cases, and are both |
| 142 | +wait-free and lock-free. The only difficulty in using circular buffers is how |
| 143 | +big they should be; however even going for 1 s of audio costs ~50 kB of memory, |
| 144 | +which is small enough to not be noticeable even with potentially 100s of those |
| 145 | +buffers. |
| 146 | + |
| 147 | +## Additional resources for audio programming |
| 148 | + |
| 149 | +More in-depth article about audio programming: |
| 150 | +<http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing> |
| 151 | + |
| 152 | +Awesome Audio DSP: <https://github.com/BillyDM/awesome-audio-dsp> |
0 commit comments