Skip to content

Commit 729d0a8

Browse files
SolarLinerspectria-limina
authored andcommitted
Contribution guidelines for bevy_audio (#12338)
# Objective Provide guidelines for contributing code to `bevy_audio`, with a focus on the critical sections of the audio engine. ## Changelog Added to the crate-level documentation comment with a section introducing audio programming, real-time safety and why it is important to audio programming, as well as recommendations for some programming use-cases. The section concludes with links to more resources about audio programming. I might have gone overboard with the writeup, but I didn't want to assume a lot out of potential `bevy_audio` contributors, and so I spent a bit of time defining terms as simply as I could. I didn't want to pressure people to do so, but the first link on the additional resources should really be "required reading" as it goes more in depth about the why and how of audio programming. --------- Co-authored-by: Nathan Graule <[email protected]>
1 parent d1b08e4 commit 729d0a8

File tree

2 files changed

+152
-1
lines changed

2 files changed

+152
-1
lines changed

crates/bevy_audio/CONTRIBUTING.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Contributing to `bevy_audio`
2+
3+
This document highlights documents some general explanation and guidelines for
4+
contributing code to this crate. It assumes knowledge of programming, but not
5+
necessarily of audio programming specifically. It lays out rules to follow, on
6+
top of the general programming and contribution guidelines of Bevy, that are of
7+
particular interest for performance reasons.
8+
9+
This section applies to the equivalent in abstraction level to working with
10+
nodes in the render graph, and not manipulating entities with meshes and
11+
materials.
12+
13+
Note that these guidelines are general to any audio programming application, and
14+
not just Bevy.
15+
16+
## Fundamentals of working with audio
17+
18+
### A brief introduction to digital audio signals
19+
20+
Audio signals, when working within a computer, are digital streams of audio
21+
samples (historically with different types, but nowadays the values are 32-bit
22+
floats), taken at regular intervals of each other.
23+
24+
How often this sampling is done is determined by the **sample rate** parameter.
25+
This parameter is available to the users in OS settings, as well as some
26+
applications.
27+
28+
The sample rate directly determines the spectrum of audio frequencies that will
29+
be representable by the system. That limit sits at half the sample rate, meaning
30+
that any sound with frequencies higher than that will introduce artifacts.
31+
32+
If you want to learn more, read about the **Nyquist sampling theorem** and
33+
**Frequency aliasing**.
34+
35+
### How the computer interfaces with the sound card
36+
37+
When requesting for audio input or output, the OS creates a special
38+
high-priority thread whose task it is to take in the input audio stream, and/or
39+
produce the output stream. The audio driver passes an audio buffer that you read
40+
from (for input) or write to (for output). The size of that buffer is also a
41+
parameter that is configured when opening an audio stream with the sound card,
42+
and is sometimes reflected in application settings.
43+
44+
Typical values for buffer size and sample rate are 512 samples at a sample rate
45+
of 48 kHz. This means that for every 512 samples of audio the driver is going to
46+
send to the sound card the output callback function is run in this high-priority
47+
audio thread. Every second, as dictated by the sample rate, the sound card
48+
needs 48 000 samples of audio data. This means that we can expect the callback
49+
function to be run every `512/(48000 Hz)` or 10.666... ms.
50+
51+
This figure is also the latency of the audio engine, that is, how much time it
52+
takes between a user interaction and hearing the effects out the speakers.
53+
Therefore, there is a "tug of war" between decreasing the buffer size for
54+
latency reasons, and increasing it for performance reasons. The threshold for
55+
instantaneity in audio is around 15 ms, which is why 512 is a good value for
56+
interactive applications.
57+
58+
### Real-time programming
59+
60+
The parts of the code running in the audio thread have exactly
61+
`buffer_size/samplerate` seconds to complete, beyond which the audio driver
62+
outputs silence (or worse, the previous buffer output, or garbage data), which
63+
the user perceives as a glitch and severely deteriorates the quality of the
64+
audio output of the engine. It is therefore critical to work with code that is
65+
guaranteed to finish in that time.
66+
67+
One step to achieving this is making sure that all machines across the spectrum
68+
of supported CPUs can reliably perform the computations needed for the game in
69+
that amount of time, and play around with the buffer size to find the best
70+
compromise between latency and performance. Another is to conditionally enable
71+
certain effects for more powerful CPUs, when that is possible.
72+
73+
But the main step is to write code to run in the audio thread following
74+
real-time programming guidelines. Real-time programming is a set of constraints
75+
on code and structures that guarantees the code completes at some point, ie. it
76+
cannot be stuck in an infinite loop nor can it trigger a deadlock situation.
77+
78+
Practically, the main components of real-time programming are about using
79+
wait-free and lock-free structures. Examples of things that are *not* correct in
80+
real-time programming are:
81+
82+
- Allocating anything on the heap (that is, no direct or indirect creation of a
83+
`Vec`, `Box`, or any standard collection, as they are not designed with
84+
real-time programming in mind)
85+
86+
- Locking a mutex - Generally, any kind of system call gives the OS the
87+
opportunity to pause the thread, which is an unbounded operation as we don't
88+
know how long the thread is going to be paused for
89+
90+
- Waiting by looping until some condition is met (also called a spinloop or a
91+
spinlock)
92+
93+
Writing wait-free and lock-free structures is a hard task, and difficult to get
94+
correct; however many structures already exists, and can be directly used. There
95+
are crates for most replacements of standard collections.
96+
97+
### Where in the code should real-time programming principles be applied?
98+
99+
Any code that is directly or indirectly called by audio threads, needs to be
100+
real-time safe.
101+
102+
For the Bevy engine, that is:
103+
104+
- In the callback of `cpal::Stream::build_input_stream` and
105+
`cpal::Stream::build_output_stream`, and all functions called from them
106+
107+
- In implementations of the [`Source`] trait, and all functions called from it
108+
109+
Code that is run in Bevy systems do not need to be real-time safe, as they are
110+
not run in the audio thread, but in the main game loop thread.
111+
112+
## Communication with the audio thread
113+
114+
To be able to to anything useful with audio, the thread has to be able to
115+
communicate with the rest of the system, ie. update parameters, send/receive
116+
audio data, etc., and all of that needs to be done within the constraints of
117+
real-time programming, of course.
118+
119+
### Audio parameters
120+
121+
In most cases, audio parameters can be represented by an atomic floating point
122+
value, where the game loop updates the parameter, and it gets picked up when
123+
processing the next buffer. The downside to this approach is that the audio only
124+
changes once per audio callback, and results in a noticeable "stair-step "
125+
motion of the parameter. The latter can be mitigated by "smoothing" the change
126+
over time, using a tween or linear/exponential smoothing.
127+
128+
Precise timing for non-interactive events (ie. on the beat) need to be setup
129+
using a clock backed by the audio driver -- that is, counting the number of
130+
samples processed, and deriving the time elapsed by diving by the sample rate to
131+
get the number of seconds elapsed. The precise sample at which the parameter
132+
needs to be changed can then be computed.
133+
134+
Both interactive and precise events are hard to do, and need very low latency
135+
(ie. 64 or 128 samples for ~2 ms of latency). It is fundamentally impossible to
136+
react to user event the very moment it is registered.
137+
138+
### Audio data
139+
140+
Audio data is generally transferred between threads with circular buffers, as
141+
they are simple to implement, fast enough for 99% of use-cases, and are both
142+
wait-free and lock-free. The only difficulty in using circular buffers is how
143+
big they should be; however even going for 1 s of audio costs ~50 kB of memory,
144+
which is small enough to not be noticeable even with potentially 100s of those
145+
buffers.
146+
147+
## Additional resources for audio programming
148+
149+
More in-depth article about audio programming:
150+
<http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing>
151+
152+
Awesome Audio DSP: <https://github.com/BillyDM/awesome-audio-dsp>

crates/bevy_audio/src/lib.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919
//! });
2020
//! }
2121
//! ```
22-
2322
#![forbid(unsafe_code)]
2423

2524
mod audio;

0 commit comments

Comments
 (0)