# Objective Stumbled on one of these and went digging for more ## Solution ```diff - word word + word ```
		
			
				
	
	
		
			153 lines
		
	
	
		
			7.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			153 lines
		
	
	
		
			7.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# Contributing to `bevy_audio`
 | 
						|
 | 
						|
This document highlights documents some general explanation and guidelines for
 | 
						|
contributing code to this crate. It assumes knowledge of programming, but not
 | 
						|
necessarily of audio programming specifically. It lays out rules to follow, on
 | 
						|
top of the general programming and contribution guidelines of Bevy, that are of
 | 
						|
particular interest for performance reasons.
 | 
						|
 | 
						|
This section applies to the equivalent in abstraction level to working with
 | 
						|
nodes in the render graph, and not manipulating entities with meshes and
 | 
						|
materials.
 | 
						|
 | 
						|
Note that these guidelines are general to any audio programming application, and
 | 
						|
not just Bevy.
 | 
						|
 | 
						|
## Fundamentals of working with audio
 | 
						|
 | 
						|
### A brief introduction to digital audio signals
 | 
						|
 | 
						|
Audio signals, when working within a computer, are digital streams of audio
 | 
						|
samples (historically with different types, but nowadays the values are 32-bit
 | 
						|
floats), taken at regular intervals of each other.
 | 
						|
 | 
						|
How often this sampling is done is determined by the **sample rate** parameter.
 | 
						|
This parameter is available to the users in OS settings, as well as some
 | 
						|
applications.
 | 
						|
 | 
						|
The sample rate directly determines the spectrum of audio frequencies that will
 | 
						|
be representable by the system. That limit sits at half the sample rate, meaning
 | 
						|
that any sound with frequencies higher than that will introduce artifacts.
 | 
						|
 | 
						|
If you want to learn more, read about the **Nyquist sampling theorem** and
 | 
						|
**Frequency aliasing**.
 | 
						|
 | 
						|
### How the computer interfaces with the sound card
 | 
						|
 | 
						|
When requesting for audio input or output, the OS creates a special
 | 
						|
high-priority thread whose task it is to take in the input audio stream, and/or
 | 
						|
produce the output stream. The audio driver passes an audio buffer that you read
 | 
						|
from (for input) or write to (for output). The size of that buffer is also a
 | 
						|
parameter that is configured when opening an audio stream with the sound card,
 | 
						|
and is sometimes reflected in application settings.
 | 
						|
 | 
						|
Typical values for buffer size and sample rate are 512 samples at a sample rate
 | 
						|
of 48 kHz. This means that for every 512 samples of audio the driver is going to
 | 
						|
send to the sound card the output callback function is run in this high-priority
 | 
						|
audio thread.  Every second, as dictated by the sample rate, the sound card
 | 
						|
needs 48 000 samples of audio data. This means that we can expect the callback
 | 
						|
function to be run every `512/(48000 Hz)` or 10.666... ms.
 | 
						|
 | 
						|
This figure is also the latency of the audio engine, that is, how much time it
 | 
						|
takes between a user interaction and hearing the effects out the speakers.
 | 
						|
Therefore, there is a "tug of war" between decreasing the buffer size for
 | 
						|
latency reasons, and increasing it for performance reasons.  The threshold for
 | 
						|
instantaneity in audio is around 15 ms, which is why 512 is a good value for
 | 
						|
interactive applications.
 | 
						|
 | 
						|
### Real-time programming
 | 
						|
 | 
						|
The parts of the code running in the audio thread have exactly
 | 
						|
`buffer_size/samplerate` seconds to complete, beyond which the audio driver
 | 
						|
outputs silence (or worse, the previous buffer output, or garbage data), which
 | 
						|
the user perceives as a glitch and severely deteriorates the quality of the
 | 
						|
audio output of the engine. It is therefore critical to work with code that is
 | 
						|
guaranteed to finish in that time.
 | 
						|
 | 
						|
One step to achieving this is making sure that all machines across the spectrum
 | 
						|
of supported CPUs can reliably perform the computations needed for the game in
 | 
						|
that amount of time, and play around with the buffer size to find the best
 | 
						|
compromise between latency and performance. Another is to conditionally enable
 | 
						|
certain effects for more powerful CPUs, when that is possible.
 | 
						|
 | 
						|
But the main step is to write code to run in the audio thread following
 | 
						|
real-time programming guidelines.  Real-time programming is a set of constraints
 | 
						|
on code and structures that guarantees the code completes at some point, ie. it
 | 
						|
cannot be stuck in an infinite loop nor can it trigger a deadlock situation.
 | 
						|
 | 
						|
Practically, the main components of real-time programming are about using
 | 
						|
wait-free and lock-free structures. Examples of things that are *not* correct in
 | 
						|
real-time programming are:
 | 
						|
 | 
						|
- Allocating anything on the heap (that is, no direct or indirect creation of a
 | 
						|
`Vec`, `Box`, or any standard collection, as they are not designed with
 | 
						|
real-time programming in mind)
 | 
						|
 | 
						|
- Locking a mutex - Generally, any kind of system call gives the OS the
 | 
						|
opportunity to pause the thread, which is an unbounded operation as we don't
 | 
						|
know how long the thread is going to be paused for
 | 
						|
 | 
						|
- Waiting by looping until some condition is met (also called a spinloop or a
 | 
						|
spinlock)
 | 
						|
 | 
						|
Writing wait-free and lock-free structures is a hard task, and difficult to get
 | 
						|
correct; however many structures already exists, and can be directly used. There
 | 
						|
are crates for most replacements of standard collections.
 | 
						|
 | 
						|
### Where in the code should real-time programming principles be applied?
 | 
						|
 | 
						|
Any code that is directly or indirectly called by audio threads, needs to be
 | 
						|
real-time safe.
 | 
						|
 | 
						|
For the Bevy engine, that is:
 | 
						|
 | 
						|
- In the callback of `cpal::Stream::build_input_stream` and
 | 
						|
`cpal::Stream::build_output_stream`, and all functions called from them
 | 
						|
 | 
						|
- In implementations of the [`Source`] trait, and all functions called from it
 | 
						|
 | 
						|
Code that is run in Bevy systems do not need to be real-time safe, as they are
 | 
						|
not run in the audio thread, but in the main game loop thread.
 | 
						|
 | 
						|
## Communication with the audio thread
 | 
						|
 | 
						|
To be able to do anything useful with audio, the thread has to be able to
 | 
						|
communicate with the rest of the system, ie. update parameters, send/receive
 | 
						|
audio data, etc., and all of that needs to be done within the constraints of
 | 
						|
real-time programming, of course.
 | 
						|
 | 
						|
### Audio parameters
 | 
						|
 | 
						|
In most cases, audio parameters can be represented by an atomic floating point
 | 
						|
value, where the game loop updates the parameter, and it gets picked up when
 | 
						|
processing the next buffer. The downside to this approach is that the audio only
 | 
						|
changes once per audio callback, and results in a noticeable "stair-step "
 | 
						|
motion of the parameter. The latter can be mitigated by "smoothing" the change
 | 
						|
over time, using a tween or linear/exponential smoothing.
 | 
						|
 | 
						|
Precise timing for non-interactive events (ie. on the beat) need to be setup
 | 
						|
using a clock backed by the audio driver -- that is, counting the number of
 | 
						|
samples processed, and deriving the time elapsed by diving by the sample rate to
 | 
						|
get the number of seconds elapsed. The precise sample at which the parameter
 | 
						|
needs to be changed can then be computed.
 | 
						|
 | 
						|
Both interactive and precise events are hard to do, and need very low latency
 | 
						|
(ie. 64 or 128 samples for ~2 ms of latency). It is fundamentally impossible to
 | 
						|
react to user event the very moment it is registered.
 | 
						|
 | 
						|
### Audio data
 | 
						|
 | 
						|
Audio data is generally transferred between threads with circular buffers, as
 | 
						|
they are simple to implement, fast enough for 99% of use-cases, and are both
 | 
						|
wait-free and lock-free. The only difficulty in using circular buffers is how
 | 
						|
big they should be; however even going for 1 s of audio costs ~50 kB of memory,
 | 
						|
which is small enough to not be noticeable even with potentially 100s of those
 | 
						|
buffers.
 | 
						|
 | 
						|
## Additional resources for audio programming
 | 
						|
 | 
						|
More in-depth article about audio programming:
 | 
						|
<http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing>
 | 
						|
 | 
						|
Awesome Audio DSP: <https://github.com/BillyDM/awesome-audio-dsp>
 |