Expand description

Basic support for reading and writing wave PCM files.

This early version of the library seeks to support the “canonical” wave PCM file format, as well as the basic features of the extended file format. Non-PCM files are not currently supported, nor is metadata. Future versions of the library may add support for some of these features.

The wave file format was originally defined by Microsoft, and it stores audio wave data in a container using RIFF chunks to encode the header and data. The RIFF format also supports file metadata via a LIST INFO chunk and an associated data chunk.

See also:

The wave file format

The wave file format starts with the RIFF file header:

OffsetSizeDataDescription
04“RIFF”Identifies the main chunk.
44chunk sizeThe size of the rest of the file. This should be equal to the size of the file minus 8 bytes.
84“WAVE”Indicates that this is a wave file.

The wave file format requires at least two subchunks which follow the main chunk:

  • The “fmt “ subchunk. This contains additional header information.
  • The “data” subchunk. This contains the actual audio data.

The “fmt “ subchunk

The “fmt “ subchunk starts with the following fields:

OffsetSizeDataDescription
124“fmt “Identifies this subchunk.
164subchunk sizeThe size of the rest of this subchunk.
202format (1)The format of the wave data, which will be 1 for uncompressed PCM data.
222num channelsIndicates if the data is mono, stereo, or something else.
244sample rateThe sample rate per second.
284byte rateThe total byte rate per second. For 16-bit stereo at 44,100 samples per second, this would be equal to 176,000 bytes per second.
322block alignHow many bytes are needed for each “frame”, where a frame is one sample for each channel.
342bits per sampleThe bits per sample; i.e. 16 for 16-bit audio.

The format can take on various values, including the following codes:

ValueDescription
1Uncompressed PCM
3IEEE floating-point
68-bit ITU-T G.711 A-law
78-bit ITU-T G.711 µ-law
65534A special marker value, indicating that this is an “extended” wave file.

This library currently only supports uncompressed PCM in standard and extended wave formats. These files will usually be either 8-bit unsigned or 16-bit signed, mono or stereo.

Wave files may include an additional field, usually reserved for non-PCM formats:

OffsetSizeDataDescription
362extra info sizeFor non-PCM formats, this stores the size of the additional info that follows the end of the standard header. Otherwise, it is set to 0.

Extended wave files

Some wave files may follow the extended format. In this case, the extra info size field will be at least 22 instead of 0.

OffsetSizeDataDescription
382sample infoFor PCM files, this contains the valid bits for sample. For example, if this is set to 20 bits and bits per sample is set to 24 bits, then that means that 24 bits are being used to store the sample data, but the actual sample data should not exceed 20 bits of precision.
404channel maskThis specifies the assignment of channels to speaker positions.
4416sub formatFor extended wave files, format will be set to 0xFFFE to indicate that it’s an extended wave file, with the actual format specified here as a GUID. The first two bytes are the same as specified in format code, and the remainder should match 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x80, 0x00, 0x00, 0xaa, 0x00, 0x38, 0x9b, and 0x71.

The MSDN docs recommend this format for files with more than two channels or more than 16 bits per sample, but it’s also possible to encounter such wave files that don’t include these extra fields. In my testing, Android Marshmallow was able to play back 24-bit PCM wave files using both the standard format and the extensible format, generated using Audacity.

The “data” subchunk

The “data” subchunk contains the actual audio data:

OffsetSizeDataDescription
36+4“data”Identifies this subchunk
40+4subchunk sizeThe size of this chunk. For the simple “canonical” wave file format, this will generally be the size of the file minus 44 bytes for the header data, up to and including this field.
44+audio dataThis stores the actual audio data.
padding byteIf the length of audio data is an odd number, then an additional padding byte should be inserted.

As the subchunk size is a 32-bit value, the length of audio data cannot exceed 4 GiB, and indeed the entire file can’t really exceed 4 GiB as the master RIFF chunk size field is also a 32-bit value.

Additional meta-data

Wave files may also contain other metadata, such as the LIST INFO chunks defined by RIFF or other metadata. The LIST INFO chunk is analogous to the ID3 tag in an MP3 file, and if it’s present, it can often be found between the “fmt “ and “data” subchunks or after the end of the “data” subchunk.

See also:

Structs

Helper struct that takes ownership of a reader and can be used to read data from a PCM wave file.

Helper struct that takes ownership of a writer and can be used to write data to a PCM wave file.

Enums

Represents an error that occurred while reading a wave file.

Represents a file format error, when the wave file is incorrect or unsupported.

Constants

Type Definitions

Represents a result when reading a wave file.

Represents a result when reading a wave file.