If you’ve managed to hack around the various issues that AudioTrack has, then you are probably enjoying its benefits, such as low latency (in the STATIC mode), ability to generate audio on the fly (in the STREAM mode) and the wonderful ability to access and modify raw sound data before you play it.
However, the problem now is getting that data from a source. Many applications that need to use AudioTrack do not generate PCM audio from scratch (an example of an app that does that would be Ethereal Dialpad and other similar apps). You will probably encounter the need to load and use existing audio samples from file sources such as WAV or MP3 files.
Don’t expect to be able to use MediaPlayer facilities to decode raw audio from WAVs and MP3s. Although MediaPlayer can play those files pretty well, its logic is almost entirely contained in the native layer and there is no option for us to plug in and use the decoders for our own purposes. Thus, we have to decode PCM from audio files manually.
In this article I will cover WAV files and in the next one we will get advanced and read audio from MP3s.
Introduction: Some digital audio terms
If your application is not dedicated to digital audio, then you probably need to know several words and acronyms before we continue our discussion. Don’t worry, it’s all very simple and we don’t need to dig very deep.
- PCM (pulse-code modulation) – the simplest way to turn a physical audio signal into numbers. Basically the signal becomes an array of numbers where each number represents the level of energy (amplitude) of the sound at a specific moment of time. (I’m sorry if this explanation is scientifically inaccurate.) Believe it or not, you can represent a sound of any complexity with this approach, and then play it back very nicely. We will only be talking about linear PCM, that is where each number in the array is a linear representation of the original amplitude. In some cases logarithmic mapping is used to better represent the original amplitude scale – but we won’t discuss those cases.
- Sampling rate – how many samples (amplitude-representing numbers) per second your digital sound has. The more it has, the better sound quality you generally get. Numbers used in consumer audio systems today are usually 22050, 44100 and 48000 Hz (samples per seconds).
- Resolution/sample size/bits per sample – defines the size and format of each number used to represent the amplitude. For example, if you use an 8 bit integer number, you can only represent 256 levels of amplitude, so the original physical waveform will be simplified into 256 discrete levels (and you will lose audio precision/quality). If you use 16 bits, the quality becomes much better. In fact, you will probably use 16 bit audio most of the time. Other options are 24 bits, 32 bits (these are not supported on Android for now) and using float numbers.
- Channels – can be mono (1 channel) or stereo (2 channels), or any greater number (but not on Android). If you want to have stereo audio, you need to have a separate PCM array per each channel, and thus the amount of information doubles.
The definitions above also help you understand the amount of data you need for an audio buffer of a specific format and length. Say you need a buffer to keep 5 seconds of 44100 Hz stereo 16-bit linear PCM data. The calculation is:
5 sec * 44100 samples per sec * 2 bytes per sample * 2 channels = 882,000 bytes
This amount of required memory might be surprising for beginners because when you store audio on disk in an MP3, a 880 Kb file might contain a 1-minute track of the same sampling rate and resolution. That is because advanced formats such as MP3 have very sophisticated ways of compressing audio based on dropping parts of the sound that our brain doesn’t pay much attention to. However, most low-level audio APIs including Android’s AudioTrack can only accept linear PCM. That’s why if we can’t keep the entire sample in memory, we have to deal with streaming, circular buffers and other clever ways to feed audio to the API.
Hope this explanation did not confuse you too much, now let’s continue to actually doing some work with digital audio on Android!
WAV file format
Our goal is to take an InputStream that provides raw bytes from a WAV file and load PCM data from it in one way or another. Then we can push raw PCM data to a correctly configured AudioTrack using AudioTrack.write().
A WAV file has a header chunk and a data chunk. We need to read the header chunk to know the format of the data, such as the sampling rate, resolution etc. In addition, we use the header information to make sure the format is supported. WAV can encapsulate a multitude of formats and we won’t support all of them – probably, only linear PCM that has reasonable sampling rate, resolution and channels.
The details of WAV format are widely available on the internet – just do a Google search. However, no matter how long I searched I did not find a good Java library to read WAVs that would be portable to Android. Thus, I wrote some simple code myself.
Below is the method to read a WAV header:
private static final String RIFF_HEADER = "RIFF"; private static final String WAVE_HEADER = "WAVE"; private static final String FMT_HEADER = "fmt "; private static final String DATA_HEADER = "data"; private static final int HEADER_SIZE = 44; private static final String CHARSET = "ASCII"; /* ... */ public static WavInfo readHeader(InputStream wavStream) throws IOException, DecoderException { ByteBuffer buffer = ByteBuffer.allocate(HEADER_SIZE); buffer.order(ByteOrder.LITTLE_ENDIAN); wavStream.read(buffer.array(), buffer.arrayOffset(), buffer.capacity()); buffer.rewind(); buffer.position(buffer.position() + 20); int format = buffer.getShort(); checkFormat(format == 1, "Unsupported encoding: " + format); // 1 means Linear PCM int channels = buffer.getShort(); checkFormat(channels == 1 || channels == 2, "Unsupported channels: " + channels); int rate = buffer.getInt(); checkFormat(rate <= 48000 && rate >= 11025, "Unsupported rate: " + rate); buffer.position(buffer.position() + 6); int bits = buffer.getShort(); checkFormat(bits == 16, "Unsupported bits: " + bits); int dataSize = 0; while (buffer.getInt() != 0x61746164) { // "data" marker Log.d(TAG, "Skipping non-data chunk"); int size = buffer.getInt(); wavStream.skip(size); buffer.rewind(); wavStream.read(buffer.array(), buffer.arrayOffset(), 8); buffer.rewind(); } dataSize = buffer.getInt(); checkFormat(dataSize > 0, "wrong datasize: " + dataSize); return new WavInfo(new FormatSpec(rate, channels == 2), dataSize); }
(Sorry for the messy code guys! Please clean it up after copy-pasting before use!)
The missing parts should be obvious. As you can see, I only support 16 bits here but you can modify the code to support 8 bits as well (AudioTrack does not support any other resolution).
This method returns enough information for us to know how to read the rest of the file – the audio data.
public static byte[] readWavPcm(WavInfo info, InputStream stream) throws IOException { byte[] data = new byte[info.getDataSize()]; stream.read(data, 0, data.length); return data; }
The byte[] array that we read plus the WavInfo structure that has the sampling rate, resolution and channel info are enough for us to play the audio we read.
If we don’t want to load the entire data array into memory at once, we can make an InputStream out of it and suck it little by little.
Feeding PCM to AudioTrack
At this point there are two cases – either we create a new AudioTrack and are able to choose its format, or we have an existing AudioTrack that might have a different format from the one that our WAV file audio has.
In the first case, things are easy – we just use the AudioTrack constructor with the settings that we read from the WAV header.
In the second case we might need to convert our audio to the target format – the format of the AudioTrack that we are going to feed the audio to. We might need one or several of the following conversions:
- If the sampling rate is different, we need to either drop or duplicate samples to match the target rate
- If the resolution is different, we need to map from the source resolution to the target one – that is map integers from 16 bit to 8 bit or vice versa
- If the channels are different, we either mix stereo channels into one mono channel or duplicate the mono channel data to turn it into quasi-stereo
(Do consider moving those algorithms to the native layer as the native layer does give a big advantage when doing such kind of processing.)
In either case, after we’re sure the formats match, we can use AudioTrack.write() to write the entire buffer or its portions to the AudioTrack for playback.
Remember, if you use the STATIC mode, you have to create an AudioTrack with the exact buffer size to fit our audio and write() the audio data completely before we play(), while in STREAM mode we can write() the data part by part after we play() the AudioTrack.
Conclusion
There may be various reasons why you want to play WAV audio on an AudioTrack. Sometimes the size limitations of SoundPool or the latency and high resource usage of MediaPlayer will make you consider going that way. Sometimes you need to modify the audio or mix it on the fly. In any case, in this article I tried to show you how to do it.
In the next post we will do the same with MP3 audio. Stay tuned!
Tags: android, architecture, audio, beginner, development, tutorial