$24
You are required to write a C++ application which can process audio sounds clips. Using
your application, it should be possible to perform simple editing operations on sound clips – such as cut and paste – as well as transforming the sound clips. Examples of the latter include fade in/out and normalisation. The sound clips will be 1-channel (mono) or 2-channel (stereo) and will be provided as simple raw byte data which you need to interpret correctly.
Programmatically, a raw sound file/clip is a sequence of samples (usually, 8, 16 or 24-bits) of an audio signal that can be sent to a speaker to produce sound. The sound clip also has an associated sample rate – for example, 44.1 kHz (ie. 44100 samples per second). The higher the sample rate, the better, usually, the quality of the sound produced. The number of bits per sample also has a profound effect on audio quality: generally, 8-bits per sample produces really poor sound. Of course, high sampling rates result in very large sound files, which is why compression (such as MP3) is usually used – we will not expect you to manipulate compressed formats. Simple raw (byte stream) audio will be used throughout.
Playing sound files:
You can play sound files on Ubuntu as follow (assuming sox package is installed –you’ll
need headphones or speakers of course):
play -r 44100 -e signed -b 16 -c 2 Run_44100_16bit_stereo.raw
Here -c specifies the number of channels (1 for mono, 2 for stereo). T he sampling rate (-r) is 44.1 kHz, and the sample data is signed (-e) 16-bit (-b) data. Finally, the raw file has been labelled to make it clear what data it contains.
A worked example loading and manipulating these PCM RAW audio files is given as an
ipython notebook. Several audio files will be made available to you (you can use Audacity to convert your favorite songs into RAW files if necessary).
Requirements:
Arguments and program invocation:samp -r sampleRateInHz -b bitCount -c noChannels [-o outFileName ] [<ops]
soundFile1 [soundFile2]
Description
• -r Specifies the number of samples per second of the audio file(s) (usually 44100)
• -b Specifies the size (in bits) of each sample. Only 8bit and 16bit should be
supported in your program. More on this later on.
• -c Number of channels in the audio file(s). Your program will only support 1 (mono) or
2 (stereo).
• “outFileName” is the name of the newly created sound clip (should default to “out”).
• <ops is ONE of the following:
-add: add soundFile1 and soundFile2
-cut r1 r2: remove samples over range [r1,r2] (inclusive) (assumes one sound
file)
• -radd r1 r2 s1 s2 : add soundFile1 and soundFile2 over sub-ranges indicated
(in seconds). The ranges must be equal in length.
• -cat: concatenate soundFile1 and soundFile2
• -v r1 r2: volume factor for left/right audio (def=1.0/1.0) (assumes one sound
file)
• -rev: reverse sound file (assumes one sound file only)
• -rms: Prints out the RMS of the sound file (assumes one sound file only).
More details will be given later on.
• -norm r1 r2: normalize file for left/right audio (assumes one sound file only
and that r1 and r2 are floating point RMS values)
• [extra credit] -fadein n: n is the number of seconds (floating point number) to
slowly increase the volume (from 0) at the start of soundFile1 (assumes one
sound file).
• [extra credit] -fadeout n: n is the number of seconds (floating point number) to
slowly decrease the volume (from 1.0 to 0) at the end of soundFile1 (assumes
one sound file).
“soundFile1” is the name of the input .raw file. A second sound file is required for
some operations as indicted above.
•
•
•
The sample rate, bit count and number of channels should be used for both the
input files and the resulting output file.
Input:
The format of the input .raw audio files is simply a stream of samples (a binary
file). If you know the size of each element (8/16 bit and number of channels), and
the size of the file (using seekg and tellg as done here:
http://www.cplusplus.com/reference/istream/istream/tellg/) you can tell how
many samples is contained in the file.
We will only use 8-bit int (signed int) and 16-bit int (signed int) sound clips,
which can be represented as the types int8_t and int16_t (include <cstdint).
Clips will be 1-channel (mono) or 2-channel (stereo). Stereo files contain pairs of
integers per sample where the first intN_t sample correspond to the left ear (L)
and the second intN_t sample correspond to the right ear (R). You can package
your LR data into an std::pair<intN_t,intN_t, where N is the number of bits.
You can allocate a buffer that is large enough to store the entire audio clip using
resize method of std::vector. Your vector should contain either intN_t samples or
std::pair<intN_t,intN_t samples depending on whether mono or stereo samples
are being read. The address of the start of the buffer is given by
&(data_vector[0]). You may find the following formulae useful when reading in
audio files:
NumberOfSamples = fileSizeInBytes / (sizeof(intN_t) * channels)
Length of the audio clip in seconds = NumberOfSamples / (float) samplingRate.
Output:
When you modify or create sound files, you must save the ouput as a raw (byte)
audio file (.raw extension). To help interpret the files, you should write
information into the file name:
Filename_samplingrate_samplesize_monoORstereo.raw
For example, a mono output file saved with the name “spooky”, with 16-bit
samples, a sampling rate of 8000 Hz would have the final name:
“spooky_8000_16_mono.raw”
Templating:
The Audio class should be templated to handle audio signals which use different
bit sizes for samples, depending on the provided audio clips. To handle stereo,
you need to specialize your core Audio template to manipulate the data which
consists of 1 pair of samples per time step, L and R, with L being the left ear
data and R the right ear data. Thus, rather than having an array of int’s for your
sound clip, you will have an array of pairs of int’s. Each sequence (all L’s or all
R’s) can be handled differently.
Functionality required:
You will be required to overload some operators to achieve basic editing. These
operations will all produce new sound clips.
A | B: concatenate audio file A and B (A and B will have the same sampling,
sample size and mono/stereo settings)
A * F: volume factor A with F; F will be a std::pair<float,float with each float
value in range [0.0,1.0] The pair< allows us to package a separate volume
scale for left and right channels. To apply the operation, simply multiply eachsound sample by the volume factor. For mono channels only the first number will
be used. This allows you to make one channel louder/softer in relation to the
other.
A+B: add sound file amplitudes together (per sample). A and B will have the
same sampling, sample size and mono/stereo settings. Each resulting amplitude
must be clamped to the maximum value of the sample type. These maximums
are available in <cstdint. Adding two very loud files together may result in
saturation.
A^F: F will be a std::pair<int,int which specifies start and end sample of range
of samples to be cut from sound file A. This implements a “cut” operation which
produces a shorter clip (A with a portion removed).
Regular overloads and construction:
You must also overload the assignment and move assignment operators and
provide the usual constructors (including copy and move) and appropriate
destructor.
You should demonstrate that these operators work through simple unit tests
using catch.hpp. It may be helpful to create a initializer list constructor to your
Audio class to read in a small custom buffer in order to test these operators. Your
unit tests should be compiled as separate executables.
Audio transformation:
You must use STL algorithms with custom Functors or Lambdas, as specified
below. When ranges are required, you should use an iterator. This will be a
simple pointer into your internal audio data buffer.
• Reverse: reverse all samples (this can be done very quickly with the
STL)
• Ranged add: select two (same length) sample ranges from two signals
and add them together. This differs from the overloaded + which adds
entire audio clips together. You should make use of std::copy and your
previously defined operator+ to achieve this.
• Compute RMS: use std::accumulate in <numeric along with a custom
lambda to compute the RMS (per channel), according to the following
formula:
√
RMS= (
1
M
M −1
∑ x 2 i )
i=0
This can be seen as an “average” volume of the sound clip.
•
Sound normalization: Use std::transform with a custom functor to
normalize the sound files to the specified desired rms value (per channel).
You will first need to compute the current RMS of the audio clip beforeperforming the normalization step. You may have to partially specialize the
functor to work with both mono and stereo sound files.
Normalization can be done according to the following formula:
outputAmp=inputAmp×
RMS desired
RMS current
This effectively increases the overall volume of a sound clip to the desired
level and can be used to normalize between audio clips. You must clamp
the output amplitudes to the minimum and maximum values specified in
<cstdint.
You should demonstrate that these operators work through simple unit tests.
Fading in and out:
Fade-in/Fade-out: use a custom lambda with a simple linear function (ramp)
applied to a single audio clip, over a specified range of samples. You can
implement this using std::for_each().
Fade-in:
OutputAmp = (FadeSampleNo / (float) rampLength) * inputAmp
Fade-out:
OutputAmp = (1.0 - FadeSampleNo / (float) rampLength) * inputAmp
Where rampLength is the number of samples to apply (rampLength =
numSeconds * sampleRate).
Grading:
Implementing everything except Fading in and Fading out will earn you 95%.
Implementing the fading in/out operations (with appropriate unit tests) will earn
you the last 5% of the mark.
PLEASE NOTE:
•
•
•
•
A working Makefile must be submitted. If the tutor cannot compile your program on
senior lab machines by typing make, you will receive 50% of your final mark.
Your submission must contain a git repo. You must use version control from the get-
go. Failure to comply will result in a 10% penalty.
Do not submit binary files - submit your source code and any other necessary files to
build and test your project.
You must provide a README file explaining what each file submitted does and how it
fits into the program as a whole. The README file should not•
•
•
explain any theory that you have used. These will be used by the tutors if they
encounter any problems.
Please ensure that your tarball works and is not corrupt (you can check this by
trying to extract the contents of your tarball - make this a habit!). Corrupt or non-
working tarballs will not be marked - no exceptions!
A 10% penalty per day will be incurred for all late submissions. No hand-ins will be
accepted if later than 5 days. Do not hand in any binary files.
DO NOT COPY. All code submitted must be your own. Copying is punishable by 0
and can cause a blotch on your academic record. Scripts will be used to check that
code submitted is unique.