Spectrogram
In this application note we will create an application which will
- Read an audio signal from disk
- Perform spectral analysis on the waveform, thereby creating a spectrogram
- Convert this spectrogram to standard bitmap format and save the picture to disk
- Read the picture from disk and display it as a DataBitmap
- We will also demonstrate how to measure level values in a spectrogram using DynaPlot's cursor functions
The user interface of the
application will look like this. The upper chart shows the recorded signal of the sentence "To return to the main menu, press the star key". The chart below that displays the spectrogram.
At the bottom of the form you see a narrow strip with the pseudo color palette used for encoding levels.
What is a spectrogram ?
When we analyze an audio signal we are interested in the signal waveform which gives us the sound pressure or
amplitude of the signal vs. time. The waveform in "Recorded Signal" is an example of such a signal. From the
waveform we can compute parameters like average level, beginning and end of speech segments and pauses etc.
Other questions can be answered more easily if we transform the signal to the frequency domain i.e. compute the
spectrum of the signal. The spectrum of a signal yields information about the frequencies and levels of the tones a signal is composed of.
Because speech, like many other signals, is not stationary but changes over time, it is not sufficient to compute one
spectrum for the complete signal i.e. the long-term spectrum. We can obtain more information if we subdivide the
signal into short intervals and compute a short-term spectrum for each of these intervals.
To display the spectra we can use a three dimensional diagram with one axis for time a second for frequency and the
third for the signal level. An alternative, commonly used display method is the spectrogram, in which the level of the
signal at a given time and frequency is displayed as a (pseudo-) color or gray value in a 2D-diagram in which the
horizontal axis represents time and the vertical axis frequency. "Spectrogram" in the form above is an example for this type of display.
Loading the Wav-file
This has already been demonstrated in application note LoadWav. In this project the code responsible for loading a
wave file has been moved to module LoadWav. To load an audio signal we call function LoadWaveFile with the file
name and a dynamically allocated array of doubles Y as parameters.LoadWaveFile will reallocate this array with the
required numbers of rows and columns and will place the audio data in that array. The function will also return the
sampling rate, which was used when the signal was recorded, in reference argument SampleRate.
|
ReDim Y(0, 0) As Double
LoadWaveFile (WaveFileName, Y, SampleRate)
|
We can find out about the number of channels and the number of samples in the audio signal by evaluating the upper
bounds of the signal matrix Y.
|
NumChannels = UBound(Y, 2) + 1
NumSamples = UBound(Y, 1) + 1
|
We will ignore all channels but the first one for our demonstration.
Performing spectral analysis
Before we can start the spectral analysis we have to decide on the length of the time intervals for the short-term
analysis. Because we are using a standard Fast Fourier Transform (FFT) for this purpose, the length of this interval in
samples NFFT must be a power of two. The user can set the length by selecting NFFT from a combo box.
There is a tradeoff between frequency resolution and the resolution of details in the time domain. The higher NFFT,
the better the frequency resolution and the poorer the temporal resolution - and vice versa. To improve the
resolution in the time domain we can let the time intervals overlap each other. With no overlap and NFFT=256 points
we would compute spectra for intervals [0..255] , [256...511], [512...767] etc. With 50% overlap, for example, we
compute spectra for [0..255] , [128...383], [256...511] etc. It is obvious, that a large percentage of overlap will keep
the CPU busy for quite some time. You can select the amount of overlap from a given set of values in a combo box on the form.
Now that we know NFFT and the desired amount of overlap we can compute the number of rows and columns in our
spectrogram. The number of rows is NFFT/2 (this is a result of the sampling theorem), and the number of columns
can be computed by dividing the number of samples in the signal by the number of samples we advance with each short-term spectrum.
We will put the spectrogram values in array Pixelmatrix.Instead of storing color values in this array directly we use
the pseudo color palette which has already been computed in Form_Load. The palette is 256 entries long and will be
stored together with the pixel matrix. Each palette entry is a 32 bit color value which represents a signal level.
Entries in array Pixelmatrix are indices into this palette. Because we have 256 different colors in the palette, one byte is enough to encode the color of a pixel.
|
ReDim Pixelmatrix(NFFT / 2 - 1, Numcols - 1) As Byte
|
Our next step is to go over each of the intervals, compute its magnitude spectrum. We then compute the level of
each spectral component and the index into the color palette which corresponds to this level. The index values are the filled into one column of the pixel matrix.
|
For col = 0 To Numcols - 1
' read a segment of the recorded signal For c = 0 To NFFT - 1 imag(c) = 0
real(c) = Y(col * ColIncrement + c, 0) * Hanning(NFFT, c) Next
' transform to the frequency domain FourierTransform real, imag, NFFT, True
' and compute the magnitude spectrum
MagnitudeSpectrum real, imag, NFFT, magnitude
' set up one column of the spectrogram For c = 0 To NFFT / 2 - 1
Pixelmatrix(c, col) = MapToPixelindex(magnitude(c), RangedB, RangePaletteIndex) Next Next
|
The first For-loop copies waveform sample into array real and zeros out array imag. Function Hanning(NFFT, c) is a
weighting function which reduces the "leak-effect" and improves spectral resolution. The Fourier transform is
accomplished by a call to function FourierTransform. The transform is performed in-place, which means the results of the transform are returned in arrays real and imag.
From the complex spectrum we then compute the real-valued magnitude spectrum. Function MagnitudeSpectrum
takes arrays real and imag as parameters and fills array magnitudewith the results. The functions related to spectral analysis can be found in module Fourier.bas.
In the next For-loop we convert magnitude values to color indices which we fill into array pixelmatrix. The conversion
is done by function MapToPixelindex which maps level interval [-RangedB ...0] to color index interval [0..
.RangePaletteIndex] and returns the color index which belongs to the magnitude which is passed in as the first argument.
Saving the picture as a bmp-file
This is easy. Function SaveBitmap does all the work for us.
|
SaveBitmap BitmapFilename, Pixelmatrix, 8, LevelPalette
|
The signature of this function is.
|
Public Sub
SaveBitmap(Filename As String, Pixelmatrix As Variant, BitsPerPixel As Integer, Palette As Variant)
|
SaveBitmnap takes the pixel matrix array Pixelmatrix, the number of bits per pixel BitsPerPixel and a palette array
Palette as parameters and creates a bitmap file with name Filename. BitsPerPixel can be 1,4,8 or 24 which
corresponds to 2, 16, 256 or 16 million colors. Functions for saving bitmaps and creating palettes can be found in module MakeBitmap.bas
Reading the picture from disk and displaying it in a chart
This deserves some thought. The last four arguments of function Add of DynaPlot's DataBitmap collection are the
position of the lower left corner in physical coordinates and the resolution in pixels / physical unit.
The x-coordinate of the lower left corner of spectrogram should be set to the center of the first interval. If we let the
spectrogram begin at 0 there will be a remarkable shift for large values of NFFT and the waveform signal and the spectrogram will no longer align.
The distance between two adjacent columns is given by TimeslotIncrement = ColIncrement / SampleRate.DynaPlot
wants the inverse of this value as its horizontal resolution argument. We therefore pass it 1# / TimeslotIncrement.
The spectral resolution of a Fourier transform is SampleRate / NFFT , which is the inverse of TimeslotWidth. So we simply pass DynaPlot this value for its vertical resolution argument.
|
TimeslotWidth = NFFT / SampleRate
TimeslotIncrement = ColIncrement / SampleRate
DynaPlot2.DataBitmaps.Add "Channel 1", BitmapFilename, TimeslotWidth / 2, 0, 1# / TimeslotIncrement, TimeslotWidth
|
Measuring levels with the cursor
Looking at the NeedCursorText event handler function
|
Private Sub
DynaPlot2_NeedCursorText(ByVal Cidx As Long, ByVal Curve As Long, ByVal Idx As Long, ByVal x As Double, ByVal Y As Double, ByVal Color As Long, Text As String)
|
we notice that DynaPlot passes us the color value of the pixel the cursor is currently on. We can use this information
to determine the level of this pixel. In the body of the event handler function we perform a search over all palette entries to find a color which matches
the one that was passed to us. The index under which the color was stored in the palette can then be used to
compute the corresponding level by calling function PixelindexToLevel which is the inverse of function MapToPixelindex which has been described above.
|
For i = 0 To RangePaletteIndex - 1
If LevelPalette(i) = Color Then Level = PixelindexToLevel(i, RangedB, RangePaletteIndex)
Text = Text & " z: " & Format(Level, "00.0 dB") Exit For
End If Next
|
Using the Format function we convert the level value to a string and append it to the default cursor text.
The project can also be downloaded:
Download Spectrogram.zip
|