wilhelm kurz software

home
application notes

dynaplot application notes

spectrogram

Spectrogram

In this application note we will create an application which will 

  1. Read an audio signal from disk
  2. Perform spectral analysis on the waveform, thereby creating a spectrogram
  3. Convert this spectrogram to standard bitmap format and save the picture to disk
  4. Read the picture from disk and display it as a DataBitmap
  5. We will also demonstrate how to measure level values in a spectrogram using DynaPlot's cursor functions

The user interface of the application will look like this. The upper chart shows the recorded signal of the sentence "To return to the main menu, press the star key". The chart below that displays the spectrogram. At the bottom of the form you see a narrow strip with the pseudo color palette used for encoding levels.

What is a spectrogram ?

When we analyze an audio signal we are interested in the signal waveform which gives us the sound pressure or amplitude of the signal vs. time. The waveform in "Recorded Signal" is an example of such a signal. From the waveform we can compute parameters like average level, beginning and end of speech segments and pauses etc. Other questions can be answered more easily if we transform the signal to the frequency domain i.e. compute the spectrum of the signal. The spectrum of a signal yields information about the frequencies and levels of the tones a signal is composed of.  

Because speech, like many other signals, is not stationary but changes over time, it is not sufficient to compute one spectrum for the complete signal i.e. the long-term spectrum. We can obtain more information if we subdivide the signal into short intervals and compute a short-term spectrum for each of these intervals. 

To display the spectra we can use a three dimensional diagram with one axis for time a second for frequency and the third for the signal level. An alternative, commonly used display method is the spectrogram, in which the level of the signal at a given time and frequency is displayed as a (pseudo-) color or gray value in a 2D-diagram in which the horizontal axis represents time and the vertical axis frequency. "Spectrogram" in the form above is an example for this type of display.

Loading the Wav-file

This has already been demonstrated in application note LoadWav. In this project the code responsible for loading a wave file has been moved to module LoadWav. To load an audio signal we call function LoadWaveFile with the file name and a dynamically allocated array of doubles Y as parameters.LoadWaveFile will reallocate this array with the required numbers of rows and columns and will place the audio data in that array. The function will also return the sampling rate, which was used when the signal was recorded, in reference argument SampleRate. 

ReDim Y(0, 0) As Double
LoadWaveFile (WaveFileName, Y, SampleRate)

We can find out about the number of channels and the number of samples in the audio signal by evaluating the upper bounds of the signal matrix Y.

 NumChannels = UBound(Y, 2) + 1
NumSamples = UBound(Y, 1) + 1

We will ignore all channels but the first one for our demonstration.

Performing spectral analysis

Before we can start the spectral analysis we have to decide on the length of the time intervals for the short-term analysis. Because we are using a standard Fast Fourier Transform (FFT) for this purpose, the length of this interval in samples NFFT must be a power of two. The user can set the length by selecting NFFT from a combo box.

There is a tradeoff between frequency resolution and the resolution of details in the time domain. The higher NFFT, the better the frequency resolution and the poorer the temporal resolution - and vice versa. To improve the resolution in the time domain we can let the time intervals overlap each other. With no overlap and NFFT=256 points we would compute spectra for intervals [0..255] , [256...511], [512...767] etc. With 50% overlap, for example, we compute spectra for [0..255] , [128...383], [256...511] etc. It is obvious, that a large percentage of overlap will keep the CPU busy for quite some time. You can select the amount of overlap from  a given set of values in a combo box on the form.

Now that we know NFFT and the desired amount of overlap we can compute the number of rows and columns in our spectrogram. The number of rows is NFFT/2 (this is a result of the sampling theorem), and the number of columns can be computed by dividing the number of samples in the signal by the number of samples we advance with each short-term spectrum.

We will put the spectrogram values in array Pixelmatrix.Instead of storing color values in this array directly we use the pseudo color palette which has already been computed in Form_Load. The palette is 256 entries long and will be stored together with the pixel matrix. Each palette entry is a 32 bit color value which represents a signal level. Entries in array Pixelmatrix are indices into this palette. Because we have 256 different colors in the palette, one byte is enough to encode the color of a pixel.

 ReDim Pixelmatrix(NFFT / 2 - 1, Numcols - 1) As Byte

Our next step is to go over each of the intervals, compute its magnitude spectrum. We then compute the level of each spectral component and the index into the color palette which corresponds to this level. The index values are the filled into one column of the pixel matrix.

 For col = 0 To Numcols - 1
    ' read a segment of the recorded signal
    For c = 0 To NFFT - 1
        imag(c) = 0
        real(c) = Y(col * ColIncrement + c, 0) * Hanning(NFFT, c)
    Next

    ' transform to the frequency domain
    FourierTransform real, imag, NFFT, True

    ' and compute the magnitude spectrum
    MagnitudeSpectrum real, imag, NFFT, magnitude

    ' set up one column of the spectrogram
    For c = 0 To NFFT / 2 - 1
        Pixelmatrix(c, col) = MapToPixelindex(magnitude(c), RangedB, RangePaletteIndex)
    Next
Next

The first For-loop copies waveform sample into array real and zeros out array imag. Function  Hanning(NFFT, c) is a weighting function which reduces the "leak-effect" and improves spectral resolution. The Fourier transform is accomplished by a call to function FourierTransform. The transform is performed in-place, which means the results of the transform are returned in arrays real and imag.

From the complex spectrum we then compute the real-valued magnitude spectrum. Function MagnitudeSpectrum takes arrays real and imag as parameters and fills array magnitudewith the results. The functions related to spectral analysis can be found in module Fourier.bas.

In the next For-loop we convert magnitude values to color indices which we fill into array pixelmatrix. The conversion is done by function MapToPixelindex which maps level interval [-RangedB ...0] to color index interval [0.. .RangePaletteIndex] and returns the color index which belongs to the magnitude which is passed in as the first argument.

Saving the picture as a bmp-file

This is easy. Function SaveBitmap does all the work for us.

 SaveBitmap BitmapFilename, Pixelmatrix, 8, LevelPalette

The signature of this function is.

Public Sub SaveBitmap(Filename As String, Pixelmatrix As Variant, BitsPerPixel As Integer, Palette As Variant)

SaveBitmnap takes the pixel matrix array Pixelmatrix, the number of bits per pixel  BitsPerPixel and a palette array  Palette as parameters and creates a bitmap file with name Filename.  BitsPerPixel can be 1,4,8 or 24 which corresponds to 2, 16, 256 or 16 million colors. Functions for saving bitmaps and creating palettes can be found in module MakeBitmap.bas

 

Reading the picture from disk and displaying it in a chart

This deserves some thought. The last four arguments of function Add of DynaPlot's DataBitmap collection are the position of the lower left corner in physical coordinates and the resolution in pixels / physical unit. 

The x-coordinate of the lower left corner of spectrogram should be set to the center of the first interval. If we let the spectrogram begin at 0 there will be a remarkable shift for large values of NFFT and the waveform signal and the spectrogram will no longer align.  

The distance between two adjacent columns is given  by TimeslotIncrement = ColIncrement / SampleRate.DynaPlot wants the inverse of this value as its horizontal resolution argument. We therefore pass it  1# / TimeslotIncrement. The spectral resolution of a Fourier transform is SampleRate / NFFT , which is the inverse of TimeslotWidth. So we simply pass DynaPlot this value for its vertical resolution argument.

 TimeslotWidth = NFFT / SampleRate
TimeslotIncrement = ColIncrement / SampleRate

DynaPlot2.DataBitmaps.Add "Channel 1", BitmapFilename, TimeslotWidth / 2, 0, 1# / TimeslotIncrement, TimeslotWidth

 

Measuring levels with the cursor

Looking at the NeedCursorText event handler function

Private Sub DynaPlot2_NeedCursorText(ByVal Cidx As Long, ByVal Curve As Long, ByVal Idx As Long, ByVal x As Double, ByVal Y As Double, ByVal Color As Long, Text As String)

we notice that DynaPlot passes us the color value of the pixel the cursor is currently on. We can use this information to determine the level of this pixel.
In the body of the event handler function we perform a search over all palette entries to find a color which matches the one that was passed to us. The index under which the color was stored in the palette can then be used to compute the corresponding level by calling function PixelindexToLevel which is the inverse of function MapToPixelindex which has been described above.
 

 For i = 0 To RangePaletteIndex - 1
    If LevelPalette(i) = Color Then
        Level = PixelindexToLevel(i, RangedB, RangePaletteIndex)
        Text = Text & " z: " & Format(Level, "00.0 dB")
        Exit For
    End If
Next

Using the Format function we convert the level value to a string and append it to the default cursor text.

 The project can also be downloaded: 

Download Spectrogram.zip

 

[home] [products] [news] [contact] [links]