30 Aug The relevance of flatness in streaming services in media industry: why it matters?
So far in the blog, we have focused on some of the issues affecting images in streaming services. But if these companies want to deliver a great user experience, they must not neglect the sound of their productions.
Spectral flatness or the ability to measure a sound
Coding audio is a common task nowadays. The objective is always to have high-quality audio but made with systems that allow compressing the digital audio data as efficiently as possible, in order to reduce the requirements for storage and transmission of the audio signal data via compression. The goal is that the decoded audio sounds exactly (or as close as possible) to the original audio before compression.
Even more, there are other requirements for audio compression techniques. For example, a low complexity environment is necessary to enable software decoders or inexpensive hardware decoders with low power consumption. Regarding flexibility, it is needed to cope with different application scenarios. The technique to do this is called perceptual encoding and uses knowledge from psychoacoustics to reach the target of efficient but inaudible compression. Perceptual encoding is a lossy compression technique.
Predictive and entropy coding allows to code audio without resulting in big files. These two techniques reduce redundancies in the signal. However, redundancy reduction alone does not lead to low bit rate audio coding. Hence, in transform-based perceptual audio coding, psychoacoustic models (PM) are used to control the quantizers of spectral components and consequently reduce irrelevancies in the audio signal.
What an ear can hear
The acoustic capacity of humans is limited. It is well known that animals can hear sounds at frequencies that are completely unnoticed by humans. This, which at first glance may seem minor, is considered by the algorithms in charge of audio officiating.
At this point, we must distinguish the different characteristics of every sound. On one hand, we must consider tonality, the property of sound that distinguishes noise-like from tonal sounds. Noise-like sounds have a continuous spectrum while tonal sounds typically have line spectra. Tonality is related to pitch strength which describes the strength of the perceived pitch of a sound. Sounds with distinct (sinusoidal) components tend to produce larger pitch strength than sounds with continuous spectra.
Also, we can distinguish between two classes of features that (partially) measure tonality: flatness measures and bandwidth measures. Spectral flatness estimates to which degree the frequencies in a spectrum are uniformly distributed (noise-like). In other words, spectral flatness is the ratio of the geometric and the arithmetic mean of a subband in the power spectrum.
We can find the same definition in the MPEG-7 standard regarding the audio spectrum flatness descriptor. Spectral flatness may be further computed on a decibel scale.
It is important to notice that noise-like sounds have a higher flatness value (flat spectrum) while tonal sounds have lower flatness values. Spectral flatness is often used (together with spectral crest factor) for audio fingerprinting.
Sometimes spectral flatness is described as tonality coefficient. In both cases, is also known as Wiener entropy, a measure used in digital signal processing to characterize an audio spectrum. Anyway, spectral flatness is typically measured in decibels and provides a way to quantify how much a sound resembles a pure tone, as opposed to being noise-like.
The meaning of tonal in this context is in the sense of the number of peaks or resonant structures in a power spectrum, as opposed to a flat range of white noise. A high spectral flatness (approaching 1.0 for white noise) indicates that the spectrum has a similar amount of power in all spectral bands — this would sound like white noise, and the graph of the spectrum would appear relatively flat and smooth. A low spectral flatness (approaching 0.0 for a pure tone) indicates that the spectral power is concentrated in a relatively small number of bands — this would typically sound like a mixture of sine waves, and the spectrum would appear “spiky”.
As existing oscillation detection methods are ineffective in the presence of intermittent high-energy broadband noise, it is necessary to determine when a measurement’s frequency spectrum displays clearly defined dynamics and when it is flat or noise-like. This can be characterized by the measurement’s spectral flatness, a measure that quantifies the relative magnitude of any peaks present in the power spectrum of a signal, as opposed to how like white noise it is. The spectral flatness measurement (SFM) of a signal is defined as a ratio of the geometric mean to the arithmetic mean of its power spectrum.
The spectral flatness of historical PMU (Phasor Measurement Units) data and k-means clustering methods can be exploited to detect and label oscillations irrespective of the presence of intermittent, high-energy broadband noise. This can be a useful complement to coherence spectrum-based detection algorithms when analyzing data from PMUs located near high energy density devices such as electric arc furnaces. As oscillations become damped and decrease in energy, however, the method can be less reliable, which is only of concern when attempting to automate the analysis of large-scale historical data archives.
Further automation of the algorithm leads to improving its detection capabilities and reduces the amount of user input required in selecting data streams and frequencies of interest. Additionally, a generic framework can be established to apply the detection algorithm to any form of database and analysis platform.