AdBlock for radio

The author of the article is the Polish programmer Tomek Rekavek, who is developing the project Jackrabbit Oak in the framework of the Apache Software Foundation for Adobe. The article was published in the author's personal blog on February 2? 2016.
Polish Radio-3 (the so-called "Troika") is famous for its good music and intelligent hosts. On the other hand, it suffers from the presence of high-profile and annoying ad units in the broadcast, where any electronics or medicine is usually advertised. I listen to the "Troika" almost constantly at work and at home, so I asked myself: how to remove advertising? It seems that I managed to find a solution.

Digital signal processing

My goal is to create an application that muffs advertising. Commercial block begins. and ends with jingles, so the program must recognize these specific sounds and turn off the sound between them.
I know that this area of ​​mathematics /computer science is called digital signal processing , but to me DSP always seemed magic. Well, a great opportunity to learn something new. I spent a day or two trying to figure out what mechanism to use to analyze the audio stream. And in the end I found what I need: this is cross-correlation or cross-correlation.
Octave . It seems that in Octave it's easy to start a cross-correlation on two audio files. You just need to run the following commands:
pkg load signal
jingle = wavread ('jingle.wav') (:, 1);
audio = wavread ('audio.wav') (:, 1);
[R, lag]= xcorr (jingle, audio);
plot (R);

The result is the following:
AdBlock for radio  
The peak describing the position of is clearly visible. jingle.wav in audio.wav . What surprised me, it's the simplicity of the method: all the work does xcorr () , the rest of the code is read-only files and the result is displayed.
I wanted to implement the same algorithm in Java, and then I will have a tool that:
  2. reads the audio stream from the standard input (for example, from ffmpeg),  
  3. analyzes it in the search for jingles,  
  4. outputs the same thread to stdout and /or disables it.  

Using stdin and stdout will connect the new the analyzer to other applications responsible for audio broadcasting and playback of the result.

Reading sound files

First of all, the Java program should read the jingle (saved as a file wav ) Into an array. The file has some additional information like headers, metadata and other things, but we only need sound. A suitable format is called PCM, it's just a list of numbers representing sounds. Convert WAV to PCM can ffmpeg:
    ffmpeg -i input.wav -f s16le -acodec pcm_s16le output.raw    

Here, each sample is stored as a 16-bit number with a reverse order of bytes (little endian). In Java, this number is called short , but to automatically convert the input stream to the list of values ​​ short you can use class ByteBuffer :
    ByteBuffer buf = ByteBuffer.allocate (4);
buf.order (ByteOrder.LITTLE_ENDIAN);
buf.put (bytes);
short leftChannel = buf.readShort (); //stereo stream
short rightChannel = buf.readShort ();


Reverse engineering xcorr

To implement the function xcorr () on Java, I learned the source code is Octave. Without changing the end result, I was able to replace the call to xcorr () with the following lines - they need to be rewritten in Java:
    N = length (audio);
M = 2 ^ nextpow2 (2 * N-1);
pre = fft (postpad (prepad (jingle (:), length (jingle) + N-1), M));
post = fft (postpad (audio (:), M));
cor = ifft (pre. * conj (post));
R = real (cor (1: 2 * N));

It looks scary, but most of the functions are trivial operations with arrays. The basis of cross-correlation lies in the use of fast Fourier transform on the sound sample.

Fast Fourier transform

As a person who did not have experience with DSP, I just consider FFT as a function that takes an array with a description of the sound sample - and returns an array with complex numbers representing frequencies. Such a minimalistic approach worked well: I ran the FFT implementation from package JTransforms and got the same results as in Octave. I think it's kind of Cargo cult , but damn, it works!

Run xcorr on the stream

The algorithm above assumes that audio is an array in which we are looking for jingle . This is not exactly suitable for radio broadcasting, where we have a continuous stream of sound. To run the analysis, I created a circular buffer a little more than the duration of the jingle that needs to be recognized. The incoming stream fills the buffer, and once it is full, a cross-correlation test is run. If nothing is found, then the oldest part of the buffer is discarded - and again we expect it to be filled.
I experimented a bit with the length of the buffer and got the best results with a buffer size 1.5 times the size of the jingle.

We unite all together

It's not hard to get a stream in PCM format. This can be done using the above ffmpeg . The command below redirects the stream to the standard input java , and then outputs Got jingle 0 or Got jingle 1 , when a corresponding sample is found in the stream.
    ffmpeg -loglevel -8
-f s16le -acodec pcm_s16le -
| | java -jar target /analyzer-???-SNAPSHOT-jar-with-dependencies.jar
src /test /resources /commercial-start-44.1k.raw 500
src /test /resources /commercial-end-44.1k.raw 700


The stand-alone version is

I also prepared a simple standalone version of the analyzer, which itself is connected to the "Three" stream (without the external ffmpeg ) And reproduces the result using javax.sound . Everything fits into one JAR file and contains a basic user interface with the Star and Stop buttons. It can be downloaded here . If you do not like running other JARs on your machine (which is perfectly correct), then all the sources are on GitHub .
It looks like everything works as it should :)

Further work

The ultimate goal - to disable advertising at the level of the hardware amplifier, getting a "real" FM signal, and not some kind of Internet stream. This is told in the next article .

Update (June 2018)

Discussion on Hacker News
Discussion on Wykop
Discussion at Reddit
+ 0 -

Add comment