The author of the article is the Polish programmer Tomek Rekavek, who is developing the project Jackrabbit Oak in the framework of the Apache Software Foundation for Adobe. The article was published in the author's personal blog on February 2? 2016.

Polish Radio-3 (the so-called "Troika") is famous for its good music and intelligent hosts. On the other hand, it suffers from the presence of high-profile and annoying ad units in the broadcast, where any electronics or medicine is usually advertised. I listen to the "Troika" almost constantly at work and at home, so I asked myself: how to remove advertising? It seems that I managed to find a solution.

## Digital signal processing

My goal is to create an application that muffs advertising. Commercial block begins. and ends with jingles, so the program must recognize these specific sounds and turn off the sound between them.

I know that this area of ​​mathematics /computer science is called digital signal processing , but to me DSP always seemed magic. Well, a great opportunity to learn something new. I spent a day or two trying to figure out what mechanism to use to analyze the audio stream. And in the end I found what I need: this is cross-correlation or cross-correlation.

Octave . It seems that in Octave it's easy to start a cross-correlation on two audio files. You just need to run the following commands:

` pkg load signaljingle = wavread ('jingle.wav') (:, 1);audio = wavread ('audio.wav') (:, 1);[R, lag]= xcorr (jingle, audio);plot (R); `

The result is the following:

The peak describing the position of ` is clearly visible. jingle.wav ` in ` audio.wav ` . What surprised me, it's the simplicity of the method: all the work does ` xcorr () ` , the rest of the code is read-only files and the result is displayed.

I wanted to implement the same algorithm in Java, and then I will have a tool that:

1.
2. reads the audio stream from the standard input (for example, from ffmpeg),
3. analyzes it in the search for jingles,
4. outputs the same thread to stdout and /or disables it.

Using stdin and stdout will connect the new the analyzer to other applications responsible for audio broadcasting and playback of the result.

First of all, the Java program should read the jingle (saved as a file ` wav ` ) Into an array. The file has some additional information like headers, metadata and other things, but we only need sound. A suitable format is called PCM, it's just a list of numbers representing sounds. Convert WAV to PCM can ffmpeg:

`  `  ffmpeg -i input.wav -f s16le -acodec pcm_s16le output.raw  `  `

Here, each sample is stored as a 16-bit number with a reverse order of bytes (little endian). In Java, this number is called ` short ` , but to automatically convert the input stream to the list of values ​​ ` short ` you can use class ` ByteBuffer ` :

`  `  ByteBuffer buf = ByteBuffer.allocate (4);buf.order (ByteOrder.LITTLE_ENDIAN);buf.put (bytes);short leftChannel = buf.readShort (); //stereo streamshort rightChannel = buf.readShort ();  `  `

## Reverse engineering xcorr

To implement the function ` xcorr () ` on Java, I learned the source code is Octave. Without changing the end result, I was able to replace the call to xcorr () with the following lines - they need to be rewritten in Java:

`  `  N = length (audio);M = 2 ^ nextpow2 (2 * N-1);pre = fft (postpad (prepad (jingle (:), length (jingle) + N-1), M));post = fft (postpad (audio (:), M));cor = ifft (pre. * conj (post));R = real (cor (1: 2 * N));  `  `

It looks scary, but most of the functions are trivial operations with arrays. The basis of cross-correlation lies in the use of fast Fourier transform on the sound sample.

## Fast Fourier transform

As a person who did not have experience with DSP, I just consider FFT as a function that takes an array with a description of the sound sample - and returns an array with complex numbers representing frequencies. Such a minimalistic approach worked well: I ran the FFT implementation from package JTransforms and got the same results as in Octave. I think it's kind of Cargo cult , but damn, it works!

## Run xcorr on the stream

The algorithm above assumes that ` audio ` is an array in which we are looking for ` jingle ` . This is not exactly suitable for radio broadcasting, where we have a continuous stream of sound. To run the analysis, I created a circular buffer a little more than the duration of the jingle that needs to be recognized. The incoming stream fills the buffer, and once it is full, a cross-correlation test is run. If nothing is found, then the oldest part of the buffer is discarded - and again we expect it to be filled.

I experimented a bit with the length of the buffer and got the best results with a buffer size 1.5 times the size of the jingle.

## We unite all together

It's not hard to get a stream in PCM format. This can be done using the above ` ffmpeg ` . The command below redirects the stream to the standard input ` java ` , and then outputs ` Got jingle 0 ` or ` Got jingle 1 ` , when a corresponding sample is found in the stream.

`  `  ffmpeg -loglevel -8-i http://stream3.polskieradio.pl:8904/;stream-f s16le -acodec pcm_s16le -| | java -jar target /analyzer-???-SNAPSHOT-jar-with-dependencies.jar2src /test /resources /commercial-start-44.1k.raw 500src /test /resources /commercial-end-44.1k.raw 700  `  `

## The stand-alone version is

I also prepared a simple standalone version of the analyzer, which itself is connected to the "Three" stream (without the external ` ffmpeg ` ) And reproduces the result using ` javax.sound ` . Everything fits into one JAR file and contains a basic user interface with the Star and Stop buttons. It can be downloaded here . If you do not like running other JARs on your machine (which is perfectly correct), then all the sources are on GitHub .

It looks like everything works as it should :)

## Further work

The ultimate goal - to disable advertising at the level of the hardware amplifier, getting a "real" FM signal, and not some kind of Internet stream. This is told in the next article .

## Update (June 2018)

Discussion on Hacker News

Discussion on Wykop

Discussion at Reddit
+ 0 -