Is it possible to blur Laplace instead of Gauss, how many times it is faster, and is it worth the loss of 1/32 accuracy.
(Laplace Blur - The proposed original name of the algorithm)
Today, my internal demoscenary kicked me and forced me to write an article that had to be written six months ago. As an amateur at my leisure to develop original effect algorithms, I would like to propose to the public an “almost Gaussian blura” algorithm, featuring the use of extremely fast processor instructions (shifts and masks), and therefore is available for implementation up to microcontrollers (extremely fast in a limited environment).
According to my tradition of writing articles on Habr, I will give examples in JS, as in the most popular language, and believe it or not, it is very convenient for the rapid prototyping of algorithms. In addition, the ability to implement this effectively on JS has come along with typed arrays. On my not very powerful laptop, the fullscreen image is processed at a speed of 30fps (the multithreading of the workers was not involved).
3r330. Disclaimer for cool mathematicians [/b]
Immediately I will say that I take off my hat in front of you, because I myself consider myself not sufficiently savvy in basic mathematics. However, I am always guided by the general spirit of the fundamental approach. Therefore, before heytit my, to some extent “observant” approach to approximation, take care to count the bit complexity of the algorithm, which, in your opinion, can be obtained by the methods of classical approximation by polynomials. I guessed? You wanted to quickly approximate them? Given that they will require floating arithmetic, they will be much slower than a single bit shift, to which I will explain in the end. In a word, do not rush into theoretical fundamentalism, and do not forget about the context in which I solve the problem.
This description is present here rather to explain the course of my thoughts and conjectures that led me to the result. For those who will be interested:
Original Gauss Function:
g (x) = a * e ** (- ((x-b) ** 2) /c), where r3r3241.
a is the amplitude (if the color is eight bits per channel, then it is = 256)
e is the Euler constant ~ ???r3r3241.
b - the shift of the graph in x (we do not need = 0)
c - the parameter affecting the width of the graph associated with it as ~ w /???r3r3241.
Our private function (minus from the exponent is removed with substitution of multiplication by division):
g (x) = 256 /e ** (x * x /c)
Let the dirty approximation act begin:
Note that the parameter c is very close to the half-width and set to 8 (this is due to how many steps you can move along one 8-bit channel).
We will also roughly replace e by ? noting, however, that this affects the curvature of the “bell” more than its borders. Actually, it affects 2 /e times, but the surprise is that this error compensates for the parameter introduced in the definition of the parameter s, so that the boundary conditions are still in order, and the error appears only in a slightly incorrect “normal distribution”, for graphic algorithms, this will affect the dynamics of gradient color transitions, but it is almost impossible to notice with the eye.
So now our function is this:
gg (x) = 256/2 ** (x * x /8) or gg (x) = 2 ** (8 - x * x /8) r3r3241.
Note that the exponent (x * x /8) has the same range of values of[0-8]as a function of a smaller order abs (x), because the latter is a candidate for replacement. Quickly check the guess by looking at how the graph changes with it gg (x) = 256 /(2 ** abs (x)):
GaussBlur vs LaplasBlur:
It seems that the deviations are too great, besides the function, having lost its smoothness, now has a peak. But wait.
First, let's not forget that the smoothness of the gradients obtained by blurring does not depend on the probability density function (which is the Gauss function), but on its integral — the distribution function. At that moment I did not know this fact, but in fact, after conducting a “destructive” approximation with respect to the probability density function (Gauss), the distribution function remained quite similar.
Proof, taken from the finished algorithm is the same:
(looking ahead, I’m saying that the blur error of my algorithm for Gausian x5 was only 3%)
So, we have moved much closer to the Laplace distribution function. Who would have thought, but they can wash images at 97% no worse.
Proof, the difference of the Gaucian Blura x5 and "Laplace Blura" x7:
(this is not a black image! You can study in the editor)
The admission of this transformation allowed us to proceed to the idea of obtaining the value by iterative filtering, which I planned to reduce to the beginning.
Before the narration of a specific algorithm, it will be fair if I run ahead and immediately describe its only drawback (although the implementation can be fixed, with a loss of speed). But this algorithm is implemented using shear arithmetic, and degrees 2k are its limitation. So the original is made to blur x7 (which is the closest to the Gausian x5 in the tests). This restriction of the implementation is connected with the fact that with eight-bit color, shifting the value in the filter drive by one bit per step, any effect from a point ends in a maximum of 8 steps. I also implemented a slightly slower version in terms of proportions and additional additions, which realizes a quick division by 1.5 (having obtained as a result a radius of x15). But with the further application of this approach, the error increases, and the speed drops, which does not allow using it so. On the other hand, it is worth noting that x15 is already enough to not notice much of the difference, the result is obtained from the original or from the downsampled image. So the method is quite suitable if you need extreme speed in a limited environment.
So, the core of the algorithm is simple, four passes of the same type are performed:
1. Half of the value of the drive t (initially equal to zero) is added to half the value of the next pixel, the result is assigned to it. We continue this way until the end of the image line. For all lines.
Upon completion of the first pass, the image is blurred in one direction.
2. In the second pass we do the same in the opposite direction for all lines.
We get the image completely blurred horizontally.
3-4. Now do the same thing vertically.
Initially, I used a two-pass algorithm with the implementation of reverse blurring through the stack, but it is difficult to understand, not graceful, and besides, on the current architectures it turned out to be slower. Perhaps the one-pass algorithm will be faster on microcontrollers, plus it will also be possible to output the result progressively.
The current four-pass method of implementation, I looked at the previous guru on the blur algorithms. 3r3182. habr.com/post/151157
Taking this opportunity, I express to him my solidarity and deep gratitude.
But the hacks didn't end there. Now, on how to calculate all three color channels for one processor instruction! The fact is that the bit shift used as a division by two allows the position bits of the result to be very well controlled. The only problem is that the lower bits of the channels slide into the neighboring older ones, but you can simply reset them, rather than correct the problem, with some loss of accuracy. And according to the described filter formula, the addition of half the value of the drive with half the value of the next cell (assuming zeroing of the discharged discharges) never leads to overflow, so you should not worry about it. And the filter formula for calculating all digits at the same time becomes
buf32[i]= t = (((t 1) & 0x7F7F7F) + ((buf32[i]1) & 0x7F7F7F); 3r-3241.
However, another addition is required: it was empirically found that the loss of accuracy in this formula is too significant, the brightness of the picture visually jumps noticeably. It became clear that the lost bits should be rounded to the whole, and not discarded. A simple way to do this in integer arithmetic is to add half the divider before division. The divider is a two for us, so you need to add one, in all digits, - the constant 0x010101. But with any addition you need to be careful not to get an overflow. So we cannot use this correction to calculate the half value of the next cell. (If there is a white color, we will get an overflow, therefore we will not correct it). But it turned out that the main mistake was made by the multiple division of the drive, which we can correct. Because, in fact, even with such a correction, the value in the drive will not rise above 254. But when added to 0x01010? overflow will not be guaranteed. And the formula of the filter with the correction takes the form:
buf32[i]= t = ((((0x010101 + t) 1) & 0x7F7F7F) + ((buf32[i]1) & 0x7F7F7F);
In fact, the formula performs correction quite well, so that when you repeatedly apply this algorithm to an image, artifacts begin to be seen only in the second dozen passes. (not the fact that the repetition of the Ghusian Blur will not give such artifacts).
Additionally, there is a remarkable property, with multiple passes. (It is not the merit of my algorithm, but the “normality” of the normal distribution itself). Already on the second pass, the Laplace Blur probability density function (if I figured it all out correctly) will look something like this:
That, you see, is already very close to the Ghousiana.
Experimentally, I found that it is permissible to use modifications with a large radius in a pair, since The property described above compensates for errors if the last pass is more accurate (the most accurate one is the x7 blur algorithm described here).
A Call for Cool Mathematicians:
What would be interesting to know how well to use such a filter is separable, not sure if there is a symmetrical distribution pattern. Although the eye of heterogeneity and is not visible.
Suggestions, comments, constructive criticism - welcome!
It may be interesting