# The fastest floating-point numbers in the wild west are

In the process of implementing one "counters" a problem arose with increased accuracy of calculations. The computational algorithm worked quickly on standard floating-point numbers, but when libraries were connected for accurate calculations, everything began to slow wildly. In this article, algorithms for expanding floating-point numbers using a multicomponent approach will be considered, thanks to which it was possible to achieve acceleration, since float arithmetic is implemented on a chip. This approach will be useful for a more accurate calculation of the numerical derivative, inversion of matrices, trimming polygons, or other geometric problems. So it is possible to emulate a 64bit float on video cards that do not support them.

is being taken. attempts creating libraries for processors based on the decimal system and IEEE even standardized formats.

But as long as we take into account storage binarity everywhere and we perform all monetary calculations with libraries for accurate calculation, such as bignumber, which leads to loss of performance. Asiki consider the crypt, and there is not enough space in the processors for this your decimal arithmetic, say marketers. Therefore, the multicomponent approach, when the number is stored as an unreformed sum of numbers, is a convenient trick and an actively developing sphere in the field of theoretical informatics. Although Dekker still learned how to multiply correctly without losing accuracy, the ready-to-use libraries appeared much later (MPFR, QD) and not in all languages, apparently because not all supported the IEEE standards, and strong evidence of computational error, and then later, for example in 2017 for double-word arithmetic.

What is the essence? In bearded times, when there were no standards for floating numbers, to avoid problems with the implementation of their rounding, Møller came up with, and Knuth later proved that there was an error-free summation. The

thus performed.

In this algorithm it was assumed that if

| b | $ "data-tex =" inline "> , then their exact sum can be represented as a sum of two numbers

and can be stored as a pair for subsequent calculations, and subtraction is reduced to addition with a negative number.

Subsequently, Dekker showed that if floating-point numbers are used that use round-to-nearest ties to even, which is generally the correct procedure, which does not lead to large errors in the long computation process and the IEEE standard) then there is an error-free multiplication algorithm.

where split () is Mr. Veltkamp's algorithm for dividing the number

using the constant

, which equates to half the length of the mantissa, which does not lead to an overflow of numbers in the process of multiplication and divides the mantissa into two halves. For example, with a word length of 64-bit, the length of the mantissa is 53 and then s = 27.

Thus, Dekker cited the almost complete set required for computation in double-word arithmetic. Since there it was also indicated how to multiply, divide and square two double-word numbers.

It had a quickTwoSum algorithm everywhere for summing two double-word, and

was used. On modern processors, as described in[4], it is cheaper to use additional operations with numbers than branching the algorithm.Therefore, the following algorithm is now more appropriate for adding two single-word numbers

And here is the summation and multiplication of double-word numbers.

Generally speaking, the most complete and exact list of algorithms for double-word arithmetic, theoretical error boundaries and practical implementation is described in reference[3]from 2017. Therefore, if interested, I strongly recommend going straight there. And in general, in[6]The algorithm for quadruple-word is given, and in[5]for a multicomponent extension of an arbitrary length. Only here, after each operation, the renormalization process is used, which is not always optimal for small dimensions, and the accuracy of calculations in QD is not strictly defined. In general, it is worthwhile to think about the limits of applicability of these approaches, of course.

So it turned out that almost all libraries for exact calculations in js are written by one person. An illusion of choice is created, although they are almost all the same. In addition, the documentation does not explicitly state that if you do not round numbers after each multiply /divide operation, then the size of your number will always be doubled, and the complexity of the algorithm can grow into an easy x3500. For example, this might look like a comparison of their calculation time, if you did not round the numbers.

That is, you put accuracy in the 32 sign after the comma and Oops you already have 64 characters, 128. We believe very accurately! 25? 512 But I exhibited 32! 102? 2048 As that's how the overhead appears 3500 times. The documentation states that if you have scientific calculations, then you probably prefer decimal.js. Although in fact, if you just round it off periodically, for scientific calculations Bignumber.js works a little faster (see Figure 1). Who then needs to count the hundredths of a penny, if they can not be turned over? Is there any kind of case, when you need to store more decimal numbers and not get another few extra signs? How does it take a sine from such a monster number, when no one knows the strict accuracy of the convergence of the Taylor series for arbitrary numbers. In general, there are not unreasonable suspicions that it is possible to increase the calculation speed by using the Schönhage-Strassen multiplication algorithms and finding the sine with Cordic calculations, for example.

I would like to say, of course, that Double.js believes quickly and accurately. But this is not entirely true, that is, it is 10 times faster than it thinks, but that's not always accurate. For example, 0.3-0.1 it can process, moving to doubled storage and back. But here the number Pi is parsed with double precision in almost 32 characters and back it does not come out. An error occurs on the 16th, as if overflow occurs. In general, I encourage the js community to try to solve the parsing problem by joint efforts, since I am stuck. Tried to parse digitally and divide in double precision, as in QD, divide in batches of 16 digits and divide in double precision, split the mantissa using Big.js as in one of Julia's libs. Now I sin on a bug in .parseFloat (), because IEEE standards with rounding to the nearest integer are supported even with ECMAScript 1. Although of course you can try to zabindit binary buffer and watch every 0 and 1. In general, if it is possible to solve this problem, then it will then be possible to make calculations with arbitrary accuracy with an acceleration of x10-x20 from bignumber.js. However, the Mandelbrot set is already rendered qualitatively, and you can use it for geometric problems. Here is an interactive benchmark , reference to or and sandpox where you can play with it.

O. Møller.

Theodorus Dekker.

Mioara Joldes, Jean-Michel Muller, Valentina Popescu.

Muller, J.-M. Brisebarre, N. de Dinechin, etc. Handbook of Floating-Point Arithmetic, Chapter 1? 2010.

Jonathan Shewchuk.

Yozo Hida, Xiaoye Li, David Bailey.

is being taken. attempts creating libraries for processors based on the decimal system and IEEE even standardized formats.

But as long as we take into account storage binarity everywhere and we perform all monetary calculations with libraries for accurate calculation, such as bignumber, which leads to loss of performance. Asiki consider the crypt, and there is not enough space in the processors for this your decimal arithmetic, say marketers. Therefore, the multicomponent approach, when the number is stored as an unreformed sum of numbers, is a convenient trick and an actively developing sphere in the field of theoretical informatics. Although Dekker still learned how to multiply correctly without losing accuracy, the ready-to-use libraries appeared much later (MPFR, QD) and not in all languages, apparently because not all supported the IEEE standards, and strong evidence of computational error, and then later, for example in 2017 for double-word arithmetic.

### Double-word arithmetic

What is the essence? In bearded times, when there were no standards for floating numbers, to avoid problems with the implementation of their rounding, Møller came up with, and Knuth later proved that there was an error-free summation. The

thus performed.

` function quickTwoSum (a, b) {`

let s = a + b;

let z = s - a;

let e = b - z;

return[s, e];

}

In this algorithm it was assumed that if

| b | $ "data-tex =" inline "> , then their exact sum can be represented as a sum of two numbers

and can be stored as a pair for subsequent calculations, and subtraction is reduced to addition with a negative number.

Subsequently, Dekker showed that if floating-point numbers are used that use round-to-nearest ties to even, which is generally the correct procedure, which does not lead to large errors in the long computation process and the IEEE standard) then there is an error-free multiplication algorithm.

` function twoMult (a, b) {`

let A = split (a);

let B = split (b);

let r1 = a * b;

let t1 = -r1 + A[0]* B[0];

let t2 = t1 + A[0]* B[1];

let t3 = t2 + A[1]* B[0];

return[r1, t3 + A[1]* B[1]];

}

where split () is Mr. Veltkamp's algorithm for dividing the number

` let splitter = Math.pow (? 27) + 1;`

function split (a) {

let t = splitter * a;

let d = a - t;

let xh = t + d;

let xl = a - xh;

return[xh, xl];

}

using the constant

, which equates to half the length of the mantissa, which does not lead to an overflow of numbers in the process of multiplication and divides the mantissa into two halves. For example, with a word length of 64-bit, the length of the mantissa is 53 and then s = 27.

Thus, Dekker cited the almost complete set required for computation in double-word arithmetic. Since there it was also indicated how to multiply, divide and square two double-word numbers.

It had a quickTwoSum algorithm everywhere for summing two double-word, and

was used. On modern processors, as described in[4], it is cheaper to use additional operations with numbers than branching the algorithm.Therefore, the following algorithm is now more appropriate for adding two single-word numbers

` function twoSum (a, b) {`

let s = a + b;

let a1 = s - b;

let b1 = s - a1;

let da = a - a1;

let db = b - b1;

return[s, da + db];

}

And here is the summation and multiplication of double-word numbers.

` function add22 (X, Y) {`

let S = twoSum (X[0], Y[0]);

let E = twoSum (X[1], Y[1]);

let c = S[1]+ E[0];

let V = quickTwoSum (S[0], c);

let w = V[1]+ E[1];

return quickTwoSum (V[0], w);

}

function mul22 (X, Y) {

let S = twoMult (X[0], Y[0]);

S[1]+ = X[0]* Y[1]+ X[1]* Y[0];

return quickTwoSum (S[0], S[1]);

}

Generally speaking, the most complete and exact list of algorithms for double-word arithmetic, theoretical error boundaries and practical implementation is described in reference[3]from 2017. Therefore, if interested, I strongly recommend going straight there. And in general, in[6]The algorithm for quadruple-word is given, and in[5]for a multicomponent extension of an arbitrary length. Only here, after each operation, the renormalization process is used, which is not always optimal for small dimensions, and the accuracy of calculations in QD is not strictly defined. In general, it is worthwhile to think about the limits of applicability of these approaches, of course.

### Horror stories jаvascript-a. Compare decimal.js vs bignumber.js vs big.js.

So it turned out that almost all libraries for exact calculations in js are written by one person. An illusion of choice is created, although they are almost all the same. In addition, the documentation does not explicitly state that if you do not round numbers after each multiply /divide operation, then the size of your number will always be doubled, and the complexity of the algorithm can grow into an easy x3500. For example, this might look like a comparison of their calculation time, if you did not round the numbers.

That is, you put accuracy in the 32 sign after the comma and Oops you already have 64 characters, 128. We believe very accurately! 25? 512 But I exhibited 32! 102? 2048 As that's how the overhead appears 3500 times. The documentation states that if you have scientific calculations, then you probably prefer decimal.js. Although in fact, if you just round it off periodically, for scientific calculations Bignumber.js works a little faster (see Figure 1). Who then needs to count the hundredths of a penny, if they can not be turned over? Is there any kind of case, when you need to store more decimal numbers and not get another few extra signs? How does it take a sine from such a monster number, when no one knows the strict accuracy of the convergence of the Taylor series for arbitrary numbers. In general, there are not unreasonable suspicions that it is possible to increase the calculation speed by using the Schönhage-Strassen multiplication algorithms and finding the sine with Cordic calculations, for example.

### Double.js

I would like to say, of course, that Double.js believes quickly and accurately. But this is not entirely true, that is, it is 10 times faster than it thinks, but that's not always accurate. For example, 0.3-0.1 it can process, moving to doubled storage and back. But here the number Pi is parsed with double precision in almost 32 characters and back it does not come out. An error occurs on the 16th, as if overflow occurs. In general, I encourage the js community to try to solve the parsing problem by joint efforts, since I am stuck. Tried to parse digitally and divide in double precision, as in QD, divide in batches of 16 digits and divide in double precision, split the mantissa using Big.js as in one of Julia's libs. Now I sin on a bug in .parseFloat (), because IEEE standards with rounding to the nearest integer are supported even with ECMAScript 1. Although of course you can try to zabindit binary buffer and watch every 0 and 1. In general, if it is possible to solve this problem, then it will then be possible to make calculations with arbitrary accuracy with an acceleration of x10-x20 from bignumber.js. However, the Mandelbrot set is already rendered qualitatively, and you can use it for geometric problems. Here is an interactive benchmark , reference to or and sandpox where you can play with it.

### The sources used are

O. Møller.

*Quasi double precision in floating-point arithmetic.*, 1965.Theodorus Dekker.

*A floating-point technique for extending the available precision*, 1971.[Viewer]Mioara Joldes, Jean-Michel Muller, Valentina Popescu.

*Tight and rigourous error bounds for basic building blocks of double-word arithmetic*, 2017.[PDF]Muller, J.-M. Brisebarre, N. de Dinechin, etc. Handbook of Floating-Point Arithmetic, Chapter 1? 2010.

Jonathan Shewchuk.

*Robust Adaptive Floating-Point Geometric Predicates*, 1964.[PDF]Yozo Hida, Xiaoye Li, David Bailey.

*Library for Double-Double and Quad-Double Arithmetic*, 2000.[PDF]
## Comments 1