C vs Go cycles and simple math

 3r3152. 3r3-31. When I was tired of C programming, like many, I was interested in the Go language. It is strongly typed, compiled, therefore sufficiently productive. And then I wanted to find out how confused the Go creators were on optimizing the work with cycles and numbers.
 3r3152.
 3r3152. To begin with, we look at how things are with C.
 3r3152.
 3r3152. We write such a simple code:
 3r3152.
 3r3152.
#include
3r3152. #include
3r3152. 3r3152. int main ()
{
uint64_t i; 3r3152. uint64_t j = 0; 3r3152. for (i = 1?00?000; i> 0; i--)
{
j ^ = i; 3r3152.}
printf ("% lun", j); 3r3152. return 0; 3r3152.}
3r3133.
 3r3152. Compile with O? disassemble:
 3r3152.
 3r3152.
564: 31 d2 xor% edx,% edx
566: b??? ??? mov $ 0x98968?% eax
56b: 0f 1f ??? nopl 0x0 (% rax,% rax, 1)
570: ??? c2 xor% rax,% rdx
573: ??? e??? sub $ 0x?% rax
577: 75 f7 jne 570
3r3152.
3r3133.
 3r3152. We get the execution time:
 3r3152.
 3r3152. real 0m?023s
 3r3152. user 0m???s
 3r3152. sys 0m???s
 3r3152.
 3r3152. It would seem that there is no where to accelerate, but we have the same modern processor, for such operations we have fast sse registers. We try the options gcc -mfpmath = sse -msse4.2 the same result.
 3r3152. Add -O3 and hurray:
 3r3152.
 3r3152.
57a: 66 0f 1f ??? nopw 0x0 (% rax,% rax, 1)
580: 83 c??? add $ 0x?% eax
583: 66 0f ef c8 pxor% xmm?% xmm1
587: 66 0f d4 c2 paddq% xmm?% xmm0
58b: 3d 40 4b 4c 00 cmp $ 0x4c4b4?% eax
590: 75 ee jne 580
3r3152.
3r3133.
 3r3152. It can be seen that SSE2 commands and SSE registers are used, and we get a triple performance boost: 3r3139.  3r3152.
 3r3152. real 0m?006s
 3r3152. user 0m?006s
 3r3152. sys 0m?000s
 3r3152.
 3r3152. Also on Go:
 3r3152.
 3r3152.
package main
3r3152. import "fmt"
3r3152. func main () {3r3r2152. i: = 0
j: = 0
for i = 1?00?000; i> 0; i-- {
j ^ = i
}
fmt.Println (j)
}
3r3133.
 3r3152.
0x000000000048211a <+42> : lea -0x1 (% rax),% rdx
0x000000000048211e <+46> : xor% rax,% rcx
0x0000000000482121 <+49> : mov% rdx,% rax
0x0000000000482124 <+52> : test% rax,% rax
0x0000000000482127 3-333130. : ja 0x48211a
3r3152.
3r3133.
 3r3152. Performance as in the case of C and O? also put gccgo the result is the same, but it works longer than the regular Go (???) compiler.
 3r3152.
 3r3152. Conclusion: Go developers still have something to work on, or maybe I just did something wrong. 3r3148. 3r3152. 3r3152. 3r3152. 3r3145. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () (); 3r3146. 3r3152. 3r3148. 3r3152. 3r3152. 3r3152. 3r3152.
+ 0 -

Add comment