Time measurement with nanosecond precision

3r31283. 3r3-31. 3r33967. 3r31283. 3r33967. 3r31283. A couple of months ago a historical moment came for me. I no longer have enough standard operating system tools for measuring time. It took time to measure with nanosecond accuracy and with nanosecond overhead. 3r33967. 3r31283. 3r33967. 3r31283. I decided to write a library that would solve this problem. At first glance it seemed that there was nothing much to do. But upon closer examination, as always, it turned out that there were many interesting problems that had to be dealt with. In this article I will talk about the problems and how they were solved. 3r33967. 3r31283. 3r33967. 3r31283. Since you can measure a lot of different types of time on a computer, I’ll just clarify that here we’ll talk about “stopwatch time”. Or wall-clock time. It is real time, elapsed time, etc. That is, a simple “human” time, which we mark at the beginning of the task and stop at the end. 3r33967. 3r31283. 3r33967. 3r31283. https://github.com/AndreyNevolin/wtmlib 3r33967. 3r31283. It will be compiled and executed only on Linux. 3r33967. 3r31283. In the code you can see the details of the implementation of all methods, which will be discussed further. 3r33967. 3r31283. 3r33967. 3r31283. 3r33939. Evaluation of reliability TSC

3r33967. 3r31283. The library provides an interface that returns two scores:

3r31283. 3r33832. 3r31283. 3r33939. maximum shift between counters belonging to different CPUs. Only CPUs available to the process are considered. For example, if there are three CPUs available to the process, and at the same time point the TSCs on these CPUs are 5? 15? 2? then the maximum shift will be 150-20 = 130. Naturally, experimentally, the library will not be able to get a real maximum shift, but it will give an estimate in which this shift will fit. What to do with the assessment next? How to use? This is already solved by client code. But the meaning is about the following. The maximum shift is the maximum value by which the measurement that the client code makes may be distorted. For example, in our example with three CPUs, the client code began to measure time on CPU3 (where TSC was 20), and finished on CPU2 (where TSC was 150). It turns out that in the measured interval, an extra 130 ticks will creep in. And never again. The difference between CPU1 and CPU2 would be only 100 ticks. With a rating of 130 ticks (in fact, it will be much more conservative), the client can decide whether this distortion value suits him or not 3r31267. 3r31283. 3r33939. Do TSC values increase in series on the same or different CPUs? Here the idea is as follows. Suppose we have several CPUs. Suppose their watches are synchronized and ticking with the same frequency. Then, if you first measure the time on one CPU, and then measure it again — already on any of the available CPUs — then the second digit must be greater than the first. 3r33967. 3r31283. I will call this assessment below the TSC

monotony estimate. 3r31283. 3r31267. 3r31283. 3r33846. 3r33967. 3r31283. Let us now see how you can get the first estimate:

3r31283. 3r33832. 3r31283. 3r33939. one of the CPUs available to the process is declared “basic” 3r31267. 3r31283. 3r33939. then all the other CPUs are moved, and for each of them the shift is calculated: ` TSC_a_current_CPU - TSC_a_base_CPU `

. This is done as follows:

3r31283. 3r33939. 3r31283. 3r33939. a) take three successively (one after the other!) measured values: 3r33948. TSC_base_? TSC_current, TSC_base_2 . Here current indicates that the value was measured on the current CPU, and base on the base 3r31283. 3r33939. b) the shift ` TSC_a_current_CPU - TSC_a_base_CPU `

must lie in the range ` [TSC_current – TSC_base_2, TSC_current – TSC_base_1] `

. This is on the assumption that the TSC is ticking with the same frequency on both CPUs 3r3-31267. 3r31283. 3r33939. c) steps a) -b) are repeated several times. Calculates the intersection of all intervals obtained in step b). The resulting interval is taken as an estimate of the shift ` TSC_a_current_CPU - TSC_a_base_CPU `

3r31267. 3r31283. 3r31269. 3r33967. 3r31283. 3r31267. 3r31283. 3r33939. after the estimated shift for each CPU relative to the base is obtained, it is easy to get an estimate of the maximum shift between all available CPUs: 3r36767. 3r31283. 3r33939. 3r31283. 3r33939. a) the minimum interval is calculated, which includes all the resulting intervals obtained in step 2 3r31283. 3r33939. b) the width of this interval is taken as an estimate of the maximum shift between TSC ticking on different CPUs 3r31267. 3r31283. 3r31269. 3r33967. 3r31283. 3r31267. 3r31283. 3r33846. 3r33967. 3r31283. To evaluate the monotony in the library, the following algorithm is implemented: 3r3-3967. 3r31283. 3r33832.

Suppose the process is available N CPU 3r31267. 3r31283. 3r33939. Measure TSC on CPU1 3r31267. 3r31283. 3r33939. Measure TSC on CPU2 3r31267. 3r31283. 3r33939. 3r31267. 3r31283. 3r33939. Measure the TSC on the CPUN 3r31283. 3r33939. Again we measure TSC on CPU1 3r31267. 3r31283. 3r33939. Check that the measured values increase monotonically from first to last 3r31267. 3r31283. 3r33846. 3r33967. 3r31283. It is important here that the first and last values are measured on the same CPU. And that's why. Suppose we have 3 CPUs. Suppose that TSC on CPU2 is shifted by +100 ticks relative to TSC on CPU1. Also assume that TSC on CPU3 is shifted by +100 ticks relative to TSC on CPU2. Consider the following chain of events:

3r31283. 3r33939. 3r31283. 3r33939. Read TSC on CPU1. Let the value be 10 3r31283. 3r33939. It took 2 tick 3r31283. 3r33939. Read TSC on CPU2. It should be 112 3r31267. 3r31283. 3r33939. It took 2 tick 3r31283. 3r33939. Read TSC on CPU3. Must be 214 3r31283. 3r31269. 3r33967. 3r31283. So far, the clock looks synchronized. But let's again measure the TSC on CPU1:

3r31283. 3r33939. 3r31283. 3r33939. It took 2 tick 3r31283. 3r33939. Read TSC on CPU1. It should be 16 3r31267. 3r31283. 3r31269. 3r33967. 3r31283. Oops! Monotony is broken. It turns out that measuring the first and last values on the same CPU allows you to detect more or less large shifts between hours. The next question, of course: “How big is the shift?” The amount of shift that can be detected depends on the time that passes between successive TSC measurements. In the example above, this is just 2 ticks. Shifts between clocks greater than 2 ticks will be detected. Generally speaking, shifts that are less than the time elapsed between successive measurements will not be detected. So, the tighter the measurements in time, the better. The accuracy of both estimates depends on this. The tighter the measurements are:

3r31283. 3r33939. 3r31283. 3r33939. The lower the estimated maximum shift 3r31283. 3r33939. the greater confidence in the monotony estimation of r3r31267. 3r31283. 3r31269. 3r33967. 3r31283. In the next section we will talk about how to make dense measurements. Here I will add that during the computation of the TSC reliability ratings, the library does many more simple checks for lice, for example:

3r31283. 3r33939. 3r31283. 3r33939. limited verification that TSCs on different CPUs are ticking at the same speed 3r31267. 3r31283. 3r33939. checking that the counters do change in time, and not just show the same value 3r31283. 3r31269. 3r33967. 3r31283. 3r33967. 3r31283. 3r33939. Two methods of collecting the values of counters 3r33967. 3r31283. In the library, I implemented two methods for collecting TSC values:

3r31283. 3r33832. 3r31283. 3r33939. 3r33434. Switch between CPUs [/b] . In this method, all the data necessary for assessing the reliability of a TSC is collected by a single stream that “jumps” from one CPU to another. Both algorithms described in the previous section are suitable for this method and are not suitable for the other. 3r33967. 3r31283. There is no practical use for switching between CPUs. The method was implemented just for the sake of "play." The problem with the method is that the time required to drag the flow from one CPU to another is very long. Accordingly, a lot of time passes between successive measurements of a TSC, and the accuracy of the estimates is very low. For example, a typical estimate for the maximum shift between TSC is obtained in the region of 2?000 ticks. 3r33967. 3r31283. Nevertheless, the method has a couple of advantages: 3r3393967. 3r31283. 3r33939. 3r31283. 3r33939. he is absolutely deterministic. If you need to consistently measure TSC on CPU? CPU? CPU? then we just take and do it: switch to CPU? read TSC, switch to CPU? read TSC, and finally, switch to CPU? read TSC 3r31283. 3r33939. presumably, if the number of CPUs in the system grows very quickly, then the time to switch between them should grow much slower. Therefore, in theory, apparently, there can be a system - a very large system! - in which the use of the method will be justified. But still it is unlikely 3r31283. 3r31269. 3r33967. 3r31283. 3r31267. 3r31283. 3r33939. 3r33434. Measurements ordered by CAS [/b] . In this method, data is collected in parallel by multiple threads. On each available CPU, one thread is started. Measurements made by different threads are ordered in a single sequence using the “compare-and-swap” operation. Below is a piece of code that shows how this is done. 3r33967. 3r31283. The idea of the method is borrowed from fio , a popular tool for generating I /O loads. 3r33967. 3r31283. The reliability estimates obtained with the power of this method already look very good. For example, the estimate of the maximum shift is obtained at the level of several hundred ticks. And the monotony test allows you to catch the desynchronization of clocks within hundreds of ticks. 3r33967. 3r31283. However, the algorithms given in the previous section are not suitable for this method. For them, it is important that the TSC values are measured in a predetermined order. The method of "measurements ordered by CAS" does not allow this. Instead, a long sequence of random measurements is first collected, and then the algorithms (already others) attempt to find in this sequence the values read on the "suitable" CPUs. 3r33967. 3r31283. I will not give these algorithms here, so as not to abuse your attention. They can be viewed in code. There are a lot of comments. Ideally, these algorithms are the same. A fundamentally new moment is a test of how statistically typed TSC sequences are statistically “qualitative”. It is also possible to set the minimum acceptable level of statistical significance for TSC reliability ratings. 3r33967. 3r31283. Theoretically, on VERY large systems, the method of “measurements ordered by CAS” can give poor results. The method requires that processors compete for access to a common memory cell. If there are a lot of processors, the match can be very tense. As a result, it will be difficult to create a measurement sequence with good statistical properties. However, at the moment such a situation seems unlikely. 3r33967. 3r31283. 3r31267. 3r31283. 3r33846. 3r33967. 3r31283. I promised some code. Here is how building the measurements in a single chain with the help of CAS. 3r33967. 3r31283.

`for (uint64_t i = 0; i < arg-> probes_count; i ++)`

3r33470. 3r33967. 3r31283. This code is executed on every available CPU. All threads have access to the shared variable

{

uint64_t seq_num = 0; 3r31283. uint64_t tsc_val = 0; 3r31283. 3r31283. do

{

__atomic_load (seq_counter, & seq_num, __ATOMIC_ACQUIRE); 3r31283. __ sync_synchronize (); 3r31283. tsc_val = WTMLIB_GET_TSC (); 3r31283.} while (! __ atomic_compare_exchange_n (seq_counter, & seq_num, seq_num + ? false, __ATOMIC_ACQ_REL, __ATOMIC_RELAXED)); 3r31283. 3r31283. arg-> tsc_probes[i].seq_num = seq_num; 3r31283. arg-> tsc_probes[i].tsc_val = tsc_val; 3r31283.}`seq_counter`

. Before reading the TSC, the stream reads the value of this variable and stores it in the variable`seq_num`

. Then reads TSC. Then it tries to atomically increment seq_counter by one, but only if the value of the variable has not changed since it was read. If the operation is successful, it means that the thread managed to “stake out” the serial number stored in`after the measured TSC value. seq_num`

. The next sequence number that can be staked out (perhaps already in another thread) will be one more. For this number is taken from the variable`seq_counter`

, and every successful call 3r33948. __atomic_compare_exchange_n () increases this variable by one. 3r33967. 3r31283. 3r33967. 3r31283.

**__atomic with __sync ??? [/b] 3r33490. For the sake of boredom, it should be noted that the use of built-in functions of the family 3r33948. __atomic in conjunction with the function of the obsolete**

3r31283.

3r31283.

3r31283.

3r31283.

3r31283.

3r31283.

3r31283. 3r33967. 3r31283. Remember how we calculated the parameter

3r31283.

3r31283. 3r33832. 3r31283. 3r33939. Why

3r31283.

3r31283.

3r31283. 3r31267. 3r31283. 3r31269. 3r33967. 3r31283. The methods discussed in this article allow you to measure the time scale of a second with an accuracy of the order of several tens of nanoseconds. This is the accuracy that I really observe when using my library. 3r33967. 3r31283. 3r33967. 3r31283. Interestingly, fio, from which I borrowed some methods, on the scale of a second, loses exactly 700-900 nanoseconds (and there are three reasons for this). Plus, it loses in conversion speed due to the storage of time in the standard Linux format. However, I hasten to reassure fio fans. I sent the developers a description of all the problems with the conversion, which I found: github.com/axboe/fio/issues/695 . People are already working, they will be fixed soon. 3r33967. 3r31283. 3r33967. 3r31283. I wish you all a lot of pleasant nanoseconds! 3r31279. 3r31283. 3r31283.

! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () (); 3r33973. 3r31283. 3r31279. 3r31283. 3r31283. 3r33978. Only registered users can participate in the survey. 3r3393979. Enter

, you are welcome. 3r33981. 3r31283. 3r31283. 3r31150. 3r31283. 3r31152. 3r31283. 3r31154. Do you work with high-performance code? 3r31262. 3r31283. 3r31279. 3r31283. 3r31283. 3r31160. 3r31283. 3r31162. 3r31283. 3r3r1164. 3r31283. 3r3-3000. 3r31283. 3r31283. 3r31169. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Yes, I am writing it

3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Yes, I analyze it 3r3r12262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. No 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Other

3r31283. 3r31264. 3r31283.

3r31267. 3r31283. 3r31269. 3r31283. 3r31283. 3r31272. 3r31283. 3r31279. 3r31283. 3r31276. No one has voted yet. There are no abstentions. 3r31279. 3r31283. 3r31279. 3r31283. 3r31283. 3r31150. 3r31283. 3r31152. 3r31283. 3r31154. Do you use fio in your work (at your leisure?)? 3r31262. 3r31283. 3r31279. 3r31283. 3r31283. 3r31160. 3r31283. 3r31162. 3r31283. 3r3r1164. 3r31283.

3r31283. 3r31283. 3r31169. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Yes 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. No, but I use similar tools

3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. No, I don’t need these tools 3r3r12262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31269. 3r31283. 3r31283. 3r31272. 3r31283. 3r31279. 3r31283. 3r31276. No one has voted yet. There are no abstentions. 3r31279. 3r31283. 3r31279. 3r31283. 3r31283. 3r31150. 3r31283. 3r31152. 3r31283. 3r31154. What is your specialty? 3r31262. 3r31283. 3r31279. 3r31283. 3r31283. 3r31160. 3r31283. 3r31162. 3r31283. 3r3r1164. 3r31283. 3r31166. 3r31283. 3r31283. 3r31169. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Developer 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283. 3r31283. 3r31261. Tester 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283. 3r31283. 3r31261. Performance Analyst 3r3r12262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. DevOps 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Manager in IT 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Another occupation in IT 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Not an IT specialist 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31269. 3r31283. 3r31283. 3r31272. 3r31283. 3r31279. 3r31283. 3r31276. No one has voted yet. There are no abstentions. 3r31279. 3r31283. 3r31279. 3r31283. 3r31283. 3r31283. 3r31283.

` family. __sync `

looks ugly. ` __sync_synchronize () `

used in the code in order to avoid reordering the TSC read operation with overlying operations. For this you need a complete barrier in memory. In the ` family. __atomic `

Formally, there is no function with corresponding properties. Although in fact there is: ` __atomic_signal_fence () `

. This function streamlines the computation of a stream with signal handlers running in the same stream. In fact, this is a complete barrier. However, this is not stated directly. And I prefer code in which there is no hidden semantics. From here ` __sync_synchronize () `

- stopudov full barrier on memory. 3r33967. 3r31283. 3r31279. 3r31279. 3r33967. 3r31283. Another point worth mentioning here is the concern that all measurement flows start more or less at the same time. We are interested in the fact that the values of TSC, read on different CPUs, were mixed together as best as possible. We are not satisfied with the situation when, for example, one thread starts first, finishes its work, and only then all the others start. The resulting TSC sequence will have no good properties. From it will not work to extract any estimates. The simultaneous start of all threads is important - and for this the library has taken action. 3r33967. 3r31283. 3r33967. 3r31283. 3r33939. Convert ticks to nanoseconds on the fly 3r3r144. 3r33967. 3r31283. After checking the reliability of TSC, the second big library assignment is to convert ticks to nanoseconds on the fly. The idea of this conversion, I borrowed from the already mentioned fio. However, I had to make several significant improvements, because, as shown by my analysis, in fio itself, the conversion procedure does not work well enough. It turns out low accuracy. 3r33967. 3r31283. 3r33967. 3r31283. Immediately begin with an example. 3r33967. 3r31283. Ideally, I would like to convert tics to nanoseconds like this:3r31283.

` ns_time = tsc_ticks /tsc_per_ns `

3r33967. 3r31283. We want the time spent on conversion to be minimal. Therefore, we aim to use only integer arithmetic. Let's see what it may threaten us. 3r33967. 3r31283. If ` tsc_per_ns = 3 `

, then simple integer division, in terms of accuracy, works great: ` ns_time = tsc_ticks /3 `

. 3r33967. 3r31283. But what if ` tsc_per_ns = ???r3r3949. ? If this number is rounded to ? the conversion accuracy will be very low. To overcome this problem can be as follows:`

3r31283. ` ns_time = (tsc_ticks * factor) /(??? * factor) `

3r33967. 3r31283. If the multiplier is ` factor `

large enough, then the accuracy will be good. But something will remain bad. Namely, the overhead of the conversion. Integer division is a very expensive operation. For example, on x86 it requires 10+ cycles. Plus, the operations of integer division are not always pipelined. 3r33967. 3r31283. 3r33967. 3r31283. Rewrite our formula in equivalent form:

3r31283. ` ns_time = (tsc_ticks * factor /???) /factor `

3r33967. 3r31283. 3r33967. 3r31283. The first division is not a problem. We can predict ` (factor /???) `

in advance. But the second division is still pain. To get rid of it, let's choose ` factor `

equal to a power of two. After that, the second division can be replaced by a bit shift - a simple and fast operation. 3r33967. 3r31283. 3r33967. 3r31283. How big can you choose ` factor `

? Unfortunately, ` factor `

can not be arbitrarily large. It is limited by the condition that the multiplication in the numerator should not lead to overflow of the 64-bit type. Yes, we want to use only native types. Again, to keep the overhead of the conversion at a minimum. 3r33967. 3r31283. 3r33967. 3r31283. Now let's see how big can be ` factor `

in our particular example. Suppose we want to work with time intervals up to one year. For a year, TSC ticks the following number of times: 3r3393948. ??? * 1000000000 * 60 * 60 * 24 * 365 = 105109488000000000 3r3393949. . Divide the maximum value of the 64-bit type by this number: ` 18446744073709551615/105109488000000000 ~ ???r3r3949. . Thus, the expression `` (factor /???) `

should not be greater than this value. Then we have: ` factor <= 175.5 * 3.333 ~ 584.9`

. The largest power of two, which does not exceed this number, is 512. Consequently, our conversion formula takes the form:

3r31283. ` ns_time = (tsc_ticks * 512 /???) /512 `

3r33967. 3r31283. 3r33967. 3r31283. Or: 3r33967. 3r31283. ` ns_time = tsc_ticks * 153/512 `

3r33967. 3r31283. 3r33967. 3r31283. Perfectly. Let's now see what this formula has with accuracy. In one year, contained ` 1000000000 * 60 * 60 * 24 * 365 = 31536000000000000 `

nanoseconds Our formula gives: ` 105109488000000000 * 153/512 = 31409671218750000 `

. The difference with the present value is 126328781250000 nanoseconds, or 3r3393948. 126328781250000/1000000000/60/60 ~ 35 r3r3949. hours 3r33967. 3r31283. 3r33967. 3r31283. This is a big mistake. We want better accuracy. What if we measure time intervals no more than an hour? I will omit the calculations. They are completely identical to what has just been done. The final formula will be:

3r31283. ` ns_time = tsc_ticks * 1258417/4194304 `

(1) 3r33967. 3r31283. 3r33967. 3r31283. The conversion error will be only 119305 nanoseconds for 1 hour (which is less than 0.2 milliseconds). Very, very good. If the maximum convertible value is even less than an hour, then the accuracy will be even better. But how do we use it? Do not limit the measurement of time to one hour? 3r33967. 3r31283. 3r33967. 3r31283. Pay attention to the following point:

3r31283. ` tsc_ticks = (tsc_ticks_per_1_hour * number_of_hours) + tsc_ticks_remainder `

3r33967. 3r31283. 3r33967. 3r31283. If we predict ` tsc_ticks_per_1_hour `

then we can extract ` number_of_hours `

from ` tsc_ticks `

. Further, we know how many nanoseconds are contained in one hour. Therefore, it will not be difficult for us to convert that part to 3r3r489 in nanoseconds. tsc_ticks

which corresponds to the whole number of hours. To complete the conversion, we will need to convert to nanoseconds ` tsc_ticks_remainder `

. However, we know that this number of ticks occurred in less than an hour. So, to convert it to nanoseconds, we can use the formula (1). 3r33967. 3r31283. 3r33967. 3r31283. Is done. This mechanism of conversion suits us. Let's now generalize and optimize it. 3r33967. 3r31283. First of all, we want to have flexible control over conversion errors. We do not want to bind the conversion parameters to a time interval of 1 hour. Let it be an arbitrary time interval:

3r31283. ` tsc_ticks = modulus * number_of_moduli_periods + tsc_ticks_remainder `

3r33967. 3r31283. 3r33967. 3r31283. Recall how to convert the residue to nanoseconds:

3r31283. ` ns_per_remainder = (tsc_ticks_remainder * factor /tsc_per_nsec) /factor `

3r33967. 3r31283. 3r33967. 3r31283. We calculate the conversion parameters (we know that 3r33948. Tsc_ticks_remainder < modulus

):3r31283.

` modulus * (factor /tsc_per_nsec) <= UINT64_MAX`

3r31283. factor <= (UINT64_MAX /modulus) * tsc_per_nsec

3r31283. 2 ^ shift <= (UINT64_MAX /modulus) * tsc_per_nsec

3r31283.

3r33967. 3r31283. For the sake of boredom, it should be noted that the last inequality is not equivalent to the first one in the framework of integer arithmetic. But I will not dwell on this for long. I can only say that the last inequality is tougher than the first, and therefore safe to use. 3r33967. 3r31283. 3r33967. 3r31283. After the last inequality obtained 3r3-3948. shift , we calculate:3r31283.

` factor = 2 ^ shift`

3r31283. mult = factor /tsc_per_nsec

3r31283.

3r33967. 3r31283. And then these parameters are used to convert the residue to nanoseconds:3r31283.

` ns_per_remainder = (tsc_ticks_remainder * mult) shift`

3r31283.

3r33967. 3r31283. So, with the conversion of the balance figured out. The next problem to be solved is the extraction of 3r3393948. tsc_ticks_remainder and 3r33948. number_of_moduli_periods from ` tsc_ticks `

. As always, we want to do it quickly. As always, we do not want to use division. Therefore, simply select ` modulus `

equal to a power of two:3r31283.

` modulus = 2 ^ remainder_bit_length `

3r33967. 3r31283. Then: 3r33967. 3r31283. ` number_of_moduli_periods = tsc_ticks remainder_bit_length`

3r31283. tsc_ticks_remainder = tsc_ticks & (modulus - 1)

3r33967. 3r31283. 3r33967. 3r31283. Excellent. We now know how to extract from ` tsc_ticks `

` number_of_moduli_periods `

and 3r33948. tsc_ticks_remainder . And we know how to convert ` tsc_ticks_remainder `

in nanoseconds. It remains to figure out how to convert that part of ticks into a nanosecond, which is a multiple of ` modulus `

. But everything is simple:3r31283.

` ns_per_moduli = ns_per_modulus * number_of_moduli_periods `

3r33967. 3r31283. 3r33967. 3r31283. ` ns_per_modulus `

can be calculated in advance. And according to the same formula, according to which we convert the remainder. This formula can be used for periods of time that are no longer than 3r-3948. modulus . ` itself. modulus `

Naturally, no longer than ` modulus `

. 3r33967. 3r31283. ` ns_per_modulus = (modulus * mult) shift `

3r33967. 3r31283. 3r33967. 3r31283. All! We were able to predict all the parameters necessary for converting ticks to nanoseconds on the fly. Now we briefly summarize the conversion procedure: 3r3393967. 3r31283. 3r33832. 3r31283. 3r33939. we have ` tsc_ticks `

3r31267. 3r31283. 3r33939. ` number_of_moduli_periods = tsc_ticks remainder_bit_length `

3r31267. 3r31283. 3r33939. ` tsc_ticks_remainder = tsc_ticks & (modulus - 1) `

3r31267. 3r31283. 3r33939. ` ns = ns_per_modulus * number_of_moduli_periods + (tsc_ticks_remainder * mult) shift `

3r31267. 3r31283. 3r33846. 3r33967. 3r31283. In this procedure, parameters ` remainder_bit_length `

, 3r33948. modulus, ns_per_modulus , 3r33948. mult and 3r33948. shift predicted in advance. 3r33967. 3r31283. 3r33967. 3r31283. If you are still reading this post, then you are a big or big fellow. It is even possible that you are a performance analyst or developer of high-performance software. 3r33967. 3r31283. So here. It turns out that we have not finished yet :)3r31283. 3r33967. 3r31283. Remember how we calculated the parameter

` mult `

? It was like this:3r31283.

` mult = factor /tsc_per_nsec `

3r33967. 3r31283. 3r33967. 3r31283. Question: where does r3r3948 come from? tsc_per_nsec ? 3r33967. 3r31283. The number of ticks per nanosecond is a very small amount. In fact, in my library instead of ` tsc_per_nsec `

used ` (tsc_per_sec /1000000000) `

. Ie: 3r33967. 3r31283. ` mult = factor * 1000000000 /tsc_per_sec `

3r33967. 3r31283. 3r33967. 3r31283. And here there are two interesting questions:3r31283. 3r33832. 3r31283. 3r33939. Why

` tsc_per_sec `

and not ` tsc_per_msec `

, eg? 3r31267. 3r31283. 3r33939. Where to get these ` tsc_per_sec `

? 3r31267. 3r31283. 3r33846. 3r33967. 3r31283. Let's start with the first. In fio, the number of ticks in a millisecond is now used. And there are problems with this. By car, the parameters of which I called above, ` tsc_per_msec = 2599998 `

. While ` tsc_per_sec = 2599998971 `

. If we bring these numbers to the same scale, then their ratio will be very close to unity: ???. But if we use the first, not the second, then for every second we will have an error of 374 nanoseconds. Therefore - r3r3948. tsc_per_sec . 3r33967. 3r31283. 3r33967. 3r31283. Next How to count ` tsc_per_sec `

? 3r33967. 3r31283. This is done on the basis of direct measurement:3r31283.

` start_sytem_time = clock_gettime ()`

3r31283. start_tsc = WTMLIB_GET_TSC ()

3r31283. wait for some time

3r31283. end_system_time = clock_gettime ()

3r31283. end_tsc = WTMLIB_GET_TSC ()

3r31283.

3r33967. 3r31283. “Some time” is a configurable parameter. It can be more, less than or equal to one second. Let's say it's half a second. Suppose further that the real difference is between r3r3948. end_system_time and 3r33948. start_system_time turned out to be 0.6 seconds. Then ` tsc_per_sec = (end_tsc - start_tsc) /?6 `

. 3r33967. 3r31283. 3r33967. 3r31283. The library reads several values in this way, ` tsc_per_sec `

. And then, using standard methods, it “clears” them of statistical noise and gets a single value of 3r34848. tsc_per_sec you can trust. 3r33967. 3r31283. 3r33967. 3r31283. In the time measurement scheme above, the order of calls 3r33948 is important. clock_gettime () and 3r33948. WTMLIB_GET_TSC () 3r3393949. . It is important that between the two calls ` WTMLIB_GET_TSC () 3r3393949. the same time as between the two calls `` has passed. clock_gettime () `

. Then you can easily correlate the system time with TSC ticks. And then the spread of values ` tsc_per_sec `

can really be considered random. With such a measurement scheme, the value of ` tsc_per_sec `

will deviate from the mean in either direction with the same probability. And you can apply standard filtering methods to them. 3r33967. 3r31283. 3r33967. 3r31283. 3r33939. Conclusion 3r33914. 3r33967. 3r31283. Perhaps all. 3r33967. 3r31283. 3r33967. 3r31283. But the topic of effective time measurement is not limited to this. There are many nuances. Interested I propose to independently work out the following questions: 3r3393967. 3r31283. 3r33939. 3r31283. 3r33939. storing the conversion parameters in the cache or - even better - on the registers 3r31267. 3r31283. 3r33939. up to what limits can be reduced ` modulus `

(thereby increasing the accuracy of the conversion)? 3r31267. 3r31283. 3r33939. as we have seen, not only r3r3948 affects the accuracy of the conversion. modulus

, but also the value of the time interval that corresponds to the ticks ( ` tsc_per_msec `

or ` tsc_per_sec `

). How to balance the influence of both factors? 3r31267. 3r31283. 3r33939. TSC on a virtual machine. Is it possible to use? 3r31267. 3r31283. 3r33939. using standard operating system structures to store time. For example, fio saves its nanoseconds in the standard Linux format timespec. Here’s how it happens:3r31283.

` tp-> tv_sec = nsecs /1000000000ULL; `

3r33967. 3r31283. It turns out that at first TSC tics are converted to nanoseconds using a fast and efficient procedure. And then the entire payoff is leveled due to the integer division, which is needed in order to separate the nanoseconds from seconds3r31283. 3r31267. 3r31283. 3r31269. 3r33967. 3r31283. The methods discussed in this article allow you to measure the time scale of a second with an accuracy of the order of several tens of nanoseconds. This is the accuracy that I really observe when using my library. 3r33967. 3r31283. 3r33967. 3r31283. Interestingly, fio, from which I borrowed some methods, on the scale of a second, loses exactly 700-900 nanoseconds (and there are three reasons for this). Plus, it loses in conversion speed due to the storage of time in the standard Linux format. However, I hasten to reassure fio fans. I sent the developers a description of all the problems with the conversion, which I found: github.com/axboe/fio/issues/695 . People are already working, they will be fixed soon. 3r33967. 3r31283. 3r33967. 3r31283. I wish you all a lot of pleasant nanoseconds! 3r31279. 3r31283. 3r31283.

! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () (); 3r33973. 3r31283. 3r31279. 3r31283. 3r31283. 3r33978. Only registered users can participate in the survey. 3r3393979. Enter

, you are welcome. 3r33981. 3r31283. 3r31283. 3r31150. 3r31283. 3r31152. 3r31283. 3r31154. Do you work with high-performance code? 3r31262. 3r31283. 3r31279. 3r31283. 3r31283. 3r31160. 3r31283. 3r31162. 3r31283. 3r3r1164. 3r31283. 3r3-3000. 3r31283. 3r31283. 3r31169. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Yes, I am writing it

3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Yes, I analyze it 3r3r12262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. No 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Other

3r31283. 3r31264. 3r31283.

3r31267. 3r31283. 3r31269. 3r31283. 3r31283. 3r31272. 3r31283. 3r31279. 3r31283. 3r31276. No one has voted yet. There are no abstentions. 3r31279. 3r31283. 3r31279. 3r31283. 3r31283. 3r31150. 3r31283. 3r31152. 3r31283. 3r31154. Do you use fio in your work (at your leisure?)? 3r31262. 3r31283. 3r31279. 3r31283. 3r31283. 3r31160. 3r31283. 3r31162. 3r31283. 3r3r1164. 3r31283.

3r31283. 3r31283. 3r31169. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Yes 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. No, but I use similar tools

3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. No, I don’t need these tools 3r3r12262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31269. 3r31283. 3r31283. 3r31272. 3r31283. 3r31279. 3r31283. 3r31276. No one has voted yet. There are no abstentions. 3r31279. 3r31283. 3r31279. 3r31283. 3r31283. 3r31150. 3r31283. 3r31152. 3r31283. 3r31154. What is your specialty? 3r31262. 3r31283. 3r31279. 3r31283. 3r31283. 3r31160. 3r31283. 3r31162. 3r31283. 3r3r1164. 3r31283. 3r31166. 3r31283. 3r31283. 3r31169. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Developer 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283. 3r31283. 3r31261. Tester 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283. 3r31283. 3r31261. Performance Analyst 3r3r12262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. DevOps 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Manager in IT 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Another occupation in IT 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31255. 3r31283. 3r31257. 3r31283.

3r31261. Not an IT specialist 3r31262. 3r31283. 3r31264. 3r31283. 3r31283. 3r31267. 3r31283. 3r31269. 3r31283. 3r31283. 3r31272. 3r31283. 3r31279. 3r31283. 3r31276. No one has voted yet. There are no abstentions. 3r31279. 3r31283. 3r31279. 3r31283. 3r31283. 3r31283. 3r31283.

It may be interesting

#### weber

Author**5-10-2018, 00:23**

Publication Date
#### Programming / High performance / Algorithms

Category- Comments: 0
- Views: 466

Your post is very helpful to get some effective tips to reduce weight properly. You have shared various nice photos of the same. I would like to thank you for sharing these tips. Surely I will try this at home. Keep updating more simple tips like this. buffet catering service Dudley