Code profiling with LLVM

The Curse of Non-Determinism


 
Code profiling with LLVM  
My first attempt to write an LLVM pass - I love these segfolts
 
 
Recently, I ran into an interesting task - I needed a deterministic and cross-platform method for determining the execution time of C ++ code. By the word "deterministic" I mean that the same code will be executed for the same number of units of time. By cross-platform, I understand that the same code under Windows and under Ubuntu will be executed in the same amount of time units.
 
 
Naturally, the time measurement on the CPU does not satisfy these conditions. The machine code varies depending on the architecture and operating system, and the same code will take a different amount of time to execute. Even on the same machine, factors such as cache misses will play a large role - enough to distort the results of measuring the execution time of the same code. I needed something smarter
 
tool. PIN not suitable for our purposes. Instead, we decided to directly instrument the participant code in order to count the number of instructions to be executed. Instead of tooling the binaries (machine code), we decided to use Clang to compile the code and tool the LLVM bytecode.
 
 
If you are not familiar with LLVM, it is a collection of modular and reusable compiler and toolchain technologies. One of the main projects is LLVM IR and backend. In simple terms, an intermediate representation (Intermediate Representation) has been developed, into which the compile front-end compiles the code. This code, LLVM IR, is then compiled into the machine code by the LLVM backend. Thus, if you are making a new language, you can decide to allow LLVM to support the generation of machine code and its optimization, and write a front-end to convert your language to LLVM IR.
 
 
 
Clang converts C ++ code to LLVM IR, which is then converted to native code by the LLVM backend.
 
 
Clang is an LLVM frontend. Since we need a cross-platform code measurement method, we cannot instruct a binary code. LLVM IR, however, is platform-independent, as it is only an intermediate presentation of code. Instrumenting IR code using LLVM libraries is a simple cross-platform solution.
 
 

Solution


 
A simple calculation of LLVM IR instructions is obviously not appropriate, since we need the number of instructions to be actually executed, and not just the number of instructions in the code. In the end, we developed a simple algorithm based on the concept of basic code blocks.
 
 
The base block is a set of instructions in which the input point can be only the first instruction, and the output point is only the last instruction. (3r378. Any transitions inside the base unit are also prohibited - approx. Transl. 3r379.) To understand this, try splitting the code into pieces in which the branch instructions (transitions, cycles and returns) can only be the last in the set, and the input to the block (the first instruction in a function, loop, or if /else block) is possible only on the first instruction. The result is a set of basic blocks - blocks of sequential code, which is simply executed sequentially, without making decisions about which instruction is executed next.
 
 
Why don't we try it right now? This is a code snippet provided by Code Character: 3r38282.  
 
//Make the soldiers who weren’t patrolling the enemy
for (int i = NUM_SOLDIERS /2; i 3r3-33120. auto & soldier = state.soldiers[i];
if (soldier.hp == 0) //If this soldier is dead, skip it
continue;
for (auto enemy_soldier: state.enemy_soldiers) {
if (enemy_soldier.hp! = 0) {//Ensure your prospective target has
//not already been slain
soldier.soldier_target = enemy_soldier.id;
break ; 3r33232.} 3r33292.} 3r33292.} 3r33256. 3r33257. 3r33232.  
Link to Github: https://gist.github.com/JHurricane96/8c9c3a45ec5e969de4d9fecb47ebef69#file-player_code-cpp
 
 
Using the fact that the base unit has only one entry point (the first instruction), we can divide this fragment into the following basic blocks: 3r-3282.  
 
//-------------------------------- BB 1 -------------- --------------------
for (int i = NUM_SOLDIERS /2; i 3r3-33120. //-------------------------------- BB 2 - --------------------------------
Auto & soldier = state.soldiers[i];
If (soldier.hp == 0)
//-------------------------------- BB 3 -------- --------------------------
Continue;
//-------------------- ---------------- BB 4 -------------------------------- -
For (auto enemy_soldier: state.enemy_soldiers) {
//---------------------------------------- BB 5 ----------------------------------
If (by the enemy_soldier.hp! = 0) {
//-------------------------------- BB 6 -------------- --------------------
Soldier.soldier_target = enemy_soldier.id;
Break;
//------------ -------------------- BB 7 ---------------------------- ------
} 3r33232.} 3r33292.} 3r33256. 3r33257.
 
Link to Github: https://gist.github.com/JHurricane96/92452a27acb49ebe84a55a3bee46fe3e#file-player_code-cpp
 
This has helped us understand how the basic blocks work, now consider this algorithm in LLVM IR: 3r-3282.  

;
: 140:; preds =% 18?% 139
% 141 = load i3? i32 *% 1? align 4
% 142 = sext i32% 141 to i64
% 143 = icmp slt i64% 14? 20
br i1% 14? label% 14? label% 184
;
: 144:; preds =% 140
% 145 = getelementptr inbounds% "struct.player_state :: State",
% "struct.player_state :: State" *% ? i32 ? i???r3323292. % 146 = load i3? i32 *% 1? align 4
% 147 = sext i32% 146 to i64
% 148 = call dereferenceable (72)% "struct.player_state :: Soldier" *
@ _ZNSt5arrayIN12player_state7SoldierELm20EEixEm (
% "Struct.std :: array.10" *% 14? i64% 147) # 3
store% "struct.player_state :: Soldier" *% 14? 3r-3232. % "struct.player_state :: Soldier" **% 2? align 8
% 149 = load% "struct.player_state :: Soldier" *, 3r-3292. % "struct.player_state :: Soldier" **% 2? align 8
% 150 = getelementptr inbounds% "struct.player_state :: Soldier",
% "struct.player_state :: Soldier" *% 14? i32 ? i???r3r3292. % 151 = load i6? i64 *% 15? align 8
% 152 = icmp eq i64% 15? 0
br i1% 15? label% 15? label% 154
;
: 153:; preds =% 144
br label% 181

 
Link to Github: https://gist.github.com/JHurricane96/743e0611319d63516014ad3c7a2dcf42#file-player_code-s
 
 
If you look carefully, you will notice that the code snippet above is the first three blocks of a C ++ snippet compiled into LLVM IR (each line is the beginning of the base block).
 
 
LLVM has libraries that allow us to write “passages” —a code that can transform LLVM IR. The LLVM API allows us to easily read and analyze the LLVM IR by iterating over modules, functions, and base blocks, and modifying the LLVM IR before it is compiled into native code.
 
 
Now we have the basic blocks and the LLVM API, it becomes simple to count the number of instructions to be executed using such a simple algorithm: 3r-3282.  
 
1. Write the function IncrementCount, which accepts an integer and increments a static integer by this value, with each call. It must be linked to the participant code.
 
 
2. We make iterations on all basic blocks. For each base unit, perform steps 3 and 4 3r328282.  
 
3. Find n - the number of instructions in this base unit.
 
 
4. Insert the call to the IncrementCount function before the last instruction of the base unit, with the argument n.
 
 
5. The static integer with which IncrementCount works, and will be the counter of instructions after the code is executed. It can be saved somewhere and then checked.
 
 
Our instrumented IR works like this:
 
 
;
: 140:; preds =% 18?% 139
% 141 = load i3? i32 *% 1? align 4
% 142 = sext i32% 141 to i64
% 143 = icmp slt i64% 14? 20
call void @ _Z14IncrementCountm (i32 4)
br i1% 14? label% 14? label% 184
;
: 144:; preds =% 140
% 145 = getelementptr inbounds% "struct.player_state :: State",
% "struct.player_state :: State" *% ? i32 ? i???r3323292. % 146 = load i3? i32 *% 1? align 4
% 147 = sext i32% 146 to i64
% 148 = call dereferenceable (72)% "struct.player_state :: Soldier" *
@ _ZNSt5arrayIN12player_state7SoldierELm20EEixEm (
% "Struct.std :: array.10" *% 14? i64% 147) # 3
store% "struct.player_state :: Soldier" *% 14? 3r-3232. % "struct.player_state :: Soldier" **% 2? align 8
% 149 = load% "struct.player_state :: Soldier" *, 3r-3292. % "struct.player_state :: Soldier" **% 2? align 8
% 150 = getelementptr inbounds% "struct.player_state :: Soldier",
% "struct.player_state :: Soldier" *% 14? i32 ? i???r3r3292. % 151 = load i6? i64 *% 15? align 8
% 152 = icmp eq i64% 15? 0
call void @ _Z14IncrementCountm (i???)
br i1% 15? label% 15? label% 154
;
: 153:; preds =% 144
call void @ _Z14IncrementCountm (i32 1)
br label% 181

 
 
Link to Github: https://gist.github.com/JHurricane96/368c7ed79a113be860a73aeaadad371d#file-player_code-s
 
 
As we see, the call to IncrementCount is made at the end of each base block right before the last instruction. Using the static int with which IncrementCount works, we can get the number of instructions at the end of each participant’s turn. This method is deterministic and cross-platform, because the same source code is guaranteed to generate the same LLVM IR if we use the same version of the compiler and the same flags.
 
 
Conclusion 3r33273.
 
Code profiling is not as simple as I once thought. In the process of working on such a relatively simple task, I learned how the compiler works and how to write LLVM passages. If you are interested in generating code and you want to write your own passages, LLVM has 3r-3276. Beginner's Guide
. There is also a great article in the blog I used when writing my own pass. Since the LLVM API has no reversecompatibility between major versions, pay attention to which version of LLVM you are using.
 
 
The source code of the passage you can take here is .
+ 0 -

Add comment