How Clang Compiles a Function

 3r3308. 3r3-31. I planned to write an article about how LLVM optimizes a function, but first you need to write how Clang translates C or C ++ to LLVM.
 3r3308.
 3r3308. How Clang Compiles a Function
 3r3308. lectures on cyclic optimization :
 3r3308.
 3r3308.
bool is_sorted (int * a, int n) {
for (int i = 0; i < n - 1; i++)
if (a[i]> a[i + 1])
return false;
return true;
} 3rr989.  3r3308. Since Clang does not make any optimizations, and since LLVM IR was originally designed to work with C and C ++, the conversion is relatively easy. I will use Clang ??? (or a close version, since this one has not yet been released) on x86-64.
 3r3308.
 3r3308. The command line is as follows:
 3r3308.
 3r3308.
clang ++ is_sorted.cpp -O0 -S -emit-llvm
 3r3308. In other words: compile the is_sorted.cpp file as C ++ and then we say the following LLVM toolchain: do not optimize, display the assembler, as a textual representation of LLVM IR. LLVM IR is volumetric, and cannot be quickly output or parsed, the binary bitcode format is always preferable if a person does not need to look at this code. Here is is full LLVM IR, we will look at it in parts.
 3r3308.
 3r3308. Let's start with the top of the file:
 3r3308.
; ModuleID = 'is_sorted.cpp'
source_filename = "is_sorted.cpp"
target datalayout = "e-m: e-i64: 64-f80: 128-n8: 16: 32: 64-S128"
target triple = "x86_64-unknown-linux-gnu"

 3r3308. The whole text between the semicolon and the end of the line is a comment, which means that the first line does nothing, but if you are interested, in LLVM a “module” is a compilation unit, a container for code and data. The second line should not bother us either. The third line describes some assumptions made by the compiler; they do not play a role in this article, but you can read more than 3r363. here is
. 3r3365. Target triple
listed in gcc-ism format and we will not need it later.
 3r3308.
 3r3308. The LLVM function has optional attributes: 3r-3295.  3r3308.
 3r3308.
; Function Attrs: noinline nounwind optnone uwtable
 3r3308. Some of them (like these) are supported by the front end, others are added later by optimization passes. These attributes have nothing to do with the meaning of the code, I will not discuss them here, but you can read about them here is if you're interested.
 3r3308.
 3r3308. And finally, our function:
 3r3308.
 3r3308.
define zeroext i1 @ _Z9is_sortedPii (i32 *% a, i32% n) # 0 {
 3r3308. “Zeroext” means that the return value of the function (i? one-bit integer) must be expanded with zeros in the backend, to the width required by ABI. Then comes the “augmented” (mangled) function name, then the parameter list is basically the same as in C ++ code, except that i32 defines a 32-bit variable. # 0 connects the function to attribute group at the end of the file.
 3r3308.
 3r3308. Here is the first base unit:
 3r3308.
 3r3308.
entry: 3r3308. % retval = alloca i? align 1
% a.addr = alloca i32 *, align 8
% n.addr = alloca i3? align 4
% i = alloca i3? align 4
store i32 *% a, i32 **% a.addr, align 8
Store i32% n, i32 *% n.addr, align 4
store i32 ? i32 *% i, align 4
br label% for.cond

 3r3308. Each LLVM instruction must be located inside the base unit: a set of instructions with one input at the beginning and one output at the end. The last instruction of the base unit should be termination instruction : “Dumping” into the next base unit is not allowed. Each function must have an input block that does not have predecessors (predecessors) that perform the transition to this block. This is other properties are checked during IR parsing, these checks can also be called multiple times during the compilation process by the “module verifier”. The verifier is useful for debugging when the pass generates the wrong IR.
 3r3308.
 3r3308. The first four instructions in this base block are alloca: allocating stack memory. The first three create variables that are implicitly created during compilation, the fourth - a loop variable. Variables allocated in this way can only be accessed via the load and store instructions. The following three instructions initialize three stack slots, a.addr and n.addr are initialized using the values ​​passed to the function as parameters, and i is initialized to zero. The return value does not need to be initialized, any code that is not undefined in C and C ++ will have to take care of this. The last instruction is an unconditional transition to the next base unit (we don’t worry about it yet, most of the unnecessary transitions will be removed by the LLVM backend).
 3r3308.
 3r3308. You might ask: Why does Clang allocate stack slots for a and n? Why doesn't he just use these values ​​directly? In this function, since a and n do not change, such a strategy will work, but this case will be taken into account by the optimizer, and is beyond the competence of Calng. If a and n can be modified, they should be in memory, and should not be SSA-values, which, by definition, can take on value only at one point in the program. Memory cells are outside the SSA world and can be modified at any time. This may seem strange, but this solution allows you to organize the work of all parts of the compiler in a natural and effective way.
 3r3308.
 3r3308. I think of Clang as a degenerate SSA code generator that satisfies all SSA requirements, but only because information is exchanged between the base units through memory. Generating a non-degenerate code requires some care and some analysis, and the Clang developers refused to do this in order to share the responsibilities of code generation and optimization. I did not see the measurement results, but in my understanding, a lot of memory operations are generated, and then almost immediately most of them are deleted by the optimizer, without leading to a large overhead of compile time, 3r-3295.  3r3308.
 3r3308. Consider how the for loop is translated. In general, it looks like this:
 3r3308.
 3r3308.
for (initializer; condition; modifier) ​​{
body
}

 3r3308. This translates approximately as follows:
 3r3308.
 3r3308.
initializer 3r3308. goto COND
COND: 3r3308. if (condition)
goto BODY
else
goto EXIT
BODY:
body
modifier
goto COND
EXIT:

 3r3308. Of course, this translation is not Clang specific, any C and C ++ compiler does the same.
 3r3308.
 3r3308. In our example, the loop initializer is minimized to an input base unit. The following basic block is a test of the cycle condition:
 3r3308.
 3r3308.
for.cond:; preds =% for.inc,% entry
% 0 = load i3? i32 *% i, align 4
% 1 = load i3? i32 *% n.addr, align 4
% sub = sub nsw i32% ? 1
% cmp = icmp slt i32% ?% sub
br i1% cmp, label% for.body, label% for.end

 3r3308. Clang also makes a useful comment that this base block can be reached either from for.inc or from an input base block. This block loads i and n from memory, decreases n (the nsw flag reflects the C property that the sign overflow is undefined; without this flag, LLVM uses the additional code semantics), compares the reduced value with i, using the slt command (signed less than, signed less than) and then finally branch to the for.body or for.end base unit.
 3r3308.
 3r3308. Logging into the loop body is possible only from the for.cond:
block.  3r3308.
 3r3308.
for.body: 3r3308. % 2 = load i32 *, i32 **% a.addr, align 8
% 3 = load i3? i32 *% i, align 4
% idxprom = sext i32% 3 to i64
% arrayidx = getelementptr inbounds i3? i32 *% ? i64% idxprom
% 4 = load i3? i32 *% arrayidx, align 4
% 5 = load i32 *, i32 **% a.addr, align 8
% 6 = load i3? i32 *% i, align 4
% add = add nsw i32% ? 1
% idxprom1 = sext i32% add to i64
% arrayidx2 = getelementptr inbounds i3? i32 *% ? i64% idxprom1
% 7 = load i3? i32 *% arrayidx? align 4
% cmp3 = icmp sgt i32% ?% 7
br i1% cmp? label% if.then, label% if.end

 3r3308. The first two lines load a and i into SSA registers; i then expands to 64 bits and can participate in the address calculation. Team getelementptr (or gep for short) - LLVM team, known for its pretentiousness, it even has its own Help section . Unlike machine language, LLVM does not treat pointers as integers. This facilitates alias analysis and other memory optimizations. This code loads a[i]and a[i + 1], compares them and performs branching depending on the result.
 3r3308.
 3r3308. The if.then block saves 0 to the stack slot for the return value of the function and performs an unconditional transition to the output block of the function: 3r-3295.  3r3308.
 3r3308.
if.then:
store i1 false, i1 *% retval, align 1
br label% return

 3r3308. The else block is trivial:
 3r3308.
 3r3308.
if.end:
br label% for.inc

 3r3308. And the block for adding one to the loop variable is also very simple:
 3r3308.
 3r3308.
for.inc: 3r3308. % 8 = load i3? i32 *% i, align 4
% inc = add nsw i32% ? 1
store i32% inc, i32 *% i, align 4
br label% for.cond

 3r3308. This code jumps back to the loop condition check.
 3r3308.
 3r3308. If the loop finishes normally, we return true:
 3r3308.
 3r3308.
for.end:
store i1 true, i1 *% retval, align 1
br label% return

 3r3308. And finally, what we loaded into the stack return value slot is loaded and returned:
 3r3308.
 3r3308.
return:
% 9 = load i? i1 *% retval, align 1
ret i1% 9

 3r3308. There is nothing special at the end of the function. The post turned out longer than I thought, in the next post we will look at the optimization of the IR level for this function.
 3r3308.
 3r3308. (Thanks to Xi Wang and Alex Rosenberg for the corrections)
3r3308. 3r3308. 3r3308. 3r33333. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e.parentNode.insertBefore (r, e)}; "[object Opera]" == e.opera? a.addEventListener? a.addEventListener ("DOMContentLoaded", d,! 1): e.attachEvent ("onload", d ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") (); 3r3302. 3r3308. 3r3304. 3r3308. 3r3308. 3r3308. 3r3308.
+ 0 -

Comments 2

Offline
sohail khatri
sohail khatri 16 September 2019 12:05
I think this is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article.
טיסות לאמסטרדם

Offline
sohail khatri
sohail khatri Today, 13:09
I think this is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article.טיסות לאנטליה

Add comment