A flexible system for testing and collecting metrics of programs using the example of LLVM test suite

Introduction 3r3442.
 3r33450. Most developers have clearly heard about quite significant open-source developments such as the LLVM system and the clang compiler. However, LLVM is now not only directly the system itself to create compilers, but already a large ecosystem that includes many projects to solve various problems arising in the process of any stage of compiler creation (usually each such project has its own separate repository). Part of the infrastructure naturally includes tools for testing and benchmarking, because when developing a compiler, its efficiency is a very important indicator. One of such individual projects of the LLVM test infrastructure is the test suite (3-3338. Official documentation 3-333261.).
 3r33450.
 3r33450.
LLVM test suite


 3r33450. At first glance at the test-suite repository, it seems that this is just a set of benchmarks in C /C ++, but this is not quite so. In addition to the source code of the programs that will measure performance, test suite includes a flexible infrastructure for building, running and collecting metrics. By default, it collects the following metrics: compilation time, execution time, link time, code size (by sections).
 3r33450.
 3r33450. Test suite is naturally useful for testing and benchmarking compilers, but it can also be used for some other research tasks where some C /C ++ code base is needed. Those who once attempted to do something in the field of data analysis, I think, faced with the problem of lack and separation of the source data. A test-suite, though not composed of a huge number of applications, but has a unified data collection mechanism. It is very easy to add your own applications to the set, to collect the metrics necessary for your task. Therefore, in my opinion, the test suite (besides the main task of testing and benchmarking) is a good option for a basic project, on the basis of which you can build your data collection for tasks where you need to analyze some features of the program code or some characteristics of the programs.
 3r33450.
 3r33450. 3r373. LLVM test-suite
structure.
 3r33450. 3r33434. 3r3302. test suite
| ---- CMakeLists.txt //Basic CMake file, conducting the initial settings, adding
| //modules, etc. 3r33450. | 3r33450. | ---- cmake
| | ---- .modules //Files describing macros and general-purpose functions,
| //actually being an API for integrating
tests. | 3r33450. | ---- litsupport //A Python module that describes the test format for the test suite, 3r35034. | //recognized by the test tool lit (included in LLVM) 3r3-33450. | 3r33450. | ---- tools //Contains additional tools: to compare the results of
| //work programs with the expected output (with settings for accuracy 3r33450. | //check), time measurement, etc. 3r33450. | 3r33450. | //The remaining directories contain benchmarks 3r33450. | 3r33450. | ---- SingleSource //Contains test programs consisting of a single file with the source
| //code. There can be many different tests in one directory. 3r33450. | 3r33450. | ---- MultiSource //Contains test programs consisting of a set of files from 3r-3350. | //source code. In one directory are usually files for
| //single application. 3r33450. | 3r33450. | ---- MicroBenchmarks //Programs that use the google-benchmark library. In them 3r33450. | //define functions that are executed several times, until 3r33450. | //measurement results will not be statistically significant 3r3-3450. | 3r33450. | ---- External //Contains a description for programs that are not included in the test suite, but
| //namely, the source codes of the programs are (or may be) 3r33450. | //somewhere else 3r35034.
 3r33450. The structure is simple and straightforward.
 3r33450.
 3r33450. 3r373. The principle of operation 3r374.
 3r33450. As you can see, CMake and a special lit-test format are responsible for all the work on the description of the assembly, launch and collection of metrics.
 3r33450.
 3r33450. If we consider it very abstract, it is clear that the process of benchmarking with the help of this system looks simple and very predictable: 3r3443.  3r33450. A flexible system for testing and collecting metrics of programs using the example of LLVM test suite  3r33450.
 3r33450. How does this look in more detail? In this article I would like to dwell on exactly what role CMake plays in the entire system and what is the only file you need to write if you want to add something to this system.
 3r33450.
 3r33450. 3r33358. 1. Building test applications. [/b]
 3r33450.
 3r33450. As a build system, it is already used as a standard for C /C ++ CMake programs. CMake configures the project and generates make, ninja, etc., depending on the user's preferences. for direct construction.
 3r33450. However, in the test suite, CMake generates not only the rules for how to build applications, but also configures the tests themselves.
 3r33450.
 3r33450. After launching CMake, more files will be written to the build directory (with the .test extension) with a description of how the application should be executed and checked for correctness.
 3r33450.
 3r33450. An example of the most standard .test
file.  3r33450.
 3r33450. 3r33434. 3r3302. RUN: cd
/MultiSource /Benchmarks /Prolangs-C /football; 3r3118. /MultiSource /Benchmarks /Prolangs-C /football /football
VERIFY: cd
/MultiSource /Benchmarks /Prolangs-C /football; 3r3118. /tools /fpcmp% o football.reference_output

 3r33450. The file with the .test extension may contain the following sections:
 3r33450.
 3r33450.
 3r33450. 3r33381. PREPARE - describes any actions that must be done before launching the application, very similar to the Before method existing in different unit-testing frameworks; 3r33333.  3r33450. 3r33381. RUN - describes how to start the application; 3r33333.  3r33450. 3r33381. VERIFY - describes how to check the correctness of the application; 3r33333.  3r33450. 3r33381. METRIC - describes the metrics that you need to collect in addition to the standard. 3r33333.  3r33450. 3r33333.
 3r33450. Any of these sections may be omitted.
 3r33450.
 3r33450. But since this file is automatically generated, it is in the CMake file for the benchmark that describes how to get the object files, how to assemble them into the application, and then what to do with this application.
 3r33450.
 3r33450. For a better understanding of the default behavior and how it is described, consider the example of some CMakeLists.txt
 3r33450.
 3r33450. 3r33434. 3r3302. list (APPEND CFLAGS -DBREAK_HANDLER -DUNICODE-pthread) # compilation flags required by the application (the rest of the more general compilation flags like optimization level, etc. are better to be specified when calling CMake to be able to do experiments later) 3r33450. list (APPEND LDFLAGS -lstdc ++ -pthread) # the necessary application flags for linker

 3r33450. Flags can be set depending on the platform, the test-suite cmake modules include the DetectArchitecture file, which defines the target platform on which the benchmarks are run, so you can simply use the data already collected. Other data is also available: operating system, byte order, etc.
 3r33450.
 3r33450. 3r33434. 3r3302. if (TARGET_OS STREQUAL "Linux")
list (APPEND CPPFLAGS -DC_LINUX)
endif ()
if (NOT ARCH STREQUAL "ARM")
if (ENDIAN STREQUAL "little")
list (APPEND CPPFLAGS -DFPU_WORDS_BIGENDIAN = 0)
endif ()
if (ENDIAN STREQUAL "big")
list (APPEND CPPFLAGS -DFPU_WORDS_BIGENDIAN = 1)
endif ()
endif ()

 3r33450. In principle, in this part there should not be anything new for people who at least once saw or write a simple CMake file. Naturally, you can use libraries, build them yourself, in general, use any means provided by CMake to describe the process of building your application.
 3r33450.
 3r33450. And then you need to ensure the generation of the .test file. What tools does the tets-suite interface provide for this?
 3r33450.
 3r33450. There are 2 basic macros llvm_multisource and 3r33358. llvm_singlesource [/b] which is enough for most trivial cases.
 3r33450.
 3r33450.
 3r33450. 3r33381. 3r33358. llvm_multisource [/b] used if the application consists of several files. If you do not pass the source code files as parameters when calling this macro in your CMake, then all source code files in the current directory will be used as the base for building. In fact, changes are currently taking place in the interface of this macro in the test suite, and the described method of transferring the source files as macro parameters is the current version located in the master branch. Previously, there was another system: the source code files should have been written to the Source variable (it was still in release 7.0), and the macro did not accept any parameters. But the basic logic of the implementation remained the same. 3r33333.  3r33450. 3r33381. 3r33358. llvm_singlesource [/b] believes that each .c /.cpp file is a separate benchmark and for each it builds a separate executable file. 3r33333.  3r33450. 3r33333.
 3r33450. By default, both macros described above for launching a built application generate a command that simply invokes this application. A validation check is performed by comparing with the expected output found in the file with the extension .reference_output (also with possible suffixes .reference_output.little-endian, .reference_output.big-endian).
 3r33450.
 3r33450. If this suits you, it’s just great, one extra line (call llvm_multisource or llvm_singlesource) is enough for you to start the application and get the following metrics: code size (by section), compile time, link time, execution time.
 3r33450.
 3r33450. But naturally, rarely everything is so smooth. You may need to change one or more stages. And this is also possible with the help of simple actions. The only thing you need to remember is that if you redefine a certain stage, you need to describe all the others (even if the default algorithm suits them, which, of course, is a little upset).
 3r33450.
 3r33450. In the API, there are macros to describe the actions at each stage.
 3r33450.
 3r33450. About macro 3r3r58. llvm_test_prepare [/b] for the preparatory stage, there is nothing special to write, there simply are the commands to be executed as a parameter.
 3r33450.
 3r33450. What may be needed in the launch section? The most predictable case is that the application takes some arguments, input files. For this there is a macro llvm_test_run , which takes only application launch arguments (without an executable file name) as parameters.
 3r33450.
 3r33450. 3r33434. 3r3302. llvm_test_run (- fixed 400 --cpu 1 --num 200000 --seed 1158818515 run.hmm)

 3r33450. To change actions at the validation stage, use macro 3r33358. llvm_test_verify [/b] which accepts any commands as parameters. Of course, to check the correctness it is better to use the tools included in the tools folder. They provide quite good opportunities for comparing the generated output with the expected one (there is a separate processing for comparing real numbers with a certain error, etc.). But you can somewhere and just check that the application has completed successfully, etc.
 3r33450.
 3r33450. 3r33434. 3r3302. llvm_test_verify ("cat% o | grep -q 'exit 0'") #% o is a special placeholder for the output file that lit understands. As further these commands will be executed with the help of lit, you can use everything that it is able to recognize. There will be no detailed information about lit (the testing tool included in LLVM) in this material (you can familiarize yourself with 3r3260. Official documentation 3r3261.) 3r35050.
 3r33450. And what if there is a need to collect some additional metrics? For this there is a macro llvm_test_metric .
 3r33450.
 3r33450. 3r33434. 3r3302. llvm_test_metric (METRIC 3r3-3275. 3r3 -3276.)

 3r33450. For example, you can get a metric specific to dhrystone.
 3r33450.
 3r33450. 3r33434. 3r3302. llvm_test_metric (METRIC dhry_score grep 'Dhrystones per Second'% o | awk '{print $ 4}')

 3r33450. Of course, if you need to collect additional metrics for all tests, this method is somewhat inconvenient. You must either add the llvm_test_metric call to the high-level macros provided by the interface, or you can use TEST_SUITE_RUN_UNDER (CMake variable) and a specific script to collect metrics. The variable TEST_SUITE_RUN_UNDER is quite useful, and can be used, for example, to run on simulators, etc. In fact, a command is written to it that will accept an application with its arguments as input.
 3r33450.
 3r33450. As a result, we get someCMakeLists.txt of the form
 3r33450.
 3r33450. 3r33434. 3r3302. # We have no specific
compilation flags. llvm_test_run (- fixed 400 --cpu 1 --num 200000 --seed 1158818515 run.hmm)
llvm_test_verify ("cat% o | grep -q 'exit 0'")
llvm_test_metric (METRIC score grep 'Score'% o | awk '{print $ 4}')
llvm_multisource () # llvm_multisource (my_application) in the new version

 3r33450. The integration does not require additional efforts, if the application is already being built using CMake, then in CMakeList.txt in the test suite you can enable an already existing CMake for building and add a few simple macro calls.
 3r33450.
 3r33450. 3r33358. 2. Run tests [/b]
 3r33450.
 3r33450. As a result of his work, CMake generated a special test file according to the given description. But how is this file executed?
 3r33450.
 3r33450. lit always uses some configuration file lit.cfg, which, respectively, exists in the test suite. This configuration file contains various settings for running tests, including the format of executable tests. The test suite uses its own format, which is located in the litsupport folder.
 3r33450.
 3r33450. 3r33434. 3r33333. config.test_format = litsupport.test.TestSuiteTest ()
 3r33450. This format is described as a test class inherited from the standard lit-test and overrides the main interface method execute. Also important components of litsupport is a class with a description of the test plan execution TestPlan, which stores all the commands that must be executed at different stages and knows the order of the stages. To provide the necessary flexibility, the architecture also includes modules that should provide the mutatePlan method, within which they can modify the test plan, adding the description of collecting the necessary metrics, adding additional commands to measure the time to launch the application, etc. Due to the similar decision the architecture well extends.
 3r33450.
 3r33450. 3r33333.
 3r33450.
 3r33450. An approximate scheme of the test-suite test (except for details in the form of TestContext classes, various lit configurations and the tests themselves, etc.) is presented below.
 3r33450.
 3r33450.  3r33450.
 3r33450. Lit causes the execution of the test type specified in the configuration file. TestSuiteTest parses the generated CMake test file, getting a description of the main stages. Then all the found modules are called to change the current test plan, the launch is instrumented. Then the obtained test plan is executed: they are executed in the order of the preparation, launch, validation stages. If necessary, profiling can be performed (added by one of the modules, if a variable is set during configuration that indicates the need for profiling). The next step is to collect metrics, functions for which collection were added by standard modules in the metric_collectors field in TestPlan, and then additional metrics collected by the user in CMake are collected.
 3r33450.
 3r33450. 3r33358. 3. Run the test suite [/b]
 3r33450.
 3r33450. Running a test suite is possible in two ways:
 3r33450.
 3r33450.
 3r33450. 3r33381. Manual, i.e. sequential invocation of commands. 3r33434. 3r33434. cmake -DCMAKE_CXX_COMPILER: FILEPATH = clang ++ -DCMAKE_C_COMPILER: FILEPATH = clang test-suite # configuration 3rr3450. make # directly build
llvm-lit. -o
# run tests

3r33333.  3r33450. 3r33381. using LNT (another system from the LLVM ecosystem that allows you to run benchmarks, save the results in the database, analyze the results in the web interface). LNT, within its test run team, performs the same steps as in the previous paragraph.
 3r33450. 3r33434. 3r33434. lnt runtest test suite --sandbox SANDBOX --cc clang --cxx clang ++ --test-suite test-suite
3r33333.  3r33450. 3r33333.
 3r33450. The result for each test is displayed as
 3r33450.
 3r33450. 3r33434. 3r33434. PASS: test suite :: MultiSource /Benchmarks /Prolangs-C /football /football.test (m of n)
********** TEST 'test suite :: MultiSource /Benchmarks /Prolangs-C /football /football.test' RESULTS **********
compile_time: ???r3r3450. exec_time: ???r3r3450. hash: "38254c7947642d1adb9d2f1200dbddf7"
link_time: ???r3r3450. size: 59784
sizebss: 99800
3r33450. sizetext: 37778
**********

 3r33450. The results from different launches can be compared without LNT (although this framework provides great opportunities for analyzing information using different tools, but it needs a separate review), using the script included in the test suite
 3r33450.
 3r33450. 3r33434. 3r33434. test-suite /utils /compare.py results_a.json results_b.json
 3r33450. An example of comparing the code size of one and that benchmark from two launches: with the -O3 and -Os
flags.  3r33450.
 3r33450. 3r33434. 3r33434. test-suite /utils /compare.py -m size SANDBOX1 /build /O3.json SANDBOX /build /Os.json
Tests: 1
Metric: size
3r33450. Program O3 Os diff
3r33450. test suite langs-C /football /football.test ??? -20.6%

 3r33450.
Conclusion 3r3442.
 3r33450. The infrastructure for describing and running benchmarks implemented in the test suite is easy to use and support, it scales well, and in my opinion, in principle, it uses rather elegant solutions in its architecture, which of course makes the test suite a very useful tool for developers compilers, as well as this system can be improved for use in some data analysis tasks.
3r33450. 3r33450. 3r33448. ! function (e) {function t (t, n) {if (! (n in e)) {for (var r, a = e.document, i = a.scripts, o = i.length; o-- ;) if (-1! == i[o].src.indexOf (t)) {r = i[o]; break} if (! r) {r = a.createElement ("script"), r.type = "text /jаvascript", r.async =! ? r.defer =! ? r.src = t, r.charset = "UTF-8"; var d = function () {var e = a.getElementsByTagName ("script")[0]; e. ): d ()}}} t ("//mediator.mail.ru/script/2820404/"""_mediator") () ();
3r33450.
+ 0 -

Add comment