Traffic generation in the user's area
Generate traffic through MoonGen + DPDK + Lua in the artist's presentation
Neutralization of DDoS-attacks in real conditions requires preliminary testing and testing of various techniques. Network equipment and software should be tested in artificial conditions close to real ones - with intensive traffic flows simulating attacks. Without such experiments, it is extremely difficult to obtain reliable information about the specific features and limitations of any complex instrument.
In this material, we will uncover some of the traffic generation methods used in Qrator Labs.
We strongly recommend the reader not to try to use the mentioned tools for attacks on real infrastructure objects. The organization of DoS attacks is prosecuted by law and can lead to severe punishment. Qrator Labs conducts all tests in an isolated laboratory environment.
similar decision is the nuclear module pktgen . This method significantly improves performance, but it is not flexible, since the slightest change in the source code in the kernel leads to a long build cycle, a reboot of kernel modules or even the entire system and, in fact, testing, which reduces the overall productivity (that is, the programmer needs more time and effort).
Another possible approach is to obtain direct access from userspace to the memory buffers of the network controller. This path is more complicated, but it is worth the effort in order to achieve higher productivity. Disadvantages include high complexity and low flexibility. Examples of this approach are technologies netmap , PF_RING and DPDK .
Another effective, though very costly way to achieve high performance is the use of non-universal, and specialized equipment. Example: Ixia .
Also, there are solutions based on DPDK using scripts, which increases the flexibility in managing the parameters of the generator, and also allows you to vary the type of generated packages during the startup process. Below we describe our own experience with one of these tools - MoonGen.
The distinctive features of MoonGen are:
Processing data DPDK in userspace, this is the main reason for performance gains;
Stack Luawith simple scripts at the top level and bindings to the DPDK library written in C, at the bottom;
Thanks to JIT (just in time) technology, Lua-scripts work quite quickly, which contradicts the generally accepted notions of the effectiveness of scripting languages.
MoonGen can be perceived as a Lua wrapper around the DPDK library. At least the following DPDK operations are visible at the level of the Lua user interface:
Configuring network controllers;
Allocation and direct access to pools and memory buffers, which, for optimization purposes, should be allocated by continuous aligned areas;
Direct access to RSS-queues of network controllers;
API for managing the computational flows, taking into account the heterogeneity of access to memory (NUMA and CPU affinity).
Architecture MoonGen, the scheme of the material.
MoonGen is a script high-speed packet generator based on the DPDK library. Lua scripts control the entire process: a user-created script creates, modifies, and sends packets. Thanks to the very fast LuaJIT and the DPDK packet processing library, this architecture allows you to saturate a 10-gigabit Ethernet interface with 64-byte packets using only one core of the CPU. MoonGen allows you to reach this speed, even when the Lua script modifies each packet. It does not use tricks like re-using the same buffer on the network controller.
MoonGen can also receive packets, that is, check which packets were discarded by the system under test. Since receiving packets is controlled exclusively by a user Lua script, it can be used to create more complex test scripts. For example, it is possible to use two instances of MoonGen to establish a connection with each other. Such a configuration can be used, in particular, for testing so-called middleboks (equipment between the sending and receiving points of traffic), for example, firewalls. MoonGen focuses on four main areas:
High performance and multi-core scaling: more than 20 million packets per second on one CPU core;
Flexibility: each package is generated in real time based on the user-created Lua script;
Precise time stamps: on a regular (commodity) iron, the time markings are made with millisecond accuracy;
Accurate control of intervals between packets sent: reliable generation of required patterns and types of traffic on ordinary hardware.
DPDK stands for Data Plane Development Kit and consists of libraries, the main functions of which are to increase the performance of network packet generation on a wide variety of CPU architectures.
In a world where computer networks become the foundation of human communication, performance, bandwidth and delays are becoming increasingly critical in the operation of systems such as wireless networks and cable infrastructure, including all their individual components: routers, load balancers, firewalls; as well as the scope of applications: media transfer (streaming), VoIP, etc.
DPDK is a lightweight and convenient way of building tests and scripts. Data transfer within userspace is something that we do not see so often, mainly because most applications communicate with network equipment through the operating system and the kernel stack, which is the opposite of the DPDK model.
The main purpose of the existence of Lua is to provide simple and flexible expressive tools that expand to specific current tasks, instead of a set of primitives that can only be used in one programming paradigm. As a result, the base language is very easy - the entire interpreter only takes 180 kB in a compiled form and easily adapts to a wide range of possible implementations.
Lua is a dynamic language. It is so compact that it can be placed on almost any device. Lua supports a small set of types: Boolean values, numbers (double precision floating point), and strings. Conventional data structures, such as arrays, sets, and lists, can be represented by the only existing built-in data structure in Lua-a table that represents a heterogeneous associative array.
Lua uses the JIT compilation (just in time), so, as a scripting language, it shows performance comparable to compiled languages, such as C.
As a company specializing in neutralizing DDoS attacks, Qrator Labs needs a reliable way to create, upgrade and test its own security solutions. It is for the latter - testing, you need different ways of generating traffic, simulating real attacks. Nevertheless, it is not so easy to simulate a dangerous, at the same time, direct flood attack on 2-3 levels of the OSI model, primarily because of the difficulties in achieving high performance in packet generation.
In other words, for a company engaged in continuous availability and neutralization of DDoS, the simulation of various DoS attacks in an isolated laboratory environment is a way to understand how different equipment that is part of the company's hardware complexes will behave in reality.
MoonGen is a good way to generate traffic close to the maximum for the network controller on a minimum of cores of the central processor. Data transfer within userspace significantly increases the performance of the stack under consideration (MoonGen + DPDK), compared to many other variants of generating high traffic values. Using a pure DPDK requires much more effort, so do not be surprised at our desire to optimize performance. We also support the cloneThe original MoonGen repository with the goal of expanding the functionality and implementation of its own tests.
In order to achieve maximum flexibility, the logic of package generation is specified by the user using the Lua script, which is one of the main features of MoonGen's work. In the case of relatively simple packet processing, this solution works quickly enough to populate the 10G interface on one CPU core. A typical way to modify incoming packets and create new ones is to work with packages of the same type, in which only some of the fields change.
An example is the l3-tcp-syn-ack-flood test described below. Note that any modification of the package can be performed in the same buffer, where the package generated or received in the previous stage appeared. Indeed, this kind of packet conversion is very fast, since it does not involve expensive operations, such as system calls, access to potentially not cached memory locations and the like.
Tests on the equipment Qrator Labs
Qrator Labs conducts all tests in the laboratory on different equipment. In this case, we have used the following network interface controllers:
Intel 82599ES 10G
Let's note separately that when working with network controllers working on standards above 10G, the performance problem is getting sharper. To date, it is not possible to saturate the 40G interface with a single core, although a small number of cores is already realistic.
In the case of Mellanox network controllers, it is possible to change some parameters and device settings with the tuning guide, provided by the manufacturer. This allows you to increase performance, and in some special cases - to further change the behavior of NIC. Other manufacturers may have similar documents for their own high-performance devices intended for professional use. Even if you can not find such a document in the public domain, it always makes sense to contact the manufacturer directly. In our case, representatives of Mellanox were very kind and, in addition to providing documentation, quickly responded to the questions that we had, thus we managed to achieve 100% band utilization, which was very important for us.
Test TCP SYN flood
L3-tcp-syn-ack-flood is an example of simulating a SYN flood attack. This extended Qrator Labs version of the test l3-tcp-syn-flood from the main repository MoonGen, which is stored in our repository clone. Our test can run three types of processes:
Generate the TCP SYN packet stream from zero, varying the required fields, such as source IP address, source port number, etc .;
Create a valid ACK response for each received SYN packet according to the TCP protocol;
Create a valid SYN-ACK response for each received ACK packet according to the TCP protocol.
For example, the internal (respectively, the most "hot") code cycle for creating ACK responses is as follows:
local tx = 0
local rx = rxQ: recv (rxBufs)
for i = ? rx do
local buf = rxBufs
local pkt = buf: getTcpPacket (ipv4)
if pkt.ip4: getProtocol () == ip4.PROTO_TCP and
pkt.tcp: getSyn () and
(pkt.tcp: getAck () or synack)
local seq = pkt.tcp: getSeqNumber ()
local ack = pkt.tcp: getAckNumber ()
pkt.tcp: unsetSyn ()
pkt.tcp: setAckNumber (seq + 1)
pkt.tcp: setSeqNumber (ack)
local tmp = pkt.ip4.src: get ()
pkt.ip4.src: set (pkt.ip4.dst: get ())
pkt.ip4.dst: set (tmp)
- some more manipulations with packet fields
tx = tx + 1
if tx> 0 then
txBufs: resize (tx)
txBufs: offloadTcpChecksums (ipv4) - offload checksums to NIC
txQ: send (txBufs)
Common IdeaI create a response package is as follows. First, you need to remove the packet from the RX queue, then check to see if the packet type matches the expected one. In case of coincidence, prepare a response by modifying some fields of the original package. Finally, put the created package in the TX queue using the same buffer. To improve performance, instead of taking packets one by one, one by one, we aggregate them, extracting all available packages from the RX queue, creating the appropriate answers, and putting them all in the TX queue. Despite the large number of manipulations on one package, performance remains high, primarily due to the fact that Lua JIT compiles all these operations into a small number of processor instructions. Many other tests, not just TCP SYN /ACK, work on the same principle.
The table below shows the results of the SYN flood test (SYN generation without attempts to respond) using Mellanox ConnectX-4. This NIC has two 40G ports with a theoretical performance ceiling of ??? Mpps on one port and 2 * 50 Mpps for two ports. The concrete implementation of the NIC connection to the PCIe somewhat limits the bandwidth (giving 2 * 50 instead of the expected 2 * ???).
cores per port
1 port, Mpps
2 ports, Mpps per each port
[i] SYN flood test; NIC: Mellanox Technologies MT27700 Family (ConnectX-4), dual 40G port; CPU: Intel® Xeon® Silver 4114 CPU @ ???GHz
The following table shows the results of the same SYN flood test run on the Mellanox ConnectX-5 with a single 100G port.
SYN flood test; NIC: Mellanox Technologies MT27800 Family (ConnectX-5), single 100G port; CPU: Intel® Xeon® Silver 4114 CPU @ ???GHz
Note that in all cases, we achieve more than 96% of the theoretical performance ceiling on a small number of processor cores.
Capture incoming traffic and save to PCAP files
Another test case is rx-to-pcap, which attempts to capture all incoming traffic and storein a certain number of PCAP files. . Although this test does not specifically concern the generation of packages as such, it serves as a demonstration of the fact that the weakest link in the organization of data transfer through userspace is the file system. Even the virtual file system tmpfs significantly slows down the flow. In this case, 8 cores of the central processor are necessary for the disposal of ??? Mpps, while only one core is sufficient to receive (and reset or redirect) the same amount of traffic.
The following table shows the amount of traffic (in Mpps) that was received and stored in PCAP files located in the ext2 file system on the SSD (second column) or on the tmpfs file system (third column).
on SSD, Mpps
on tmpfs, Mpps
Rx-to-pcap test; NIC: Intel 82599ES 10-Gigabit; CPU: Intel® Xeon® CPU E5-2683 v4 @ ???GHz
Modification of MoonGen: job manager tman
We would also like to introduce to the reader our own extension of the MoonGen functionality, which provides another way to run a group of tasks for testing. The main idea here is to separate the general configuration and the specific settings for each task, allowing you to run an arbitrary number of different tasks (that is, Lua scripts) simultaneously. In our MoonGen repository clone MoonGen implementation with thetask manager is represented. , here we only briefly list its main functions.
A new command-line interface allows you to run several jobs of different types at the same time. The basic scenario is as follows:
./build/tman[tman options] [-- [task1 options]] [-- [task2 options]] [-- ]
In addition, ./build/tman -h provides detailed help.
However, there is some limitation - the usual Lua job files are incompatible with the interface. tman . The task file is tman must clearly identify the following objects:
The configure (parser) function that describes the parameters of the job;
The task (taskNum, txInfo, rxInfo, args), which describes the actual task-task. Here, txInfo and rxInfo are arrays of RX and TX queues, respectively; args contains the parameters of the job manager and the job itself.
Examples can be found in examples /tman.
Using the job manager gives you great flexibility in running heterogeneous tests.
The approach that MoonGen offers was well suited to our goals and satisfied employees with the results obtained. We got a tool with high performance, while maintaining both the test environment and the language fairly simple. The high performance of this setup is achieved through two main features: direct access to the network interface controller buffers and Just-In-Time compilation technique in Lua.
As a rule, achieving the theoretical ceiling of the performance of the network interface controller is quite an achievable task. As we have demonstrated, a single core can be enough to fill the 10G port, while with a large number of cores it is not a problem and the filling of the 100G port.
We are especially grateful to the Mellanox team for help with working with their equipment and the MoonGen team for their response to bug fixes.
MoonGen: A Scriptable High-Speed Packet Generator - Paul Emmerich et al., Internet Measurement Conference 2015 (IMC'15), 2015: www.net.in.tum.de/fileadmin/bibtex/publications/papers/MoonGen_IMC2015.pdf
Mellanox tuning guide: www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf
Data Plane Development Kit: www.dpdk.org ,
SYN flood: en.wikipedia.org/wiki/SYN_flood
Qrator Labs' clone of MoonGen repository: github.com/QratorLabs/MoonGen
PCAP file format: en.wikipedia.org/wiki/Pcap
Task manager: github.com/QratorLabs/MoonGen#using-tman-task-manager
Lua performance: nullprogram.com/blog/2018/05/27
Network Functions Virtualization Whitepaper: portal.etsi.org/NFV/NFV_White_Paper.pdf
NUMA, non-uniform memory access: en.wikipedia.org/wiki/Non-uniform_memory_access
It may be interesting
Thttps://clubessay.com/here is definately a great deal to know about this subject. I like all of the points you've made.
VK Mobile ChallengE is best challange I have ever seen. Thanks for sharing such awesome news.
I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog. It shows how well you understand this subject. Bookmarked this page, will come back for more.
Situs QQ Online
Situs QQ Online
After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
Situs QQ Online
Situs QQ Online