Variants of application or what the load balancers do not have
A couple of years ago I completed the migration project in the network of one of our clients, the task was to change the platform that distributes the load between the servers. The scheme for providing this client's services evolved over almost 10 years, together with new developments in the data center industry, so the customer was expecting a solution that would satisfy not only the failover requirements of network equipment, load balancers and servers, in a good sense of the word , but also would have such properties as scalability, flexibility, mobility and simplicity. In this article, I will try consistently, from simple to complex, to outline the main examples of using load balancers without reference to the manufacturer, their features and methods of interface with the data network.
Load balancers are now increasingly referred to as Application Delivery Controllers (ADCs). But if the applications are running on the server, why do they need to be delivered somewhere? For reasons of fault tolerance or scaling, an application can be launched on more than one server, in this case you need a kind of reverse proxy server that will hide the internal complexity from the users, select the required server, deliver the request to it and make sure that the server returns the correct , from the point of view of the protocol, the result, otherwise - select another server and send the request there. To implement these functions, the ADC must understand the semantics of the application-level protocol with which it operates, it allows you to configure the application specific rules for the delivery of traffic, analyzing the result, and checking the status of the server. For example, understanding the semantics of HTTP makes configuration possible when HTTP requests are
GET /docs/index.html HTTP /???r3r3303.
Accept-Encoding: gzip, deflate
are sent to one server group with subsequent compression of results and caching, and requests
POST /api /object-put HTTP /???r3r3303.
Content-Type: application /json
processed according to completely different rules.
Understanding the semantics of the protocol allows you to organize session-level at the level of application protocol objects, for example, using HTTP Headers, RDP Cookie, or multiplexing requests to populate one transport session with many user requests if the application protocol layer allows it.
The scope of application of ADC is sometimes unreasonable to imagine only the maintenance of HTTP traffic, in fact, the list of supported protocols for most manufacturers is much broader. Even working without understanding the semantics of the application level ADC protocol can be useful for solving certain tasks, for example, I was involved in building a self-sufficient virtual farm of SMTP servers, during the spam-attack period the number of instances is increased by using the feedback control along the message queue length to ensure a satisfactory time for checking messages using resource-intensive algorithms. During activation, the server registered to the ADC and received its portion of the new TCP sessions. In the case of SMTP, such a scheme of work was justified because of the high entropy of connections at the network and transport levels, to evenly distribute the load during ADC spam attacks, only TCP support is required. A similar scheme can be used to build a farm from database servers, high-density DNS clusters, DHCP, AAA or Remote access servers, where servers can be considered equivalent in the domain and when their performance characteristics are not too different from each other. I will not go deeper into the subject of protocol features further, this aspect is too extensive to expound it in the introduction, if something seems interesting - write, it's probably an excuse for an article with a deeper exposition of some application, but now let's get to the point.
Most often, the ADC closes the transport layer, so the end-to-end TCP session between the consumer and the server becomes composite, the consumer establishes a session with ADC, and ADC - with one of the servers.
The network configuration and the addressing settings should provide such traffic advancement so that two parts of the TCP session pass through the ADC. The simplest way to get the traffic of the first part to come to the ADC is to assign the address of the service to one of the ADC interface addresses, with the second part, the following options are possible:
ADC as the default gateway for the server network;
Broadcast to ADC customer addresses in one of its interface addresses.
In fact, a slightly more realistic view of the first scheme of application looks like this, this is the basis from which we will begin:
Fig. 2 3 r3r3303.
The second group of servers can be databases, application back-ends, networked storage, or front-ends for another set of services in the case of decomposition of a classic application into micro-services. This group of servers can be a separate routing domain, with its policies, located in another data center or completely isolated for security reasons. Servers are rarely in the same segment, they are more often placed in segments for functional purposes with well-defined access policies, in the figure this is represented as a firewall.
Studies show that modern multi-tiered applications generate more West-East traffic, and you probably do not want all intra-inter-segmental /intersegmental traffic to pass through the ADC. The switches in Figure 2 are not necessarily physical - the routing domains can be implemented using virtual entities that are called virtual-router, vrf, vr, vpn-instance, or virtual routing table for different vendors.
By the way, there is also a variant of interface with the network, without the requirement for symmetric traffic flows from the consumer to the ADC and from the ADC to the servers, it is claimed in cases of long-lived sessions, in which one very large amount of traffic is transmitted to one side, for example, streaming or broadcast video content. In this case, the ADC sees only the stream from the client to the servers, this stream is delivered to the ADC interface address and after simple processing, which is to replace the MAC address with the interface MAC of one of the servers, the request is sent to the server where the service address is assigned to one of the logical interfaces. Reverse traffic from the server to the consumer goes, bypassing the ADC in accordance with the routing table of the server. Support for a single broadcast domain for all frontends can be very complex, and the ADC's ability to analyze the responses and support for the session is very limited in this case, in fact it's just a swatch, so this option is not considered further, although in solving some narrow tasks can be used.
So, we have one data center, shown in Figure ? let's think what problems can push the data center to evolution, I see two topics for analysis:
Suppose that the switching subsystem is completely reserved, let's not think about how and how, the topic is too extensive. Applications run on multiple servers and are backed up using ADC, but how can I reserve ADC?
If the analysis shows that the next seasonal peak of the load can exceed the capabilities of ADC, you, of course, think about scalability.
These tasks are similar in that, in the process of solving them, the number of copies of ADC should exactly increase. At the same time, fault tolerance can be organized according to Active /Backup and Active /Active, and scaling only by Active /Active. Let's try to solve them separately and see what properties have different solutions.
ADC of many manufacturers can be considered as elements of a network infrastructure, RIP, OSPF, BGP - all this is there, so you can build a trivial Active /Backup backup scheme. An active ADC transmits service prefixes to a higher-level router, and receives a default route from it to populate its table and to transfer it to the side of the data center into the corresponding virtual routing table. The backup ADC does the same, but using the semantics of the selected routing protocol, forms less attractive announcements. Servers with this approach can see the real IP address of the consumer, since there is no reason to use address translation. This scheme works also elegantly if the superior router is more than one, but to avoid the situation when the active ADC loses default and connectivity to the router while still receiving default from the backup ADC and continues to announce it to the data center, try to avoid the neighborhood between ADC and the use of static routes.
If the servers do not necessarily have to operate with the actual IP addresses of the consumer, or the application layer protocol allows you to embed it into headers, such as HTTP, the schema turns into Active /Active with an almost linear dependency of performance on the number of ADCs. In the case of more than one upstream router, care should be taken to ensure that incoming traffic arrives more or less evenly. This task is simple to solve if the transmission starts in the ECMP routing domain before these routers, if this is difficult or if the routing domain is not served by you - you can use full-mesh connections between ADX and routers so that the ECMP transmission starts directly on them.
At the beginning of this part, I wrote that fault tolerance and scaling are two big differences. Solutions to these tasks have different levels of resource utilization, if you design an Active /Standby scheme, you need to accept that half of the resources will be idle. And if it so happens that you need to take another quantitative step, be ready in the future to multiply the required resources by another two.
Advantages of Active /Active start to appear when you operate the number of devices greater than two. Suppose you need to ensure the performance of 8 conventional units (8 thousand connections per second, or 8 million simultaneous sessions) and provide for a scenario of failure of one device, in the Active /Active version, you need three ADCs with a performance of ? in the case of Active /Standby - two on 8. If you translate these figures into resources that are idle turns a third against half. The same calculation principle can be used to estimate the proportion of broken connections during a partial failure period. With the increase in the number of Active /Active instances, mathematics becomes even more pleasurable, and the system gets the possibility of a smooth increase in performance instead of a stepped Active /Standby.
It will be correct to mention another method of Active /Active or Active /Standby workflows - clustering. But it will not be very correct to give this much time, as I tried to write about approaches, and not about the features of the producers. When choosing such a solution it is necessary to clearly understand the following things:
Cluster architecture sometimes imposes restrictions on this or that functionality, in some projects it is important, in some it can become fundamentally in the future, everything depends on the manufacturer and every solution needs to be worked out individually;
A cluster is often a single fault domain, there will be errors in the software.
Cluster is easy to assemble, but very difficult to disassemble. The technology has less mobility - you can not control parts of the system.
You fall into the grasping embrace of your manufacturer.
Nevertheless, there are also positive things:
The cluster is simply installed and simply operated by
Sometimes you can expect close to optimal utilization of resources.
So, our data center from Figure 5 continues to grow, the task that you may have to solve is to increase the number of servers. Do this in an existing data center is not always possible, so suppose that there is a new spacious location with additional servers.
The new site may not be very far away, then you will successfully solve the problem by extending the routing domains. A more general case, which does not exclude the appearance of a site in another city or in another country, will put new problems ahead of the data center:
Utilization of channels between sites;
The difference in the processing time of requests that ADC sent for processing to nearby and remote servers.
Maintaining a wide channel between sites can be a very costly exercise, in addition, the choice of placement will cease to be a trivial task-an overloaded site with a short response time or a free one with a large response time. Thinking about this will push you toward building a geographically distributed data center configuration. This configuration, on one side, is user friendly, since it allows you to receive service at a point close to you, on the other hand it can significantly reduce the requirements for the channel bandwidth between sites.
For the case where real IP addresses do not necessarily have to be accessible to servers, or when the application layer protocol allows their transfer in headers, the device of the territorially distributed data center is not much different from what I have called the data center. ADC on any site can send requests for processing to local servers or send them for processing to an adjacent one, the address translation of the user makes it possible. Some attention should be paid to monitoring the volume of incoming traffic in order to maintain the amount of ADC within the site adequate to the share of traffic that the site receives. Broadcasting the customer address allows you to increase /decrease the number of ADCs or even move instances between sites according to changes in the matrix of incoming traffic, or during migration /launch. Despite the simplicity, the scheme is quite flexible, has pleasant operating characteristics and is easily replicated to a number of sites greater than two.
If you are working with a protocol that allows you to forward requests, as in the case of HTTP Redirect, this capability can be applied as an additional lever to control the load of the channel between sites, like the Mekhanmodification of maintenance work on servers or as a method of building Active /Backup server farms on different sites. At the required point in time, automatically or after some trigger events, the ADC can take traffic from the local servers and move the users to the next site. It is worth paying close attention to the development of this algorithm so that the coordinated work of ADC excludes the possibility of mutual forwarding of requests or resonance.
Of special interest is the case where servers need real IP addresses of consumers, and the application layer protocol does not have the capability to transmit additional headers, or when ADCs work without understanding the semantics of the application layer protocol. In this case, it is impossible to provide an agreed connection of TCP session segments simply by declaring in the ADC default route. If you do this, the first site servers will start using the local ADC as the default gateway for sessions that came from the second site, by itself the TCP session will not be established in this case because the ADC of the first site will only see one session shoulder.
There is a small trick that still allows you to run Active /Active ADC in combination with Active /Active server farms on different sites (I do not consider Active /Backup on two sites, careful reading of the above will allow you to solve this task without additional explanation). The trick is to use on the ADC of the second site not the interface addresses of the servers, but the logical ADC address, which corresponds to the server farm on the first site. Servers at the same time accept traffic as if it came from a local ADC and use the default local gateway. To maintain this mode of operation on the ADC, it is required to activate the memory function of the interface from which the first packet came to install the TCP session. For different manufacturers, this function is called differently, but the essence is the same - remember the interface in the session state table and use it for the response traffic without paying attention to the routing table. The scheme is quite workable and allows you to flexibly distribute the load across all available servers wherever they are located. In the case of two or more sites, the failure of one ADC does not affect the availability of the service as a whole, but completely excludes the possibility of processing traffic on the site servers with an out-of-order ADC, this must be remembered when predicting behavior and load during partial failures.
Approximately in this scheme, the services of our client functioned, when I took up work in the migration project on the new ADC platform. It was not a big challenge to just recreate the behavior of the old platform devices on the new one in the framework of a proven and customer-friendly scheme, this was what was expected of us.
But look again at Figure ? do you see that there is something that can be optimized?
The main disadvantage of the scheme for working with the ADC chain is that the resources of two ADCs are used for processing some part of the sessions. In the case of this client, the choice was absolutely conscious, it is conditioned by the specific operation of the applications and the need to redistribute the load very quickly (from 20 to 50 seconds) between the servers of different sites. In different periods of time, dual processing took an average of 15 to 30 percent of ADC resources, which is enough to think about optimization. Having discussed this point with the customer's engineers, we proposed to replace the support for the ADC session table with the interface to the routing by source on the servers using PBR on the IP stack of Linux. As a key, we considered such variants as:
additional IP address on the servers on the common interface for each ADC;
interface IP address on servers on a separate 802.1q for each ADC;
separate overlay tunnel network on servers for each ADC.
The first and second options would somehow have an impact on the network as a whole. Among the side effects of version number one, we found it unacceptable to increase the number of ADCs, ARP tables on the switches by a factor of several, and the second option would require increasing the number of end-to-end broadcast domains between sites or individual instances of virtual routing tables. The local character of the third option seemed very attractive to us, and we took up the work, which resulted in a simple controller that automates tunnels configuration on servers and ADCs, as well as the PBR configuration on the IP stack of Linux servers.
As I wrote the migration ended, the client got what it wanted - a new platform, simplicity of flexibility, scalability, and, as a result of switching over to the overlay, simplifying the configuration of network equipment within the service of these services - instead of several instances of virtual tables and large broadcast domains, it turned out that like a simple IP factory.
Colleagues who work for ADC producers, this paragraph is more about you. Some of your products are good, but try to pay attention to closer integration with applications on servers, automating their settings and orchestrating the entire development and operation process. It seems to me in the form of a classic interaction of the controller-agent, making changes to the ADC user initiates the controller's request to registered agents, this is what we did with the client, but "out of the box."
In addition, some customers will find it convenient to switch from the PULL model of interaction with servers to the PUSH model. The capabilities of applications on servers are very wide, so sometimes it's easier to organize a serious application specific service check on the agent itself. If the check yields a positive result, the agent sends information, for example, in the form of a BGP-like Community, for use in weighted weighting algorithms.
Often, maintenance of servers and ADCs are performed by different organizational units, switching to the PUSH interaction model might be of interest because this model eliminates the need for coordination between units over the human-human interface. Services in which the server participates can be transferred directly from the agent to the ADC in the form of something similar to the advanced BGP Flow-Spec.
There are many other things you can write. What do I do all this Being in free choice we decide in favor of a more convenient, more suitable or in favor of an option that expands the window of opportunity to minimize its risks. Large players of the Internet industry invent something completely innovative to solve their problems, what tomorrow dictates, smaller players and companies with software development experience are increasingly using such technologies and products that allow for deep customization. Many manufacturers of load balancers note a decrease in the demand for their products. In other words, the servers and applications that work on them, switches and routers some time ago have already changed qualitatively and entered the SDN era. The balancers are still on the threshold, take this step while the door is open, otherwise you risk losing competitive advantages and moving to the periphery.
It may be interesting
Pleasant data, significant and magnificent plan, as offer great stuff with smart thoughts and ideas, bunches of incredible data and motivation, the two of which I need, on account of offer such an accommodating data here.
Situs QQ Online