Powering IT equipment: safety or security? part 2
We continue the article, the purpose of which is to share experience and show key features and frequent mistakes arising in the design and organization of power supply subsystems of IT infrastructure and data center in general. But I would like to slightly expand the audience and devote several sections to basic elements of electrical safety and protection of equipment and people.
Those who missed the first part or wants to remember the first part of you can go here .
For those who understand what an automaton and RCD are, for which they are needed, what they are protecting from - go to section Do you need RCD for IT equipment, server room, data center? .
Let's see what relationship between energy and the final IT equipment, we will understand the issue in which cases of power outages the operating system is guaranteed to work without failures.
quoted above for IT equipment, "a break in electricity supply is unacceptable". And what is hidden under this phrase? What is a "break" in the supply of information equipment? Now let's look at a living example.
The customer introduces a local server room together with an IT infrastructure of two floors under the office of the company. At the stage of discussing the power supply system, he has a desire to supply all information equipment with one power supply unit (BP), and leave the second slot under the BP servers free, and to mount a single ATS rack-mount version for the entire rack. (Fig. ? diagram).
External view of the back of the server with duplicated power supplies
As the Customer argued his desire :
Savings of funds ($ 500-800 from each device in the rack)
You can put two elementary PDUs and apply them already for power distribution after ATS
Absolutely similar level of system reliability, in comparison with the classical method of distribution
We took a time-out, investigated in detail the Customer's desire from various points of view, the reliability of services in general in the warranty and post-warranty period, as well as:
cost (savings) of capital costs during implementation (CAPEX)
cost of depreciation costs, maintenance of spare parts, labor costs of the customer's personnel (
comparison of the operation algorithms and the switching time to the backup line in both versions, checking for "single failure points"
the level of risks of suspending and /or restarting operating systems of information equipment, the fall of information services that operate on them.
And that's what it turned out:
According to the regulatory framework GOST 32144-2013 (Electrical energy: Electromagnetic compatibility of the equipment, Norms of power quality in general-purpose networks, Date of introduction - July ? 2014), the main cause of malfunctions in the operation of information equipment can be voltage dips, which are
usually occur due to malfunctions in electrical networks or in electrical installations of consumers, as well as when connecting a powerful load
duration of voltage dips can be up to 1 minute
This phrase tells us that the information equipment should be provided by the UPS and /or high-speed AVRs, since voltage dips of similar duration are permissible and normal in terms of high energy, but will be fatal for IT equipment and services.
By the way, it is worth noting that at the moment in the current regulatory framework of the Russian Federation there are contradictions in the measurement of quantities relating to the quality of electricity, more can be read in the article of technical director of the direction of our company Viktor Cherdak (source
Some excerpts from the article [/b]
In recent years, state standards in the field of measurement of electrical energy parameters related to CE have been actively developed and have been repeatedly reworked
An important change was the replacement of GOST 13109-97 "Electric energy. Compatibility of technical means is electromagnetic. Norms of quality of electric energy in general-purpose power supply systems »to GOST 32144-2013. These standards determine a different range of energy quality indicators.
But how fast? How to determine the time in milliseconds for which the service (and the server) of the customer will not fall, and the operating system will not go into the "critical error"?
There is a standard CBEMA (Computer and Business Equipment Manufacturers Association), which, after some adjustments, is now known as the "ITIC curves" (Information Technology Industry Council), and its variants are included in the IEEE 446 ANSI standards. According to these standards, electronic circuits of power supplies must remain operational for 20 ms (or ??? seconds, that is, a period).
Those curves ITIC
According to the requirements for power supplies of server and computer systems Server System Infrastructure we can say that the power supply parameter is Tvout_holdup During the failure of the mains supply voltage, the information equipment operates at a minimum of 21 msec. That is, the full period of the network is the guaranteed time of normal operation of the server or switch. Parameter Tpwok_holdup is defined as a minimum of 20 msec.
some details on the parameters of SSI can be found here [/b]
Reference: Hold-up time is the time period during which the power supply unit can maintain output voltages within certain limits after the power supply voltage is lost at its input. In most computer power supplies, Hold-up time also characterizes how long the power good signal (PWR_OK) tells the system that the voltages generated by the power supply are unstable (for computer power supplies this parameter is usually more than 16 ms).
Here is one of the tables in document
And this is a time-line diagram with the regulated algorithms of the BP
Now let's see what the switching time states APC, for example, for a rack-mount switch of the brand AP7721 . We see that here we usually have 8-12 ms, but 18 ms is the maximum switching time.
We can conclude that the switching time for the backup input for the rack-mount load switch corresponds to the specification of the power supply of the server equipment. It turns out that there will be no failures in the work of information equipment.
Summary table of timings of system elements [/b]
And what do we have with the economic component and which of the options is more profitable and fault-tolerant?
Suppose we have three small servers in the rack, into which we can put two power supplies and three devices with non-duplicated power supplies. All are critically important and the failure of any of the devices will lead to the failure of the entire system of the customer as a whole. We need a rack-mounted load switch in any case. This is about 18 thousand rubles.
The customer declares that they do not need the PDU (PDU), which means that in the budget there will be only ATS cost - the same 18 thousand rubles. As a replacement for power distribution units (PDU), the Customer proposes to use the power distribution "on board" the rack-mounted load switch. Also, the customer plans to buy a server with two slots for power supplies, but with one PSU for the sake of economy. (Figure 4)
The classic version is (Figure 3) assumes a set of 2 PDUs - about ??? rubles, 3 additional power supplies to servers for $ 500 each for 84 thousand rubles total. ATS for the same 18 thousand rubles. Having combined everything, we understand that the classic solution will cost the Customer about 134 thousand rubles.
It seems to be true, the Customer is right, the money is completely different. But let's see from the point of view of fault tolerance and ease of maintenance of both options:
Variant of the customer: The single point of failure is - A rack-mounted load switch. If something happens to him, then we lose the entire rack entirely. So, you need to have a spare parts inventory right on the site, which adds 1?000 rubles to the estimate. Power supplies in the servers are one by one, they are also failure points. Hence, it is desirable to have at least one, and preferably all three power supplies in the reserve on the site. Let's assume that we need three PSUs in the spare parts - this is plus 36 thousand rubles. It is necessary to check the power that the ATS rack-mountable can commute. Now we proceed from the fact that 3 kW or 16A will be enough for all the rack equipment. If we need ATS at 32A (7kW), then it will be much more expensive (more than 100 thousand rubles). That is the budget of the Customer's option with a detailed consideration of the reliability of grows to 160 thousand rubles . In this case, in the event of an emergency, despite the fact that spare parts will be needed down-time on the site to replace the device.
A single point of failure (SPOF, Single Point Of Failure) is a node, link or object of a data availability system whose failure can disable the entire system, or cause the inaccessibility of
data. Option of Open Technologies : By Figure 3 , but if necessary, ATS is added for small network equipment with a single power supply.
The point of failure is the same ATS. If something happens to him, then we lose the entire rack entirely. We agree that you need to have spare parts on site. But in our case, if only ATS fails, this can only affect the operation of switches and auxiliary equipment. The servers themselves will continue to work quietly. Power supplies in the spare parts are not needed. Since if one of the redundant power supplies fails, the server will continue to work on the remaining ones, and, most likely, the new power supply from the vendor will wait, regardless of the distance of the site.
Interpretation of the term SPOF for IT systems [/b]
Single point of failure (SPOF, Single Point Of Failure) - a node, device or point of the circuit whose failure can disable the entire system, cause inaccessibility of data and services. It is considered in the development and design of any critical systems. The total absence of single points of failure leads to a significant increase in capital costs during implementation, so the criticality of the operation of a particular system or service is determined at the design stage based on the project budget, as well as the wishes and requirements of the Customer. We always find the ideal solution for each customer, defining several options for the implementation of the project, and offering them to the customer. As a result, at the stage of delivery of the project, the customer receives exactly the solution he wanted to see by the price /quality /reliability ratio.
Thus, it is possible to connect all rack equipments to a single ATS, but it is not rational, since in this case we obtain a single point of failure in power supply. The purchase of servers with redundant power supplies is preferable in any case, since fault tolerance at the level of information equipment increases many times.
The rack-mounted load switch provides a correct and almost instantaneous switching to the reserve input, the information equipment will not even feel This, software products and operating systems will continue to work correctly. Rack power distribution units are in any case necessary and there is no need to save on them. Visible savings in the capital cost of power distribution can result in unsolvable operational problems, for example, the need to "extinguish" the entire rack only in order to move the ATS to another unit or to audit the rack-mounted load switch. In any case, for duplicate power supplies there should be a spare parts kit, and it is not always possible or available.
Appearance of the removable power supply server:
The use of a rack-mounted AVR has its own peculiarities [/b]
For example, the power of such ATS is limited, and it can switch a complex of relatively weak loads in terms of power consumption. There are questions to the number of output power connectors. For example, the aforementioned ATS AP7721 is equipped with an input of type C14 connectors, which means a maximum switching power of 2.5 kW. At higher load power, there is a 2U model AP7724 , which is connected to the input with a connector for 32 A, that is, the maximum capacity of the equipment can be up to 7 kW. And this means that a typical rack with equipment can be connected to this ATS completely. However, the price of such a decision will be more than 100 thousand rubles.
The work of information equipment with two power supplies was well described in article by Vadim Sinitsky @ dimskiy . As you can see, there are advantages and disadvantages. And the availability of redundant power supplies for information equipment in any case is necessary, especially if the object is out of the fast delivery zone of the power supply from the vendor. In addition, we want to note that online calculators for calculating the power of new servers from vendors can only be used as a guide for system administrators, the Customer's personnel.
The real possibilities to connect a new powerful server to an existing rack should be evaluated taking into account the initial design of the power supply, the current status and load of the rack grid, server, UPS, generator In terms of connection in the rack, it is also worth considering:
current capabilities of the PDU, the type of free connectors in them is
the denominations of automatic machines in the boards and the cross-section and the phase of the cable line to the rack.
Separate attention should be paid to the reliability of the server's power supply system, if it is built according to the system depicted at Fig. 2 (with two bus systems), the presence of a new powerful server can in the event of repair work lead to an overload of the entire power supply system, reduce the battery life of UPS on batteries , make the UPS go to the bypass for overload and so on
And how do you build a rack distribution system?
What is the BP resource for IT equipment and the algorithm for their software backup?
Which do you prefer to use BRP: basic, with monitoring? how useful is the function "controlled PDU /PDU" in practice? And has it ever helped you?
Author: Kulikov Oleg, Publisher:
Leading engineer designer
Department of Integration Solutions
Registration in the National Register of Specialists "NOPRIZ" P-045870
Only registered users can participate in the survey. Enter , you are welcome.
What type of power distribution unit (PDU) has the optimal set of functions?
Basic (basic, just a set of connectors)
metered (with measurement functions, including remote monitoring)
switched (remote disconnection of individual consumers)
6 people voted. Abstained 5 users.
It may be interesting