# Overview of the main mathematical optimization methods for problems with constraints

I have been preparing and collecting material for a long time, I hope this time turned out better. This article is dedicated to the main methods for solving mathematical optimization problems with constraints, so if you have heard that the simplex method is some kind of very important 3r3483. method, but still do not know what it does, then maybe this article will help you.
3r3699.
3r3699. P. S. The article contains mathematical formulas added by a macro editor. They say that they are sometimes not displayed. There are also many gif animations.
3r3699.

3r3699. 3r33478. Preamble
3r3699. The task of mathematical optimization is a task of the form “To find in the set
3r33535. 3r3675. element
3r3675. such that for all
3r33545. 3r3675. from 3r3673. 3r33535. 3r3675. performed
3r3675. ”, That in scientific literature it will rather be written down somehow like this
3r3699. 3r3650. 3r3654. 3r3650.
3r3675. 3r3654.
3r3699. Historically, popular methods such as gradient descent or Newton's method work only in linear spaces (and preferably simple ones, for example, 3r3673. 3r3888. 3r36755.). In practice, there are often problems where you need to find a minimum in a non-linear space. For example, you need to find the minimum of some function among such vectors
3r3675. for which
3r3675. This may be due to the fact that
3r3675. denote the length of any objects. Or for example, if
3r33545. 3r3675. represent the coordinates of a point that should be at a distance of no more than
3r3365. 3r3675. from 3r3673. 3r3675. i.e.
3r371. 3r3675. . For such tasks, gradient descent or Newton's method is not directly applicable. It turned out that a very large class of optimization problems is conveniently covered by “constraints”, similar to those I described above. In other words, it is convenient to represent the set
3r33535. 3r3675. in the form of a system of equalities and inequalities
3r3699. 3r3650. 3r3654. 3r3650.
3r3675. 3r3654.
3r3699. Minimization tasks over the space of the form
3r388. 3r3675. thus, they became conditionally called “problems without constraints” (the unconstrained problem), and tasks over sets defined by sets of equalities and inequalities - “problems with constraints” (constrained problem).
3r3699.
3r3699. Technically, absolutely any set of
3r33535. 3r3675. can be represented as a single equality or inequality using 3r3602. indicator 3r3603. -function, which is defined as
3r3699. 3r3650. 3r3654. 3r3650.
3r3105. 3r3675. 3r3654.
3r3699. however, such a function does not have different useful properties (convexity, differentiability, etc.). However, it is often possible to imagine
3r33535. 3r3675. in the form of several equalities and inequalities, each of which has such properties. The basic theory is summarized under case 3r3r9689.  3r3699. 3r3650. 3r3654. 3r3650.
3r3119. 3r3675. 3r3654.
3r3699. where
3r3125. 3r3675. - convex (but not necessarily differentiable) functions,
3r3128. 3r3675. - matrix. To demonstrate how the methods work, I will use two examples:
3r3699. 3r31717.  3r3699.
3r3699. 3r3650.
\$\$ display \$\$ begin {array} {rl} mbox {minimize} & -2 & x ~~~ - & y mbox {provided} & -1.0 & ~ x -0.1 & ~ leq -1.0 & -1.0 & ~ x + ~ 0.6 & ~ y leq -1.0 & -0.2 & ~ x + ~ 1.5 & ~ y leq -0.2 & ~ 0.7 & ~ x + ~ 0.7 & ~ y leq 0.7 & ~ 2.0 & ~ x -0.2 & ~ y leq 2.0 & ~ 0.5 & ~ x -1.0 & ~ y leq 0.5 & -1.0 & ~ x -1.5 & ~ y leq -1.0 end {array} \$\$ display \$\$
3r3654.
3r3699. In essence, this task is to find the farthest point of the polygon in the direction (? 1), the solution to the problem is the point (4.? 3.5) - the most “right” in the polygon). But the actual polygon itself
3r3699. 3r3145.
3r3699.
3r3699.
3r3699. 3r3650. 3r3654. 3r3650.
3r3675. 3r3654.
3r3699.
3r3699. 3r3659.
3r3699.
3r3699. 3r33478. Simplex method
3r3699. Of all the methods that I cover with this review, the simplex method is probably the most famous. The method was developed specifically for linear programming and the only one presented achieves an exact solution in a finite number of steps (provided that exact arithmetic is used for calculations, in practice this is usually not the case, but in theory it is possible). The idea of ​​the simplex method consists of two parts:
3r3699. 3r31717.  3r3699.
Systems of linear inequalities and equalities define multidimensional convex polyhedra (polytopes). In the one-dimensional case, it is a point, a ray, or a segment, in a two-dimensional one, a convex polygon, and in a three-dimensional case, a convex polyhedron. Minimizing a linear function is essentially finding the furthest point in a particular direction. I think intuition should suggest that there should be some peak at this furthest point, and this is indeed so. In general, for a system of
3r3179. 3r3675. inequalities in
3r3191. 3r3675. -dimensional space, a vertex is a point satisfying a system for which exactly
3r3191. 3r3675. of these inequalities turn into equalities (provided that among the inequalities there are no equivalent). There are always a finite number of such points, although there may be a lot of them.
3r3699.
Now we have a finite set of points, generally speaking, you can simply pick them up, that is, to do something like this: for each subset of 3r3673. 3r3191. 3r3675. inequalities to solve the system of linear equations compiled on the selected inequalities, verify that the solution fits into the original system of inequalities and compare with other such points. This is a fairly simple inefficient, but working method. The simplex method instead of iteration moves from vertex to vertex along edges so that the values ​​of the objective function are improved. It turns out that if a vertex has no “neighbors” in which the function values ​​are better, then it is optimal.
3r3699.
3r3699. 3r3659.
3r3699. The simplex method is iterative, that is, it consistently improves the solution slightly. For such methods, you need to start somewhere, in general, this is done by solving the auxiliary problem 3r3689.  3r3699. 3r3650. 3r3654. 3r3650.
3r3675. 3r3654.
3r3699. If to solve this problem
3r3675. such that
3r3675. then runs
3r3675. otherwise, the original problem is generally given on the empty set. To solve an auxiliary problem, you can also use the simplex method, but with the starting point you can take
3r3675. with an arbitrary
3r33545. 3r3675. . Finding the starting point can be called the first phase of the method, finding the solution to the original problem can be called the second phase of the method.
3r3699.
3r3699. 3r38282.
The trajectory of the two-phase simplex method [/b]
The trajectory was generated using scipy.optimize.linprog.
3r3699.
3r3699. 3r33700. 3r33700.
3r3699.
3r3699. About gradient descent, I recently wrote a separate 3r33251. Article 3r3483. , in which including briefly described this method. Now this method is quite alive, but is being studied as part of a more general 3r3602. 3x3r3603 proximal gradient descent. . The very idea of ​​the method is quite trivial: if we apply a gradient descent to a convex function
3r? 3551. 3r3675. , then, with the right choice of parameters, we get a global minimum of 3r3673. 3r? 3551. 3r3675. . If, after each step of the gradient descent, to correct the resulting point, taking instead its projection onto the closed convex set
3r33535. 3r3675. , as a result, we get the minimum of the function
3r? 3551. 3r3675. on 3r3673. 3r33535. 3r3675. . Well or more formally, a projective gradient descent is an algorithm that sequentially calculates
3r3699. 3r3650. 3r3654. 3r3650.
3r3675. 3r3654.
3r3699. where
3r3699. 3r3650. 3r3654. 3r3650.
3r3675. 3r3654.
3r3699. The last equality defines the standard projection operator on the set, in essence, this is a function that is at the point
3r33545. 3r3675. computes the point of the set
closest to it. 3r33535. 3r3675. . The role of distance is played here
3r33232. 3r3675. It is worth noting that you can use any norm Nevertheless, projections with different norms may differ!
3r3699.
3r3699. In practice, projective gradient descent is used only in special cases. Its main problem is that the calculation of the projection can be even more challenging than the original, and it needs to be calculated many times. The most common case for which it is convenient to apply a projective gradient descent is “boxed restrictions”, which look like 3r3689.  3r3699. 3r3650. 3r3654. 3r3650.
3r3675. 3r3654.
3r3699. In this case, the projection is calculated very simply, for each coordinate we get
3r3699. 3r3650. 3r3654. 3r3650. r_i e2i20 & x_i
The trajectory of the projective gradient descent for the linear programming problem 3r3684.
3r33337. 3r33700.
3r3699. 3r33700. 3r33700.
3r3699. And here is what the projective gradient descent trajectory looks like for the second task, if
3r3699. 3r38282.
Choose a large step size [/b]
3r33352. 3r33700.
3r3699. 3r33700. 3r33700.
3r3699. and if 3r3689.  3r3699. 3r38282.
choose a small step size [/b]
3r33333. 3r33700.
3r3699. 3r33700. 3r33700.
3r3699.
3r3699. 3r33478. The ellipsoid method
3r3699. This method is notable for the fact that it is the first polynomial algorithm for linear programming problems; it can be considered a multidimensional generalization of 3-33381. bisection method 3r3483. . I will start with the more general separating hyperplane method :
3r3699.
3r3699.
At each step of the method there is a set that contains the solution to the problem.
3r3699.
At each step, a hyperplane is built, after which all points lying on one side of the selected hyperplane are removed from the set, and perhaps some new points will be added to this set.  3r3699. 3r33395.
3r3699. For optimization problems, the construction of a “separating hyperplane” is based on the following inequality for convex functions 3r3r6868.  3r3699. 3r3650. 3r3654. 3r3650.
3r3404. 3r3675. 3r3654.
3r3699. If you fix
3r33545. 3r3675. , then for the convex function
3r? 3551. 3r3675. half space
3r31616. 3r3675. contains only points with a value not less than at the point
3r33545. 3r3675. , which means they can be cut off, since these points are no better than the one we have already found. For problems with constraints, you can likewise get rid of points that are guaranteed to violate any of the constraints.
3r3699.
3r3699. The simplest version of the separating hyperplane method is to simply cut off half-spaces without adding any points. As a result, at each step we will have a certain polyhedron. The problem with this method is that the number of faces of a polyhedron is likely to increase from step to step. Moreover, it can grow exponentially.
3r3699.
3r3699. The ellipsoid method actually stores an ellipsoid at every step. More precisely, after the hyperplane is constructed, an ellipsoid of minimum volume is constructed, which contains one of the parts of the original one. This is achieved by adding new points. An ellipsoid can always be defined by a positive definite matrix and vector (center of the ellipsoid) as follows
3r3699. 3r3650. 3r3654. 3r3650.
3r33434. 3r3675. 3r3654.
3r3699. The construction of a minimal volume ellipsoid containing the intersection of a half-space and another ellipsoid can be done with the help of 3r33440. moderately bulky formulas
. Unfortunately, in practice, this method was still not as good as the simplex method or the interior point method.
3r3699.
3r3699. But actually how it works for 3r3689.  3r3699. 3r38282.
linear programming 3r3684.
3r3699. 3r33700. 3r33700.
3r3699. and for 3r3689.  3r3699. 3r38282.
3r33434. 3r33700.
3r3699. 3r33700. 3r33700.
3r3699.
3r3699. 3r33478. Interior point method
3r3699. This method has a long history of development, one of the first prerequisites appeared around the same time as the simplex method was developed. But at that time it was still not effective enough to used in practice. Later in 198? was developed. option method specifically for linear programming, which was good both in theory and in practice. Moreover, the internal point method is not limited only to linear programming, in contrast to the simplex method, and now it is the main algorithm for convex optimization problems with constraints.
3r3699.
3r3699. The basic idea of ​​the method is the replacement of restrictions on the penalty in the form of the so-called barrier function . Function
3r3675. called barrier function for the set
3r33535. 3r3675. if 3r3689.  3r3699. 3r3650. 3r3654. 3r3650.
3r3675. 3r3654.
3r3699. Here
3r? 3510. 3r3675. - the inside of
3r33535. 3r3675. , 3r3673. 3r? 3516. 3r3675. - border 3r3673. 3r33535. 3r3675. . Instead, the original problem is proposed to solve the problem 3r3689.  3r3699. 3r3650. 3r3654. 3r3650.
3r33535. 3r3675. 3r3654.
3r3699.
3r? 3533. 3r3675. and 3r3673. 3r33557. 3r3675. are given only on the insides
3r33535. 3r3675. (essentially hence the name), the barrier property ensures that
3r33557. 3r3675. at least
3r33545. 3r3675. exists. Moreover, the larger
3r3675. the greater the impact of
3r? 3551. 3r3675. . Under sufficiently reasonable conditions, one can achieve that if one rushes to
3r3675. to infinity, then at least
3r33557. 3r3675. will converge to the solution of the original problem.
3r3699.
3r3699. If set
3r33535. 3r3675. given as a set of inequalities
3r3-3567. 3r3675. then the standard choice of barrier function is logarithmic barrier
3r3699. 3r3650. 3r3654. 3r3650.
3r33577. 3r3675. 3r3654.
3r3699. Minimum points
3r?383. 3r3675. functions
3r33586. 3r3675. for different 3r3673. 3r3675. forms a curve, which is usually called central path , the internal point method tries to follow this path. This is how it looks for
3r3699. 3r38282.
Examples with linear programming [/b]
3r3599.
3r3699. 3r3602. Analytical center [/i] - it's just
3r3675.
3r3699. 3r33700. 3r33700.
3r3699.
3r3699. Finally, the internal point method itself has the following form
3r3699. 3r31717.  3r3699.
Select the initial approximation
3r3675. , 3r3673. 3r32424. \$ 0 "data-tex =" inline "> 3r3363675. 3r33657.  3r3699.
Select a new approximation using the Newton method
3r3699. 3r3650. 3r3654. 3r3650. 3r3654.
3r3699.
3r3699.
Click to enlarge
3r3675.
3r3699. 3r3650. 3r3654. 3r3650.
1 \$ "data-tex =" display "> 3r3363675.
3r3699.
3r3699. 3r3659.
3r3699. The use of Newton's method is very important here: the fact is that with the right choice of the barrier function, the step of Newton's method generates a point that remains inside our set. Finally, the trajectory of the internal point method
looks like this.  3r3699. 3r38282.
The task of linear programming [/b]
3r3669. 3r33700.
3r3699. The jumping black dot is
3r3675. i.e. point to which we are trying to approach the step of the Newton method in the current step.
3r3699. 3r33700. 3r33700.
3r3699. 3r38282.
The problem of quadratic programming [/b]
3r3699. 3r33700. 3r33700. 3r33700. 3r3699. 3r3699. 3r3699.
3r3699. 3r33700.
+ 0 -