Convex Vs. Non-convex Optimization

Author: Aster Santana

Optimization problems have been part of the scientific world for thousands of years. Especially over the past few decades, optimization algorithms have profoundly changed the way we make practical decisions.

  • In revenue management, they tell us which ads have a higher probability of yielding the most profit;

  • In civil engineering, they help us understand the circumstances under which a given structure, such as a bridge, might collapse;

  • In public health, they help identify strains causing disease;

  • In chemical engineering, they determine the flow of raw material, such as crude oil, from field to refinery;

  • In computer vision, they are used for image recognition.

And many other examples could be listed.

While some optimization problems are easy to solve, others are extremely difficult. Finding an efficient algorithm for certain classes of problems poses a major scientific task. Despite the fact that computers and mathematical methods have been evolving tremendously, many complex optimization problems arising in our modern society remain unsolved or partially unsolved.

Classes of optimization problems

Optimization problems can be categorized in different ways. In particular, we can classify problems as deterministic or non-deterministic. In inventory optimization, for instance, we typically need to account for demand and supply variabilities, in which case a non-deterministic model, i.e., a stochastic model, is the best fit. However, deterministic models are generally much more tractable.

If we account for the type of decision variables present in the model, such as continuous or discrete, and the types of expressions defining the objective function and constraints, with respect to linearity and convexity, for instance, we can further classify deterministic models. The diagram in Figure 1 shows the classification where the tone of blue suggests the traceability of each class of problems (darker means harder and lighter means easier to solve).

 

Figure 1: Classes of optimization problems.

 

In this context, the term “discrete” indicates the presence of integer or binary decision variables in the model.

As we will see next, non-convex optimization problems are more challenging to solve than their convex counterpart. For similar reasons, discrete problems are usually much harder to solve than continuous problems. Nevertheless, non-convexity and discrete decisions are present almost everywhere in applications and, in many situations, we can’t avoid dealing with them.

Convex Optimization

Perhaps, the most classical (and easy to solve) example of a convex problem is the linear program (LP), which can be formulated as follows:

\[ \begin{eqnarray} \begin{array}{rcl} & \min & c^{\top} x\\ & \text{s.t.}& Ax \geq b,\\ && x \in \mathbb{R}_+^n. \end{array} \end{eqnarray} \] Here, \(A\) is an \(m \times n\) matrix, \(c\) and \(b\) are vectors of appropriate dimensions and \(R_+^n\) is the set of non-negative vectors in n-dimensional space.

LPs are extremely useful for modeling a variety of problems and can be efficiently solved in practice using the simplex algorithm and interior-point methods.

However, not every problem can be formulated using only linear expressions. Quadratic terms, for instance, are often needed for modeling basic phenomena emerging from economics, biology, and engineering, just to mention a few. This necessity led to a generalization of LP known as conic programming (CP), which can be represented as follows:

$$ \begin{eqnarray} \begin{array}{rcl} & \min & c^{\top} x\\ & \text{s.t.}& Ax - b \in K,\\ && x \in \mathbb{R}_+^n. \end{array} \end{eqnarray} $$ Here, \(A\) is an \(m \times n\) matrix, \(c\) and \(b\) are vectors of appropriate dimensions and \(K\) is a regular cone (i.e., closed, convex, pointed, and full-dimensional) in n-dimensional space. In particular, if \(K = R_+^n\), which is a regular cone, we recover the LP above. Among the most classical examples of conic programs are the semidefinite program (SDP) and the second-order program (SOCP). In the SDP, \(K\) is replaced with the cone of symmetric positive semidefinite matrices. In the case of SOCP (which is a special case of the SDP), \(K\) is replaced with the second-order cone, which is also known as the Lorentz cone or ice cream cone in three-dimensional space:
 

Figure 2: Ice Cream Cone.

 

With fundamental theoretical progress and recent software developments, conic programs can be efficiently solved for relatively large instances using interior-point methods (often used for solving large-scale instances of LPs as well), especially in the case of second-order cone programming.

One feature that all convex problems have in common is that every local optimal solution is a global optimal solution (see points A and B in Figure 3 for an example of a global and a local solution, respectively).

Non-convex Optimization

Non-convexity in optimization problems is generally due to the presence of discrete variables (binary or integer in most cases) or non-convex expressions (such as the product of two variables), and sometimes both. Different from convex programming, non-convex problems may have multiple local optima (or other stationary points) that are not necessarily a global optima. This is one of the most troubling aspects of solving non-convex programs.

Figure 3 illustrates this fact in a continuous setting. Specifically, the shaded region on the left represents the feasible region of a non-convex problem, and the dashed lines represent a linear objective function that is to be minimized over this region. Therefore, points A and B are both local optimal solutions but only A is a global optimal solution to this problem.

 
Duality Gap

Figure 3: Local optima and duality gap.

 
The shaded region on the right of Figure 3 represents a convex relaxation (not the best one) of the original problem, where \(C\) is the minimizer. Observe that \(C\) yields a lower bound, zero in this case, on the optimal objective value of the original problem. Solutions \(A\) and \(B\) yield upper bounds. The difference between the smallest known upper and the largest known lower bound (in the case of a minimization problem) defines the duality gap, which provides a measure of how far we may be from global optimality. For instance, if we only knew solution \(B\), but not \(A\), then \(2-0=2\) would be the absolute duality gap in the example of Figure 3. The goal is to close the duality gap. Once closed, a provably global optimal solution has been found.

At this point, it should be clear that the ideal convex relaxation would be the smallest convex set that still contains all the feasible solutions to the problem, i.e., the convex envelope (also called the convex hull) of the entire set of feasible solutions. However, this is out of reach, except in some special cases. Thus, in reality, convex relaxations by themselves are not enough to solve a non-convex program to global optimality. A very successful approach then, in both discrete and continuous settings, is to use branch-and-bound algorithms. The idea is to partition the feasible region in such a way that:

  1. the optimal solution to the previous relaxation is cut off;

  2. no feasible solution to the original problem is removed;

  3. the quality of the convex relaxation is improved over each partition.

This step is performed recursively for each partition until the duality gap is closed. It is clear then that the mechanism used to derive the convex relaxations plays a major role in the performance of branch-and-bound algorithms. This motivates the study of the convexification of sets.

 

If you have any questions, comments, or would like to chat more about this topic, just reach out to us.