Nonlinear Minimization in Octave: Sequential Quadratic Programming

Sequential quadratic programming (SQP) is a gradient-based optimization routine based upon the method of Lagrange Multiplers. Lagrange multipliers are used to incorporate (nonlinear and linear) equality and inequality constraints. However, the goal of this writeup is not to discuss what SQP is, but rather how to use it in Octave.

SQP is implemented as sqp in Octave. Entering help sqp at the terminal yields:

— Function File: [X, OBJ, INFO, ITER, NF, LAMBDA] = sqp (X, PHI, G, H)
Solve the nonlinear program

min phi (x)
x

subject to

g(x) = 0
h(x) >= 0

using a successive quadratic programming method.

The first argument is the initial guess for the vector X.

The second argument is a function handle pointing to the objective
function. The objective function must be of the form

y = phi (x)

in which X is a vector and Y is a scalar.

The second argument may also be a 2- or 3-element cell array of
function handles. The first element should point to the objective
function, the second should point to a function that computes the
gradient of the objective function, and the third should point to a
function to compute the hessian of the objective function. If the
gradient function is not supplied, the gradient is computed by
finite differences. If the hessian function is not supplied, a
BFGS update formula is used to approximate the hessian.

If supplied, the gradient function must be of the form

g = gradient (x)

in which X is a vector and G is a vector.

If supplied, the hessian function must be of the form

h = hessian (x)

in which X is a vector and H is a matrix.
The third and fourth arguments are function handles pointing to
functions that compute the equality constraints and the inequality
constraints, respectively.

If your problem does not have equality (or inequality) constraints,
you may pass an empty matrix for CEF (or CIF).

If supplied, the equality and inequality constraint functions must
be of the form

r = f (x)

in which X is a vector and R is a vector.

The third and fourth arguments may also be 2-element cell arrays of
function handles. The first element should point to the constraint
function and the second should point to a function that computes
the gradient of the constraint function:

[ d f(x) d f(x) d f(x) ]
transpose ( [ —— —– … —— ] )
[ dx_1 dx_2 dx_N ]

Here is an example of calling `sqp’:

function r = g (x)
r = [ sumsq(x)-10;
x(2)*x(3)-5*x(4)*x(5);
x(1)^3+x(2)^3+1 ];
endfunction

function obj = phi (x)
obj = exp(prod(x)) – 0.5*(x(1)^3+x(2)^3+1)^2;
endfunction

x0 = [-1.8; 1.7; 1.9; -0.8; -0.8];

[x, obj, info, iter, nf, lambda] = sqp (x0, @phi, @g, [])

x =

-1.71714
1.59571
1.82725
-0.76364
-0.76364

obj = 0.053950
info = 101
iter = 8
nf = 10
lambda =

-0.0401627
0.0379578
-0.0052227

The value returned in INFO may be one of the following:
101
The algorithm terminated because the norm of the last step
was less than `tol * norm (x))’ (the value of tol is
currently fixed at `sqrt (eps)’–edit `sqp.m’ to modify this
value.

102
The BFGS update failed.

103
The maximum number of iterations was reached (the maximum
number of allowed iterations is currently fixed at 100–edit
`sqp.m’ to increase this value).

See also: qp.

The implementation is fairly detailed. Let’s use sqp in an example. Consider

f(x,y)=x^2+y^2.

We know that the minimum of the parabaloid is at (x,y)=(0,0). In fact, f(x,y) is plotted as

sqp requires an initial estimate to the minimizer x_0, a function somewhere (obj_fcn1.m, we call it) that takes as input value x in R^2 and maps to the non-negative reals. Further, SQP requires some sort of constraint. We only need know (which I’m using loosely) that x_f, the final solution, need be positive. The fourth argument sqp takes is a function (we call it bndfcn.m) that describes the bounds in terms of an output vector with each component positive. For us, literally, the bound function is

function out=bndfcn(x)

out=x(:);

since we want each value of the 2×1 vector x to be positive. We leave the third entry for sqp blank since there are no equality constraints. Just to clarify, out objective function, obj_fcn1.m is given as

function out=obj_fcn1(x)
x=x(:);

out=x’*x;

Our call to sqp is

[x_f,j_f]=sqp(x_0,@(x)obj_fcn1(x),[],@(x)bndfcn(x));

where our initial iterate to the solution is x_0=[10;10]. Running this program, yields the final results x_f=[0;0]; which is exactly the minimizer to f(x,y)=x^2+y^2.

(This post is double-posted at DistroStop.)

Published by

Jon Ernstberger

Husband and father; student and teacher; AVPAA and Prof. of Mathematics at @LaGrangeCollege. Blessed beyond measure.

4 thoughts on “Nonlinear Minimization in Octave: Sequential Quadratic Programming”

  1. Can you please explain the meaning of
    ” If supplied, the equality and inequality constraint functions must be of the form
    r = f (x)
    in which X is a vector and R is a vector.”
    if vector what is the size of the vector?
    Does it mean that CIF & CEF represent several simultaneous constrains?
    Thanks in advance,

    1. Sorry for the long delay in response.

      First, this is the help in Octave when you type “help sqp”. Consequently, I may not have really great responses. Second, you don’t have to supply those. But, if you do, they must follow that certain form.

      The vector X is the current iterate of the vector of parameters you are estimating/minimizing. R is related to that vector by some f (f:x->R). Therefore, R may not necessarily be the same size as your X. However, f represents a mapping of your parameters to your equality constraint space. So the dimension of the range of f is where I’d look.

      CEF (‘g’ in the call) means “constraint equality function” and CIF (‘h’ in the call) means “constraint inequality functions”. While I don’t know your exact problem, those two relations could easily describe multiple simultaneous constraints. I know that I’ve used both inequality and equality constraints to describe hundreds of relationships.

      Hope this helps!

Leave a comment