A Bisection Method

The Bisection Method is perhaps the first of the root-finding methods because it works like our minds do. If we know a root exists in an interval [a,b], all we have to do is keep cutting the interval in half and look in the more appropriate half. How do we do that?

First, note that the bisection method relies upon functions

f:R->R.

This function must be continuous on [a,b]. Further, almost all bisection codes rely on the fact that f(a)*f(b)0 at x=a and x=b. We couldn’t verify the existence of roots. Functions such as g(x)=sin(x)+3 have no roots on any interval! Consequently, without further automation, we must check for f(a)*f(b)<0.The bisection method approximates the root atp_i=(a+b)/2.If f(a)*f(p_i)<0, then we pick b=p_i. Otherwise a=p_i. We continually shrink the interval until abs(b-a)< eps.

A version of the bisection method written for MATLAB or Octave is posted at bisect.m. This code requires a function given by handle and an interval [a,b]. Additional options include maximum iterations, tolerance on the root and tolerance on the function value.

I hope this is useful for coursework in basic numerical analysis classes or for basic root finding.

Simulated Annealing in Octave

The simulated annealing algorithm (proposed by Metropolis in 1953, message me for a reference) draws a reference to the structure of a metal, as it cools, settling into a crystalline structure. As the metal cools, it settles down to one of these lower energy minima. However, as the structure changes into lower minima orientations, the temperature of the metal can actually increase. By mimicking this natural process, a minima is actually achieved. The same analogy is pushed for numerical optimization.

In Octave, simulated annealing is implemented as samin. Typing “help samin” at the Octave terminal yields the results

help samin
samin: simulated annealing minimization of a function. See samin_example.m

usage: [x, obj, convergence, details] = samin(“f”, {args}, {control})

Arguments:
* “f”: function name (string)
* {args}: a cell array that holds all arguments of the function,
* {control}: a cell array with 11 elements
* LB – vector of lower bounds
* UB – vector of upper bounds
* nt – integer: # of iterations between temperature reductions
* ns – integer: # of iterations between bounds adjustments
* rt – (0 < rt <1): temperature reduction factor
* maxevals – integer: limit on function evaluations
* neps – integer: number of values final result is compared to
* functol – (> 0): the required tolerance level for function value
comparisons
* paramtol – (> 0): the required tolerance level for parameters
* verbosity – scalar: 0, 1, or 2.
* 0 = no screen output
* 1 = only final results to screen
* 2 = summary every temperature change
* minarg – integer: which of function args is minimization over?

Returns:
* x: the minimizer
* obj: the value of f() at x
* convergence:
0 if no convergence within maxevals function evaluations
1 if normal convergence to a point interior to the parameter space
2 if convergence to point very near bounds of parameter space
(suggest re-running with looser bounds)
* details: a px3 matrix. p is the number of times improvements were found.
The columns record information at the time an improvement was found
* first: cumulative number of function evaluations
* second: temperature
* third: function value

Example: see samin_example
/usr/libexec/octave/packages/optim-1.0.0/i386-redhat-linux-gnu-api-v32/samin.oct

Additional help for built-in functions and operators is
available in the on-line version of the manual. Use the command
`doc ‘ to search the manual index.

Help and information about Octave is also available on the WWW
at http://www.octave.org and via the help@octave.org
mailing list.

So let’s try this out for the same example f(x,y)=x^2+y^2. The minima is (x,y)=(0,0). This was encoded in the function obj_fcn1.m which looks like

function out=obj_fcn1(x)
x=x(:);
out=x’*x;

Let’s say that our initial estimate is x_0=[10,10]’. Recall, samin takes first the string argument “obj_fcn1”. The second argument is a cell. This cell, denoted by {,….,}, contains all the arguments of the function “obj_fcn1”. In this case, only x_0 is the input. The third argument is a cell argument with 11 interior arguments including lower and upper bounds, iterations between temperature and iteration changes, factor at which temperature is reduced, max function evals, number of values at which final estimate is compared, function tolerance, parameter tolerances, level of screen output, and argument (2nd samin input) which is being minimized. Here is my setup:

x_0=[10,10]’;
LB=[-100,-100]’;
UB=[1e10,1e10]’;
nt=100;
ns=100;
rt=1e-3;
maxevals=1e8;
neps=100;
functol=1e-7;
paramtol=1e-7;
verbosity=2;
minarg=1;
[x_f,j_f]=samin(“obj_fcn1”,{x_0},{LB,UB,nt,ns,rt,maxevals,neps,functol,paramtol,verbosity,minarg});

The argument minarg is set to 1 since there is only one input to obj_fcn1.

Here is the final output from the run using the commands above:

samin: intermediate results before next temperature change

temperature 1.000000e-306
current best function value 0.000000
total evaluations so far 2280000
total moves since last temperature reduction 20000
downhill 4824
accepted uphill 4808
rejected uphill 10368
out of bounds trials 0
new minima this temperature 4

parameter search width
-0.000000 0.000000
-0.000000 0.000000

================================================
SAMIN results

==> Normal convergence <==

Convergence tolerances:
Function: 1.000000e-07
Parameters: 1.000000e-07

Objective function value at minimum: 0.000000

parameter search width
-0.000000 0.000000
-0.000000 0.000000
================================================

Simulated annealing, as a stochastic search, is quite excellent with good convergence properties. In cases, where no knowledge of derivatives is had or where there are many local minima, SA is a good choice. SA has known global convergence properties and might be the answer for your problem.

(Also posted at www.distrostop.org.)

Nonlinear Minimization in Octave: Sequential Quadratic Programming

Sequential quadratic programming (SQP) is a gradient-based optimization routine based upon the method of Lagrange Multiplers. Lagrange multipliers are used to incorporate (nonlinear and linear) equality and inequality constraints. However, the goal of this writeup is not to discuss what SQP is, but rather how to use it in Octave.

SQP is implemented as sqp in Octave. Entering help sqp at the terminal yields:

— Function File: [X, OBJ, INFO, ITER, NF, LAMBDA] = sqp (X, PHI, G, H)
Solve the nonlinear program

min phi (x)
x

subject to

g(x) = 0
h(x) >= 0

using a successive quadratic programming method.

The first argument is the initial guess for the vector X.

The second argument is a function handle pointing to the objective
function. The objective function must be of the form

y = phi (x)

in which X is a vector and Y is a scalar.

The second argument may also be a 2- or 3-element cell array of
function handles. The first element should point to the objective
function, the second should point to a function that computes the
gradient of the objective function, and the third should point to a
function to compute the hessian of the objective function. If the
gradient function is not supplied, the gradient is computed by
finite differences. If the hessian function is not supplied, a
BFGS update formula is used to approximate the hessian.

If supplied, the gradient function must be of the form

g = gradient (x)

in which X is a vector and G is a vector.

If supplied, the hessian function must be of the form

h = hessian (x)

in which X is a vector and H is a matrix.
The third and fourth arguments are function handles pointing to
functions that compute the equality constraints and the inequality
constraints, respectively.

If your problem does not have equality (or inequality) constraints,
you may pass an empty matrix for CEF (or CIF).

If supplied, the equality and inequality constraint functions must
be of the form

r = f (x)

in which X is a vector and R is a vector.

The third and fourth arguments may also be 2-element cell arrays of
function handles. The first element should point to the constraint
function and the second should point to a function that computes
the gradient of the constraint function:

[ d f(x) d f(x) d f(x) ]
transpose ( [ —— —– … —— ] )
[ dx_1 dx_2 dx_N ]

Here is an example of calling `sqp’:

function r = g (x)
r = [ sumsq(x)-10;
x(2)*x(3)-5*x(4)*x(5);
x(1)^3+x(2)^3+1 ];
endfunction

function obj = phi (x)
obj = exp(prod(x)) – 0.5*(x(1)^3+x(2)^3+1)^2;
endfunction

x0 = [-1.8; 1.7; 1.9; -0.8; -0.8];

[x, obj, info, iter, nf, lambda] = sqp (x0, @phi, @g, [])

x =

-1.71714
1.59571
1.82725
-0.76364
-0.76364

obj = 0.053950
info = 101
iter = 8
nf = 10
lambda =

-0.0401627
0.0379578
-0.0052227

The value returned in INFO may be one of the following:
101
The algorithm terminated because the norm of the last step
was less than `tol * norm (x))’ (the value of tol is
currently fixed at `sqrt (eps)’–edit `sqp.m’ to modify this
value.

102
The BFGS update failed.

103
The maximum number of iterations was reached (the maximum
number of allowed iterations is currently fixed at 100–edit
`sqp.m’ to increase this value).

See also: qp.

The implementation is fairly detailed. Let’s use sqp in an example. Consider

f(x,y)=x^2+y^2.

We know that the minimum of the parabaloid is at (x,y)=(0,0). In fact, f(x,y) is plotted as

sqp requires an initial estimate to the minimizer x_0, a function somewhere (obj_fcn1.m, we call it) that takes as input value x in R^2 and maps to the non-negative reals. Further, SQP requires some sort of constraint. We only need know (which I’m using loosely) that x_f, the final solution, need be positive. The fourth argument sqp takes is a function (we call it bndfcn.m) that describes the bounds in terms of an output vector with each component positive. For us, literally, the bound function is

function out=bndfcn(x)

out=x(:);

since we want each value of the 2×1 vector x to be positive. We leave the third entry for sqp blank since there are no equality constraints. Just to clarify, out objective function, obj_fcn1.m is given as

function out=obj_fcn1(x)
x=x(:);

out=x’*x;

Our call to sqp is

[x_f,j_f]=sqp(x_0,@(x)obj_fcn1(x),[],@(x)bndfcn(x));

where our initial iterate to the solution is x_0=[10;10]. Running this program, yields the final results x_f=[0;0]; which is exactly the minimizer to f(x,y)=x^2+y^2.

(This post is double-posted at DistroStop.)