Downhill Simplex Method for Many (~20) Dimensions

Grzegorz Kaczmarczyk
Institute of Theoretical Physics and Astrophysics, University of Gdansk,
Wita Stwosza 57, Pl-80-952 Gdansk, Poland
E-mail: grzegorz@iftia.univ.gda.pl

The downhill simplex method is due to Nelder & Mead (1965). The method requires only function evaluations, not derivatives. It is not very efficient in terms of the number of function evaluations that it requires. However the downhill simplex method may frequently be the best method to use. In case of may dimensions (~20) the function sometimes does not converge to the minimum but the simplex is constantly shrinking. The improvement solves this problem. The classical simplex method will be described bellow and the improvement will be discussed.

The method has a geometrical naturalness about it which makes it delightful to describe or work through. A simplex is the geometrical figure consisting, in N dimensions, of N+1 points (or vertices) and all their interconnecting line segments, polygonal faces, etc. In two dimensions, a simplex is a triangle. In three dimensions it is a tetrahedron, not necessarily regular tetrahedron. In general we are only interested in simplexes that are non-degenerate, i.e. which enclose a finite N-dimensional volume. If any point of a non-degenerate simplex is taken as the origin, then the N other points define vector directions that span the N-dimensional vector space.

To start the method we need to choose the first point to start. The algorithm is then supposed to make its own way downhill through the unimaginable complexity of an N-dimensional topography, until it encounters an (at least local) minimum. The downhill simplex method must be started not just with a single point, but with N+1 points, defining an initial simplex. If there is one initial point P0 then the other N points can be expressed by

Pi = P0 + ai ei
where ei are N unit vectors, and ai are constants that characterize the length scale for each vector direction.

The downhill simplex method takes a series of steps, most steps just moving the highest point (the point of simplex where the function is largest) through the opposite face of the simplex to a supposed lower point (Pr). These steps are called reflections, and they are constructed to conserve the volume of the simplex (hence maintain its non-degeneracy).

Step 1

In case when the value of the function in point Pr is between the second highest and lowest value of in the vertices of the simplex then the highest point is changed to Pr (the new simplex appears - drawn on above picture by dashed lines). When the value of function in point Pr is lower than or equal to in the lowest point of the simplex (we have better estimation of the minimum) the value is checked in point Prr to see if the function drops further in direction of Pr.
Step no 2.

The lower of points Pr and Prr replaces then the highest point of the simplex. If point Pr is greater then the simplex expands in direction to Prr, its volume grows. When the value of function in point Pr is higher than or equal to the value in the highest point of the simplex the value is checked in point Prr' to see if the function is lower between the highest point and the point Pb. The point Pb is the average of simplex points except the highest point. When the value of function in point Pr is higher than or equal to the value in the second highest point of the simplex and lower than the value in the highest point the value is checked in point Prr'' to see if the function is lower between the point Pr and the point Pb.
Step no 3.

When the value of function in point Prr'/Prr'' is lower than in the highest point of the simplex / in the point Pr then Prr'/Prr'' replaces the highest point of the simplex.

What to do when the value of function in point Prr'/Prr'' is higher than in the highest point of the simplex / in the point Pr? In the classical simplex method the simplex contracts itself in all directions, pulling itself in around its lowest point. This procedure works well when the number of dimensions is lower than 10. But when the number of dimensions ~20 this is not effective, this step requires calculation of function in N points. Moreover in problems solved by the author is some situations this method have not converged. The contractions were constantly repeated with no effect (no better estimations of the function). The author decided to modify the method to improve the convergence.

The latter steps are presented on bellow picture. The function is checked in point P1/P1' and if the function is lower there than in the highest point / the point Pr the highest point is changed to P1/P1'. Else the function is checked in point P2/P2' and if the function is lower there than in the highest point / the point Pr the highest point is changed to P2/P2'. Else other tricks are used to find the minimum on the blue line.

Step no 4,5,...

The other tricks are interpolation of the function along blue line by third degree polynomial, and finding the minimum of it. The highest point is changed to the minimum of the polynomial. If after the operation the changed point have the same coordinates as the lowest point that means that the simplex was degenerated.

To prevent degeneration of the simplex it is useful to restart the method at the point that is the lowest point that have been found. It is also useful to change the directions of unit vectors (rotate them) to prevent lack convergence. The procedure creates convergence report and it is possible according to it decide when to stop the calculations.

The procedure writen in C language bellow is an implementation of the improved simplex method. The procedure is available here with an example of usage. It can be compiled using command cc -lm simplex.c.

/* Global variables -------------------------------------------------------- */
int NDIM;                   /* Dimension of approximation space        */
double* MX[25];             /* NDIM-dimensional vector of values       */
double MdX[25];             /* NDIM-dimensional vector of differences  */

/* Improved simplex approximation method ----------------------------------- */
void aps()
{
double MP[26][25];          /* Main matrix of simplex vertices         */
double Pb[25];              /* The point Pb.                           */
double Pr[25];              /* The point Pr.                           */
double Prr[25];             /* The point Prr or Prr'.                  */
double Y[26];               /* Results Y[i]=FUNK(MP[i])                */
double RTol;                /* Real toleration                         */
double Al,Bt,Gm,Ypr,Yprr,Yavr,Rmp;
double xa,xb,xc,xd,lMin;
int i,j,MPTS;
int ITR0,ITR1,NITR,MITR;
int iLo,iHi,iNHi;
FILE *fe;

MPTS=NDIM+1;Al=1.0;Bt=0.5;Gm=2.0;Rmp=MPTS;ITR0=0;
printf("Downhill simplex approximation method for %d dimensions.\n",NDIM);
if((fe=fopen("zparam00.dat","rt"))==0) {ITR1=0;MITR=100;NITR=40;} else {
  fscanf(fe,"%d %d\n",&ITR1,&NITR);MITR=100;
  for(i=0;i<MPTS;i++) fscanf(fe,"%le %le",MX[i],MdX+i);
  fclose(fe);}
while(ITR1<NITR) {ITR0=0;
 for(i=0;i<NDIM;i++) {for(j=0;j<=NDIM;j++) MP[j][i]=*(MX[i]);MP[i][i]+=MdX[i]*(0.9+0.2*rand()/RAND_MAX);}
 /* Rotarion of the matrix MP - not implemented */
 for(j=0;j<=NDIM;j++) {for(i=0;i<NDIM;i++) *(MX[i])=MP[j][i];Y[j]=func();}
 while(ITR0<MITR) {
  Yavr=0;iLo=0;if(Y[0]>Y[1]) {iHi=0;iNHi=1;} else {iHi=1;iNHi=0;}
  for(i=0;i<MPTS;i++) {Yavr+=Y[i];if(Y[i]<Y[iLo]) iLo=i;
   if(Y[i]>Y[iHi]) {iNHi=iHi;iHi=i;} else if(Y[i]>Y[iNHi]) if(i!=iHi) iNHi=i;}
  ITR0++;RTol=2.0*fabs(Y[iHi]-Y[iLo])/(fabs(Y[iHi])+fabs(Y[iLo]));Yavr/=Rmp;
  printf(">%3d%4d RTol=%e Ymin=%f Ymax=%f Yavr=%f\n",ITR1,ITR0,RTol,Y[iLo],Y[iHi],Yavr);
  for(j=0;j<NDIM;j++) Pb[j]=0.0;
  for(i=0;i<MPTS;i++) if(i!=iHi) for(j=0;j<NDIM;j++) Pb[j]+=MP[i][j];
  for(j=0;j<NDIM;j++) {Pb[j]/=NDIM;Pr[j]=(1.0+Al)*Pb[j]-Al*MP[iHi][j];}
  for(j=0;j<NDIM;j++) *(MX[j])=Pr[j];Ypr=func();
  if(Ypr<=Y[iLo]) {for(j=0;j<NDIM;j++) Prr[j]=Gm*Pr[j]+(1.0-Gm)*Pb[j];
    for(j=0;j<NDIM;j++) *(MX[j])=Prr[j];Yprr=func();
    if(Ypr>Yprr) {for(j=0;j<NDIM;j++) MP[iHi][j]=Prr[j];Y[iHi]=Yprr;}
    else {for(j=0;j<NDIM;j++) MP[iHi][j]=Pr[j];Y[iHi]=Ypr;}}
  else {
    if(Ypr>=Y[iNHi]) {
      if(Ypr<Y[iHi]) {for(j=0;j<NDIM;j++) MP[iHi][j]=Pr[j];Y[iHi]=Ypr;}
      for(j=0;j<NDIM;j++) Prr[j]=Bt*MP[iHi][j]+(1.0-Bt)*Pb[j];
      for(j=0;j<NDIM;j++) *(MX[j])=Prr[j];Yprr=func();
      if(Yprr<Y[iHi]) {for(j=0;j<NDIM;j++) MP[iHi][j]=Prr[j];Y[iHi]=Yprr;}
      else {
/*      for(i=0;i<MPTS;i++) if(i!=iLo) {
          for(j=0;j<NDIM;j++) {Pr[j]=0.5*(MP[i][j]+MP[iLo][j]);MP[i][j]=Pr[j];}
          for(j=0;j<NDIM;j++) *(MX[j])=Pr[j];Y[i]=func();}}}                     */
        for(j=0;j<NDIM;j++) Pr[j]=0.5*(MP[iHi][j]+MP[iLo][j]);
        for(j=0;j<NDIM;j++) *(MX[j])=Pr[j];Ypr=func();
        if(Ypr<Y[iHi]) {for(j=0;j<NDIM;j++) MP[iHi][j]=Pr[j];Y[iHi]=Ypr;}
        else {
          for(j=0;j<NDIM;j++) Prr[j]=-MP[iHi][j]+2.0*MP[iLo][j];
          for(j=0;j<NDIM;j++) *(MX[j])=Prr[j];Yprr=func();
          if(Yprr<Y[iHi]) {for(j=0;j<NDIM;j++) MP[iHi][j]=Prr[j];Y[iHi]=Yprr;}
          else {
            xa=3*Y[iHi]-8*Ypr+6*Y[iLo]-Yprr;
            xb=Y[iHi]-2*Y[iLo]+Yprr;
            xc=-0.5*Y[iHi]+8*Ypr/3-2*Y[iLo]+Yprr/6;
            xd=xb*xb-4*xa*xc;
            if(xd>0) {
              lMin=0.5*(-xb-sqrt(xd))/xa;
              for(j=0;j<NDIM;j++) Pr[j]=lMin*MP[iHi][j]+(1-lMin)*MP[iLo][j];
              for(j=0;j<NDIM;j++) *(MX[j])=Pr[j];Ypr=func();}
            if(Ypr<Y[iHi]) {for(j=0;j<NDIM;j++) MP[iHi][j]=Pr[j];Y[iHi]=Ypr;}
            else {for(j=0;j<NDIM;j++) MP[iHi][j]=MP[iLo][j];Y[iHi]=Y[iLo];}
          }}}}
    else {for(j=0;j<NDIM;j++) MP[iHi][j]=Pr[j];Y[iHi]=Ypr;}}}
 iLo=0;for(i=0;i<MPTS;i++) if(Y[i]<Y[iLo]) iLo=i;
 for(i=0;i<NDIM;i++) *(MX[i])=MP[iLo][i];ITR1++;
 fe=fopen("zraport0.dat","at");fprintf(fe,"%5d %f  ",ITR1,Y[iLo]);
 for(i=0;i<NDIM;i++) {fprintf(fe,"%16.8e ",*MX[i]);}
 fprintf(fe,"\n");fclose(fe);
 printf("=======> Ymin=%f  (%d)\n",Y[iLo],iLo);
 /* for(i=0;i<NDIM;i++) {MdX[i]*=0.95;} */
 fe=fopen("zparam00.dat","wt");fprintf(fe,"%d %d\n",ITR1,NITR);
 for(i=0;i<NDIM;i++) {fprintf(fe,"%16.8e %16.8e\n",*MX[i],MdX[i]);}fclose(fe);}
printf("Approximation with simplex method finished.\n");
}

References:
Nelder J.A., Mead R., 1965, Computer Journal, 7, 308


Last modification: 7 Dec 1999.