Temperature Scheduling in Simulated Annealing

 THIS PAGE IS A WORK IN PROGRESS

The purpose of this page is to describe the notion of temperature scheduling in simulated annealing. The type of temperature schedule used in one's algorithm and the size and difficulty of terrain of the search space have connections to the convergence properties of ones algorithm. These topics are also considered briefly below, with an emphasis on the diamond problem. Additionally, prospective temperature scheduling strategies are also proposed, which will serve as a backbone to potential simulated annealing runs in the near future.

Introduction to Temperature Scheduling
''...bringing a fluid into a low-energy state such as growing a crystal, has been considered...to be similar to the process of finding an optimum solution of a combinatorial optimization problem. Annealing is a well-known process for growing crystals. It consists of melting the fluid and then lowering the temperature slowing until the crystal is formed. The rate of the decrease of temperature has to be very low around the freezing temperature. The Metropolis Monte Carlo method...can be used to simulate the annealing process. It has been proposed as an effective method for finding global minima of combinatorial optimization problems.''[3]

Temperature scheduling in simulated annealing refers to the process of controlling the temperature in a particular run through either geometric or adaptive means. The effect that temperature has on an annealing run is that it determines the probability that a solution with a higher cost function value will be accepted over one with a lower value, which is known as hill climbing. There are two main stages of temperature scheduling(warming up and cooling down), which are punctuated by two stopping conditions (equilibrium and frozen criterion respectively). The warming up stage represents a period during simulated annealing in which the temperature is raised to a high enough point that the probability of the acceptance of a neighboring solution is nearly one. Such a high probability of acceptance ensures that the final solution does not depend on the initial starting point. The warming period usually occurs for a predefined number of steps (known as a chain length). When the predefined number of steps have occurred, the annealing run has reached the equilibrium point, which simply is the term used to define the point at which warming has ended and cooling will begin. Continuing the analogy to classical annealing, the cooling stage of the annealing run, corresponds to the period during which the temperature is cooled down to reduce the acceptance ratio of lesser quality solutions. The cooling phase helps the algorithm to find an ultimate solution. Depending on the parameters set, the cooling phase is ended by either a set number of steps or a determined acceptance ratio lesser than some value. This point in the process is known as the freezing point (or frozen criterion).

Difficulty of the problem
The size and terrain of a particular search space are two concepts that one should include when speaking of the difficulty of a search space. The computing time drastically increases with the size (or dimensionality) of search space [1]. However, it is still possible to have higher dimensional problems that are unimodal or have very few local minima. Thus, one must also take into account the shape of search space when contemplating the difficulty.

In the diamond problem, a solution consists of values given to two amplitudes and three phases. The amplitudes represent the intensity of the light from the reference mirror and the diamond {the front and the back only differ by a constant). The phases represent each of the three surfaces: the reference mirror and the front and back of the diamond.  Each of these quantities is represented by a certain order Legendre polynomial, which is represented in the form of a matrix with the number of rows and columns to match the order with an offset of one.  Since the order Legendre Polynomial for each of the amplitudes and phases is set by the user, the dimensionality of the problem varies depending on the desired order.  Thus, the lowest dimensionality that the diamond problem could take on would be 11 and the highest would be (cutting the Legendre polynomial order off at five) 125.  However, it is most likely somewhere in between most likely slightly greater than 50, due to wanting to fully exploit the range of the Legendre polynomials to give the greatest accuracy in mapping the diamond surface.

Convergence and Prospective strategies
Convergence when spoken in the context of simulated annealing refers to how (if at all) a particular algorithm will approach the correct solution (or for very difficult problems a close to correct solution). There are many proofs of convergence that are given for certain types of simulated annealing algorithms, each with there own twist on cooling and other aspects of the algorithm's implementation. To understand fully (in a mathematical sense) the subject of convergence one must look into the properties of Markov chains and there connections to Monte Carlo-like algorithms. This topic is reserved for another wiki page. However, citing the article by [2], the ParSA library suggests that convergence speed is governed by the following equation:

$$P(X_n \not\in Cost_{min})=\left(\frac{K}{n}\right)^{\alpha}$$

Where P is the convergence speed, n is the subchain length and "K and &alpha; are constants specific to the problem." [7].

The primary objective for potential cooling strategies lies in the determination of the &alpha; and K factors given in the ParSA section on improving solution quality in a lesser amount of time. By tracking both the chain length n and speed of convergence P(Xn &notin; Costmin), one can find a linear plot relating all four of the quantities through the following relationship: $$\ln{P(X_n \not\in Cost_{min})} = \alpha \ln{K} - \alpha \ln{n}$$

Once a sufficient number of runs have been completed, the &alpha; and K factors will be known and can thereby be exploited to find the most effective chain length to run multiple independent Markov chains. Given the potential size of the search space, one can muse that the &alpha; factor will most likely be closer to one rather than to zero, because with the multiple run strategy, faster cooling will result in a particular chain settling very quickly to a minimum (which may be a local minimum). After settling, the cluster can then move on to a new chain to settle to another minima. If this is repeated, the chances of finding the global minima among one of the solutions is much greater than if only one chain were used.