next up previous
Next: Discussion and Conclusions Up: GENETIC ALGORITHMS AS A Previous: Understanding Genetic Algorithm Solutions

Predicting time to convergence for GAs

 
The best theory is inspired by practice,
The best practice is guided by theory.

- Donald Knuth, IFIP 1992 address.

It is difficult to predict a genetic algorithm's behavior on an arbitrary problem. Combining genetic algorithm theory with practice, this chapter uses the average hamming distance as a syntactic metric to derive bounds on the time convergence of genetic algorithms. Analysis of a flat function provides worst case time complexity for static functions. Further, employing linearly computable runtime information, this thesis provides bounds on the time beyond which progress is unlikely on arbitrary static functions. As a byproduct, the analysis also provides qualitative bounds by predicting average fitness.

A Measure of Variation

Natural selection uses diversity in a population to produce adaptation. Ignoring the effects of mutation for the present, if there is no diversity there is nothing for natural selection to work on. Since GAs mirror natural selection, this chapter applies the same principles and uses a measure of diversity for estimating time to stagnation or convergence. A GA converges when most of the population is identical, or in other words, the diversity is minimal. Using average hamming distance (hamming average) as a measure of population diversity this chapter derives bounds for the time to minimal diversity, when the genetic algorithm may be expected to make no further progress (hamming convergence).

Previous work in GA convergence by Ankenbrandt, and Goldberg and Deb focuses on a lower limit on the time to convergence to a particular allele (Ankenbrandt, 1991; Goldberg and Deb, 1991) using fitness ratios to obtain bounds on time complexity. Since GAs use syntactic information to guide their search, it seems natural to use syntactic metrics in their analysis. The analysis here, using hamming averages, can predict the time beyond which qualitative improvements in the solution are unlikely. Since the schema theorem intimately links schema proportions with fitness, bounds on average fitness follow.

The next section identifies the effect of genetic operators on the diversity metric, which is the hamming average. Deriving an upper bound on the expected time to convergence suggests syntactic remedies to the problem of premature convergence and, as a side effect, how to mitigate deception in GAs. Since crossover does not affect the hamming average, extrapolating the change in the hamming average sampled during the first few generations is used to predict the hamming average in later generations. Results on a test suite of functions indicate that accurate predictions are possible.

Genetic Algorithms and Hamming Distance

The average hamming distance of a population is the average distance between all pairs of strings in a population of size N. As each member of the population is involved in N-1 pairs, the sample size from which to calculate the average is:

displaymath3063

Let the length of the member strings in the population be l. The hamming average of the initial population of binary strings is well approximated by the normal distribution with mean tex2html_wrap_inline3075 where

displaymath3064

and standard deviation tex2html_wrap_inline3077 given by

displaymath3065

Ignoring the effect of mutation, the hamming average of a converged population is zero (mutation increases the hamming average of a converged population by an amount tex2html_wrap_inline3079 depending on the probability of mutation). Given that the initial hamming average is l/2 and the final hamming average is zero, the effects of selection and crossover determine the behavior of a genetic algorithm in the intervening time.

Crossover and Average Hamming Distance

Assuming that offspring replace their parents during a crossover, all crossover operators can be partitioned into two groups based on whether or not they change the hamming average. If one parent contributes the same alleles to both offspring (as in masked crossover (chapter gif)) the hamming distance between the children is less than the hamming distance between their parents. This leads to a loss of genetic material, reducing population hamming average and resulting in faster hamming convergence. The vast majority of traditional operators, like one-point, two-point, tex2html_wrap_inline3091 l-point, uniform and punctuated crossover [Spears and DeJong, 1991, Schaeffer and Morishima, 1987, Schaeffer and Morishima, 1988, Syswerda, 1989], do not affect the hamming average of a population. For these crossover operators the following lemma is proved:

Tm1005

Proof: The hamming average in generation t+1 is proved to be the same as the hamming average at generation t under the action of crossover alone. Assuming a binary alphabet tex2html_wrap_inline3099 , express the population hamming average at generation t as the sum of hamming averages of l loci, where l is the length of the chromosome. Letting tex2html_wrap_inline3107 stand for the hamming average of the tex2html_wrap_inline2675 locus results in:

displaymath3083

where tex2html_wrap_inline3111 is

displaymath3084

In the absence of selection and mutation, crossover only changes the order in which to sum the contributions at each locus. That is:

displaymath3085

The hamming average in the next generation is therefore

displaymath3086

Q.E.D

Having eliminated crossover from consideration since it causes no change in the hamming average, it remains for selection to provide the force responsible for hamming convergence.

Selection and Average Hamming Distance

Unlike recombination, selection intensity depends on the properties of the search space. It is the domain-dependent or problem-dependent part of a genetic algorithm. But, independent of the domain, it is possible to obtain an upper bound on the time to convergence. Obtaining an upper bound is equivalent to assuming the worst possible search space and estimating the time required for finding a point in this space. For a static function, a space on which an algorithm can do no better than random search satisfies this criterion. The flat function defined by

displaymath3117

contains no useful information for an algorithm searching for a particular point in the space. Thus no algorithm can do better than random search on this function. In terms of selection in genetic algorithms, the flat function assigns an equal fitness to all members of a population. There is no selection pressure to lead to a decrease in hamming average since the probability of selection is now the same for every individual. However, a simple GA without mutation loses diversity and eventually converges. An expression for the time convergence on such a function gives an upper bound on the time complexity of a genetic algorithm on any static function. The GA's convergence on the flat function is caused by random genetic drift where small random variations in allele distribution cause a GA to drift and eventually converge. The time for a particular allele to become fixed due to random genetic drift is used to derive an upper bound.

If a population of size N contains a proportion tex2html_wrap_inline3129 of a binary allele i, then the probability that k copies of allele i are produced in the following generation is given by the binomial probability distribution

displaymath3118

This distribution is useful in calculating the probability of a particular frequency of occurrence of allele i in subsequent generations -- A classical problem in population genetics. Although the exact solution is complex, approximating the probability that allele frequency takes value p in generation t is practical. Wright's approximation for intermediate allele frequencies and population sizes, as given in [Gale, 1990], is sufficient. Let tex2html_wrap3113 stand for the probability that allele frequency takes value p in generation t where tex2html_wrap3114 then

displaymath3119

The probability that an allele is fixed at generation t is

displaymath3120

Applying this to a genetic algorithm, assuming a randomly instantiated population at t = 0 ,

displaymath3121

Assuming that alleles consort independently, which is true for a flat function, the expression for the probability that all alleles are fixed at generation t for a chromosome of length l alleles, is given by

  equation1041

Equation gif gives the time to convergence for a genetic algorithm on a flat function and is therefore an upper bound for any static function. For example, on a 50 bit chromosome and a population size of 30, this gives an upper bound of tex2html_wrap_inline3157 on the chance of convergence in 50 generations. Experimental results get between tex2html_wrap_inline3157 and tex2html_wrap_inline3163 convergence starting with an initial hamming average of 50/2 = 25. Previous work by Goldberg and Segrest on genetic drift gives a more exact albeit more computationally taxing expression for time to convergence due to genetic drift [Goldberg and Segrest, 1987]. They also include the effect of mutation in their analysis, which once again is directly applicable to our problem. In fact, since the flat function costs one assignment statement and genetic operators are computationally inexpensive, running a genetic algorithm on a flat function with mutation and crossover probabilities set to practical values can cheaply produce an upper bound.

Finally, that GAs converge due to random drift gives a theoretical basis to the often observed problem of premature convergence, when the genetic algorithm converges before high fitness schemas have sufficient time to recombine into the global optimum [Thierens and Goldberg, 1993]. The next section suggests one method of alleviating this problem.

Handling Premature Convergence

Nature uses large population sizes to ``solve'' the premature convergence problem. This is expensive, furthermore engineers need not be restricted to nature's methods but can do some genetic engineering. Mutation seems a likely candidate, and in practice, is the usual way of maintaining diversity. However, although high mutation rates may increase diversity, its random nature raises problems. Mutation is as likely to destroy good schemas as bad ones and therefore elitist selection is used to preserve the best individuals in a population. This works quite well in practice, but is unstructured and cannot insure that all alleles are always present in the population.

Instead of increasing mutation rates, picking an individual and adding its bit complement to the population ensures that every allele is present and the population spans the entire encoded space of the problem. The individual to be complemented can be picked in a variety of ways depending on assumptions made about the search space. Randomly selecting the individual to be complemented makes the least number of assumptions and may be the best strategy in the absence of other information. The best or worst individual could also be selected, or probabilistic methods can be used in choosing whom to complement. Instead of complementing one individual, choosing a set of individuals allows spanning the encoded space in many directions. The most general approach is to maintain the complement of every individual in the population, doubling the population size.

The presence of complementary individuals also makes a GA more resistant to deception. Intuitively, since the optimal schema is the complement of the deceptively optimal schema, it will be repeatedly created with high probability as the GA converges to the deceptive optimum [Goldberg et al., 1989]. In experiments the minimum fitness individual was replaced with either the complement of a randomly picked individual in the current population, or the complement of the current best individual. Let tex2html_wrap_inline3173 represent the number of bits needed to represent variable i in a deceptive problem, then the deceptive functions used can be described as follows

displaymath3167

Dec1 and Dec2 which are 10-bit and 20-bit deceptive problems respectively were used. Letting superscripts denote the number of bits needed for each variable, Dec1 can be described by

displaymath3168

and Dec2 by

displaymath3169

Figure gif compares the average fitness after 100 generations of a classical GA (CGA) and a GA with complements (GAC) on a Dec1.

 
Figure: Average   fitness over 100 generations of classical GA and GA with complements. GAC (max) replaces the worst individual with the complement of the current best individual. GAC (random) replaces the worst individual with the complement of a random individual in the population

The GAC easily outperforms the classical GA, both in average fitness and the number of times the optimum was found. In most experiments the classical GA fails to find the optimum in 11 runs of the GA with a population size of 30 in a 100 generations. Since the converged hamming average for the CGA is very small, it is not expected to be ever able to find the global optimum. GAC on the other hand easily finds the optimum for Dec1 within a 100 generations and usually finds the optimum for Dec2 within a few hundred generations. Figures gif and gif show the number of times the optimum was found for Dec1 and Dec2 within a total of 100 and 1000 generations respectively. In the figures GAC (max) replaces the worst individual with the complement of the current best individual. GAC (random) replaces the worst individual with the complement of a random individual in the population.

   figure1083
Figure: Number of times the optimum was found on Dec1 for a classical GA compared with the same statistic for GAs with complements (GACs).

   figure1088
Figure: Number of times the optimum was found on Dec2 for a classical GA compared with GAs with complements (GACs).

Although this genetically engineered algorithm can mitigate certain kinds of deception, it is not a panacea and cannot guarantee an optimal solution on all problems. Messy Genetic Algorithms [Goldberg et al., 1989] can make much stronger guarantees but their computational complexity is daunting. Not surprisingly, GAC also does better than a classical GA on Ackley's One Max and no worse on the De Jong test suite [Ackley, 1987, DeJong, 1975]. In problems consisting of mixed deceptive and easy subproblems the GAC still does much better than the classical GA

Predicting Time To Convergence

Predicting performance on an arbitrary function is more difficult than obtaining an upper bound because of the non-linearities introduced by the selection intensity. However tracking the decrease in hamming average while a GA is working on a particular problem allows predicting the time to hamming convergence.

Convergence Analysis

Without mutation a GA reduces the hamming average from l/2 to zero. Therefore predicting the time to convergence reduces to the problem of finding the rate of decrease in hamming average. A general model for the change in hamming average per generation:

  equation1097

relates the hamming average tex2html_wrap_inline3229 in generation t to the hamming average in the next generation. Finding tex2html_wrap_inline3233 and solving the recurrence is enough to predict the time to convergence.

The population hamming average at generation t can be expressed as the sum of hamming averages of l loci, where l is the length of the chromosome. Letting tex2html_wrap_inline3107 stand for the hamming average of the tex2html_wrap_inline2675 locus gives:

  equation1104

Assuming a binary alphabet tex2html_wrap_inline3099 , and a large population, the hamming average of a locus in the current population is well approximated by the product of the number of a's and the number of b's divided by the sample size N(N-1)/2, where N is the population size. More formally, letting tex2html_wrap_inline3255 and tex2html_wrap_inline3257 be the number of a's and b's at a locus i,

  equation1110

Since a and b are first-order schemas competing at locus i, their growth rates are given by the schema theorem. The expected growth rate for a schema s with fitness f(s) due to selection alone is

  equation1116

Here tex2html_wrap_inline2215 is the population's average fitness in generation t. As only first-order schema are being considered, there is no modification needed to incorporate crossover's effects. Ignoring mutation for the present, the hamming average in generation t+1 at locus i can be calculated as follows. First:

displaymath3203

Then using the schema theorem tex2html_wrap_inline3283 can be expressed in terms of tex2html_wrap_inline3107 ,

displaymath3204

Substituting using equation gif:

  equation1135

The solution to this recurrence is

  equation1143

The denominator needs to be computed. This is an interesting problem not only because it helps in predicting the hamming average, but because it implies the possibility of fitness prediction. Once again concentrating on first-order schemas, consider the average fitness in generation t at locus i.

displaymath3205

But tex2html_wrap_inline3291 also approximates the population average fitness in low epistasis problems. The approximation gets better as epistasis decreases and our estimates of first-order schema fitnesses get better. Keeping this assumption in mind and dropping the first subscript

  equation1158

Using the schema theorem (equation gif) to replace the t terms on the right hand side

displaymath3206

Multiplying both sides by tex2html_wrap_inline3295

displaymath3207

But by the schema theorem (equation gif)

eqnarray1184

Therefore

displaymath3208

Multiplying by tex2html_wrap_inline3297

displaymath3209

In general,

  equation1216

which is the needed expression. This equation's significance in fitness prediction is discussed in the next section. For the present, substituting equation gif in gif gives tex2html_wrap_inline3107 in terms of the initial hamming average.

displaymath1229

For a randomly initialized population

displaymath3210

Therefore

displaymath3211

Rearranging terms

displaymath3212

Here tex2html_wrap_inline3303 is the initial hamming average at locus i. For a randomly initialized population, the hamming average for each locus is 1/2. Replacing tex2html_wrap_inline3303 by 1/2

displaymath3213

Summing over i from 0 to l - 1 gives the population hamming average in
generation t

displaymath3214

the expression for the hamming average in generation t, depends only on the fitnesses of alleles in the chromosome.

  equation1260

Handling Mutation

With mutation the hamming average does not converge to zero, rather a population converges when the rate of change of hamming average is very low or zero. Consider mutation's effect on the number of a's and b's. Mutation flips a bit with probability tex2html_wrap_inline2413 , changing the number of copies of a and b in subsequent generations. Without selection

displaymath3323

since the a's become b's and vice-versa. Incorporating selection

  equation1273

The equation for hamming distance in generation t is

  equation1284

To solve this tex2html_wrap_inline3255 is needed in terms of tex2html_wrap_inline3355 . Substituting tex2html_wrap_inline3357 in equation gif

displaymath3324

Collecting terms

displaymath3325

This is of the form

displaymath3326

where

displaymath3327

The solution to this recurrence is

displaymath3328

Before calculating the product of fitnesses, rewrite the second term as

  equation1323

Similarly letting

displaymath3329

results in

  equation1344

Substituting these values into equation gif directly

  equation1362

Unlike the case without mutation, it is difficult to find a simple expression for the product of fitness averages. In addition, the calculations implied by equation gif are cumbersome. For practical prediction this chapter resorts to an approximate model.

Practical Prediction

 

The intuition needed to address the problem is that the schema theorem works both ways. If schema fitnesses are known, proportions can be predicted -- if proportions are tracked, fitnesses can be predicted. Rewriting equation gif as

displaymath3359

Keeping track of tex2html_wrap_inline3255 over a few generations should give an estimated fitness for tex2html_wrap_inline3371 . Mutation can be handled in the same way starting with equation gif.

  equation1392

Solving for tex2html_wrap_inline3373 using the second expression

displaymath3360

Substituting back in the first expression in equation gif and solving for tex2html_wrap_inline3375

displaymath3361

which leads to

displaymath3362

multiplying by tex2html_wrap_inline3377

displaymath3363

from which the expression for tex2html_wrap_inline3375 can be computed

  equation1416

and subsequently an expression for tex2html_wrap_inline3373

  equation1422

The two equations gif and gif above can be used to estimate first order schema fitnesses by tracking their proportions over time. Using these estimates in the expression for average fitness tex2html_wrap_inline2215 in generation t gives

  equation1432

where N is the population size. This results in being able to predict fitness averages. In other words, tracking first-order schema proportions allows estimation of their fitness. Next, using these first-order schema fitnesses allows predicting their future proportions from the schema theorem (equation gif). Finally, knowing first-order schema fitnesses and proportions allows predicting the subsequent population average fitness, which is needed for estimating future proportions and so on. Although a closed-form solution is not available for the case with mutation, the recurrences can be programmed and run on a computer for a large number of generations.

Assuming that string similarity implies fitness similarity, the variance of fitness is a measure of the nonlinearities in the genome and hence a measure of epistasis. Fortunately, competing schemas with a large fitness variance feel high selection pressure and quickly lose diversity, diminishing epistatic effects. Since a GA exponentially reduces high variances, O(logN) generations should suffice to reduce most nonlinear effects.

The following methodology is therefore used for practical prediction:

Using an adjusted fitness computed this way, it is possible to predict the growth rate of first-order schema and therefore the population average fitness and population hamming average in subsequent generations.

Upper and Lower Bounds

Consider the effect of newly discovered schemas on the predictions. Fresh schemas that decrease adjusted fitnesses have little effect since they are quickly eliminated by selection. When higher fitness schemas are discovered by crossover, the fitness of alleles participating in these schemas increases. This means that the predicted hamming average will be greater than or equal to the observed value giving an upper limit on the time to convergence. The predictions therefore provide upper bounds on the hamming average and time to convergence to that hamming average. Such an estimate is more desirable than an overly optimistic estimate: It is better to let a GA run too long than to stop it early.

The variance in adjusted fitness for an allele bounds the number of generations sampled to get an accurate estimate of the allele's fitness. This variance bounds the current fitness of schemas in which the allele participates. At a particular locus, using the allele with the least variance to predict growth gives a conservative estimate of hamming average. Low variance implies a good approximation to the current fitness of the allele and provides an upper bound on our predicted hamming average. Similarly, using the allele with greater variance gives an optimistic estimate or lower bound. These bounds should surround the predicted value of the hamming average. Results given in the following sections support this preliminary analysis.

Noisy functions and noise induced by mutation may cause less desirable, more unpredictable behavior. This problem can be solved by having a larger sample space, that is, using more generations to calculate adjusted fitnesses.

For much the same reasons above, calculating adjusted fitnesses allows predicting a lower limit on the fitness average because these fitnesses only incorporate the effects of current schemas. Any new schemas (created by crossover) that increase the adjusted fitness of an allele can lead to a higher than predicted fitness average. In multimodal functions, the opposite may happen. The GA could initially concentrate around one optimum and then move toward another optimum. Since the calculated adjusted fitnesses may no longer be correct and it takes a GA time to overcome the inertia created by the first optimum, the predicted fitness average may be higher than the actual fitness average.

Comparing Predictions

Since hamming average prediction is quite robust and a first-order approximation may be interesting enough, a linear approximation of tex2html_wrap_inline3233 in equation gif can be tried out for prediction. Assuming a quick and simple model for the case without mutation

displaymath3407

whose general solution is given by

  equation1452

This chapter compares this simple equation's predictions with the predictions generated using the analysis in section gif. The task is to predict the hamming average at generation 50 for the following functions:

  1. Flat: tex2html_wrap3401 with a tex2html_wrap3402 bit chromosome.
  2. DeJong's five functions: tex2html_wrap_inline3417 (De Jong, 1975).
  3. One Max: tex2html_wrap3403 where |X| stands for the norm of the bit string X, for both a 50 (OneMaxI) and a 200 (OneMaxII) bit chromosome.

All problems were reduced to maximization problems. The GA population size in all experiments was 30, using proportional selection and two-point crossover. a in equation gif was computed from successive hamming averages in generations 0 through 10. Table gif summarizes the results. The last two columns predict lower and upper bounds from the analysis in section gif (equations gif,  gif, and gif), using proportions sampled in generations tex2html_wrap_inline3435 through 9 only. Since the encoding for the DeJong functions used a sign bit, 5 samples were optimistically chosen as sufficient. The experimental values are averages over 11 runs with no mutation. The standard deviations over 11 runs of the GA are given in parentheses.

  table1465
Table: Comparing   actual and predicted hamming averages (no mutation)

Surprisingly, the values predicted using the rough approximation are quite good and are within a standard deviation of the observed values. That good results come from even such a simple model clearly indicates the validity of this approach. Predicted upper and lower bounds on the upper limit include the observed values except for OneMaxI whose lower bound is greater than the observed hamming average. This implies a large variance in sampled allelic fitnesses. Intuitively, the observed fitness of an allele depends on the number of ones in the entire chromosome, leading to the observed conservative estimate in the absence of mutational noise. OneMax should be expected to create greater problems when including mutation.

Next, realizing that mutation makes the final hamming average greater than 0, another related simplified model is used to predict hamming average at convergence for the same set of problems but with the mutation rate set to 0.01. The model is given below.

  equation1485

The general solution is

displaymath3408

In these experiments the GA ran until hamming convergence when the hamming average did not change significantly over time. Experimentally, 500 generations was sufficient for the problem set. Calculating the values of a and b in equation gif using the method of least squares, the hamming average is predicted at convergence (generation 500). This is found by setting tex2html_wrap_inline3547 and solving for tex2html_wrap_inline3549 , resulting in

displaymath3409

Table gif compares the observed hamming average at generation 500 with the predicted values. Column three tabulates the hamming average at convergence and its standard deviation over the 11 runs. Columns four and five are the predicted hamming averages using equation gif and data from generations 5 through 20 and 5 through 45 respectively. The simpler model (curve fitting) does much worse when mutation is incorporated. It requires more samples, and even predicts negative hamming averages at convergence. However, the more refined model developed in this chapter predictably needs no extra information and therefore still uses samples from generations 5 through 9 to produce the last two columns.

  table1497
Table: Comparing   actual and predicted hamming averages with the probability of mutation set to 0.01, the *'s indicate that the negative values predicted in some runs were discarded.

The predictions of the model developed in this chapter, displayed in table gif, are very close to the observed values. Although the hamming average after 500 generation is large for F4 and OneMaxII, there is no significant change even after 2000 generations. This just indicates that once a GA reaches the equilibrium hamming average, some other search method should be tried especially if the converged hamming average is large. Predicted upper and lower bounds are still good except for OneMaxII. Here, noise induced by mutation is the major culprit. The high variance in allele fitnesses suggests using a larger sample space to increase the signal to noise ratio. Sure enough, using an additional 15 generations to sample fitnesses suffices to correct the problem and give good bounds on OneMaxII.

Predicting Average Fitness

Since a simplified curve fitting model gave good results on hamming average prediction, equation gif was used for fitness prediction as well. Table gif predicts the average fitness (F) at convergence without mutation. Column two gives the initial average fitness and column three the converged average fitness in generation 50. The standard deviations of the observed and predicted average fitnesses over the 11 runs are in parentheses. The next two columns are the predicted fitness averages using equation gif from fitnesses sampled in generations 5 through 15 and 5 through 25 respectively. The last two columns use equation gif to provide conservative and optimistic bounds on the average fitness. As stated earlier, both bounds are expected to underestimate the observed fitness, unless the function is multimodal or has high fitness variance. Since the average fitness is a byproduct of hamming average prediction, the hamming average data from generations 5 through 10 is used for fitness prediction. The flat function is not considered.

  table1520
Table: Comparing   actual and predicted fitness averages (no mutation)

The simple linear model's predictions of average fitness are are also within about one standard deviation of observed values. It is more interesting to look at the upper and lower bounds. As expected F2 and F5 which are multimodal, have higher predicted fitnesses. Other predictions serve as conservative and optimistic lower bounds.

Table gif summarizes results on the same set of problems with mutation probability set to 0.01. Mutation's effect forces good predictions to use even larger data samples for the simpler linear model. The upper and lower bounds again fall out of the hamming average data for generations 5 through 9 with mutation.

  table1594
Table: Comparing   actual and predicted fitness averages with the probability of mutation set to 0.01

These preliminary results support this chapter's model of genetic algorithm behavior. Although curve fitting has its uses in providing a very rough approximation, it cannot predict the effects of changes in GA parameters and does badly in the presence of mutation. Using the analysis in section gif it is easy to predict the time to convergence on a problem. This is just the time at which the change in hamming average is insignificant. Knowing available computing resources fixes the hamming average below which exhaustive search of the remaining space can be undertaken, and therefore the time needed to deliver on an application. In practice, instrumenting a GA to keep estimates of allele fitness and proportions should reduce the time spent experimenting with various parameter settings and let the developer know the computing resources needed. If the problem encoding is highly epistatic or otherwise undesirable from a prediction point of view, just keeping track of the hamming average and rate of change of hamming average indicates the progress of a genetic algorithm on the problem. In addition, incorrect predictions from the theory indicate that the problem encoding exhibits these undesirable properties. This is useful information about the problem space and can be used in changing the GAs parameters or the problem's encoding.

Summary

Analyzing a GA is complicated because of its dependence on the application domain and our lack of understanding of the mapping between our view of the application domain and the GA's view of the same domain. The GA's view of the domain is implicit in the chosen encoding. However, the overall behavior is predictable and can be deduced from sampling suitable metrics.

Analysis of the syntactic information in the encoding gives a surprising amount of knowledge. This chapter derived an upper bound on the time complexity from considering the effects of drift on allele frequency. Then, from a model of the effect of selection and mutation on the behavior of genetic algorithms, the analysis was able to approximate two solution characteristics at convergence for static search spaces.

  1. Bounds on an upper limit to the average hamming distance of a population at convergence. In addition to providing time limits, the average hamming distance of the population also denotes the remaining amount of work.
  2. The amount of work possible by the GA, indicated by the average fitness at convergence.
Combining fitness prediction with hamming average prediction indicates how much progress is possible with a GA along with bounds on how much remaining work can be done. The latter is especially important when including mutation. The predicted average fitness indicates how much progress is possible, but it is the predicted hamming average that denotes the remaining work.

This dissertation has helped develop the foundations for the use of genetic algorithms as a computational tool for design. The last chapter summarizes these developments, discusses their limitations and makes explicit the questions arising from the research.


next up previous
Next: Discussion and Conclusions Up: GENETIC ALGORITHMS AS A Previous: Understanding Genetic Algorithm Solutions

Sushil J. Louis
Wed Jun 25 15:17:05 PDT 1997