Next: Genetic Algorithms and Design Up: GENETIC ALGORITHMS AS A Previous: GENETIC ALGORITHMS AS A

Introduction

Everyone designs who devises courses of action aimed at changing existing situations into preferred ones.
-H. A. Simon. The Sciences of the Artificial, MIT Press, 1969.

Design is a fundamental, purposeful, pervasive, and ubiquitous activity. As Simon pointed out, anyone who formulates programs of action to change an existing state to a preferred one is solving design problems [Simon, 1969]. Planning, scheduling, circuit design, VLSI layout, architecture, and most branches of engineering are, or can be cast as, design problems. Any useful tool that helps the design process would be widely applicable. Current tools are based on a knowledge intensive view of the design process. Techniques like expert systems and case-based reasoning, apart from problems with the acquisition of knowledge, suffer from a basic flaw: they fail on new and/or poorly-understood application domains, where there is insufficient knowledge to proceed.

Genetic algorithms (GAs) are randomized parallel search algorithms that model natural selection, the process of evolution. Over time, natural selection has produced a wide range of robust structures (life forms) that efficiently perform a broad range of functions. The success of natural selection on earth provides an existence proof of the viability of an evolutionary process as a model for design. Like natural selection, GAs are a robust search method requiring little information to search effectively in large, poorly-understood spaces.

In this context, the central claim of the thesis is

Genetic algorithms provide a computational tool for the design problem: ``Given a function and a target technology to work within, design an artifact that performs the function subject to constraints.''

Although genetic algorithms have been used in engineering problems, there are reasons why designers are reluctant to embrace GAs. Representation problems are one reason familiar to artificial intelligence researchers. There was no bound on the time to convergence until recently [Louis and Rawlins, 1993b]. This thesis identifies and overcomes four basic problems in establishing genetic algorithms as a computational tool for design.

The current chapter expands on the definition of design, the nature of design problems, and casts design as search through a state space. After identifying the questions addressed in this thesis, the chapter ends by providing an overview of the structure of this dissertation.

Design

Design's pervasive nature makes it difficult to find a concise, comprehensive, and well-accepted definition. Attempts at defining design usually reflect the bias of the author and range from the mundane to the theoretical. Dasgupta's book provides some historical as well as modern attempts at such definitions [Dasgupta, 1991]. More recently, Chris Tong and Duvvuru Sriram advance a fairly comprehensive definition and describe design as ``the process of constructing a description of an artifact that satisfies a (possibly informal) functional specification, meets certain performance criteria and resource limitations, is realizable in a given target technology, and satisfies criteria such as simplicity, testability, manufacturability, reusability, etc.'' [Tong and Sriram, 1992b].

The various definitions agree, however, that design is concerned with the mapping of a specified function onto a structure or description of a structure. Usually, the designed structure also satisfies performance, resource, and other pragmatic constraints. In addition, most design theorists agree that the goal of design is to purposefully initiate change in some aspect of the world [Simon, 1969, Tong and Sriram, 1992b]. Unlike theories of the natural sciences, design is concerned with construction of artifacts and the purposeful effecting of change.

The main reason for the difficulty of design tasks lies in the complexity of the mapping between function and structure. The specified function may be very complex and/or it may have to be realized using complex arrangements of a large number of interacting parts. Typically, the behavior of each component is well known, and the difficulty lies in predicting how an assemblage of such components will behave. Additional non-functional criteria and constraints complicate the design problem even further.

The type of structure synthesized can be used to distinguish design tasks [Tong and Sriram, 1992b]. In structure synthesis, the design is constructed from the composition of primitive parts into a structure that performs a specified function. Constructing a parity-checker circuit from logic gates is an example of this type of task. This may not be optimization, but rather, as noted by Simon, it may be a satisficing task, in that a designer searches for a circuit that checks parity, not an optimal circuit [Simon, 1969].

In structure-configuration tasks, the design is a configuration of parts of pre-determined type and connectors of pre-determined type. For example, in floorplanning, the problem is to design a configuration of rooms and doors to fit in a given area, along with the values of room and door parameters.

Lastly, in parameter-instantiation tasks, the problem is to find the values of parameters such that a particular design can be instantiated.

In all the cases above, optimization plays a role, since there is usually a cost associated with each design. Minimizing this cost can be part of the original design task or it may be a separate task.

Design and Search

Casting design as a search problem, a formulation with a well-established heritage in artificial intelligence research, a design process searches through a state space of possible structures for one or more structures that perform a specified function. Points in the state space represent candidate designs, and one or more of these points defines a goal state representing a design that meets specifications.

In search there is a tradeoff between flexibility -- applicability in different domains, and speed -- the number of candidate designs searched through before finding the correct one. Current design systems tend to depend on domain-specific knowledge and favor speed over flexibility. Most are so brittle that they fail completely when domain knowledge is scarce. Genetic algorithms, on the other hand, can make do with little domain knowledge, make few assumptions about the search space, and use domain-independent operators for generating candidate points in a state space.

Search

If there is sufficient knowledge about a problem domain, there is no need to search for a solution. However, if there is insufficient knowledge about a domain to arrive directly at a solution, then if there is at least enough information to delineate an area within which a solution is expected to be found, a search method can be used to hunt for a solution within the delineated area.

This thesis is concerned with design problems that do not lend themselves to a direct solution but that can be cast as search problems within a space of possible solutions.

Search algorithms typically have two components corresponding to the tradeoff between speed and flexibility. They balance exploration of the search space with exploitation of areas of the space. Exploration points out new areas to search in, while exploitation concentrates search in a particular area. The need for a balance arises because unchecked exploration leads to wasted time in unpromising areas, while severe exploitation may miss the correct solution by concentrating on too small an area. That is, too much exploration means too much time, and too much exploitation means that the correct solution may not be found. On the other hand, if there is not enough exploration, the correct solution may not be generated while insufficient exploitation may take too long. Striking a good balance between exploration and exploitation is therefore critical for a search algorithm.

Wasting time in unpromising areas of the search space may seem a small price to pay for finding the correct solution. Unfortunately, the size of the search spaces that arise in design domains is often so large that a human lifetime is short compared to the time needed by the fastest supercomputers to look at a significant fraction of the space. This is why random and/or exhaustive search (RES) is infeasible: it explores too much. On the other hand, RES works equally well across all search spaces, since it makes the fewest assumptions about exploitable properties of the search space. It assumes that a minimal amount of knowledge is available -- enough to know when to stop. Put another way, RES needs and uses minimal information about the search space to work equally well across a variety of spaces.

At the other extreme are algorithms that do no exploration and have knowledge available to solve the problem quickly and directly. These algorithms make strong assumptions about the search space and sufficient exploitable information is available to avoid searching. However, because these strong assumptions do not usually hold across domains (search spaces), these algorithms work only on the particular space they were designed for and do miserably on others.

In between these extremes of random and/or exhaustive search and no search, every other search algorithm makes assumptions of varying kinds about search spaces. These assumptions correspond to knowledge about the space and may be correct, incorrect, or misleading, implicit or explicit, and known or unknown, when used for searching a particular space. This knowledge is exploited to guide exploration, speeding up search by generating a search bias manifested in the set of generated solutions.

Genetic algorithms belong to a class of algorithms, called blind search algorithms, which make the assumption that there is enough knowledge to compare two solutions and tell which is better [Rawlins, 1991]. Genetic algorithms are the search method of choice for this dissertation.

Genetic Algorithms and Search

A Genetic Algorithm (GA) is a randomized parallel search method modeled on evolution. GAs are being applied to a variety of problems and becoming an important tool in machine learning and function optimization. They have already been used in the preliminary design of aircraft engine turbines and, more recently, in protein structure prediction [Powell et al., 1989, Le Grand and Merz Jr., 1993]. Goldberg's book gives a list of application areas [Goldberg, 1989b].

Intuition

Intuitively, genetic algorithms can be understood in terms of differences with other search methods that can be applied in the same situation. Consider the problem of finding the correct combination for opening a thirty-digit combination lock (designing a key). One possible algorithm is random and/or exhaustive search. In this case the only information used by the algorithm is whether or not the currently generated combination is the correct one. With this algorithm the chances of success -- finding the correct combination -- are better than 1/2 only after the generation of combinations. This is a very large number. In fact, if RES generates a combination every nanosecond, it would take more than the age of the earth before success is more than probable.

If, in addition to whether the current combination is correct, more information is available, a search algorithm could use this information to increase efficiency. Hill-climbing or gradient-descent algorithms, when given information about which of two generated combinations is ``closer'' to the correct combination, make use of this additional, directional information to speed up the search. The hill-climbing analogy considers the search space as a landscape through which a search algorithm moves towards the highest point, where height corresponds to ``closeness'' to the optimum. However, a hill-climber can be trapped on a hill that is not a global optimum but a local optimum. In other words, if the search landscape is rugged with a lot of hills (local optima), the algorithm could climb the nearest hill and find that any further movement decreases height and thus remain trapped on this hill, whereas the highest point (global optimum) is actually on another taller hill.

A population of hill-climbers is more robust. Trapping a large number of hill-climbers, working independently on the same problem, is less probable than trapping just one hill-climber.

Genetic algorithms are more than a population of hill-climbers. In addition to having a population of individuals that hill-climb, genetic algorithms allow the exchange of information among individual hill-climbers in generating combinations that are ever closer to the correct one. Genetic algorithms exemplify the maxim: ``Two heads are better than one'' and can be considered a population of information-exchanging hill-climbers.

Formalism

When used in design, a GA encodes a candidate design in a binary string. A randomly generated set of such strings forms the initial population from which the GA starts it search. Three basic genetic operators: selection, crossover, and mutation guide this search. The genetic search process is iterative: evaluating, selecting, and recombining strings in the population during each iteration (generation) until reaching some termination condition. The basic algorithm, where P(t) is the population of strings at generation t, is given below.

 t = 0

                initialize P(t)

                evaluate P(t)

              while  (termination condition not satisfied) do

                begin

                        		select P(t+1) from P(t)

                        		recombine P(t+1)

                        		evaluate P(t+1)

                        		t = t+1

                end

Evaluation of each string is based on a fitness function that is problem-dependent. It determines which of two candidate solutions is better (figure ).

Figure: Evaluating individuals determines which is better.

This corresponds to the environmental determination of survivability in natural selection. Selection of a string, which represents a point in the search space, depends on the string's fitness relative to that of other strings in the population. It probabilistically culls from the population those points that have relatively low fitness (figure ).

Figure: Selection acts as a sieve, selecting better individuals to continue.

Mutation and crossover imitate sexual reproduction. Mutation, as in natural systems, is a very low probability operator and just flips a specific bit. Crossover in contrast is applied with high probability. It is a randomized yet structured operator that allows information exchange between points. Simple crossover is implemented by choosing a random point in the selected pair of strings and exchanging the substrings defined by that point. Figure shows how crossover mixes information from two parent strings, producing offspring made up of parts from both parents. Note that this operator which does no table lookups or backtracking, is very efficient because of its simplicity.

Figure: Crossover of the two parents P1 and P2 produces the two children C1 and C2. Each child consists of parts from both parents which leads to information exchange.

Selection probabilistically filters out designs that perform poorly, choosing high performance designs to concentrate on or exploit. Crossover and mutation, through string operations, generate new designs for exploration. Thus genetic algorithms only require two elements: 1) an encoding of candidate structures (solutions), and 2) a method of evaluating the relative performance of candidate structures. Given an initial population of elements, GAs use the feedback from the evaluation process to select fitter designs, generating new designs through recombination of parts of selected designs, eventually converging to a population of high performance designs. Since GAs only require feedback on the relative performance of candidate designs to guide search, they are applicable in any design task that can provide this information. In other words, given a function to perform, a genetic algorithm only needs to know which of a pair of candidate designs is ``closer'' to performing the function. And even this information need not be perfect but can be subject to noise and misdirection to some extent. This thesis is only concerned with such design problems.

However, there are unresolved problems in using genetic algorithms for design. First, although genetic algorithms allow sharing information among hill-climbers, are there search spaces where this sharing leads to better performance? Second, since genetic algorithms work with an encoding of the problem, what kind of biases can be expected to be introduced by the encoding and can the genetic algorithm surmount unfavorable encoding biases? Third, genetic algorithms should be applied in knowledge lean domains where the solutions that are found tend to be difficult to understand. The question is, what kind of analysis is possible of genetic algorithm designs? Finally, given a problem, how long should a designer run a genetic algorithm on it?

The dissertation addresses each of these problems in turn, using design examples from circuit design, floorplanning and function optimization.

Related Work

Although design encompasses a lot of fields, this thesis is most concerned with work that has been done at the nexus of engineering, architecture, and artificial intelligence. Perhaps the closest piece of research connected to this dissertation is in using genetic algorithms to design computer programs.

John Koza uses genetic algorithms to evolve programs in the LISP programming language, illustrating the use of genetic algorithms in the task of program design [Koza, 1993]. The work presented in this thesis is however, qualitatively different. This dissertation develops tools and methods, and addresses questions and problems, that can arise in using genetic algorithms in any applicable design task and thus attempts to address more foundational issues.

Knowledge-based systems for design depend on domain knowledge to accomplish design tasks. A description of the state of the art in engineering design can be found in [Tong and Sriram, 1992a]. Systems that use a combination of techniques also exist. Such integrated design environments use a variety of tools to structure and attack design problems. ``Engineous'' is a system that includes genetic algorithms in its toolbox to uncover unconventional designs when needed [Tong et al., 1992, Powell et al., 1989]. The work in this thesis should help improve the construction, use and applicability of such genetic algorithm tools.

Innovation and creativity are often associated with design processes. Chapter explains in what sense genetic algorithms can be considered innovative. Other approaches to innovative engineering design can be found in [Mostow et al., 1992, Huhns and Acosta, 1992, Sycara and Navinchandra, 1992, Goel and Chandresekaran, 1992, Ulrich and Seering, 1992, Tong, 1992]. These approaches include case-based reasoning, structural mutation, and innovation through the combination of multiple knowledge sources. Lenat's work on heuristics and meta-heuristics can also be considered as an innovative approach to the design of effective strategies [Lenat, 1983].

Modeling the creative aspects of a design process is addressed in [Gero, 1992, McGraw, 1992]. The work in [Gero, 1992] seeks to define and model the creative aspects of a design task. Letter Spirit, the project described in [McGraw, 1992], attempts to model central aspects of human high-level perception and creativity on a computer, in the domain of artistic letter-design. This dissertation does not address creativity.

Structure of this Thesis

Chapter introduces natural selection and genetic algorithm theory, fleshing out the problems identified above. Although search is the model used in this thesis, the chapter also illustrates the close mapping between genetic algorithms and the well-established Analysis, Synthesis, Evaluation model of the design process. Then, the innovativeness of genetic algorithms and issues in evaluating and encoding problems are discussed. The chapter ends by explaining how to represent a problem for a genetic algorithm using an example from floorplanning in architectural design.

Chapter compares genetic and other search algorithms. It defines a class of search spaces which are easy for genetic algorithms and hard for stochastic hill-climbers. Since recombination is the crucial difference between genetic and other search algorithms, the results provide insight into the kind of spaces where recombination is necessary. This chapter helps answer the question of when to expect genetic algorithms to outperform other algorithms and thus when to use them to greatest effect. An earlier version of the work reported in this chapter has appeared in [Louis and Rawlins, 1993a].

In chapter the dissertation shows how problem encoding can bias genetic search. This leads into the important issue of problem encoding for genetic algorithms in the design domain. A bad encoding can cause a GA to flounder, to be misled, or both. Modifying the classical genetic algorithm to mitigate these problems sheds light on the nature of good encodings from the perspective of genetic algorithms and from the differing perspective of design. This chapter uses examples from combinational circuit design to illustrate the results. Parts of this chapter have appeared earlier in [Louis and Rawlins, 1991].

Chapter is concerned with analyzing GA generated designs. The importance of design analysis lies in being able to explain and subsequently modify designs in principled ways. Since genetic algorithms are designed for working in poorly understood spaces, knowledge about the space needs to be acquired, stored and processed for useful analysis. This chapter shows how to apply case-based reasoning tools to acquire knowledge about a design space to explain solutions generated by a GA. Much of this information is implicit in the processing done by the genetic algorithm. The case-based reasoning tools extract and process information supplied by the trajectory of a genetic algorithm through the search space. This organized information in the case-base can be used to explain how a solution evolved and to help identify its building blocks. The resulting knowledge base can also be used to tune a genetic algorithm for that specific domain. Examples from function optimization and circuit design clarify the results. This work has appeared in [Louis et al., 1993].

Chapter considers computational complexity, a crucial element in any computational tool and in any design model where a designer must specify time limits for a design task. Since evolution is driven by diversity or variation in a gene pool, the average hamming distance is used as a diversity metric to derive bounds on the time convergence of genetic algorithms. Analysis of a flat domain provides worst case time complexity for static functions. Further, employing linearly computable runtime information provides tighter bounds on the time beyond which progress is unlikely on arbitrary static functions. As a by-product, this analysis also gives qualitative bounds by predicting average fitness, the quality of an average solution at convergence. Examples from the DeJong test suite of problems are used to provide supporting empirical evidence. The part of this chapter on worst case time complexity has appeared in [Louis and Rawlins, 1993b].

The last chapter draws together the threads of representation, analysis, and complexity paving the way toward establishing genetic algorithms as a computational tool for design. Questions and possible avenues of research brought to light by this dissertation are addressed at the end.

Next: Genetic Algorithms and Design Up: GENETIC ALGORITHMS AS A Previous: GENETIC ALGORITHMS AS A

Sushil J. Louis
Wed Jun 25 15:17:05 PDT 1997