Next: Methodology and Results Up: No Title Previous: Introduction

Combining GAs and CBR

Genetic algorithms provide an efficient tool for searching large, poorly understood spaces often encountered in function optimization and machine learning [Holland, 1975]. The probabilistic nature of GA search allows GAs to be successfully applied to a variety of NP-hard problems
[Goldberg, 1989, Powell et al., 1989, Caldwell and Johnston, 1991]. The population-based, emergent nature of computation in GAs lends them a degree of robustness and flexibility that is often unobtainable using other approaches.

A genetic algorithm explores a subset of the search space during a problem solving attempt and its population serves as an implicit memory guiding the search. Every individual generated during a search defines a point in the search space and the individual's evaluation provides a fitness. This information when stored, organized, and analyzed can be used to explain the solution, that is, tell which parts of the genotype are important, and allow a sensitivity analysis [Louis et al., 1993]. However, this information is usually discarded at the end of a GA's run and the resources spent in gaining this information are wasted. If we store this information (explicit memory) and use it in a subsequent problem solving attempt on a related problem we can tune the GA to a particular space and thus increase performance in this space.

One early attempt at reuse can be found in Ackley's work with SIGH [Ackley, 1987]. Ackley periodically restarts a search in an attempt to avoid local optima and increase the quality of solutions. Eshelman's CHC algorithm, a genetic algorithm with elitist selection and cataclysmic mutation, also restarts search when the population diversity drops below a threshold [Eshelman, 1991]. Both approaches only attack a single problem not a related set of problems. Ramsey and Grefenstette come closest to our approach and use previously stored solutions to initialize the genetic algorithm's initial population and thus increase a genetic algorithm's performance in an environment that periodically changes with time [Ramsey and Grefensttete, 1993]. They report encouraging results in such environments (better performance). Our work however, tackles sets of similar problems and addresses the issues of which and how many cases to inject into the population. To our knowledge, the earliest work in combining genetic algorithms and case-based reasoning was done by Louis, McGraw, and Wyckoff who used case-based reasoning principles to explain solutions found by genetic algorithm search [Louis et al., 1993].

Figure 1 shows a conceptual view of a simple version of our system. There are two basic modules, the search module and the CBR module. The search module in this paper consists of a genetic algorithm and the CBR module for this work only contains a few cases relevant to the problem we are going to solve. When confronted with a problem, the CBR module looks in its case base for similar problems and their associated solutions. If any similar problems are found their solutions are injected into the initial population (a small percentage of the population is initialized this way) of the genetic (or other search) algorithm and the GA searches from this population.

Figure 1: Conceptual view of our system

The case-base does what it is best at -- memory organization; the genetic algorithm handles what it is best at -- adaptation. The resulting combination takes advantage of both paradigms; the genetic algorithm component delivers robustness and adaptive learning while the case-based component speeds up the system.

The genetic algorithm also provides a ready-made case generating mechanism as each individual generated during a GA search can be thought of as part of a case. However, even a population size of a 100 run for a 100 generations can generate up to cases leading to problems with storage and indexing. CBR systems usually have difficult in finding enough cases; our problem is the opposite. We need to sift through a large number of cases to find potential seeds for the initial population. CBR systems often include principled mechanisms for generalization and abstraction of cases because it may be impractical to store all prior experience in detail. For our system, generalized individuals correspond to schemas (subsets of the search space) and can be stored as cases. This is described in [Louis et al., 1993]. Since genetic algorithms should be used in poorly-understood domains, one side-effect of this approach is that the generalizations and abstractions may in fact induce a useful domain model.

There are at least three points of view to consider when combining genetic algorithms and case-based reasoning.

From the case-based reasoning point of view, genetic algorithms provide a robust mechanism for adapting cases in poorly understood domains conducive to a case-based representation of domain knowledge. In addition, the individuals produced by the GA furnish a basis for generating new cases to be stored in the case-base.
From the point of view of genetic algorithms, case-based reasoning provides a long term memory store in its case-base. A combined GA-CBR system does not require a case-base to start with and can bootstrap itself by learning new cases from the genetic algorithm's attempts at solving a problem. This paper stresses the genetic algorithm point of view and addresses the following questions.
1. What cases are relevant, or, which previously found solutions do we use to seed a genetic algorithm's population? Selecting appropriate cases is important since it directly relates to speed and efficiency of a search. If the ``right'' cases are injected, a GA does not have to waste time exploring unpromising subspaces because these cases provide partial, near-optimal, or optimal solutions. In other words, they provide good building blocks for solutions to the current problem. This is crucial if the size of a search space is extremely large since injecting appropriate cases will provide the genetic algorithm with better starting points and thus speed up search and improve performance.
2. How many cases should we inject into the initial population? Search algorithms must balance exploitation with exploration for two reasons: 1) too much exploitation in less promising areas may cause the loss of crucial information on correct solutions and convergence to a local optimum, and 2) speed and efficiency will suffer if the algorithm wastes time in exploring unpromising areas. A randomly initialized population has maximum diversity and thus maximum capacity for exploration [Louis and Rawlins, 1993]. We expect injected individuals (cases) to have a higher fitness than randomly generated individuals. Since a GA focuses search in the areas defined by high fitness individuals, we expect large numbers of high fitness individuals in an initial population to increase exploitation. This increased concentration or exploitation of a particular area can cause the GA to get stuck on a local optimum. Balancing exploitation with exploration now means balancing the number of randomly generated individuals with the number of injected individuals in the initial population.
From the machine learning pointing of view, using cases instead of rules to store information provides an alternative to Holland classifier systems [Holland, 1975, Goldberg, 1989] which use simple string rules for long term storage and genetic algorithms as their learning or adaptive mechanism. In our system, the case-base of problems and their solutions supplies the genetic problem solver with a long term memory. The combination takes advantage of both paradigms but we have to be careful and ensure that the combined system does not inherit the deficiencies of both.

Our results on the open shop scheduling (OSSP) and re-scheduling problems (OSRP) establish the feasibility of this approach and shed light on some of the issues above and raise others to explore. In the next section, we describe a simple system to test the feasibility of combining genetic algorithms and case-based reasoning on OSSP and OSRP problems.

Next: Methodology and Results Up: No Title Previous: Introduction

Sushil J. Louis
Wed Jun 25 14:31:51 PDT 1997