\documentstyle[art11,fullpage,psfig]{article}
%\documentstyle[rep10,twocolumn,art11]{article}
%\pagestyle{empty}
\input{comm}
\begin{document}


\title{
Combining Robot Control Strategies using Genetic Algorithms with Memory.
} 
%
\author{
Sushil J. Louis \\
Department of Computer Science  \\
University of Nevada\\
Reno - 89557 \\
sushil@cs.unr.edu \\
\and
Gan Li \\
Department of Computer Science  \\
University of Nevada\\
Reno - 89557 \\
}


\maketitle

\begin{abstract}
We use a genetic algorithm augmented with a long term memory to design
control strategies for a simulated robot, a mobile vehicle operating in a
two-dimensional environment. The simulated robot has five touch sensors, two
sound sensors, and two motors that drive locomotive tank tracks. A genetic
algorithm trains the robot in several specially-designed simulation
environments for evolving basic behaviors such as food approach, obstacle
avoidance, and wall following.  Control strategies for a more complex
environment are then designed by selecting solutions from the stored strategies
evolved for basic behaviors, ranking them according to their performance in the
new complex environment and introducing them into a genetic algorithm's initial
population. This augmented memory-based genetic algorithm
quickly combines the basic behaviors and finds control strategies for
performing well in the more complex environment. 
\end{abstract}

%\noindent
%{\bf Keywords:} Genetic Algorithms, Simulated Robotics.

\section{Introduction}

One of the main concerns in robotics is to plan a path for a robot system
moving purposely and safely in an environment filled with known or unknown
obstacles. Using the sensor motion planning approach, information about
obstacles is assumed to be unknown or only partially known and local on-line
information is assumed to come from sensory feedback. Since no detailed model
of the environment is assumed, planning is performed continuously based on
whatever partial information is available at the moment.  The advantages of
sensor motion planning are twofold: (1) it can deal with unstructured
environments and the uncertainty typical of such environments, and (2) it
requires much less memory or computation because relatively little information
has to be processed during each step. On the negative side, generality is an
elusive goal and optimality is usually ruled out.
 
A control strategy, mapping sensory feedback into a robot's actuators, is an
essential component for a mobile robot under the sensor motion planning model.
It can be designed by a human according to both the physical structure and the
behavior requirements of the robot. However, human design of control strategies
doesn't always work well because sometimes desired behaviors are fuzzy
and difficult to explicitly define and not all useful behaviors of an
autonomous robot can be determined a-priori, or recognized by humans.

During the last decade, much work has been done to explore the evolution of
robot control strategies. A series of technical reports has been published by
Cliff, Husbands, and Harvey on using genetic algorithms (GAs) to design
neural-network control architectures for a simulated visually guided
robot~\cite{Cliff}. Koza has used genetic programming to evolve LISP
programs that control and guide a variety of simulated robots performing
navigation and other tasks~\cite{Koza}. Murray and Louis used genetic
algorithms to first design combinational circuits for basic (low-level)
behaviors, then used the genetic algorithm to design a switch to choose between
these low-level behaviors for performing more complex tasks~\cite{Murray}.

We cast low-level robot control strategy design as a search problem in a search
space of possible strategies and use a non-traditional genetic algorithm to
computationally design control strategies, in the form of a combinational
circuit connecting sensor inputs to actuators, for a simulated robot (simbot)
which can navigate and eat food in a two-dimensional environment with
rectangular obstacles. At first, the simbot learns (and memorizes) basic
behaviors such as food approach, obstacle avoidance, and wall following in
specially-designed separate simulation environments. The best performing simbot
from each environment can be considered an expert at one specific behavior and
its control strategies must have some useful building blocks corresponding to
this behavior. Next, seed solutions (cases) are selected from these experts by
calculating and ranking their performance in a new and more complex target
environment. Finally, we inject these cases as seeds into the initial
population of another GA running in the more complex target
environment. Selecting the ``right'' number of ``appropriate'' cases results in
speedy design of promising control strategies for the simbot. Our strategy
therefore seeks to provide a genetic algorithm with a long term memory in the
form of a case-base, borrowing ideas from the field of case-based
reasoning~\cite{Riesbeck}.

In the next section, we introduce the traditional genetic algorithm and
describe our modifications. In addition we provide a brief description of
case-based reasoning and the combined system. Section~\ref{simulation}
describes the simulated robot and its environment. We present the experimental
parameters used by our system in section~\ref{params}. Experimental results are
displayed and analyzed in section~\ref{results}, followed by conclusions and
future work.


\section{A Genetic Algorithm}

Genetic algorithms (GAs) are stochastic, parallel search algorithms based on
the mechanics of natural selection, the process of
evolution~\cite{Holland,Goldberg}.  GAs were designed to efficiently search
large, non-linear, poorly-understood search spaces where expert knowledge is
scarce or difficult to encode and where traditional optimization techniques
fail.  They are flexible and robust, exhibiting the adaptiveness of biological
systems. As such, GAs appear well-suited for 
searching the large, poorly-understood spaces that arise in design problems;
specifically designing control strategies for mobile robots.

\subsection{The CHC Genetic Algorithm}

CHC, the non-traditional genetic algorithm used in this paper, differs
from traditional GAs in a number of ways~\cite{Eshelman}:
\be
\item For a population of size N, it guarantees the best individuals found so
far always survive by putting the children and parents together and selecting
the best N individuals for further processing. In a traditional GA, the parent 
population does not survive to the next generation.
%
\item To avoid premature convergence, two similar individuals
separated by a small Hamming distance (this threshold is set by the user) are
not allowed to mate.  
%
\item During crossover, two parents exchange exactly oned- half of their
randomly selected {\em non-matching} bits.
%
\item Mutation isn't needed during normal processing. 
%
\item  Instead, an external mutation operator re-initializes the
population when the population has converged or search has stagnated. 
\ee
The CHC genetic algorithm generally does well with small
populations~\cite{Eshelman}. Limited
resources and the computational cost of the simulations led to our use of small
populations and selection of the CHC genetic algorithm for this work.

\subsection{Case-Based Reasoning}

Case-based reasoning (CBR) is based on the idea that reasoning and explanation
can best be done with reference to prior experience, stored in memory as {\it
cases}~\cite{Riesbeck}. When confronted with a problem to solve, a
case-based reasoner extracts the most similar case in memory and uses
information from the retrieved case and any available domain information to
tackle the current problem.  This paper uses the basic tenet
of CBR --- the idea of organizing information based on ``similarity'' --- to
help augment genetic algorithm search.

\section{Combining GAs and CBR}

Combining genetic algorithms with a long term memory model, like case-based
reasoning, combines the strengths of both approaches. The case-base does what
it is best at --- memory organization; the genetic algorithm handles what it is
best at --- adaptation.  The resulting combination takes advantage of both
paradigms; the genetic algorithm component delivers robustness and adaptive
learning while the case-based component speeds up the system. Furthermore, in
many application areas we confront sets of similar problems. It makes little
sense to start a problem solving search attempt from scratch with a random
initial population when previous search attempts may have yielded useful
information about the search space.  Instead, seeding a genetic algorithm's
initial population with solutions to similar previously solved problems can
provide information (a search bias) that, hopefully, increases the efficiency
of the search. If no useful information was obtained or obtainable, a randomly
initialized population may be our only choice. Our approach borrows ideas from
case-based reasoning (CBR) in which old problem and solution information,
stored as cases in a case-base, help solve a new
problem~\cite{Riesbeck}. Although we restrict ourselves to genetic algorithms
in this paper, we should be able to substitute, with minor modifications, any
population based search algorithm for the genetic algorithm.  We believe that
evolutionary programming, genetic programming, and evolution strategies are
especially suitable. Figure~\ref{sys} shows a conceptual view of a first
version of our system.
\begin{figure}[htp] 
\centerline{ 
\psfig{figure=simplesys.ps,height=1.75in,width=2.0in} 
} 
\caption{Conceptual view of our system} 
{\label{sys}} 
\end{figure}

Previous work in this area includes Louis, McGraw, and Wyckoff's paper that
applied Case-Based Reasoning (CBR) to GAs as an analysis tool for the parity
combinational circuit design problem~\cite{Sushil}.  Ramsey and Grefensttette
seeded a genetic algorithm's initial population with cases and reported
improved performance on the non-stationary functions that they
used~\cite{Ramsey}. More recent work by Louis, Liu, and Xu addresses the
questions of which cases are ``appropriate'' and how many cases to
inject~\cite{xliu,isca_sf} and establishes the feasibility of the method using
the open-shop scheduling and rescheduling problem and the combinational circuit
design problem. 

\section{Simulation}  {\label{simulation}}

For our problem, the environment is a bounded square area with 300 units on a
side as shown in Figure~\ref{Fig1}. There are several obstacles and food
sources in the simulation environment and obstacles are modeled by rectangular
boxes. The robot cannot enter either the boundary or the obstacles and the
locations and the amount of food are fixed in one environment. Food
sources produce a sound signal that can be detected by the simbot. A food
signal can penetrate obstacles and can be heard by the simbot only if the
simbot is within the signal range. If the distance between the robot center and
a piece of food is less than or equal to five units, the food is assumed to be
``eaten'' and disappears from the environment. 

\begin{figure}[htp]
\centerline{
        \psfig{figure=figs/fig1.ps,height=2.0in,width=2.5in}
}
\caption{Simulation environment and simbot}
{\label{Fig1}}
\end{figure}

\subsection{Simbot}

Figure~\ref{Fig1} also shows that the simbot has four touch sensors, one
central touch sensor, two hearing sensors, and two locomotive tank tracks. Each
touch sensor or hearing sensor is fixed on the end of a one-dimensional
flexible whisker. The touch sensors and hearing sensors simulate hands and ears
letting the robot feel obstacles and hear food in the environment.  Each sensor
has two states: 0 or 1. A combinational logic circuit maps the sensor
states into binary control commands for each locomotive tank
track. The tracks have two forward, one stop, and one reverse speed;
the four possible speeds need two bits to encode.  The simbot moves by
coordinating the two tank tracks.  
%Table~\ref{Tab1}
%shows the mapping between the control circuit's binary outputs to motor
%commands. 
%\begin{table} 
%\center
%\begin{tabular}{|c|c|c|}  \hline
%DCC & BCC & \\ (Decimal Control & (Binary Control & Speed 
%\\ Command) & Command) & \\ \hline   
%0 & 00 & reverse \\ 
%1 & 01 & stopped \\ 
%2 & 10 & forward \\ 
%3 & 11 & fast forward \\ \hline
%\end{tabular}
%\caption{The motor commands for the two locomotive tank tracks}
%{\label{Tab1}}
%\end{table} 

\subsection{Encoding}

In this paper, the control circuit is a $7 \times 6$ gate array that must be
encoded into a binary chromosome.  There are $(7 \times 6) - 6 = 36$ useful
logical gates in the gate circuit because only four out of seven outputs of the
control circuit are used for expressing the binary control commands for the two
robot tracks. For each gate, four bits are needed for expressing the 16
possible logic gates with two inputs and one output. Therefore the chromosome
length will be $36 \times 4 = 144$. We map a two-dimensional logic circuit to a
one-dimensional chromosome by concatenating adjacent rows~\cite{Louis}.
%This is shown schematically in Figure~\ref{Fig2}.
%\begin{figure}
%\centerline{
%        \psfig{figure=figs/fig2.ps,height=3.0in,width=4.5in}
%}
%\caption{Encoding from a $7 \times 6$ control circuit to a chromosome}
%{\label{Fig2}}
%\end{figure}         

The simulation process provides an evaluation of a candidate combinational
circuit for controlling the robot. The encoded chromosome of the combinational
logical gate is obtained from the GA and evaluated in the simulation
environment using a fitness function that measures how well the simbot
performed its assigned task. The fitness value is returned back to the 
GA.

%The simulation pseudocode is shown below:
%\newline
%\begin{tabbing}
%{\bf Be}\={\bf gin} \\ 
%\> Re\=ad in the chromosome to be tested from\\
%\>\> the GA; \\
%\> Decode the chromosome into a control circuit; \\
%\> Initialize the simulation environment; \\
%\> Initialize the sensor statuses of the simbot; \\ 
%\> $For$ $each$ $simulation$ $step:$ \\
%\> {\bf Be}\={\bf gin} \\
%\>\>   Calculate outputs of the control circuit; \\
%\>\>   Calculate turning degree and step length; \\
%\>\>   Calculate simbot's new position for next step; \\
%\>\>   Ca\=lculate contribution to fitness according to the simbot's \\
%\>\>\> Performance in this step; \\
%\>\>   Re\=set the sensor statuses according to their \\
%\>\>\> Modified position; \\
%\> {\bf End} \\
%\> Record trace into a data file if required;\\
%\> Return fitness value to the GA; \\
%{\bf End} \\
%\end{tabbing}

\section{Experimental Setup} {\label{params}}

In the first three experiments, the robot was trained to develop the
three basic behaviors of food approach, obstacle avoidance, and wall
following separately in three different simulation environments. The
environments are shown in Fig.~\ref{Fig345}. The solutions found here
serve as seeds for developing control strategies for a navigation task
(find all the food) in a complex environment that looks like an office
area with rooms (open space) separated by walls (large obstacles).
There are four food sources distributed in four of the nine rooms.

Each simulation process consists of $1,000$ time steps. Both the starting
position of the robot and the initial sensor values are fixed for each
experiment. The performance of the robot was evaluated using specific fitness
parameters comprised of seven parts calculated at each time step and
summed over all time steps for a final fitness value. The seven parts
are listed in Table~\ref{Tab60}.
\begin{table}
\caption{The components of the fitness function}
\begin{center}
\begin{tabular}{|c|l|}  \hline 
Parameter & Description \\ \hline 
f1  & a bonus for hearing food \\
f2  & a penalty for collision \\
f3  & a bonus for long, straight, and forward motion \\
f4  & a bonus for moving along an obstacle or boundary \\
f5  & a penalty for head-on touch against an obstacle or boundary \\ 
f6  & a bonus for eating all food in less than 1,000 steps \\
f7  & a huge bonus for eating food \\ \hline
\end{tabular}
{\label{Tab60}}
\end{center}
\end{table}

The fitness of a candidate circuit is a weighted sum of these seven
components and we can emphasize one or more behaviors for the
simulated robot by adjusting the weights. We ran the GA ten times for
each experiment with different random seeds. In all experiments, the
genetic algorithm's population size was $100$ run for $100$
generation. Each chromosome was $144$ bits long. The crossover probability was
$1.0$ and no normal mutation is needed as mentioned before.

\section{Results and Analysis} {\label{results}}

\subsection{Evolving Three Basic Behaviors}

In the first three experiments, we use CHC without scaling and the entire 
initial population is randomly generated. The threshold for the Hamming 
distance is ($\mbox{length-of-chromosome} / 4) = 36$ as is the norm for 
CHC. Scaled CHC is used to overcome the possible monopoly of solutions with
extremely high fitness (caused by injection) in the complex environment.
The fitness functions that were used are shown below:
\be
\item {\bf Food Approach (FA):} \(f1 + f2 + f3 + f4 + f5 + f6 + f7 \) 
\item {\bf Obstacle Avoidance (OA):} \( f1 + (5 \times f2) + f3 + f4 + (5
\times f5) + f6 + f7 \) 
\item {\bf Wall Following (WF):}  \( f1 + f2 + f3 + (5 \times f4) + f5 + (2
\times f6) + f7 \)  
\ee

\begin{figure}%[htp]
\center{
\begin{minipage}[t]{2.0in}
%\centerline{
        \psfig{figure=figs/fig3.ps,height=1.5in,width=1.5in}
%}
\end{minipage}
%\caption{The simbot's path when learning food approach behavior}
%{\label{Fig3}}
%\end{figure}
%\begin{figure}[htp]
%\centerline{
\begin{minipage}[t]{2.0in}
        \psfig{figure=figs/fig4.ps,height=1.5in,width=1.5in}
\end{minipage}
%}
%\caption{The simbot's path when learning obstacle avoidance behavior}
%{\label{Fig4}}
%\end{figure}
%\begin{figure}[hbp]
%\centerline{
\begin{minipage}[t]{2.0in}
        \psfig{figure=figs/fig5.ps,height=1.5in,width=1.5in}
\end{minipage}
%}
}
\caption{The simbot's path when learning food approach (left),
obstacle avoidance (middle), and wall following (right) behavior}
{\label{Fig345}}
\end{figure}

CHC proved to be a reasonable and effective method for designing basic control
strategies for a simbot. As shown in Fig~\ref{Fig345}, the simbots
have successfully evolved the expected basic behaviors of food
approach, obstacle avoidance, and wall following. This figure depicts
the paths taken by the best individual for each of these first 
three experiments. Note that the control circuits may not be optimal.


\subsection{Designing Control Strategies in an Office Environment}

In the next set of experiments we designed the control strategies of a
simbot in a complex office environment by injecting a suitable number
of appropriate cases into the GA's initial population. First, we copied
one case corresponding to the best individual for a basic behavior
from each of the first three experiments for a total of three cases.
Second, five cases were selected from the best $30$ candidates of the
first three experiments' results according to the candidates' fitness
values {\em in the office environment}. We found that injecting five
cases produced better performance than injecting a larger or smaller
number of cases.
% as shown in Figure~\ref{tr5}. 
We believe, that injecting a larger number of cases leads to
insufficient exploration and injecting a fewer number of cases leads
to insufficient exploitation. Five percent is a happy medium.  We call
the GA injected with these cases the {\bf T}arget {\bf R}anked {\bf
G}enetic {\bf A}lgorithm or TR-GA, and the GA injected with the best
cases in the basic behavior environments the {\bf S}ource {\bf R}anked
{\bf G}enetic {\bf A}lgorithm or SR-GA.  We also ran the GA with a
randomly initialized population (RI-GA) for comparison purposes.
%\begin{figure}[hbp]
%\centerline{
%        \psfig{figure=fig73.ps,height=1.75in,width=2.5in}
%}
%\caption{Comparing different injection percentages}
%{\label{tr5}}
%\end{figure}

%The maximum and average performance curves over $10$ runs of the
%genetic 
The maximum performance curves over $10$ runs of the genetic
algorithm are shown in Fig.~\ref{Fig6}.% and~\ref{Fig7} respectively.
\begin{figure}[htp]
%
\centerline{
        \psfig{figure=figs/fig6.ps,height=2.0in,width=2.5in}
}
\caption{The genetic algorithm maximum performance curves}
{\label{Fig6}}
\end{figure}
%\begin{figure}[htp]
%\centerline{
%        \psfig{figure=figs/fig7.ps,height=2.0in,width=2.5in}
%}
%\caption{The  genetic algorithm average performance curves} 
%{\label{Fig7}}
%\end{figure}
As we can see, the TR-GA significantly out-performed its competitors. Although
Fig.~\ref{Fig6}
% and~\ref{Fig7} 
compares a TR-GA
with five injected individuals to a SR-GA that used three injected
individuals, the TR-GA with three injected individuals also did better than
the SR-GA, while not doing as well as the TR-GA with five. Somewhat 
surprisingly the randomly initialized GA did better than the SR-GA, indicating
that the best control strategies for basic behaviors may not contain building
blocks that help navigation in the office environment and/or may be of low
enough fitness to be eliminated during GA processing. More evidence is
presented in Table~\ref{Tab2} which compares the fitness, in the office
environment, of cases injected into the SR-GA with those injected into the
TR-GA. We also noted that solutions with high initial fitness in the target
office environment may be ranked low in their source environment.
\begin{table} \center
\caption{Cases used for SR-GA and TR-GA and their fitness in the office
environment. FA = food approach, WF = wall following, OA = obstacle avoidance.}
\begin{tabular}{|c|l|c|}  \hline
Case Source & Case & Fitness in Office  \\ 
	    & & Environment \\ \hline   
	& Best of FA 	& 2,807 \\ % & 69 \\ 
SR-GA 	& Best of OA 	& 2,000 \\ % & 76 \\ 
	& Best of WF 	& 1,220 \\ % & 79 \\ \hline
\hline
	& FA-1 		& 15,524 \\ % & 73 \\
	& WF-1		& 13,503 \\ % & 55 \\
TR-GA 	& FA-2		& 10,414 \\ %& 49 \\
	& WF-2		& 6,834 \\ % & 73 \\
	& OA-1		& 6,754 \\ % & 71 \\  \hline
\hline
\end{tabular}
{\label{Tab2}}
\end{table} 

The results indicate that injecting appropriate cases into the initial
population of a genetic algorithm can not only help speed up convergence but
also provide better quality solutions. However, this will only work if injected
solutions contain useful building blocks for solving the new problem, that is,
if injected solutions are similar enough to solutions for the new
problem. Assuming that problem similarity implies solution similarity is a
pre-requisite for our system to perform well~\cite{xliu,isca_sf}, but when
trying to combine several solutions, we had to re-validate this assumption by
evaluating and ranking candidates for injection in the new target
environment. Previous results had not indicated the need for ranking cases in
the new environment before injection~\cite{xliu}. However, we obtained good
agreement in our estimate of the number of individuals to inject. Earlier work
had shown that injecting only a small percentage of the population led to
good performance while injecting larger percentages led to quick convergence
to a local optimum~\cite{xliu}. This agreed with the experimental
results reported in this paper where we found that injecting five individuals
($5\%$ of the population) provided better performance compared to experiments
involving the injection of a smaller or larger number of individuals.

In addition, we need to make sure that the injected individuals contain at
least one representative of each basic behavior. Otherwise, the missing
basic behavior may have to be evolved from scratch -- from the randomly
initialized component of the population. Once we have individuals representing
each of the basic behaviors, the rest of the candidates for injection
compete for the remaining slots on the basis of their performance in the
target environment. This ensures that the population is initialized with the 
needed variety of high performance building blocks.

Figure~\ref{Fig8} presents the path of a simbot, controlled by a circuit
designed by the TR-GA, in the office environment. Note that although the
environment contains many traps in the form of rooms with small doorways, the
simbot does not get trapped and manages to avoid two unpromising rooms
altogether. The TR-GA designed simbot also eats $70\%$ of the food
over ten runs compared to only $40\%$ for the randomly initialized GA.
%The control circuit for this simbot is shown in Figure~\ref{Fig9}.
\begin{figure}[htp]
\centerline{
        \psfig{figure=figs/fig8.ps,height=1.75in,width=2.5in}
}
\caption{Simbot path in an office environment for a circuit designed by the
TR-GA} 
{\label{Fig8}}
\end{figure}
%\begin{figure}[htp]
%\centerline{
%        \psfig{figure=figs/fig9.ps,height=3.0in,width=4.5in}
%}
%\caption{The control circuit corresponding to the simbot}
%{\label{Fig9}}
%\end{figure}


\section{Conclusions and Future Work}

The paper demonstrates that we can evolve basic behaviors and adapt to the
environment using CHC, a non-traditional genetic algorithm. Injecting selected
solutions stored in a long term memory and corresponding to these basic
behaviors into the GA's initial population allows us to quickly and
successfully design control strategies for a robot navigating in a complex
office environment. The experimental results are promising and the simulated
robot is faster and accomplishes more of the task than the robot designed by a
randomly initialized GA. We are currently investigating parallelization of the
code to handle a larger population size in a reasonable amount of time. This
will allow us to handle more complex environments. We are also planning to
transfer the evolved circuits to a real mobile robot, thus testing our work on
physical hardware with all its concomitant problems.  We will be investigating
the effect of noise on performance -- circuits evolved in the presence of noise
may be more robust and better able to handle the noise inherent in a real
mobile robot operating in a complex environment.

We have only reported on non-randomly initializing genetic algorithms in this
paper. However, the concept is extendable to other population based searches
like evolutionary programming, evolution strategies, and genetic
programming. In addition, there is no reason why injection of individuals
should only take place at initialization -- we can inject individuals during
the course of GA's run. We believe that investigating the combination of
population-based search algorithms with a long term memory promises to be a
fruitful area of future research. The hope is that as the number of problems
solved by the combined system grows, the time taken to solve a new problem
shrinks. 

\section{Acknowledgements}
This material is based upon work supported by the National Science
Foundation under Grant No. 9624130.

\bibliographystyle{plain}
\bibliography{biblio}

\end{document}