\documentstyle[leqno,twoside,11pt,ltexproc]{article}

\input{psfig}

\begin{document}
\bibliographystyle{siamproc}
\cleardoublepage
\pagestyle{myheadings}

\title{Chapter 1 \\
Parallel Computation of Steiner Minimal Trees}
\author{Frederick C. Harris, Jr.\thanks{Department of Computer Science,
University of Nevada, Reno, NV 89557-0148, fredh@cs.unr.edu}}
\date{}
\maketitle
\markboth{Harris}{Parallel Computation of Steiner Minimal Trees}

\pagenumbering{arabic}

\begin{abstract}
	Given a set of  $N$ cities, construct a connected network which
	has minimum length.  The problem is simple enough, but the
	catch is that you are allowed to add junctions in your
	network.  Therefore the problem becomes how many extra
	junctions should be added, and where should they be placed so
	as to minimize the overall network length.

	This intriguing optimization problem is known as the Steiner
	Minimal Tree Problem (SMT), where the junctions that are added
	to the network are called Steiner Points.

	The focus of this paper is the parallel computation for the
	generation of what Pawel Winter termed T\_list and its
	implementation.  This generation of T\_list is followed by the
	extraction of the proper answer.  When Winter developed his
	algorithm, the time for extraction dominated the overall
	computation time.  After Cockayne and Hewgill's work, the time
	to generate T\_list dominated the overall computation time.
	The ideas we present were implemented in a program called
	PARSTEINER94, and the results show that the time to generate
	T\_list has now been cut by an order of magnitude.   So now the
	extraction time once again dominates the overall computation
	time.
\end{abstract}

\section{Introduction}

	Minimizing a network's length is one of the oldest
	optimization problems in mathematics and, consequently, it has
	been worked on by many of the leading mathematicians in
	history.  In the mid-seventeenth century a simple problem was
	posed:  Find the point $P$ that minimizes the sum of the
	distances from $P$ to each of three given points in the plane.
	Solutions to this problem were derived independently by Fermat,
	Torricelli and Cavalieri.  They all deduced that either $P$ is
	inside the triangle formed by the given points and that the
	angles at $P$ formed by the lines joining $P$ to the three
	points are all $120^{\circ}$, or $P$ is one of the three
	vertices and the angle at $P$ formed by the lines joining $P$
	to the other two points is greater than or equal to
	$120^{\circ}$.

	In the nineteenth century a mathematician at the University of
	Berlin, named Jakob Steiner, studied this problem and
	generalized it to include an arbitrarily large set of points in
	the plane.  This generalization created a star when $P$ was
	connected to all the given points in the plane, and is a
	geometric approach to the 2-dimensional center of mass
	problem.

	In 1934 K\"{o}ssler and Jarn\'{i}k generalized the network
	minimization problem even further~\cite{jk34:omgodb}:  Given
	$n$ points in the plane find the shortest possible connected
	network containing these points.  This generalized problem,
	however, did not become popular until the book, {\it What is
	Mathematics}, by Courant and Robbins~\cite{cr41:wim}, appeared
	in 1941. Courant and Robbins linked the name Steiner with this
	form of the problem proposed by K\"{o}ssler and Jarn\'{i}k, and
	it became known as the Steiner Minimal Tree problem.  The
	general solution to this problem allows multiple points to be
	added, each of which is called a Steiner Point.

	Much is known about the exact solution to the Steiner Minimal
	Tree problem.  Those who wish to learn about some of the
	spin-off problems are invited to read the introductory article
	by Bern and Graham~\cite{bg89:tsnp}, the excellent survey paper
	on this problem by Hwang and Richards~\cite{hr92:survey}, or
	the recent volume in The Annals of Discrete Mathematics devoted
	completely to Steiner Tree problems~\cite{hrw:annals}.  Some of
	the basic pieces of information about the Steiner Minimal Tree
	problem that can be gleaned from these articles are: (i) Melzak
	developed the first algorithm for Steiner Minimal Trees in
	1961 \cite{melzak61:otpos}, (ii) all of the original $n$ points will be
	of degree 1, 2, or 3, (iii) the Steiner Points are all of
	degree 3, (iv) any two edges meet at an angle of at least
	$120^{\circ}$ in the Steiner Minimal Tree, and (v) at most
	$n-2$ Steiner Points will be added to the network.

	This paper concentrates on the Steiner Minimal Tree problem,
	henceforth referred to as the SMT problem.  In Section 2 we
	review the known algorithms for calculating Steiner Minimal
	Trees. In Section 3 we present the first parallel algorithm for
	calculating SMT's.  Some new results are presented in Section~4,
	with conclusions and future work in Section~5.

\section{Current Implementations}

	The first computational results were presented by Cockayne and
	Schiller~\cite{cs72:cosmt}.  Their program, called STEINER, was
	written in FORTRAN on an IBM 360/50 at the University of
	Victoria.  STEINER could calculate the SMT for any 7 point
	problem in less than 5 minutes of cpu time.  When the problem
	size was increased to 8 the SMT could be found if 7 of the
	vertices were on the Steiner Hull.  When this condition held it
	could calculate the SMT in under 10 minutes, but if this
	condition did not hold it would take an unreasonable amount of
	time.  

	Cockayne called STEINER a prototype for calculating SMT's   and
	allowed Boyce and Seery of Bell Labs to obtain a copy of his
	code to improve the work.  They improved the code, renamed it
	STEINER72, and were able to calculate the SMT for all 9 point
	problems and most 10 point problems in a reasonable amount of
	time~\cite{bs75:aivocspsftmnp}.  Boyce and Seery continued
	their work and developed another version of the code that they
	thought could solve problems of size up to 12 points, but 
	they reported no computation times.

	Melzak had identified invalid tree structures in his
	work~\cite{melzak61:otpos} but these were not characterized
	until Pawel Winter's work in 1981.  Winter's characterizations
	were based upon several geometric constructions which enable
	many of the possible combinations previously generated to be
	eliminated.  He implemented these  improvements in a program
	called GEOSTEINER~\cite{winter85:aaftspitep} which he wrote in
	SIMULA 67 on a UNIVAC-1100.

	In his work, Winter split the computation process into two
	phases.  The first phase, called T\_list Generation, generates
	a set of Full Steiner Trees (FST's) which have $n$ vertices and
	$n-2$ Steiner Points. The second phase is called Extraction
	and is a branch and bound process, primed with the length of
	the MST, looking for the shortest subset that covers all the
	original vertices.

	GEOSTEINER could calculate in under 20 seconds SMT's for all
	randomly generated sets with 15 points.  This improvement was
	put into focus when he mentioned that all previous
	implementations took more than an hour for non-degenerate
	problems of size 10 or more.  Winter tried randomly-generated
	20-point problems but did not give results since some of them
	did not finish in his cpu time limit of 30 seconds.  He made
	only two comments for problems bigger than size 15: (i) the
	extraction time was dominating the overall computation time and
	(ii) ``with further improvements, it is reasonable to assert
	that point sets up to 30 V-points could be solved in less than
	an hour~\cite{winter85:aaftspitep}.''

\newpage
%This was put here to eliminate orphans and widows on this and the next
% few pages.

	The next major program, EDSTEINER86, was developed in FORTRAN
	on an IBM 4381 by Cockayne and Hewgill~\cite{ch86:ecosmtitp}.
	This implementation was based upon Winter's results but had
	enhancements in the extraction process.  EDSTEINER86 was able
	to calculate the FST for 79 out of 100 randomly-generated
	32-point problems.  For these 79 problems the cpu time for
	T\_list varied from 1 to 5 minutes, while the extraction time
	never exceeded 70 seconds.

	Cockayne and Hewgill subsequently improved their SMT program
	and renamed it EDSTEINER89~\cite{ch92:icopsmt}.  This
	improvement was completely focused on the extraction process.
	EDSTEINER89 was still written in FORTRAN but was run on a SUN
	3/60 workstation.  They randomly generated 200 32-point
	problems to solve and found that the generation of T\_list
	dominated the computation time for problems of this size.  The
	average time for T\_list generation was 438 seconds while the
	average time for forest management and extraction averaged only
	43 seconds.  They then focused on 100 point problems and set a
	cpu limit of 12 hours.  The average cpu time to generate
	T\_list was 209 minutes for these problems, but only 77
	finished the extraction in the cpu time limit.  These programs
	and their results are summarized in
	Table~\ref{table:code.results}.

                \begin{table}[h]
                \begin{center}
                \begin{tabular}{|l|l|l|r|}\hline
                Program & Author(s) & Institution &  Points \\ \hline
                STEINER & Cockayne \& Schiller & Univ of Victoria  &  7 \\
                STEINER72 & Boyce \& Seery & ATT Bell Labs & 10 \\
                STEINER73 & Boyce \& Seery & ATT Bell Labs & 12 \\
                GEOSTEINER & Winter & Univ of Copenhagen  & 15 \\
                EDSTEINER86 & Cockayne \& Hewgill & Univ of Victoria & 30 \\
                EDSTEINER89 & Cockayne \& Hewgill & Univ of Victoria & 100 \\
                \hline
                \end{tabular}
                \end{center}
                \caption{\label{table:code.results}SMT Programs, authors,
                        and results.}
                \end{table}


\section{Parallel Algorithm}

	Since the sequential version of the code does not lend itself
	to easy parallelization, one needs to back up and develop an
	understanding for how the algorithm works.  The first thing
	that is obvious from the code is that you select a left subtree
	and then try to mate it with possible right subtrees.  These
	subtrees, and the resulting children,  are kept on Q\_list
	which was primed with all the original vertices.  Upon further
	examination we come to the conclusion that a left tree will
	mate with all trees that are shorter than it, and all trees of
	the same height that appear after it on Q\_list, but it will
	never mate with any tree that is taller.

	This analysis yields the need for the first function in our
	parallel algorithm which we will call \verb+a_before_b()+.
	This piece of code will determine if node $a$ would have come
	before node $b$ in Winter's Q\_list.  This is necessary since
	in a parallel implementation the calculations and additions to
	Q\_list will not happen in the same order as they would have in
	the sequential version.  With this function  we are able to
	finish the function \verb+mate()+.  This is a boolean function
	which determines if node $a$ (left) can mate with node $b$
	(right).

	Now that the foundational building blocks are completed, design
	attention can be turned to the main algorithm.  The description
	of this is in a master--slave perspective.  This perspective
	was taken due to the structure of most parallel architectures,
	as well as the fact that all nodes on the Q\_list need a
	sequencing number assigned to them. The master will therefore
	be responsible for numbering the nodes and maintaining the main
	Q\_list and T\_list.

	The idea for the master--slave interaction came from a paper
	by Quinn and Deo~\cite{qd85:ubspbbbba}, on parallel algorithms
	for Branch-and-Bound problems.  Their idea was to let the
	master have a list of work that needs to be done.  Each slave
	is assigned to a processor.  Each slave requests work, is given
	some, and during its processing creates more work to be done.
	This new work is placed in the master's work list, which is
	sorted in some fashion.  When a slave runs out of work to do,
	it requests more from the master.  They noted that this leaves
	some processors idle at times (particularly when the problem
	was starting and stopping), but this approach provides the best
	utilization if all branches are independent.

	This description almost perfectly matches the problem at hand.
	First, we will probably have a fixed number of processors which
	can be determined at runtime. Second we have a list of work
	that needs to be done.  The hard part is implementing a sorted
	work list in order to obtain a better utilization.  This is
	implemented in what we term the Proc\_list, which is a list of
	the processes that either are currently running or have not yet
	started.  This list is primed with the information about the
	initial members of Q\_list, and for every new node put on the
	Q\_list, a node which contains information about the Q\_list
	node, is placed on the Proc\_list in the a\_before\_b() sorted
	order.


\section{Results}

	From the literature it is obvious that the current standard for
	calculating SMT's has been established by Cockayne and
	Hewgill.  Their work on SMT's has pushed the boundary of
	computation out from the 15-point problems of Winter to a large
	percentage of 100-point problems.

	Cockayne and Hewgill, in their investigation of the
	effectiveness of EDSTEINER89, randomly generated 100 problems
	with 100 points inside the unit square.  They set up a CPU
	limit of 12 hours, and 77 of 100 problems finished within that
	limit.  They described the average execution times as follows:
	T\_list construction averaged 209 minutes, Forest Management
	averaged 27 minutes, and Extraction averaged 10.8 minutes.

	The parallel algorithms we have outlined here have been
	implemented in a program called PARSTEINER94.  This
	implementation is only the second SMT program since Winter's
	GEOSTEINER in 1981 and is the first parallel code.  The major
	reason that the number of SMT programs is so small is that any
	implementation is necessarily complex.

	While preparing the code for this work, Cockayne and Hewgill
	were kind enough to supply us with 40 of the problems generated
	for \cite{ch92:icopsmt} along with their execution times.
	These data sets were given as input to PARSTEINER94, and the
	calculation timed.  For all 40 cases, the average time to
	generate T\_list was less than 20 minutes.  This is exciting
	because we have been able to generate T\_list properly, while
	cutting an order of magnitude off the time.  A sample of what
	T\_list for a 100 point problem looks like is presented in
	Figure~\ref{fig:t_list}, while the extracted answer is in
	Figure~\ref{fig:SMT}.  Table~\ref{table:tlist.times} contains a
	comparison of the times of the first 10 cases just described.

        \begin{figure}
        \centerline{\psfig{figure=tlist.ps,height=3.0in}}
       \caption{\label{fig:t_list}T\_list for a random set of points.}
        \end{figure}

        \begin{figure}
        \centerline{\psfig{figure=smt.ps,height=3.0in}}
        \caption{\label{fig:SMT}SMT extracted from T\_list for a random set of points.}
        \end{figure}

                \begin{table}
                \begin{center}
                \begin{tabular}{|r|r|r|}\hline
                Test Case & PARSTEINER94 & EDSTEINER89 \\ \hline \hline
                1 & 650      & 8597 \\ \hline
                2 & 1031     &13466  \\ \hline
                3 & 1047     &15872  \\ \hline
                4 & 1687     &17061  \\ \hline
                5 & 874      &13258  \\ \hline
                6 & 1033     &15226  \\ \hline
                7 & 1164     &12976  \\ \hline
                8 & 1109     &16697  \\ \hline
                9 & 975      &15354  \\ \hline
                10 & 554     &8650   \\ \hline
                \end{tabular}
                \end{center}
                \caption{\label{table:tlist.times}
                Comparison of T\_list computation times (in seconds).}
                \end{table}

\section{Conclusions and Future Work}

	These results are quite promising for various reasons.  First,
	the parallel implementation presented in this work is scalable,
	and therefore could be run with many more processors, thereby
	enhancing the speedup provided.  Second, with the PVM platform
	used, we can in the future port this work to a real parallel
	MIMD machine,  which will have much less communication
	overhead, or to a shared memory machine, where the
	communication could all but be eliminated.  In either case we
	would expect the speedup to improve much more.

	Enhancements are already under way to parallelize the
	Extraction process and thereby speed up the calculation even
	further.  A new method of parallelizing the T\_list generation
	is also currently being looked into which at first glance would
	appear to decrease the communication and would thereby allow
	the calculation to proceed with less delays.

\begin{thebibliography}{10}

\bibitem{bg89:tsnp}
M.~Bern and R.~Graham, {\em The shortest-network problem}, Sci. Am., 260
  (1989), pp.~84--89.

\bibitem{bs75:aivocspsftmnp}
W.~Boyce and J.~Seery, {\em {STEINER} 72 -- an improved version of {C}ockayne
  and {S}chiller's program {STEINER} for the minimal network problem}, Tech.
  Rep.~35, Bell Labs., Dept. of Computer Science, 1975.

\bibitem{ch86:ecosmtitp}
E.~Cockayne and D.~Hewgill, {\em Exact computation of {S}teiner minimal trees
  in the plane}, Info. Process. Lett., 22 (1986), pp.~151--156.

\bibitem{ch92:icopsmt}
\leavevmode\vrule height 2pt depth -1.6pt width 23pt, {\em Improved computation
  of plane {S}teiner minimal trees}, Algorithmica, 7 (1992), pp.~219--229.

\bibitem{cs72:cosmt}
E.~Cockayne and D.~Schiller, {\em Computation of {S}teiner minimal trees}, in
  Combinatorics, D.~Welsh and D.~Woodall, eds., Maitland House, Warrior Square,
  Southend-on-Sea, Essex SS1 2J4, 1972, Mathematical Institute, Oxford, Inst.
  Math. Appl., pp.~52--71.

\bibitem{cr41:wim}
R.~Courant and H.~Robbins, {\em What is Mathematics? an elementary approach to
  ideas and methods}, Oxford University Press, London, 1941.

\bibitem{hr92:survey}
F.~Hwang and D.~Richards, {\em Steiner tree problems}, Networks, 22 (1992),
  pp.~55--89.

\bibitem{hrw:annals}
F.~Hwang, D.~Richards, and P.~Winter, {\em The Steiner Tree Problem}, vol.~53
  of Ann. Discrete Math., North-Holland, Amsterdam, 1992.

\bibitem{jk34:omgodb}
V.~{Jarn\'{i}k} and O.~{K\"{o}ssler}, {\em O minim\'{a}lnich gratech
  obsahujicich n dan\'{y}ch bodu [in {C}zech]}, Casopis Pesk. Mat. Fyr., 63
  (1934), pp.~223--235.

\bibitem{melzak61:otpos}
Z.~Melzak, {\em On the problem of {S}teiner}, Canad. Math. Bull., 4 (1961),
  pp.~143--150.

\bibitem{qd85:ubspbbbba}
M.~Quinn and N.~Deo, {\em An upper bound for the speedup of parallel best-bound
  branch-and-bound algorithms}, BIT, 26 (1986), pp.~35--43.

\bibitem{winter85:aaftspitep}
P.~Winter, {\em An algorithm for the {S}teiner problem in the {E}uclidian
  plane}, Networks, 15 (1985), pp.~323--345.

\end{thebibliography}
\end{document}
