\chapter{Linear Octree-based Parallel Collision Detection Algorithm}

\section{Introduction}
        As presented in chapter 2, collision detection is a
        multi-faceted problem requiring individual solutions to 
        subsets of the collision detection problem. Current collision
        detection algorithms use a collection of algorithms to solve
        subset problems. The relationships between collision detection
        algorithms and how they can be used in conjunction with one
        another is best described as a pipeline. By optimizing each
        algorithm used in a collision detection pipeline, fast solutions
        to the general collision detection problem can be devised.

	This chapter describes the data structures and algorithms explored
	in the development of a parallel collision
	detection algorithm designed to solve the all-pairs problem. 

        This chapter is organized into four sections. In the first two
        sections, two octree based data-structures considered for use in
        a parallel collision detection algorithm are described. In the
        third section a new parallel collision detection algorithm for
        solving the all-pairs problem is presented. Finally, 
        an algorithmic analysis of the new algorithm is presented.

\section{Distributed Octree Data Structure}

	In the process of designing a parallel collision detection
	algorithm, two strategies were explored, the first of which
	involved pursuing a distributed octree implementation that would
	facilitate parallel processing for collision detection.

	A distributed octree is an octree data-structure in which
	the octants, and the objects they contain, are assigned to
	separate processors (ref fig). In this configuration, objects
	can be operated on concurrently by the processors that manage
	the octants within which the data are contained.

	A distributed octree data structure has several qualities that
	make it appropriate for implementing a parallel collision
	detection algorithm.

	\begin{enumerate}
	\item Decomposition of an octree hierarchy for distribution is
	easy to understand and implement as a modified
	pointer-based tree. In the description of pointer-based octrees
	from Chapter 2, an octree node stores pointers to its child
	nodes. To facilitate the distribution of octant nodes to multiple
	processors, pointers used for referencing child octants are
	generalized, enabling them to reference a processor that manages
	octants. From the implementor's perspective a distributed
	octree, the generalized pointer can be viewed as addresses used
	for communicating with processors that manage octants.

	\item Distributing an octree across a cluster of networked computers
	is conceptually the same as distributing an octree across the
	processors of a shared-memory, multi-processor computer. As a
	function of the structure of an octree, clearly defined boundaries
	between regions of space are created. As a result of the
	logical structure of an octree, octree data structures have
	clearly defined regions by which data are segregated. In an
	octree, these boundaries are defined by the volume of an octant,
	while in an octree data structure, the boundaries are defined
	by the tree nodes in which data are stored. This feature allows
	tree nodes to be physically separated without compromising the
	logical structure of the octree. To distribute an octree data
	structure over a cluster of networked computers, pointers to
	octants are allowed to be a computer network address where child
	octants are located. 

	\item When an octree is distributed across a cluster of computers,
	the aggregate size of the octree data-structure can exceed the
	largest octrees that can be created on a single computer. This
	allows for larger octree to be constructed. The maximum size of
	an octree which can be created on a single computer is limited by
	the amount of random access memory needed for storing the octree
	data-structure. By distributing octants to several computers,
	the memory from each computer contributes to the total space
	available for storing the octree data-structure.

	\item Distribution of data processing is a function of storing data
	in octants. As objects are placed in octants that bind their
	position and volume, they are accordingly distributed among the
	processors that manage the octants.

	\item The complexity of distributing an octree across multiple
	processors does not grow as a function of the number of
	processors over which it is distributed. As a result of the
	structure of an octree, once a program has been written to handle
	a single distributed octant, the same program should scale to
	any number of distributed octants.
	\end{enumerate}

	The layout of a distributed octree can be visualized as follows:

		(insert picture of octree sub-hierarchies on processors)

	The concept behind the first attempted parallel collision
	detection algorithm  was to use a distributed octree to partition
	and store objects on multiple processors. Each processor would
	then perform collision detection between objects they stored
	locally.

	However, after studying distributed octree implementations,
	several undesirable qualities were revealed that make
	distributed octrees unsuitable for use in a parallel collision
	detection algorithm. The two undesirable qualities
	are that distributed octrees require load balancing to remain
	efficient and, in some cases, distributed octrees must store
	the same object on multiple processors to build a tight fitting
	bounding volume for the object.

	\subsubsection{Load Balancing}
        One of the undesirable qualities of distributed octrees which makes
        them unsuitable for parallel collision detection is that they
        need to be load-balanced in order to remain efficient. Load
        balancing is necessary when objects become concentrated to
        small regions of space. When this occurs, the objects also become
        concentrated to a small number of octants in the octree. In the
        case of distributed octrees, processors managing octants where
        concentrations of objects exist have a disproportionately high
        workload. As the level of concentration increases, performance
        gained through parallelism decreases.

	For a distributed octree to remain efficient, each processor
	must be assigned an equal number of objects.  This can be
	accomplished by load-balancing the distributed octree over the
	available processors. Load-balancing is performed using three
	operations: dynamic octant splitting, octant migration, and octant
	consolidation.

	Dynamic octant splitting is the process of subdividing octants
	at runtime to create additional levels in an octree. Dynamic
	octant splitting is performed by subdividing the volume of an
	octant into eight new octants and adding them to the octree. After
	which objects contained in recently split octants are inserted
	into the newly created octants (ref figure). The goal of dynamic
	octant splitting is to create additional subdivisions of a
	volume to further segregate objects. If dynamically splitting an
	octant fails to adequately partition a concentration of objects,
	the procedure is repeated until the objects are sufficiently
	segregated. Octants containing high concentrations of objects
	are candidates for dynamic octant splitting.

	(insert figure of dynamic node splitting)

	Octant migration is the process of moving octants from one
	processor to another (ref figure). To move an octant between
	processors, the definition of the octant and the data the octant
	contains are reassigned to a processor. After 
	an octant is reassigned, all pointers that reference the
	octant are updated to reflect the change of management. Depending
	on the architecture of the parallel computer a
	distributed octree is implemented for, octant migration can involve
	as little as reassigning octant pointers, as is the case on a
	multi-processor, shared-memory computer. When a distributed octree
	is implemented for a networked cluster of computers, octant migrations
	involve transporting octant definitions and data across a
	network connection.

	Combined, dynamic octant splitting and octant migration can
	be used to evenly distribute data throughout a distributed octree.

	As a result of repeated dynamic octant splitting and octant
	migration, the structure of a distributed octree becomes
	fragmented over processors and memory. Fragmentation reduces
	the performance of octree operations and consumes excessive
	amounts of memory. To repair a fragmented octree, octant
	consolidation is used to remove unnecessary octants from
	the octree.  Octants are consolidated based on the number of
	objects that exist in a branch of the octree.  If the child
	of an octant contains a small number of objects, the objects are
	moved into the parent and the child is removed. This
	process reduces the memory consumed by the data structure,
	decreases the number of levels in the octree, and decreases the
	separation distance between octant nodes in the parallel computer.

        An example of a distributed octree is described in [Adaptive,
        Multi-resolution Visualization of Large Data Sets Using a
        Distributed Memory Octree]. Although not designed specifically
        for collision detection, the distributed octree implementation
        discussed in this article can also be used for parallel
        collision detection. A distributed octree is used for storing
        and manipulating scientific data sets for visualization. By
        distributing a data set across multiple processors, operations
        on data can be performed in parallel. Due to the large amounts of
        data, a distributed octree is well suited for this type of application.

	\subsubsection{Object-Octant Membership} 
	Object-octant membership is a term that will be used to describe
	which octants an object intersects. The other undesirable
	quality of distributed octrees that makes them unsuitable for
	parallel collision detection is how they manage object-octant
	membership. The fundamental purpose of using an octree is to
	segregate objects based on their position. Octrees facilitate
	this purpose by creating a spacial relationship between objects
	based on their position.  In octrees, a spacial relationship
	exists between two objects that occupy a common octant.

	Several methods have been developed to determine the object-octant
	membership of an object. In this discussion, a search method
	will be used to illustrate the problems with distributed octrees.

	As discussed in chapter 2, a typical strategy for inserting data
	into an octree is performed by using a recursive traversal of
	an octree beginning at the root. The search recursively descends
	the branches of the octree seeking octants that intersect the
	object. Depending on how the algorithm terminates recursive
	descents into the octree, upon completion of the insertion routine,
	a reference to the object will have been added to nodes in the
	octree data structure that correspond to the location and volume
	of the object. The octants that correspond to the nodes in which
	a reference to the object was added define the object-octant
	membership of the object.

	As seen in (ref fig), when the volume of an object
	straddles octants managed by different processors, the object
	must be stored by each processor that manages octants the object
	intersects. 

	Storing objects on multiple processors is problematic for distributed
	octrees because, maintaining the state of an object requires  
	coordinating the processors on which the object is stored.

        Although the difficulties surrounding distributed octrees can
        be overcome, the cost of the solutions in terms of performance
        outweigh the benefit gained from parallelism of collision
        detection. For this reason, distributed octrees are unsuitable
        for use in a parallel collision detection algorithm.

\section{Linear Octree}

	The second data-structure considered for use in a parallel
	collision detection algorithm was a linear octree.

	A linear octree is not a tree-like data-structure, but rather an
	encoding scheme derived from the structure of an octree. Codes
	generated using a linear octree encoding scheme uniquely identify
	voxels positioned within an octree. These codes are used for
	describing three dimensional objects volumetrically based on the
	voxels that intersect the object. The set of codes that describes
	the volume of an object is called a linear octree. A linear octree
	is stored in list data-structure.

	The basic unit of a linear octree is an octal code, also called
	octnode code. An octal code is a sequence of numbers that, when
	read left to right, describe a path from the root of an octree
	to an octant. The numbers used in the sequence of an octal code
	correspond to the indices of octants, ranging from 0 to 7. The
	position of a number in sequence of an octal code corresponds to
	a level in the octree. For example, the octal code \{ 3, 1, 4 \}
	refers to an octant located on the third level of an octree (ref
	fig) and can be found by recursively descending into octants 3,
	followed by 1, and finally by 4.

        The sequence of numbers in an octal code can be conveniently
        stored in an unsigned integer by using the digit
        positions to store numbers from the octal code sequence.
        For example, the sequence \{ 3, 1, 4 \} can be represented by the
        integer 314.

        This encoding method was first introduced by Gargantini in [Gargantini
        - Linear Octrees for Fast Processing of three-dimensional
        objects: Computer Graphics and Image Processing, v. 20, no.
        4, p 365-374] and is also described in [Set operations
        between linear octrees] [Samet] and [Finding the Voxels
        Shared by a Ray and a Linear Octree]. 

	Linear octrees have several qualities that make them useful for
	parallel computing environments, the most useful of which is
	that a linear octree preserves the hierarchical nature of the
	data represented while avoiding the need for a pointer-based
	tree. Furthermore, the data representation of a linear octree
	is convenient for transporting by means of interprocess
	communications.

	Another useful quality of linear octrees is a unique property
	that octal codes exhibit. By sorting a list of octal codes
	represented as integers in ascending order, the resulting
	sequence is the post order traversal of an octree [samet]
	(ref fig). This property of octal codes is useful because it
	provides a method for merging disparate linear octrees into a
	single linear octree and is the cornerstone of the new algorithm
	presented in the next section.

	(insert fig showing post order traversal of octree)

\section{Parallel Linear Octree Collision Detection Algorithm}

	In this section a new collision detection algorithm is proposed
	that uses a linear octree for solving the all-pairs collision
	detection problem.  The algorithm has O(n lg n)/k
	performance, with n being the number of objects and k being the
	number of processors.

        The strategy behind the proposed algorithm is to construct the
        linear octrees of n objects in parallel using k processors and
        then merge the resulting octrees into a single linear octree on a
        single processor. In the description of this algorithm, 
        the following assumptions are made:

        - all objects are cubes 
	- the parallel program consists of one master process
		and k slave processes
	- each slave process either shares a single octree or all
		processes have an exact copy of the same octree
	- the number of objects is an even multiple of the number of
		processes

	\subsubsection{Algorithm outline}

	\begin{enumerate}
	\item The master process divides a list of n objects by k processes and
		distributes a list of n/k objects to each process. The master
		process then waits for all slaves to complete processing
		before execution is resumed.

	\item Each slave waits to receive a list of objects from the
		master. When the list of objects is received, each
		slave builds a linear octree for each object.
		The resulting octal-codes are paired with an index of
		corresponding objects from the main list of objects. All
		resulting octal-code/object-index pairs are stored in
		a single list.

	\item Each slave process sorts their list of octal-code/object-
		index pairs by the integer representation of octal-codes
		in ascending order.

	\item Each slave sends its list of octal-code/object-index pairs
		back to the master process. The slave process is complete.

	\item The master process receives k list of octal-code/object-
		index pairs from the slave processes.

	\item The master process merges k lists of octal-code/object-
		index pairs into a single list.

	\item The master process iterates through the list of octal-
		code/object-index pairs in ascending order and finds
		all pairs of octal-codes that share octants. The indices
		of objcts that correspond to octant-codes that share
		octants are stored in a list
	\end{enumerate}

\subsection{Algorithm Pseudo code}

	The algorithm can be summarized in two pseudo code programs:
	one for the master process and one for the slave processes.

\begin{verbatim}
Master Process()
Begin
  integer k := number of slaves
  address slave[ 0 to k ] := locations of slave processes
  cube object_list[ 0 to n ] := list of n objects 
  Pair<octalcode,integer> slave_lists[0 to k][] := NIL
  Pair<octalcode,integer> master_list[] := NIL
  Pair<integer,integer> colliding_pair_list[] := NIL
  octalcode smallest_octalcode := NIL

// Step 1
  sublist_size := sizeof(object_list[]) / k
  index := 0
  For each x := from 0 to k
  {
    Send( slave[x], object_list[ index to (index + sublist_size) ] )
    index := index + sublist_size
  }

// Step 5
  num_octal_codes := NIL
  For each x := from 0 to k 
  {
    Wait(slave[x])
    Receive(slave[x], slave_list[x])
  }


// Step 6
  master_list[] := MergeSortedLists(slave_lists[])

// Step 7
  index := 0
  For( object1 := 0 ... sizeof(master_list[]) - 1 )
  {
    For( object2 := object1 ... sizeof(master_list) ) 
    {
       if( CommonPath( master_list[object1], master_list[object2] ) == true ) 
       {
         colliding_pair[index] := object1,object2	
         index := index + 1
       }
       else 
       {
         break from inner for-loop
       }
  }
End

Slave Process
Begin
  cube object_list[] := NIL
  octalcode linear_octree[] := NIL
  Pair<octalcode,integer> slave_list[] := NIL
  address master := address of master process

// Step 2
  Wait( master )
  Receive( master, object_list ) 

  For each x := object_list[]
  {
    linear_octree[] := SearchOctree(object_list[x]) 
    For each y := linear_octree[]
    {
      slave_list[] append pair(linear_octree[y],x)
    }
  }

  Sort( slave_list[] )
  Send( master, slave_list )
End

Subroutine Send( address, data ) 
{
  Send "data" to a process at "address"
}

Subroutine Wait( address )
{
  Wait for communications from the process at "address"
}

Subroutine Receive( address, data )
{
  Receive "data" from the process located at "address" 
}

array Subroutine MergeSortedLists( "list of sorted lists" )
{
  Merges a "list of sorted lists" into a single "list"
  return "list"
}

Boolean CommonPath( octalcode1, octalcode2 )
{
  Determines if the octal codes share an octant.
  Two octal codes share an octant if both codes define
  octants on the same path through the octree.

  Return true if two octants are shared, otherwise return
  false.
}

array SearchOctree( cube )
{
  Determine the object-octant membership of an object return
  the octal codes of the octants this object intersects.
}

	Sort( octant[] )
	{
		 Sorts a list of octants in ascending order
	}
	\end{verbatim}

	\subsubsection{Bounding Cube Representations} 

	As this algorithm is used to solve the all-pairs collision detection
	problem, objects are approximated with a bounding volume. In 
	this algorithm, bounding cubes are used. Bounding cubes are
	used for two reasons. First, testing for intersection between two cubes
	is efficient. In the worst case, intersection between two cubes can
	be found with a maximum of 48 comparisons. Efficient intersection
	testing is important because the octree search algorithm performs
	intersection tests between cubes extensively. Second, cubes
	can be stored in memory efficiently. Storage efficiency is
	important because it reduces the overall memory requirements of
	the collision detection algorithm.

	\subsubsection{Octree Search}

	To build a linear octree for an object, the slave processes
	search an octree for a set of octants that, combined, form a 
	the bounding cube around the bounding cube representation of
	the object.

	\subsubsection{Octal code comparisons}

	The CommonPath subroutine is used to determine if two objects
	are colliding by determining if they share octants. The CommonPath
	subroutine is called "CommonPath" because octal codes that define
	similar paths through an octree have overlapping octants. The
	general algorithm for comparing two octal codes is a piecewise
	equivalency test between the sequences of numbers stored as
	octal codes. If any of the numbers between the sequences
	differ, then the octal codes identify disparate regions of space.
	Otherwise, the octal codes identify overlapping regions of space and
	the corresponding objects are considered for further collision
	testing.

	It is possible for two octal codes to define paths of different
	lengths. When this occurs, then octant indices are compared up
	to the maximum length of the the shorter octal code.

	In practice, the implementation of the general algorithm exploits
	the integer representation of octal codes. When two octal codes
	represented as integers differ in length, the longer of the two
	is truncated using a modulus and a subtraction operation. The
	octal codes can the be tested for equivalence using an integer
	comparison.

	\subsubsection{Post-order Traversal of a Linear Octree}

	The feature of linear octrees that makes the proposed algorithm
	efficient is the property of octal codes that, when sorted
	in ascending order, result is a list that defines a post
	order traversal of an octree. This feature is used to merge
	the linear octree representations of multiple objects into a
	single space. This is done by concatinating the linear octree
	representations of multiple objects and sorting the resulting
	list of octal codes. The sorted list of octal codes is used to
	determine all colliding pairs of objects using the CommonPath
	function. To search for all colliding pairs of objects, each
	octal code is tested for a common path with all subsequent octal
	codes in the list until an octal code identifies a non-overlapping
	region of space. At this point, no other octal codes need to be
	tested because no additional overlapping octants are represented
	in the list of octal codes. This is a result of the post-order
	traversal ordering of the list.

\subsection{Analysis of Algorithm}

	On the master processor, steps 1 and 5 of the algorithm run in
	linear time as a function of the data distributed to and collected
	from the slave processes. On each slave, because every object
	is passed to the search subroutine, the search time becomes a
	constant factor. Therefore, searching for n objects in step 2
	runs in linear time as a function of the number of objects. Also,
	on each slave, sorting n resultant octal codes takes $O(n log n)$
	time in the worst case.  In step 6, merging k lists of sorted
	octal codes is done in linear time.  Finally, step 7 runs in $O(n^2)$
	time in the worst case but, if objects are not allowed to remain
	in an intersecting state, then the running time is $O(n)$ on average
	as a result of the post-order traversal ordering of the list.

	Based on the individual run times of each step, the runtime
	performance of this algorithm will be bound by the sort performed
	by each slave processor. Therefore, the expected runtime of this
	algorithm is $O( n log n ) / k$.