|
Our research addresses the problem of visual motion analysis, by formulating it as a motion layers inference from a
noisy and possibly sparse point set in a 4-D space. Our approach is based on a layered 4-D representation of
data and a voting scheme for token communication, within a tensor voting computational framework. From a
possibly sparse input consisting of identical point tokens in two frames, the image position and potential velocity
of each token are encoded into a 4-D tensor. Within this 4-D space, moving regions are conceptually represented
as smooth surface layers, and are extracted through a voting process that enforces the smoothness constraint
while preserving motion discontinuities.
The key features of this approach are: 1) inference of a dense representation in terms of accurate velocities, motion boundaries and regions, without any a priori knowledge of the motion model, based on the smoothness of motion only; 2) consistent handling of both smooth moving regions and motion discontinuities; 3) integration of motion and monocular (intensity) cues for accurate segmentation; 4) a 4-D layered representation that allows for spatial separation of the points according to both velocities and image coordinates, thus letting tokens from the same layer to strongly support each other, while inhibiting influence from other layers, or from isolated tokens; 5) a non-iterative voting scheme, which does not require initialization and does not suffer from local optima or poor convergence problems, and whose only free parameter is the scale of analysis, an inherent characteristic of human vision. [Details]
|
|
This project is targeted at a real-time modular system for vision-based intelligent environments. We designed
and developed GlobeAll, a modular prototype based on an electronic pan-tilt-zoom camera array. The visual
input is acquired by a multiple-camera system, which generates a composite view of the scene with
a wide field of view (as a planar mosaic) and a view of the desired region of interest (as an
electronically-controlled virtual camera). By maintaining an adaptive background model in mosaic space, the
system segments the foreground objects as planar layers. Among them, targets are selected
and tracked by redirecting the virtual camera. An interpretation module analyzes the generated models
(segmented objects, trajectories), allowing for the detection of simple events.
Compared to other solutions, the key features of our system are: 1) acquisition of a large field of view, while also capturing enough resolution for focusing on a certain region of interest; 2) ability to perform pan-tilt-zoom operations electronically rather than mechanically; 3) better precision and response time in redirecting the region of interest; 4) low cost and high robustness, since it involves a digital solution, instead of using expensive and fragile mechanical or optical components. [Details]
|
| Created by: Mircea NICOLESCU (e-mail: mircea@cs.unr.edu) |