RESEARCH


Smart Monitoring of Complex Public Scenes


The recent interest for surveillance in public, military and commercial applications is increasing the need to create and deploy intelligent semi-automated visual surveillance systems. The overall objective of this project is to develop a system that allows for robust and efficient coordination among robots, vision sensors and human guards in order to enhance surveillance in sensitive environments such as airports, federal buildings, railway stations or other public places. The system is structured hierarchically, with a central control node as the root, the monitored space being subdivided in regions with their local processing nodes, while at the bottom of the hierarchy there are conventional surveillance processing nodes (intelligent sensors) and mobile processing nodes represented by human personnel and robotic platforms. The technical goals of this project relate to developing: (1) algorithms for activity recognition using multiple cameras with potentially overlapping views; (2) techniques for object recognition and tracking; (3) an image and video retrieval system based on event-queries; (4) a multi-robot system provided with intelligent sensors (cameras and image understanding techniques), able to obtain high-resolution and high-level information regarding events occurring in the environment; (5) an effective human-robot interaction system that allows guards to coordinate their actions with the multi-robot system through portable devices. The problem of detecting and responding to threats through surveillance techniques is particularly well suited to a solution consisting of a team of multiple robots and human guards. For large environments, the distributed nature of such a team provides robustness and increased performance of the surveillance system. Including human interaction in all components of the system can significantly enhance the accuracy of the coordination and vision-based monitoring, while dramatically decreasing the workload of the human operators involved in surveillance applications.

Relevant Publications
  • Christopher King, Maria Valera, Raphael Grech, Robert Mullen, Paolo Remagnino, Luca Iocchi, Luca Marchetti, Daniele Nardi, Dorothy Monekosso, Mircea Nicolescu, "Multi-Robot and Multi-Camera Patrolling", Handbook on Soft Computing for Video Surveillance, Sankar Pal, Alfredo Petrosino, Lucia Maddalena (editors), Taylor & Francis, pages 255–286, January 2012.
  • Luca Iocchi, Dorothy Monekosso, Daniele Nardi, Mircea Nicolescu, Paolo Remagnino, Maria Valera Espina, "Smart Monitoring of Complex Public Scenes - collaboration between human guards, security network and robotic platforms", Proceedings of the AAAI Fall Symposium "Robot-Human Teamwork in Dynamic Adverse Environment, pages 14-19, Arlington, Virginia, November 2011.
  • Maria Valera Espina, Raphael Grech, Deon De Jager, Paolo Remagnino, Luca Iocchi, Luca Marchetti, Daniele Nardi, Dorothy Monekosso, Mircea Nicolescu, Christopher King, "Multi-Robot Teams for Environmental Monitoring", Innovations in Defence Support Systems – Intelligent Paradigms in Security, Springer-Verlag, pages 183-209, March 2011.
Videos
[Video1] [Video2]
Support
This work was supported by the Department of Homeland Security awards 2009-ST-108-000012 and 2010-ST-108-000021.


A Visual Traffic Surveillance Framework: Vehicle Classification to Event Detection


Visual traffic surveillance using computer vision techniques can be noninvasive, automated, and cost effective. Traffic surveillance systems with the ability to detect, count, and classify vehicles can be employed in gathering traffic statistics and achieving better traffic control in intelligent transportation systems. However, vehicle classification poses a difficult problem as vehicles have high intraclass variation and relatively low interclass variation. Five different object recognition techniques are investigated: principal component analysis (PCA)+difference from vehicle space, PCA+difference in vehicle space, PCA+support vector machine, linear discriminant analysis, and constellation-based modeling applied to the problem of vehicle classification. Three of the techniques that performed well were incorporated into a unified traffic surveillance system for online classification of vehicles, which uses tracking results to improve the classification accuracy. To evaluate the accuracy of the system, 31 minutes of traffic video containing multilane traffic intersection was processed. It was possible to achieve classification accuracy as high as 90.49% while classifying correctly tracked vehicles into four classes: cars, SUVs/vans, pickup trucks, and buses/semis. While processing a video, the system also recorded important traffic parameters such as the appearance, speed or trajectory of a vehicle. This information is subsequently used in an attribute-based search assistant tool in order to find relevant traffic events.

Relevant Publications
Videos
[Video1] [Video2]
Support
This work was supported by the Department of Homeland Security award 2010-ST-108-000021.

Understanding Intent Using a Novel Hidden Markov Model Representation


Understanding intent is an important aspect of communication among people and is an essential component of the human cognitive system. This capability is particularly relevant for situations that involve collaboration among multiple agents or with robotic systems, or detection of situations that can pose a particular threat. For surveillance or military applications, it is highly important to enable understanding the intent of relevant agents in the environment, from their current actions, before any attack strategies are finalized. The approach relies on a novel formulation of Hidden Markov Models (HMMs), which allows a robot to understand the intent of other agents by virtually assuming their place and detecting their potential intentions based on the current situation. This allows the system to recognize the intent of observed actions before they have been completed, thus enabling preemptive actions for defense. The system's capability to observe and analyze the current scene employs novel vision-based techniques for target detection and tracking, using a non-parametric recursive modeling approach.

Relevant Publications
Videos
[Video1] [Video2] [Video3]
Support
This work was supported by the Office of Naval Research awards N00014-06-1-0611 and N00014-09-1-1121, and by the National Science Foundation EPSCoR Ring True III award EPS0447416.


Segmentation for Videos with Quasi-Stationary Backgrounds - A Non-Parametric Approach



Video segmentation is one of the most important tasks in high-level video processing applications. Background modeling is the key to detection of foreground regions (such as moving objects - e.g., people, cars) in videos where camera is assumed to be stationary. However, possible changes in the background of the video such as waving flags, fluctuating monitors, water surfaces, etc. make it difficult to detect objects of interest in the scene. Due to the diverse nature of video applications it has been a main concern for researchers to design a general, scene-independent system. In this project we first propose a novel adaptive statistical method as a baseline system that addresses this issue. After investigating its performance we introduce a universal statistical technique which aims to overcome the weaknesses of its predecessor in modeling slow changes in the background. Finally, a new analytical technique is proposed in this work that approaches the problem of background modeling in a different direction. This technique is introduced in order to solve the limitations of statistical techniques which are bound to the accuracy of the probability density estimation. The performance of each of the proposed methods is studied and scenarios in which each of them leads to a better performance are investigated.

Relevant Publications
Videos
[Video1] [Video2] [Video3]
Support
This work was supported in part by a grant from the University of Nevada Junior Faculty Research Grant Fund and by NASA under grant NCC5-583.


Visual Awareness and Long-Term Autonomy for Robotic Assistants


A major challenge in deploying robots into the real world is the design of an architectural framework which can provide long-term, natural and effective interactions with people. Within this framework, key issues that need to be solved relate to the robots' ability to engage in interactions in a natural way, to deal with multiple users, and to be constantly aware of their surroundings. We propose a control architecture that addresses these issues. First, we endow our robot with a visual awareness mechanism, which allows it to detect when people are requesting its attention and try to engage it in interaction. Second, we provide the robot with flexibility in dealing with multiple users, such as to accommodate multiple user requests and task interruptions, over extended periods of time. In support of our robot awareness mechanism, we develop visual capabilities that allow the robot to identify multiple users, with multiple postures, in real-time, in dynamic environments in which both the robot and human users are moving. To enable long-term interaction, we design a control architecture which enables the representation of complex, sequential and hierarchical robot tasks.

Relevant Publications
Support
This work was supported by the Office of Naval Research award N00014-06-1-0611.


An Automatic Framework for Figure-Ground Segmentation in Cluttered Backgrounds



Grouping processes, which "organize" given data by eliminating irrelevant items and sorting the rest into groups, each corresponding to a particular object, can provide reliable pre-processed information to higher level vision functions, such as object detection and recognition. Here we consider the problem of grouping oriented segments in highly cluttered images. We developed a general scheme based on an iterative tensor voting approach which has been shown to improve segmentation results considerably. Segments are represented as second-order tensors and communicate with each other through a voting scheme that incorporates the Gestalt principles of visual perception. The key idea of the approach is to conservatively remove background segments using multi-scale analysis, and re-vote on the retained segments. This process results in better quality segmentations, especially under severe background clutter. Particularly remarkable, our experiments reveal that using this approach as a post-processing step to the boundary detection methods evaluated with the Berkeley dataset improves the results in 84% of the grayscale test images from this benchmark.

Relevant Publications

Voting-Based Computational Framework for Motion Analysis





Our research addresses the problem of visual motion analysis, by formulating it as a motion layers inference from a noisy and possibly sparse point set in a 4-D space. Our approach is based on a layered 4-D representation of data and a voting scheme for token communication, within a tensor voting computational framework. From a possibly sparse input consisting of identical point tokens in two frames, the image position and potential velocity of each token are encoded into a 4-D tensor. Within this 4-D space, moving regions are conceptually represented as smooth surface layers, and are extracted through a voting process that enforces the smoothness constraint while preserving motion discontinuities. The key features of this approach are: 1) inference of a dense representation in terms of accurate velocities, motion boundaries and regions, without any a priori knowledge of the motion model, based on the smoothness of motion only; 2) consistent handling of both smooth moving regions and motion discontinuities; 3) integration of motion and monocular (intensity) cues for accurate segmentation; 4) a 4-D layered representation that allows for spatial separation of the points according to both velocities and image coordinates, thus letting tokens from the same layer to strongly support each other, while inhibiting influence from other layers, or from isolated tokens; 5) a non-iterative voting scheme, which does not require initialization and does not suffer from local optima or poor convergence problems, and whose only free parameter is the scale of analysis, an inherent characteristic of human vision. [Details]

Relevant Publications
Support
This research has been funded in part by the Integrated Media Systems Center (IMSC), a National Science Foundation Engineering Research Center, Cooperative Agreement No. EEC-9529152, and by National Science Foundation Grant 9811883.


GlobeAll: Panoramic Video for an Intelligent Room


This project is targeted at a real-time modular system for vision-based intelligent environments. We designed and developed GlobeAll, a modular prototype based on an electronic pan-tilt-zoom camera array. The visual input is acquired by a multiple-camera system, which generates a composite view of the scene with a wide field of view (as a planar mosaic) and a view of the desired region of interest (as an electronically-controlled virtual camera). By maintaining an adaptive background model in mosaic space, the system segments the foreground objects as planar layers. Among them, targets are selected and tracked by redirecting the virtual camera. An interpretation module analyzes the generated models (segmented objects, trajectories), allowing for the detection of simple events. Compared to other solutions, the key features of our system are: 1) acquisition of a large field of view, while also capturing enough resolution for focusing on a certain region of interest; 2) ability to perform pan-tilt-zoom operations electronically rather than mechanically; 3) better precision and response time in redirecting the region of interest; 4) low cost and high robustness, since it involves a digital solution, instead of using expensive and fragile mechanical or optical components. [Details]

Relevant Publications
Support
This research has been funded in part by the Integrated Media Systems Center (IMSC), a National Science Foundation Engineering Research Center, Cooperative Agreement No. EEC-9529152, with additional support from the Annenberg Center for Communication at the University of Southern California and the California Trade and Commerce Agency. The support of the Philips Multimedia Center is also gratefully acknowledged.


Created by: Mircea NICOLESCU (e-mail: mircea@cse.unr.edu)