ROVIS Machine Vision System

From rovisWiki

Jump to: navigation, search

The ROVIS Machine Vision System is an open source software application, under development by the ROVIS research group, which embeds several machine vision algorithms and techniques for the purpose of scene and environment understanding. Basically, it provides a complete implementation of an object recognition and 3D reconstruction chain, that is, camera calibration, image acquisition, filtering and segmentation, object recognition and 3D scene reconstruction.

ROVIS is based on well known libraries, such as Open Computer Vision (OpenCV), Open Graphics Library (OpenGL), etc.

The source code of ROVIS can be downloaded from the subversion control (SVN) server http://rovis.unitbv.ro/rovis

The software is organized in the following four main sections:

Contents

Camera configuration and calibration

Figure 1. The ROVIS configuration menu.

This section provides a series of camera hardware configuration mechanisms, as well as image and video loading /saving operations. The main sections are:

  • Options - enables easy loading of static images stored on hard-disk memory. The user can load in the stack program memory from one monocular image up to 3 pairs of stereo images. Also the user can define an image Region of Interest (ROI) for subsequent visual processing.
  • Camera link – currently, the ROVIS software supports two types of cameras, namely Point Grey Bumblebee and Sony PTZ. The available interface allows real-time processing of video streams, as well as image / video acquisition.
  • Camera calibration – provides an easy and fast calibration interface for stereo cameras, as well as image rectification operations. The obtained intrinsic and extrinsic parameters, projection, or reprojection matrices can be stored into XML files for later usage.


2D image processing chain

Figure 2. Snapshot from the robust region segmentation algorithm.

This software section contains various 2D image processing algorithms which can be used to segment, recognize, classify, or track different objects of interest present in the images scene. The 2D image processing chain is structured as follows:

  • Pre-Processing - contains several image filters, such as median, gaussian, LoG or zero-cross, used to enhance image features.
  • Segmentation - segmentation is one of the most important steps in object detection. In ROVIS we provide two segmentation approaches:
    • Robust Region-based Segmentation – which segments image regions based on a feedback control approach of the thresholding operation.
    • Robust Boundary-based Segmentation – which uses the same feedback control approach to adjust the parameters of the Canny and Hough transform methods.
  • Feature Extraction –this section provides several methods for extracting different object characteristics (e.g. texture, color, ROI, area, perimeter, spatial and invariant moments, etc.) from a segmented image. The algorithms extract region and edge features, as well as corners.
  • Classification - the section provides a simple interface to statistical classifiers. Currently only the Minimum Distance Classifier is used within the ROVIS system.

3D scene reconstruction system

The main component of ROVIS is the 3D reconstruction system which enables depth information computation and the reconstruction of the camera’s pose. The 3D algorithms are spitted into:

  • Disparity map - calculated based on the well known Block Matching (BM) approach, enhanced through the inclusion of a feedback loop that controls depth segmentation.
  • Multi-view Reconstruction - which exploits the epipolar geometry of a stereo camera for finding its pose.
An ARToolKit marker.
  • Active 3D Tracking – for on-line 3D tracking of objects of interest. The component uses the 2D image processing chain described above.
  • ARToolKit Based Pose Estimation – uses the robust ARToolKit library for detecting marked objects in a static image or an image stream. Based on real dimension of the marker, the algorithm returns both the camera and marker poses.


Virtual 3D rendering engine

3D rendering engine.

The rendering engine is based on the OpenGL library and it is used to render in real-time the images environment. It provides a series of object primitives (e.g. points, spheres, cuboids, etc.) which can be used to model recognized objects the environment, as well as the observed obstacles in the form of depth maps.

Personal tools