Visual Perception

1-Different Types of Perception Sensing Methods

There are two types of sensing that perception sensors use to probe the environment:

  • active sensing

  • passive sensing

1.1-Active Sensing

Active sensing uses a source of energy to probe the environment. This technique works by emitting and detecting energy. Some of the perception sensors that work based on active sensing are:

  • Laser Range Finder (Lidar)

  • Time of Flight Camera:

    • Pulse Runtime Sensor

    • Phase Shift Continuous Wave Sensor

  • Ultrasonic Sensor

1.2-Passive Sensing

Passive sensing measures the energy that is already present and available in the environment. For instance, a camera captures the light or energy that is reflected from objects in the scene. Some of the passive sensors are:

  • Monocular Camera

  • Stereo Camera

  • RGB Camera

1.3-RGB-D Camera: Active and Passive combined

An RGB-D (D for Depth) camera combines the best of the active and passive sensor worlds in that it consists of a passive RGB camera along with an active depth sensor. An RGB-D camera, unlike a conventional camera, provides per-pixel depth information in addition to an RGB image. Traditionally, the active depth sensor is an infrared (IR) projector and receiver. Much like a Continuous Wave Time of Flight sensor, an RGB-D camera calculates depth by emitting a light signal on the scene and analyzing the reflected light, but the incident wave modulation is performed spatially instead of temporally. Here we can see an example of a standard RGB-D camera:

This is done by projecting light out of the IR transmitter in a predefined pattern and calculating the depth by interpreting the deformation in that pattern caused by the surface of target objects. These patterns range from simple stripes to unique and convoluted speckle patterns.

The advantage of using RGB-D cameras for 3D perception is that, unlike stereo cameras, they save a lot of computational resources by providing per-pixel depth values directly instead of inferring the depth information from raw image frames. In addition, these sensors are inexpensive and have a simple USB plug-and-play interface. RGB-D cameras can be used for various applications, ranging from mapping to complex object recognition.

Classification of different types of perception sensors.

1.4-Capturing Depth in an RGB-D Camera

Most RGB-D cameras use a technique called Structured Light to obtain depth information from a scene. In a stereo setup, we calculate depth by comparing the disparity or difference between the images captured by Left and Right cameras. In contrast to that, a structured light setup contains one projector and one camera. Here we can see how the depth map is generated from the RGB-D camera:

The projector projects a known pattern on the scene. This pattern can be as simple as a series of stripes or a complex speckled pattern. The camera then captures the light pattern reflected off of the objects in the scene. The perceived pattern is distorted by the shape of objects in the scene. By performing a comparison between the known projected pattern (which is saved on the sensor hardware) and the reflected pattern, a depth map is generated much like a stereo camera.

A depth image or a depth map is an image where each pixel provides its distance value relative to the sensor’s coordinate system.

Below is an example of a projection map that is placed over a scene in order to map out the correct pixel depths.

2-Visualizing Image Data with Point Clouds

Point Clouds are a method of representing depth information captured by 3D sensors, and they are an abstract data type responsible for storing data from an RGB-D sensor. Point clouds are digital representations of three dimensional objects. In practical implementations, they can also contain additional metadata for each point, as well as a number of useful methods for operating on the point cloud. Examples, of additional metadata might include:

  • RGB values for each point

  • Intensity values

  • Local curvature information

Additional methods provided might include:

  • Iterators for traversing points spatially

  • Methods for filtering the cloud based on particular properties

  • Operators for performing statistical analyses on the cloud

Nearly all 3D scanners or Lidars output data as high accuracy Point Clouds. Data from stereo cameras and RGB-D cameras can also be easily converted into Point Clouds. Point Clouds are used in numerous applications where 3D spatial information is a key component of the data. Some example applications include:

  • Depth sensor measurements

  • Models of real-world objects

  • The extent of a robot’s workspace

  • Environment maps

Consequently, Point Clouds are a convenient and useful way to represent spatial data. As for how point clouds are represented computationally, here is an example of the datatypes associated with each point of data from an RGB-D camera:

Attribute
Commonly used data type

x-coordinate

float

y-coordinate

float

z-coordinate

float

Red color value

Usigned 8-bit int

Green color value

Usigned 8-bit int

Blue color value

Usigned 8-bit int

Last updated