Glossary and Definitions

Camera Types

3D Camera

3D camera is the device which can capture three-dimensional images. It uses various techniques (such as stereo vision, time-of-flight, and structured light) to acquire information about the shape and depth of objects. 3D cameras can generate point cloud data with 3D spatial coordinate information, allowing computers to understand and process objects in three-dimensional space.

Active Stereo Camera

Active stereo camera is a system which combines binocular stereo vision and active lighting sources (such as structured light). It projects known light patterns onto the scene and uses a binocular camera to capture these pattern deformations, thereby improving the accuracy and robustness of depth measurement. Active stereo cameras perform exceptionally well in low light conditions and are suitable for 3D imaging in various complex environments.

Time-of-Flight (ToF) Camera

Time-of-Flight (ToF) camera is a type of 3D camera that calculates distance by measuring the time difference between emitting and getting the reflected light signals. ToF cameras are widely used in fields such as robot navigation, gesture recognition, and 3D scanning. Based on their working principles and application scenarios, ToF cameras can be divided into the following types:

Direct ToF Cameras (dToF)

Direct ToF cameras measure distance by calculating the time difference between light pulses emitted and received. These cameras typically use lasers or LEDs as light sources and have high precision and fast response times. Direct ToF cameras are suitable for depth measurement applications requiring high precision, such as industrial automation, robot navigation, and 3D scanning.
Indirect ToF Cameras (iToF)

Indirect ToF cameras measure distance by detecting the phase shift of modulated light signals. These cameras usually use modulated infrared light sources and have good anti-interference capabilities and low power consumption. Indirect ToF cameras are widely used in consumer electronics products, such as smartphones, tablets, and game consoles, for applications like gesture recognition and facial recognition.
Pulse ToF Cameras (dToF)

Pulse ToF cameras emit short light pulses and measure their return time to calculate distance. These cameras typically use laser pulses and have high precision and long measurement ranges. Pulse ToF cameras are suitable for long-distance measurement applications, such as drone mapping, autonomous driving, and security monitoring.

Stereo Matching

Stereo matching is a computer vision technique used to find corresponding pixel points between two images captured by a binocular stereo camera system. By comparing pixels in the left and right images and computing their disparity, the depth information can be inferred about objects in the scene. Stereo matching is a critical step in binocular stereo vision systems, affecting the accuracy and quality of depth maps.

Structured Light

Structured light is a technique used to calculate depth information by projecting known light patterns (such as stripes, grids, or dot matrices) onto a scene and analyzing the distortions of these patterns on object surfaces. A typical structured light system consists of a projector and a camera. By capturing and processing the deformed patterns, it generates high-precision three-dimensional images. Structured light technology is widely applied in 3D scanning, industrial inspection, gesture recognition, and other fields.

Camera Components

Color Image Sensor

Color image sensor is a type of sensor capable of capturing and recording color images. It converts light signals into electrical signals through photodiodes and filter arrays, and then processes these signals to generate color images.

Lens

Lens assembly typically consists of one or more optical glass components arranged in a specific configuration. It usually comprises concave lenses, convex lenses, or combinations thereof. The lens assembly is used to converge and refract light rays, projecting them onto image sensor.

ISP

Image Signal Processor (ISP) is used for post-processing color images.

Monochrome Image Sensor

Monochrome Image Sensor, also known as Mono sensor, is a type of sensor capable of detecting and capturing light. It converts light signals into electrical signals through photodiodes and filter arrays, and then processes these signals to generate grayscale images.

Laser Projector

Laser projector is a component used to emit structured light patterns.

FPGA

Field-Programmable Gate Array (FPGA) is a type of signal processing chip that offers highly parallel processing capabilities. It can simultaneously execute numerous logical operations, making it extremely efficient for tasks requiring extensive calculations. In image processing applications such as image preprocessing, feature extraction, and image recognition, FPGAs can significantly improve efficiency through parallel processing techniques.

SoC

System on Chip (SoC) is a chip-level system, also referred to as an embedded system. It is an integrated circuit that integrates various electronic systems into a single chip. An SoC can handle digital signals, analog signals, mixed-signal processing, and even higher-frequency signals. SoC systems are commonly used in embedded systems.

IR Floodlight

IR floodlight is used to illuminate the environment with infrared light for infrared imaging purposes.

Camera Basics

Baseline

Baseline is the distance between the two cameras in a stereo camera system. A longer baseline results in higher depth measurement accuracy but increases system complexity.

Disparity

Disparity is the difference in position of the same object in left and right images in a binocular stereo camera system. Larger disparity indicates closer objects; smaller disparity indicates farther objects.

Field of View

Field of View is the angle range of the scene that can be captured by the camera, typically expressed as horizontal and vertical angles. A larger field of view allows coverage of a wider area.

Measurement Range & Clearance Distance

Measurement Range refers to the distance range within which the depth camera can accurately measure. Depth measurements beyond this range may be inaccurate or unmeasurable.

Clearance distance refers to the shortest vertical distance between the front surface of the camera and the close field of view.

Image Resolution

Image resolution refers to the number of pixels a camera captures in an image, typically expressed as width x height. Higher resolution leads to clearer images.

Shutter

Shutter is a device controlling exposure time in cameras. Faster shutter speeds result in shorter exposure times, suitable for capturing fast-moving objects.

Depth of Field

Depth of field refers to the range of distances within which objects appear clearly focused in an image. A larger depth of field means more objects at different distances remain in focus.

Camera Coordinate System

The camera coordinate system is a three-dimensional coordinate system referenced to the camera, used to describe the position and direction of objects within the camera’s field of view.

Intrinsic Parameters

Intrinsic parameters describe the camera’s internal optical characteristics, including focal length, principal point location, and distortion coefficients, used to project three-dimensional points onto a two-dimensional image plane.

Extrinsic Parameters

Extrinsic parameters describe the camera’s position and orientation in the world coordinate system, including rotation matrices and translation vectors. They are used to convert world coordinate system points to camera coordinate system points.

Distortion Coefficients

Distortion coefficients describe the degree of lens distortion in the camera, including radial distortion and tangential distortion. They are used to correct geometric distortions in images.

Calibration

Calibration is the process of determining a camera’s intrinsic and extrinsic parameters using specific algorithms and calibration boards, aiming to improve the accuracy of measurement and image quality.

Focal Length

Focal length refers to the distance from the camera lens’s optical center to the image sensor. It affects the field of view and magnification of the image.

Aperture

Aperture is a device controlling the amount of light entering the camera lens. A larger aperture (smaller f value) allows more light in, suitable for low-light environments.

Camera Performance Metrics

Z-Axis Accuracy

The average deviation between measured distance values and true distance values in the Z-direction.

Planarity

The degree of deviation of all pixel points in the central region of the field of view from an ideal plane.

Point Precision

The degree of oscillation of depth values for all pixel points in the central region of the field of view over time.

Point-to-Point Distance

The actual physical distance corresponding to the spacing between pixels in the depth map (unit: mm).

Image Basics

Depth Map

Depth map is a 16-bit single-channel matrix consisting of depth data for all points within the depth camera’s field of view. For intuitive visualization of different distance values, the depth data is typically mapped to the RGB color space in SDK sample programs, resulting in an 8-bit RGB bitmap output.

Each pixel value in the depth map represents the vertical distance from a point on an object to the plane perpendicular to the axis of the left monochrome lens and passing through the lens optical center (the depth camera’s optical zero point). Depth data is measured in millimeters. A value of 0 indicates no depth data. Depth data does not include extrinsic parameters but provides intrinsic parameters for converting point cloud data. Active stereo cameras output depth data without distortion. ToF (Time-of-Flight) cameras output depth data with distortion.

Depth Map Definition

Depth Map Data Format

Point Cloud

Point cloud is a data matrix composed of point cloud information for all points within the depth camera’s field of view. Each point’s point cloud information consists of three-dimensional coordinates (x, y, z). Points without three-dimensional spatial information are represented as (x, y, 0).

Point cloud data format

Grayscale Image

Grayscale images (mono images) are outputs from monochrome image sensors. In the output of depth maps, some models of Percipio 3D cameras produce processed grayscale images. The original grayscale image can be viewed by disabling the depth map output. Grayscale images are divided into left grayscale images and right grayscale images. Both contain intrinsic and distortion parameters. Because the left grayscale image and the depth map are in the same coordinate system, the left grayscale image does not have extrinsic parameters. Active stereo cameras output left and right grayscale images. ToF (Time-of-Flight) cameras output left grayscale images.

Color Image

Color images (RGB images) are outputs from color image sensors. The color image sensor component provides intrinsic, extrinsic, and distortion parameters. Different models of Percipio 3D cameras output different types of color images.

Sensors with hardware ISP modules output normal YUYV422/JPEG images, which can be displayed as proper color image after OpenCV processing.
Sensors without hardware ISP modules output RAW Bayer images, which may exhibit color bias. They need software ISP processing (e.g., white balance) to display as normal color images. Sensors without hardware ISP modules ensure that their output image data is synchronized with gray scale image data.

Outliers

Outliers are abnormal outlier points appearing suddenly in depth maps or point cloud surfaces due to noise, reflections, etc.

In ToF (Time-of-Flight) cameras, outliers are typically caused by multipath interference. Multipath interference refers to multiple emitted light rays passing through different reflection paths or one light ray being reflected multiple times into the same pixel of a ToF sensor, causing abnormal ToF depth maps.

Noise

Noise refers to rough parts or random changes in image information (grayscale, brightness, etc.) in output images.

Signal-to-Noise Ratio (SNR)

Signal-to-Noise Ratio (SNR) refers to the ratio between signal (useful information in an image) and noise (random interference or unwanted information in the image). SNR is typically expressed in decibels (dB), with higher values indicating better image quality and less noise.

RGBD Synchronization

The camera’s color image and depth map data are output temporally synchronized.

Contrast

Contrast refers to the difference in brightness between the brightest and darkest areas of an image. Contrast is one of the key indicators of image quality. It affects the sharpness, detail rendering and overall visual effect of the image.

High contrast: A greater difference between the bright and dark areas in the image, resulting in sharper, more vivid representation. while images with high contrast tend to have stronger visual impact, details may be compromised, especially in the highlights and shadow areas.
Low contrast: A smaller differences between the bright and dark areas in the image, resulting in softer and less striking representation. While images with low contrast tend to be darker and more blurred, they maintains more details.

Brightness

Brightness is a measure of the intensity of light and the overall lightness in an image. It is an important factor affecting the visual effect of an image. Brightness is not just about grayscale values; it can also be influences by the contrast and color of image.

In digital imaging, brightness is typically represented by grayscale values. For example, in an 8-bit image, grayscale values range from 0 to 255, where:

0 represents complete black (no light).
255 represents complete white (maximum light intensity).
A value larger than 0 and smaller than 255 represents varying degrees of gray.

For color images, brightness can be calculated by weighting RGB (red, green, blue) values. A common formula is: Brightness = 0.299 * R + 0.587 * G + 0.114 * B. This reflects human eye sensitivity, with green contributing most to brightness, followed by red, and blue contributing the least.

Grayscale

Grayscale is a way to represent brightness information in image processing. A grayscale image has no color information; each pixel only has one intensity value representing brightness levels from black to white. Grayscale values usually range from 0 to 255, where 0 indicates black and 255 indicates white.

Image Processing

RGBD Registration

RGBD Registration refers to the process of aligning depth information with color information in an RGBD (Red-Green-Blue-Depth) image. There are two main approaches:

D2C (Depth To Color): This involves mapping each pixel in the depth map to its corresponding position in the color image based on the internal and external parameters of both the depth and color cameras. This produces an aligned RGBD image.
C2D (Color To Depth): This involves mapping each pixel in the color image to its corresponding position in the depth map based on the internal and external parameters of both the depth and color cameras. This also produces an aligned RGBD image.

Digital Gain

Digital gain amplifies digital signals from the image sensor. Increasing digital gain will increase overall brightness in the image, but it also introduces more noise, leading to decreased image quality and increased granulation.

Analog Gain

Analog gain amplifies analog signals from the image sensor. Increasing analog gain will also increase overall brightness in the image. Adjusting analog gain helps optimize the sensor’s adaptability to low-light environments while maintaining certain image quality standards.

Exposure Time

Exposure time is the time the image sensor receives the light signal. Longer exposure times result in brighter images.

Auto Exposure (AE)

Auto Exposure (AE) refers to automatically adjusting camera settings (such as shutter, aperture, ISO sensitivity) to ensure proper brightness and contrast in captured images.

Auto White Balance (AWB)

The color temperature of the ambient light affects the appearance of color images. Auto White Balance compensates for color deviations caused by the ambient color temperature and the camera’s inherent color deviations by automatically adjusting the blue, green, and red color ratios so that the images obtained correctly reflect the true colors of the objects.

Auto Exposure ROI

By automatically adjusting the exposure time based on the brightness value within the ROI area of the color image, the brightness value within the ROI area can reach a normal level.

Undistortion

Undistortion corrects image distortion caused by lens optical properties to bring the image closer to the actual scene.

High Dynamic Range (HDR)

High Dynamic Range (HDR) combines multiple images taken at different exposure times to preserve details in both the bright and dark areas of a scene. This feature is primarily suited for high-contrast imaging scenes.

Software Components

SDK

Software Development Kit (SDK) is a comprehensive collection of tools, libraries, documentation, and sample code designed to assist developers in developing applications or software systems.

API

Application Programming Interface (API) is a set of definitions and protocols designed for building and integrating software applications. It enables communication and data exchange between different software systems.

Reliability

Laser Safety Class

Laser safety class is a system used to categorize the potential hazards of laser equipment based on international standards (such as IEC 60825-1). Different classes indicate the potential risks to the eyes and skin, ranging from Class 1 (safe) to Class 4 (high risk). In 3D cameras, the laser safety class determines the required safety standards for users during operation.

Temperature Drift

Temperature drift refers to the phenomenon where changes in environmental temperature cause changes in the performance parameters of 3D cameras (such as focal length and depth measurement accuracy). Temperature drift can affect the accuracy and stability of the camera, so temperature compensation and calibration need to be considered in the design and use of 3D cameras.