Glossary and definitions
Camera type
3D camera is the device which can capture three-dimensional images. It uses various techniques (such as stereo vision, time-of-flight, and structured light) to acquire information about the shape and depth of objects. 3D cameras can generate point cloud data with 3D spatial coordinate information, allowing computers to understand and process objects in three-dimensional space.
Active stereo camera is a system which combines binocular stereo vision and active lighting sources (such as structured light). It projects known light patterns onto the scene and uses a binocular camera to capture these pattern deformations, thereby improving the accuracy and robustness of depth measurement. Active stereo cameras perform exceptionally well in low light conditions and are suitable for 3D imaging in various complex environments.
Time-of-Flight (ToF) camera is a type of 3D camera that calculates distance by measuring the time difference between emitting and getting the reflected light signals. ToF cameras are widely used in fields such as robot navigation, gesture recognition, and 3D scanning. Based on their working principles and application scenarios, ToF cameras can be divided into the following types:
Direct ToF Cameras
Direct ToF cameras measure distance by calculating the time difference between light pulses emitted and received. These cameras typically use lasers or LEDs as light sources and have high precision and fast response times. Direct ToF cameras are suitable for depth measurement applications requiring high precision, such as industrial automation, robot navigation, and 3D scanning.
Indirect ToF Cameras
Indirect ToF cameras measure distance by detecting the phase shift of modulated light signals. These cameras usually use modulated infrared light sources and have good anti-interference capabilities and low power consumption. Indirect ToF cameras are widely used in consumer electronics products, such as smartphones, tablets, and game consoles, for applications like gesture recognition and facial recognition.
Pulse ToF Cameras
Pulse ToF cameras emit short light pulses and measure their return time to calculate distance. These cameras typically use laser pulses and have high precision and long measurement ranges. Pulse ToF cameras are suitable for long-distance measurement applications, such as drone mapping, autonomous driving, and security monitoring.
Stereo matching is a computer vision technique used to find corresponding pixel points between two images captured by a binocular stereo camera system. By comparing pixels in the left and right images and computing their disparity, the depth information can be inferred about objects in the scene. Stereo matching is a critical step in binocular stereo vision systems, affecting the accuracy and quality of depth maps.
Structured light is a technique used to calculate depth information by projecting known light patterns (such as stripes, grids, or dot matrices) onto a scene and analyzing the distortions of these patterns on object surfaces. A typical structured light system consists of a projector and a camera. By capturing and processing the deformed patterns, it generates high-precision three-dimensional images. Structured light technology is widely applied in 3D scanning, industrial inspection, gesture recognition, and other fields.
Camera widgets
Color sensor is a type of sensor capable of capturing and recording color images. It converts light signals into electrical signals through photodiodes and filter arrays, and then processes these signals to generate color images.
Lens assembly typically consists of one or more optical glass components arranged in a specific configuration. It usually comprises concave lenses, convex lenses, or combinations thereof. The lens assembly is used to converge and refract light rays, projecting them onto image sensor.
Image Signal Processor (ISP) is used for post-processing color images.
Mono sensor is a type of sensor capable of detecting and capturing light. It converts light signals into electrical signals through photodiodes and filter arrays, and then processes these signals to generate grayscale images.
Laser projector is a component used to emit structured light patterns.
Field-Programmable Gate Array (FPGA) is a type of signal processing chip that offers highly parallel processing capabilities. It can simultaneously execute numerous logical operations, making it extremely efficient for tasks requiring extensive calculations. In image processing applications such as image preprocessing, feature extraction, and image recognition, FPGAs can significantly improve efficiency through parallel processing techniques.
System on Chip (SoC) is a chip-level system, also referred to as an embedded system. It is an integrated circuit that integrates various electronic systems into a single chip. An SoC can handle digital signals, analog signals, mixed-signal processing, and even higher-frequency signals. SoC systems are commonly used in embedded systems.
IR floodlight is used to illuminate the environment with infrared light for infrared imaging purposes.
Camera Basics
Baseline is the distance between the two cameras in a stereo camera system. A longer baseline results in higher depth measurement accuracy but increases system complexity.
Disparity is the difference in position of the same object in left and right images in a binocular stereo camera system. Larger disparity indicates closer objects; smaller disparity indicates farther objects.
Field of View is the angle range of the scene that can be captured by the camera, typically expressed as horizontal and vertical angles. A larger field of view allows coverage of a wider area.
Measurement Range refers to the distance range within which the depth camera can accurately measure. Depth measurements beyond this range may be inaccurate or unmeasurable.
Clearance distance refers to the shortest vertical distance between the front surface of the camera and the close field of view.
Image resolution refers to the number of pixels a camera captures in an image, typically expressed as width × height. Higher resolution leads to clearer images.
Shutter is a device controlling exposure time in cameras. Faster shutter speeds result in shorter exposure times, suitable for capturing fast-moving objects.
Depth of field refers to the range of distances within which objects appear clearly focused in an image. A larger depth of field means more objects at different distances remain in focus.
The camera coordinate system is a three-dimensional coordinate system referenced to the camera, used to describe the position and direction of objects within the camera’s field of view.
Intrinsic parameters describe the camera’s internal optical characteristics, including focal length, principal point location, and distortion coefficients, used to project three-dimensional points onto a two-dimensional image plane.
Extrinsic parameters describe the camera’s position and orientation in the world coordinate system, including rotation matrices and translation vectors. They are used to convert world coordinate system points to camera coordinate system points.
Distortion coefficients describe the degree of lens distortion in the camera, including radial distortion and tangential distortion. They are used to correct geometric distortions in images.
Calibration is the process of determining a camera’s intrinsic and extrinsic parameters using specific algorithms and calibration boards, aiming to improve the accuracy of measurement and image quality.
Focal length refers to the distance from the camera lens’s optical center to the image sensor. It affects the field of view and magnification of the image.
Aperture is a device controlling the amount of light entering the camera lens. A larger aperture (smaller f value) allows more light in, suitable for low-light environments.
Camera Performance Metrics
The average deviation between measured distance values and true distance values in the Z-direction.
The degree of deviation of all pixel points in the central region of the field of view from an ideal plane.
The degree of oscillation of depth values for all pixel points in the central region of the field of view over time.
The actual physical distance corresponding to the spacing between pixels in the depth image (unit: mm).
Image Basics
Depth map is a 16-bit single-channel matrix consisting of depth data for all points within the depth camera’s field of view. For intuitive visualization of different distance values, the depth data is typically mapped to the RGB color space in SDK sample programs, resulting in an 8-bit RGB bitmap output.
Each pixel value in the depth map represents the vertical distance from a point on an object to the plane perpendicular to the axis of the left monochrome lens and passing through the lens optical center (the depth camera’s optical zero point). Depth data is measured in millimeters. A value of 0 indicates no depth data. Depth data does not include extrinsic parameters but provides intrinsic parameters for converting point cloud data. Active stereo cameras output depth data without distortion. TOF (Time-of-Flight) cameras output depth data with distortion.
Depth Map Data Format
Point cloud is a data matrix composed of point cloud information for all points within the depth camera’s field of view. Each point’s point cloud information consists of three-dimensional coordinates (x, y, z). Points without three-dimensional spatial information are represented as (x, y, 0).
Point cloud data format
Grayscale images (mono images) are outputs from monochrome image sensors. In the output of depth images, some models of Percipio 3D cameras produce processed grayscale images. The original grayscale image can be viewed by disabling the depth map output. Grayscale images are divided into left grayscale images and right grayscale images. Both contain intrinsic and distortion parameters. Because the left grayscale image and the depth image are in the same coordinate system, the left grayscale image doesn’t have extrinsic parameters. Active stereo cameras output left and right grayscale images. TOF (Time-of-Flight) cameras output left grayscale images.
Color images (RGB images) are outputs from color image sensors. Color image sensor components provide intrinsic, extrinsic, and distortion parameters. Different models of Percipio 3D cameras output different types of color images.
Sensors with hardware ISP modules output normal YUYV422/JPEG images, which can be displayed as proper color image after OpenCV processing.
Sensors without hardware ISP modules output RAW Bayer images, which may exhibit color bias. They need software ISP processing (e.g., white balance) to display as normal color images. Sensors without hardware ISP modules ensure that their output image data is synchronized with gray scale image data.
Outliers are abnormal outlier points appearing suddenly in depth images or point cloud surfaces due to noise, reflections, etc.
In ToF (Time-of-Flight) cameras, outliers are typically caused by multipath interference. Multipath interference refers to multiple emitted light rays passing through different reflection paths or one light ray being reflected multiple times into the same pixel of a TOF sensor, causing abnormal TOF depth images.
Noise refers to rough parts or random changes in image information (grayscale, brightness, etc.) in output images.
The ratio between signal (useful information in an image) and noise (random interference or unwanted information in the image). SNR is typically expressed in decibels (dB), with higher values indicating better image quality and less noise.
The camera’s color map and depth image data are output temporally synchronized.
Contrast refers to the difference in brightness or color between objects or areas in an image or display. It helps an object stand out against a background with different brightness or color.
High contrast: A greater difference between light and dark areas, resulting in sharper, more vivid images with stronger visual impact, though details may be lost, particularly in the highlights and shadows.
Low contrast: Smaller differences between light and dark areas, resulting in softer and less striking images but preserving more details.
Brightness refers to the intensity of light and the overall lightness or darkness in an image. It is an important factor affecting the visual effect of an image.
In digital imaging, brightness is typically represented by grayscale values. For example, in an 8-bit image, grayscale values range from 0 to 255, where:
0 represents complete black (no light).
255 represents complete white (maximum light intensity).
For color images, brightness can be calculated by weighting RGB (red, green, blue) values. A common formula is: Brightness = 0.299 * R + 0.587 * G + 0.114 * B. This reflects human eye sensitivity, with green contributing most to brightness, followed by red, and blue contributing the least.
Grayscale is a way to represent brightness information in image processing. A grayscale image has no color information; each pixel only has one intensity value representing brightness levels from black to white. Grayscale values usually range from 0 to 255, where 0 indicates black and 255 indicates white.
Image Processing
RGBD Registration refers to the process of aligning depth information with color information in an RGBD (Red-Green-Blue-Depth) image. There are two main approaches:
D2C (Depth To Color): This involves mapping each pixel in the depth image to its corresponding position in the color image based on the internal and external parameters of both the depth and color cameras. This produces an aligned RGBD image.
C2D (Color To Depth): This involves mapping each pixel in the color image to its corresponding position in the depth image based on the internal and external parameters of both the depth and color cameras. This also produces an aligned RGBD image.
Digital gain amplifies digital signals from the image sensor. Increasing digital gain will increase overall brightness in the image, but it also introduces more noise, leading to decreased image quality and increased granulation.
Analog gain amplifies analog signals from the image sensor. Increasing analog gain will also increase overall brightness in the image. Adjusting analog gain helps optimize the sensor’s adaptability to low-light environments while maintaining certain image quality standards.
Exposure time is the time the image sensor receives the light signal. Longer exposure times result in brighter images.
Automatically adjusts camera settings (such as shutter, aperture, ISO sensitivity) to ensure proper brightness and contrast in captured images.
The color temperature of the ambient light affects the appearance of color images. Auto White Balance compensates for color deviations caused by the ambient color temperature and the camera’s inherent color deviations by automatically adjusting the blue, green, and red color ratios so that the images obtained correctly reflect the true colors of the objects.
Auto Exposure Region of Interest (ROI): Dynamically adjusts exposure time based on brightness values within a specified region of the image.
Corrects image distortion caused by lens optical properties to bring the image closer to the actual scene.
Combines multiple images taken at different exposure times to preserve details in both bright and dark areas of the scene.
Software Components
Software Development Kit (SDK) is a collection of tools, libraries, documentation, and sample code designed to assist developers in creating applications or software systems. It provides the necessary resources for integrating third-party software into existing projects.
An Application Programming Interface (API) is a set of definitions and protocols used to build and integrate software applications. It enables communication and data exchange between different software systems.
Reliability
Laser safety class is a system used to categorize the potential hazards of laser equipment based on international standards (such as IEC 60825-1). Different classes indicate the potential risks to the eyes and skin, ranging from Class 1 (safe) to Class 4 (high risk). In 3D cameras, the laser safety class determines the required safety standards for users during operation.
Temperature drift refers to the phenomenon where changes in environmental temperature cause changes in the performance parameters of 3D cameras (such as focal length and depth measurement accuracy). Temperature drift can affect the accuracy and stability of the camera, so temperature compensation and calibration need to be considered in the design and use of 3D cameras.