Visual Descriptors

The most common perceptual categories of visual descriptors for still images are color, texture, and shape. Image sequences add one more dimension of perceptual saliency to these: motion.

  • Color descriptors. The dominant color descriptor specifies a set of dominant colors for an image (typically 4–6 colors), and considers the percentage of image pixels each color is used in, as well as the variance and spatial coherence of the colors. The color structure descriptor encodes local color structure by utilizing a structuring element, visiting all locations in an image, and summarizing the frequency of color occurrences in each structuring element. The color layout descriptor represents the spatial distribution of colors. The scalable color descriptor is a compact color histogram descriptor represented in the HSV color space and encoded using Haar transform.
  • Texture descriptors. The homogeneous texture descriptor characterizes the regional texture using local spatial frequency statistics extracted by Gabor filter banks. The texture browsing descriptor represents a perceptual characterization of texture in terms of regularity, coarseness, and directionality as a vector. The edge histogram descriptor represents the local edge distribution of an image as a histogram that corresponds to the frequency and directionality of brightness changes in the image.
  • Shape descriptors. The region-based shape descriptor represents the distribution of all interior and boundary pixels that constitute a shape by decomposing the shape into a set of basic functions with various angular and radial frequencies using angular-radial transformation, a two-dimensional complex transform defined on a unit disk in polar coordinates. The contour-based shape descriptor represents a closed two-dimensional object or region contour in an image or video. The 3D shape descriptor is a representation-invariant description of three-dimensional mesh models, expressing local geometric attributes of 3D surfaces defined in the form of shape indices calculated over a mesh using a function of two principle curvatures.
  • Motion descriptors. The camera motion descriptor represents global motion parameters, which characterize a video scene in a particular time by providing professional video camera movements, including moving along the optical axis (dolly forward/backward), horizontal and vertical rotation (panning, tilting), horizontal and vertical trans-verse movement (tracking, booming), change of the focal length (zooming), and rotation around the optical axis (rolling). The motion activity descriptor indicates the intensity and direction of motion, and the spatial and temporal distribution of activities. The motion trajectory descriptor represents the displacement of objects over time in the form of spatiotemporal localization with positions relative to a reference point and described as a list of vectors. The parametric motion descriptor describes the global motion of video objects using a classic parametric model (translational, scaling, affine, perspective, quadratic).