Vision system has become a very important part of an "intelligent" robotics system. Vision provides a robot with a sophisticated sensing mechanism that allows the machine to respond to its environment in an intelligent and flexible manner. The use of vision and other sensing schemes is motivated by the continuing need to increase the flexibility and scope of applications of robotics systems. Although proximity and touch sensor improve of robot performance, vision is recognized as the most powerful robot sensory capability.
Vision system defined as the process of extracting, characterizing, and interpreting information from images of a three-dimensional world. This process subdivided into six principal areas:
1) Sensing
2) Preprocessing
3) Segmentation
4) Description
5) Recognition
6) Interpretation
It is convenient to group these various areas to the sophistication involved in their implementation. Vision divided into three levels of processing:
i. Low
ii. Medium.
iii. High level vision
Vision system is divided into three fundamental tasks:
Image transformation is the process of electronically digitizing light images using image devices. An image device is the front end of a vision system, which acts as an image transducer to convert light energy to electrical energy. Compare this with humans, where the image device is the eye. In a vision system, the image device is a camera, photodiode array, charge-coupled device (CCD) array, or charge-injection device (CID) array.
The output of an image device is a continuous analog signal that is proportional to the amount of light reflected from an image. In order to analyze the image with a computer, the analog signals must be converted and stored in digital form. To this end, a rectangular image array is divided into small regions called picture elements, or pixels. With photodiodes or CCD arrays, the number of pixels equals the number of photodiodes or CCD devices. The pixel arrangement provides a sampling grid for an analog-to-digital (A/D) converter. At each pixel, the analog signal is sampled and converted to a botic digital value. With an 8-bit A/D converter, the converted pixel value will range from 0 for white to 255 for black. Different shades of gray are represented by values between these two extremes. This is the reason why the term gray level is often used in conjunction with the converted values. As the pixels are converted, the respective gray-level values are stored in a memory matrix, which is called a picture matrix.
A computer needs to locate the edges of an object in order to construct drawings of the object within a scene. The line drawings provide a basis for image understanding, as they define the shapes of objects that make up a scene. Thus, the basic reason for edge detection is that edges lead to line drawings, which lead to shapes, which lead to image understanding.
The edges are usually represented by the points that exhibit the greatest difference in gray-level values within a smoothed picture matrix. From calculus, it should be known that the slope of a step edge approaches infinity. Using this idea, all we have to do is to calculate the first derivative between adjacent gray-scale values, which is usually called the gradient. The technique is called pixel differentiation.
Lines might then be identified from the binary matrix that is thresholded. Some popular techniques for finding lines from an edge-point matrix are model matching, tracking, and template matching.
The final task of vision system is to interpret the information obtained during the image-analysis process. This is called image understanding, or machine perception.
Most of the image-understanding research is centered around the "blocks world."; The blocks world assumes that real-world images can be broken down and described by 2-D rectangular and triangular solids. Several Al-based image understanding programs, which can interpret real-world images, have been successfully written under this blocks-world assumption.