PART 1. IMAGE PROCESSING FUNDAMENTALS

Chapter 1. Introduction to Image Processing

1.1 Images

A black and white picture that consists of a rectangular (or other) array of rows of dots or grains known as picture elements, or pixels for short. Each pixel has a gray level between black to white. Such a picture is called a grayscale image. In the usual case each such dot has the same size. When the pixels each have a color, then the picture is called a color image. While there are several models for describing color, the most common one is a mixture of red, green and blue, known as RGB, where the proportions r, g and b of red, green and blue, respectively, form a particular color.

In this work we take images to be rectangular arrays of pixels where the number of rows M of pixels and number of columns N of pixels in an image are given to describe the size or resolution of the image. For example, an MxN = 256x256 image has 256 rows and 256 columns for a total of (256)(256) = 65,536 pixels. An MxN = 1024x1280 image has 1,310,720 pixels. A grayscale image typically uses gray levels that range from 0 (zero level intensity of light) to 255 (full intensity), which is the minimal photographic quality for black and white. Each such value can be given by one byte in a computer. Thus the 256x256 image requires 65,536 bytes to store, while the 1024x1280 image requires 1,310,720 bytes, assuming the values are stored as binary numbers. Figure 1.1 shows the coordinate system used in image processing where the discrete pixel locations correspond to (x,y), x is in the downward vertical direction and y is the the horizontal rightward direction. The pixel value is designated by f(x,y) or by f(m,n), 0 m M and 0 n N.

Figure 1.1 The Image Coordinate System

Color images require more storage than do those of gray level. Representing each of R (red), G (green) and B (blue) by a single byte yields a requirement of three bytes for each pixel and provides for (256)(256)(256) = 16,777,216 discrete colors, which is close to the number of different colors that humans can distinguish. An MxN = 256x256 color image of three bytes per pixel would then require 3(65,536) = 196,608 bytes. For an MxN = 1024x1280 color image the requirement is 3(1,310,720) = 3,932,160. It is clear that more colors and more pixels are more costly in computer storage and time to send on the Internet.







1.2 The Field of Image Processing

Imaging began in the 19th Century with photography and continued with x-rays, television and electronic scanning in the 20th Century. Image processing as a field of study began in the 1950s with pictures of the earth from high flying "spy" airplanes and then with pictures taken from orbiting satellites of areas of the earth's surface. Electronic pictures from space probes of surfaces of the planets and their moons followed in the 1970s and 1980s. Nowadays, image processing is used in medical diagnostics, forensics, biological microscopy, inspection of parts and materials, crop estimates, types of foliage, defense intelligence, and many other areas.

Generally, we may classify image processing according to three major purposes. These and a few subtopics with very brief descriptions are

i) image enhancement

- smoothing (making contiguous groups of pixel values more alike)

- sharpening (differentiating groups of pixel values more)

- contrasting (stretching the range of levels of gray)

- zooming and interpolation (dilating or contracting images)

ii) image restoration

- removing noise (taking away additive noise or short line noise)

- removing distortion (undoing distorting effects in the image)

- improving focus, removing blur (removing blurring)

- undoing other degradations (undoing other degrading processes present in image

formation)

- recovering objects (reducing shimmer, occlusion, etc.)

iii) analysis and interpretation

- segmenting (delineating objects or regions of an image)

- delineating boundaries (detecting/strengthening edges)

- detecting and connecting lines and pieces (forming boundaries)

- thinning and trimming lines (thinning lines and objects)

- skeletonizing objects (reducing objects to skeletonic lines)

- recognizing objects (assigning objects to classes of objects)

Some methods of image processing may belong to multiple types. For example, an image that has been degraded by noise can have the degradation removed by a process of image enhancement such as smoothing or median processing. Not all kinds of noise can be removed this way.

Another way of classifying image processing is

a) point transformations (transforming gray levels as a function of each pixel)

b) area processes (changing each pixel brightness as a function of the brightness of its

neighboring pixels)

c) frequency filtering (changing the power gain of two-dimensional frequency bands,

considering the image as a 2-dimensional intensity signal over x and y)

d) frame combination (add or subtract or otherwise merge frames, i.e., images, of the same

scene)

e) geometrical processing (inserting or removing distortion)

Frequency filtering is equivalent to processing with convolution masks as is done in Chapter 3. We will therefore use mask convolution in Chapter 4 as equivalent to filtering in a way that can provide more information and better results in some cases than the type of mask convolution done in Chapter 3. There are also some newer approaches such as the employment of fuzzy rules and neural networks (see Chapter 9).

Figure 1.2 shows an input image with too few gray levels in the middle of the grayscale range. Figure 1.3 is an example of enhancement by increasing contrast (the stretching the grayscale range).

Figure 1.2. Low contrast "Lena." .................................................................................. Figure 1.3. High contrast "Lena."















Some applications of image processing involve focusing out-of-focus pictures, removing noise, smoothing, sharpening, exposing details (for example, in video images of robbers), target detection, combining frames from infrared and one or more bands of wavelength of the visible light spectrum, removing distortion from tomographic images where the patient's organs expanded and contracted during the scanning due to breathing and heart pulsation, creating advertising effects, inspection of equipment and so forth.

1.3 Monochrome Display of Images

Cathode Ray Tubes. The inside surface of the screen of a monochrome monitor is coated with a phosphoric material that converts the energy of colliding electrons into light emisison on the outside. This material is uniformly coated so that the light emitted is of a single color such as white, amber, or green. Any spot where there has been an absence of colliding electrons for a short time appears black to the viewer. The screen is the flattened end of a large vacuum tube that contains an electron gun at the other end. The gun's cathode is heated so that it emits a stream, or beam, of electrons toward the inside of the screen. Such a device is called a cathode ray tube (CRT).

Figure 1.4 shows the scheme. Electrons pass through two different pairs of parallel metallic plates, of which one pair is horizontal and the other is vertical. A voltage difference across each pair pulls the electrons up or down, left or right. The control signal voltages on the plate pairs are designed to force the electron beam to move across the screen in rows that are imperceptibly slanted downward and then return across to the left at a position one row down. This continues until all rows on the screen have been traced, at which time the beam tracing system is reinitialized to repeat the process. Such reinitialization causes a small delay. This fashion of painting the screen is call raster scanning.

Figure 1.4. The Picture Tube Scheme.

The denser the electron beam on any dot of phosphor, the brighter is the light emitted from that dot on the screen. If the beam density is sufficiently low, the dot appears black, while if it is at maximum level, the dot emits the maximum intensity of light and is white. The intensity signal f(t) is determined by the image data and f(t) controls the electron beam density at each instant that corresponds to a position in the raster scan. The timing of the raster scan control signal is such that a small dot, or area, say, of 0.28 mm diameter, is excited by electrons during a very small time interval t. Such a dot is called a pixel (for "picture element"). When the entire screen has been painted, that is, all pixels have been excited in order during the raster scan, we say one frame has been executed.

A graphics interface card connects into the computer bus (data bus, control bus, and power bus, where bus denotes a set of lines that carry signals). It converts binary values that represent the intensity level of the pixels into a voltage signal that controls the intensity of the electron gun at the specific times that particular pixels are being painted via the raster scan. In this manner, a temporary image is painted as a frame. The persistence is a property of the phosphor in that it keeps emitting light for a short time after excitation stops. The first pixel in a frame is still emitting light when the last pixel in the frame is being excited. Such light emission must decrease and be significantly less perceptible in a fraction of a second so that it does not garble symbols in the next frame.

Image Data and Data Rates. Rates greater than 44 frames per second are necessary to avoid the human perception of flickering. To achieve this rate, previous display systems traced alternate rows to the screen and then on the next scan wrote the rows in between. Thus 44 frames of half the number of rows were traced each second to avoid the flickering, although the actual rate was 22 full frames per second (a trick borrowed from the cinema industry). Such a method of scanning is called interlaced. Nowadays graphics systems are mostly noninterlaced and can display more than 44 frames per second, usually 60Hz (Herz, or cycles per second, which here means frames per second), 75 Hz, 90 Hz, 120 Hz or higher.

An 600x800 screen with 600 rows and 800 columns has 480,000 pixels. Let each grayscale pixel have an intensity value from 0 to 255 (one byte, or 8 bits). Then a file of 480,000 bytes is needed to store an image in binary. A stream of bytes is read from a file and written to the graphics memory on the graphics interface card and the values are used on the next scan. The image may not appear on the screen instantaneously. Some UNIX systems wait until all data is in graphics memory and then put it all on the screen in a single frame scan, so the image appears on the screen instantaneously. At 44 frames per second of 480,000 values of 8 bits each, this requires a graphics system that has a rate of 168,960,000 bits per second (168.96 Megabits per second or Mbps).

The process of writing an image file to the screen varies with the computer graphics system. Given an array of pixel values, say of bytes, of N columns and M rows, we consider the logical model of the image. The image is {f(m,n): 0 m M, 0 n N}, logically, where each f(m,n) is a digital value of 8 bits that is to be displayed at its particular location (m,n) on the screen. The 8-bit values allow 256 different intensities, or shades of gray, from 0 (black) to 255 (white). Figure 1.5 shows a pixel value f(m,n) at the pixel location (m,n).

In some C compilers, the functions putpixel(m,n) are provided for writing pixelwise to the screen and getpixel(m,n) gets the pixel value at screen location (m,n) for capturing the screen to a file or data structure. But this is the slow and difficult way to write an image to the screen. The operating system should have built-in functions that do this by the use of buffers in memory that store sections of the image data. We will use software that calls these built-in functions that are hidden to us.

Figure 1.5. Display of a pixel value

1.4 Display of Color Images

The display of a color image requires three different phosphors that respectively emit three independent colors of light when excited. It is known that any color can be obtained by adding the correct proportions of red (R), green (G) and blue (B). Color video display systems use this property. A color video monitor contains closely packed dots of three types of phosphor on the inside of the screen. Each pixel has an area composed of three dots: one each of R, G and B light emitting phosphor. Three electron guns are used of which one excites each of the red, green and blue phosphor dots. The R gun, for instance, is set to scan along the R phosphor dots, while the others each trace along their particular dots. The pixel color is determined by the combination of intensities of each of R, G and B. Equal intensities of R, G and B make a gray shade, from black to white.

A color image file consists of binary values for the pixels as does a monochrome image file. The difference is in the interpretation of these values. In a newer method, a binary value for a color image is broken into three parts to produce a binary value for each of the R, G and B electron guns. For example, in "true color" a pixel value is 3 bytes long. The first byte represents R, the second is for G and the third is for B. These files are quite large. As an example, an 800x600 image would take 480,000x3 = 1.44 MB (Megabytes). However, each pixel can take one of 224 = 16,777,216 colors. Each of R, G and B can take 28 = 256 intensities. For example 11111111 00000000 000000 is a pure red of the highest intensity, while 00001111 00001111 00001111 has equal intensities of 15 for each of R, G and B and so appears dark gray (equal values for RGB are always gray). A value of 24 bits of all 1's gives the highest intensity of gray, which is bold white.

Another common method is to use a single byte for each pixel value so that there are 256 colors. In this scheme each pixel value is actually the or address of one of 256 registers of 18 bits (6 for each of R, G and B). Functions can be used to put a set of 256 RBG colors in the 256 registers, so that 256 different color images can be displayed in a single image. While a single image can use only one set of 256 colors, the color set can be changed for another image (but not part of the way through a raster scan to put a single image on the screen). Because each 256 colors set can be selected from 218 = 262,144 different colors, the number of such color sets is (262,144)!/[256!(262,144 - 256)!], which is the number of ways 256 things can be selected from 262,144 unique things (a very large number). Thus one image can show 256 shades of various yellows for flowers and some greens for the stems and leaves, while another image may show various shades of blues and purples.

Figure 1.6 shows the registers. The colors fixed in a set of color registers are sometimes called a pallete. The number of pallets possible is the number of ways 256 colors at a time can be selected from 262,144. If one image is displayed in 256 colors and another is to be displayed after that in a different set of 256 colors, then the color registers must be given new color values (rewritten by a computer program function). Each color register contains the three R, G and B parts of 6 pixels each that control the three respective color gun intensities for exciting the respective set of R, G and B phosphor dots in the pixels. While the color values of 6 bits each are digital values, they are transformed by a digital-to-analog (D/A) converter into analog (continuous) signals to control the intensity of the color guns (see Figure 1.6).

As a result of different color standards, we must know whether a file is to be interpreted as grayscale or color, as well as what is the width and height of the image. If it is color, then the graphics software must know what the bits represent and how many bits are used per pixel. Different formats exist for image files to provide this information. Unfortunately, the number of color file formats (and their grayscale versions) and the complexity has grown beyond reasonable requirements. Before we process an image we will convert it into a raw data file. After processing, it can be translated into one of the more useful file formats if compression is needed to reduce the size.

Figure 1.6. VGA 256 Color Scheme

When when we process a color image, we often convert the RGB (red-green-blue) values into hue (H), saturation (S) and intensity (I), to be discussed later. The I part is devoid of color and can be procesed by grayscale processing methods. Then the HSI data are converted back into RGB for color display. Thus most of our processing will be done on grayscale images. The processing of RGB data of color data can yield shocking (eerie) results. For example, an image of a ripe reddish fruit could assume an erie puce (purplish brown) color.

1.5 Capturing Images

Figure 1.7. A photovoltaic diode......................................................................Figure 1.8. An Image Scanner



















The timing signals of the scanner cause the currents associated with the dot intensities along a horizontal row to be captured into digital values by shifting and latching, or writing to registers, which are sequences of bit-memory devices (flip-flops that can be set to 0 or 1). The picture then moves a slight amount in a direction perpendicular to the row of sensors and then another row of dots is captured similarly, and so forth until the entire page has been captured as digital data.

An image is often captured by passing a photograph through a scanner, which may or may not have color capability. It may also be captured directly from a video camera that digitizes the light signal into groups of bits. In the first case, a set of small optical sensors, or photovoltaic diodes, one of which is shown in Figure 1.7, is arranged to form a row as shown in Figure 1.8. Light rays from the tubular light source reflect from a horizontal line along the picture into the small optical sensors. Each single device receives the light reflected from a small dot area on the picture. Its average dot intensity is converted into current by the device.

The size of the detectors and their closeness together determine the sampling interval along the horizontal (row) spatial dimension. The resolution is measured in dots per inch (dpi). Many inexpensive scanners capture 600 dpi horizontally, but because they can move the picture a slight amount, they can obtain 1200 dpi vertically. The horizontal dots are often interpolated to a resolution of 1200 dpi by inserting a dot between each pair in a row. More expensive scanners interpolate to thousands of dpi. However, high resolution color images are slow in printing, displaying, transmitting on the Internet and take up large amounts of storage space.

A video signal from a video camera is fed into an interface capture card (a printed circuit card with logic chips installed) that is connected to the computer busses (data bus, control bus, and power bus), which then captures the sampled values as a stream of binary values by storing them in memory on the card. As the memory on the interface card fills with data, an associated computer program transfers the data into a buffer in computer memory for writing to a file on disk. A program can read the data from the hard disk and send it to the graphics card for display.

Figure 1.9 gives a higher level overview of image capture and display. Figure 1.10 shows a head writing electromagnetic pulses on a moving medium that is coated with ferrous material (disk or tape).

Figure 1.9. Image capture/display Figure 1.10. Reading/writing pulses



















1.6 Displaying Original and Processed Image Files with Showimage.tcl

A script program on the compact disk read-only-memory (CD-ROM) inside of the back cover displays images (before and after processing) and runs under any of the operating systems

Windows 95 (Microsoft)

Windows 98 (Microsoft)

Windows 2000 (Microsoft)

Windows NT 4.0 (Microsoft)

Linux (Red Hat, Caldera, Slackware, and others)

Solaris (Sun Microsystems, various versions)

other UNIX type operating systems



The program is called showimage.tcl. It is written in Tcl/Tk (Tool Command Language/Took Kit), an object oriented scripting language written by Prof. John Ousterhout of the University of California, Berkeley to be portable and to allow quick development of graphical user interfaces (GUIs). It is now being supported by Sun Microsystems and Scriptics. Like Java (Sun Microsystems), it is compiled into byte code which is interpreted at run time by the different versions of interpreters that run under Microsoft Windows (95, 98, 2000 and NT) and under Linux and other UNIX-type operating systems (such as Solaris).

Installing Tcl/Tk on Computers with Microsoft Windows. Tcl/Tk for Microsoft Windows can be downloaded from any of World Wide Web sites

ftp://ftp.sunlabs.com/pub/tcl/

http://www.scriptics.com

http://www.pconline.com/~erc/tclwin.htm

A recent version is Tcl8.2. Clicking on the appropriate link downloads a file such as tcl82.exe. Once downloaded, it self-extracts and sets up the Tcl/Tk libraries, compiler (for converting script programs to byte code) and the interpreter called wish (for windows shell). Usually it sets up under the Program Files directory on drive C: in a subdirectory called tcl.

A small window comes up after the installation that contains a wish icon. To make this icon a shortcut to the window shell wish, click on it the wish icon with the right mouse button and drag it outside of the small window to the screen and release. A message box will appear and one of the choices in it is Create Shortcut. Click on Create Shortcut and the shortcut icon will appear on the screen.

To run wish to interpret a Tcl/Tk script program, just click the left mouse button twice on the wish icon. A window will come up for wish with the cursor "%" waiting for input. Enter the commands

% source myprogram.tcl <ENTER>

where the boldface is the command that calls the compiler and myprogram is the name of the program.

Installing Tcl/Tk on Computers with Linux or other UNIX type Operating System. All UNIX type of operating systems have a version of Tcl/Tk already installed, but if the operating system is old, it may be preferable to download a current version. The downloading process is the same as that given above for windows, but look for a choice of Tcl/Tk for Linux/UNIX rather than for Microsoft Windows. The help of a system administrator may be needed to make it accessible from the command line, which must be in X-windows.

Installing Tcl/Tk from CD-ROM. To install Tcl/Tk from CD-ROM for either Windows or Linux (or other UNIX type operating systems), do the following.

i) put the CD-ROM in the CD-Drive and close the drive

ii) click on Start at the left bottom of the Windows screen

iii) click on Run

iv) click on Browse

v) select the CD-ROM drive and select Tcl80.exe under the Win directory (for Microsoft Windows) or else select Tcl82.exe under X-windows.

vi) click on Open

vii) either let Tcl/Tk be installed where it chooses, or select a different drive and directory



UNIX type operating systems will already have wish and Tcl/Tk installed in the directory /usr/local/tcl.

Running Tcl/Tk Programs. After installing Tcl/Tk, there should be a shortcut icon on the main Window of the screen in Microsoft Windows. Click the shortcut icon and a wish window will come up with a smaller window in it. Move the smaller window to the right side so the wish window will not cover it up when the windows shell wish window is in focus. It is the byte code compiler and interpreter.

Under Linux, go into the X-windows shell and then type wish. A wish window will come up on the screen (it looks like a regular Xterm window with "%" as the cursor). A smaller window also comes up in which the program will display its widgets (it will resize to hold everything). Left click on the top bar and move the smaller window to the right side so it does not overlap the wish window.

In either case above, the cursor will be "%" and you must type in

% source program_name.tcl <ENTER>

The command source loads the byte code compiler and interpreter. The boldface characters are to be typed in exactly as they appear. When the program program_name.tcl is loaded it will be immediately compiled to byte code (like Sun's Java). Then the byte code will be interpreted and will run or provide errors in the Tcl/Tk script.

In Windows we may run a Tcl/Tk program program_name.tcl by clicking on the Start button at the bottom left of the Windows screen, then click on Run and then on Browse to select the program program_name.tcl. Finally, click on Open and Windows will run the program with the Tcl/Tk byte code interpreter.

The program (script) that we want to run here is showimage.tcl which will display our images from a directory. We will use this to display images before and after processing them. The processing will be done by a small program written in that we will modify and recompile C. It will be made available by the instructor. Given an image called myimage.pgm, we will process it with a program called mdip.c (which will be compiled into a runtime module mdip.exe). The output file could be, for example, outimage.pgm.

Thus we type

% source showimage.tcl <ENTER>

A menu bar comes up and we click on Input_Image, then on Open_Image in the submenu that drops down. A Windows directory/file box comes up and we can choose the directory and image file that we want to display. Click on the OK button to return to the main menu bar. Next, click on Input_Image again and then on Display_Input. The image will be displayed on the screen.

To display one or two output images, click on Output_Images first and then on Open_Output1 and select the output image from the image processing program. After clicking OK, click on Output_Images and then Display_Output1. To display a second output image, click on Output_Images, Open_Output2 and then on Display_Output2.

It is possible to start again by clicking on the Input_Image and repeat the process. Thus it is more efficient to process an image multiple times or to process different images and have the files ready for viewing.

1.7 XV for Linux \UNIX Operating Systems

The XV Program. We use the xview (xv) program that runs under UNIX type operating systems (Linux, Ultrix, Solaris, Aix, etc. It is a shareware image processing program that was developed by John Bradley (at bradley@cis.upenn.edu), copyrighted in 1993 (and later dates for more recent versions). It provides the means to perform certain basic image processing. For our purposes, its main forte is the capability to display both color and grayscale images on the video monitor and to translate images between formats. It also has extensive capability for remapping colors in color images and for histogram transformation.

While it has some algorithmic capability, our main use of it will be to convert image files in *.tif, *.gif and *.jpg to *.pgm (raw) and *.ppm (raw).

Examples of some names of raw data files in the public domain that are used for comparisons in image processing are lena256.pgm, peppers512.pgm, shuttle.pgm and pollens.pgm. These can be found in the literature and on the World Wide Web, for example, but they are usually in the format of *.tif or *.gif.

Raw Image Data Files. The raw files that we will process, with names of the form filename.pgm, have a 4-line header. Below is an example of a header, followed by the binary bytes of the pixel values on the fifth line.

P5

# Picture by Johnny Brightlight, May, 1998, in the Andes Mountains

640 480

255

*)&&&$*!!!BCDBD......... ( pixel data begin on this line)

The first line tells the type of file. "P5" indicates a raw file of bytes of binary values that are packed together with no spaces in between. "P2" indicates ASCII files, where, e.g., the binary value 129 would not be given as 10000001, but would be denoted by the concatenated 3 bytes of ASCII code for "1" and "2" and "9" instead. A space would also appear between the consecutive ASCII coded numbers . The P5 (raw, packed) format requires only one byte per pixel, which is the reason we use it. The XV program that is resident under UNIX based computers will convert many file formats to these P5 (raw) data files.

The second line begins with "#" to indicate that what follows on the line is a comment. The third line provides N = number of columns in the image and M = number of rows in the image. The fourth line gives the maximum grayscale value, which is 255 here (binary 11111111). The fifth line is where the binary data begin. When a text editor is used to display these binary data, they are converted into their ASCII character equivalents for displaying, and thus have no meaning to us, unless we convert each character to its binary value.

Example 1. The following is a hexadecimal dump of the first several bytes of a PGM file. By comparing with the table of ASCII code (see Appendix 1 at the end of Chaper 1 for ASCII codes), we can decode the header bytes into ASCII characters and determine what the header says. We have put a space between consecutive hexadecimal bytes and have also broken the data into short lines for ease of reading.

50 35 0D 0A 23 20 43 52 45 41 54 4F 52 3A 20 58

56 20 56 65 72 73 69 6F 6E 20 33 2E 31 30 20 20

52 65 76 3A 20 31 32 2F 31 36 2F 39 34 0D 0A 32

35 36 20 32 35 36 0D 0A 32 35 35 0D 0A 9E A5 9E

........................................................................................

........................................................................................

The hexadecimal codes 0A and 0D represent respectively a line feed and a carriage return. They occur together in several places. The first hexadecimal byte in the file consists of two 4-bit hexadecimal digits (nibbles): 5016 0101 00002 = 8010. The byte 01010000 represents the letter "P" in ASCII (we have "P5" followed by a linefeed and carriage return). The first byte 5616 in the second line is the decimal number 8610, which is the letter "V" in ASCII (see Appendix 1a at the end of this chapter). What does the header say (see Exercise 1.5)?



1.8 Obtaining Images

For Linux and UNIX users, the image files can be copied from the instructors directory of images. Suppose that the instructor has a set of files in a subdirectory called images2 of his directory. To copy the files to your directory, create a subdirectory, say dip and change to it.

> mkdir dip <ENTER>

> cd dip <ENTER>

> cp ~instructor_name / images2 / *.pgm <ENTER>

Repeat copy command cp to copy all *.ppm (color) files as well. Images may also be retrieved from the Internet at various Web pages. We will refer to a few in the text when appropriate. These images have the *.gif or *.jpg (*.jpeg) formats and must be converted to raw data before we can process them.

1.9 Viewing Images in Linux with XV

We are now ready to use xview (XV), copyrighted by John Bradley in 1993 and at later dates (bradley@cis.upenn.edu). Type

dip> xv lena256.pgm<ENTER>

The 256x256 image appears on the screen. To get the control box, put the mouse cursor inside the image and right click. The xv control box appears below the image. On the right hand side, in a column, are many commands that can be executed by left clicking the mouse on them. For example, to double the size of the image, put the mouse pointer on the Image Size bar and left-click the mouse, hold down the button and drag the pointer down to the Double Size bar on the new pop-up menu and release the mouse button. The image will double in each dimension (x and y) and so will actually have an area 4 times as large as the original.

To move the image, put the mouse pointer in the top border of the image, hold down the left button and move (drag) the mouse pointer. The image moves with the mouse pointer. Upon releasing the mouse button, the image position becomes fixed. To exit xv, left-click anywhere in the xv controls window (if it is covered by images or other boxes) so that it comes to the front of the screen (comes in-focus). Then left-click on the Quit bar at the bottom right.



1.10 Exercises

1.1. Suppose a color image file has 1024 rows and 1024 columns, uses true color of 3 bytes per pixel (one byte for each of R, G and B). The display is to use 44 frames per second in noninterlaced mode. How many bits per second (bps) is required for the video signal to the monitor?

1.2. Consider the two Figures 1.2 and 1.3. Do their respective average pixel values necessarily differ? Do their variances of pixel values differ? Explain.

1.3. Log onto a computer sytem with a UNIX type operating system and make a directory for image processing ( $ mkdir images<ENTER> ). Change to that directory ( $ cd images<ENTER> ). Download the file lena256.tif using ftp on the Internet. To do this, first do: /images$ ftp pinon.cs.unr.edu<ENTER>. Then login with the login name "anonymous" and give your complete email address as password. Next, change directory via: ftp> cd pub/users/looney/images<ENTER>. Next, download any files via: ftp> get filename<ENTER>. Repeat for each file you want to download. After the transfer message, use: ftp> quit<ENTER> to exit ftp and return to your own directory. Check to see if you have the files in your images directory ( /images$ ls<ENTER> ). When you have them, display each with the xview (XV) utility (e.g., /images$ xv lena256.tif<ENTER> ). To end the display, left-click on the upper lefthand corner of the image and select Delete (or Kill on certain versions) from the pop-up menu.

1.4. After Exercise 1.3 above has been done and the image lena256.tif is displayed on the screen, convert it into the file lena256.pgm as follows. Put the mouse pointer inside the image display of lena256.tif and click the rightmost mouse button. The xv controls window will open up on the bottom of the screen. Put the mouse pointer on the screen bar that says Save and click the leftmost mouse button. Another window will appear on the screen. Click on the Format bar at the upper right (using the leftmost mouse button) and drag (hold the mouse button down while moving the pointer) downward to pbm/pgm (raw data). You can see the filename displayed near the bottom of this window as lena256.pgm before you click on the OK screen bar. After the file has been converted and saved, click the Quit bar at the right bottom of the xv controls window to exit XV. Now list the directory: /images$ ls -l<ENTER> ) and you should see the two files lena256.tif and lena256.pgm and their sizes.

1.5. Use the editor vi to view the raw data file lena256.pgm (from Exercise 1.4 above) as ASCII bytes on the screen of your computer ( /images$ vi lena256.pgm<ENTER> ). Write out the header on a piece of paper. What are the dimensions of the image? What is the maximum gray level?

1.6. Display lena256.pgm on the screen using XV. Right click in the displayed image to get the xv controls window. Now save the file as the postscript file lena256.ps. Send lena256.ps to the system postscript printer by left-clicking on the screen button Print and selecting appropriate items when requested. Compare the results with Figures 1.2 and 1.3. Describe the differences?



Appendix 1 - ASCII Code

Decimal Hexadecimal Symbol

0 0 NULL

1 1 SOH (Start of Heading)

2 2 STX (Start of Text)

3 3 ETX (End of Text)

4 4 EOT (End of Transmit)

5 5 ENQ (Enquiry)

6 6 ACK (Acknowledge)

7 7 BEL (Bell)

8 8 BS (Backspace)

9 9 HT (Horizontal Tab)

10 A LF (Linefeed)

11 B VT (Vertical Tab)

12 C FF (Formfeed)

13 D CR (Carriage Return)

14 E SO (Shift Out)

15 F SI (Shift In)

16 10 DLE (Data Line Escape)

17 11 DC1 (Device Control 1)

18 12 DC2 (Device Control 2)

19 13 DC3 (Device Control 3)

20 14 DC4 (Device Control 4)

21 15 NAK (Negative Acknowledge)

22 16 SYN (Synchronous Idle)

23 17 ETB (End of Transmit Block)

24 18 CAN (Cancel)

25 19 EM (End of Medium)

26 1A SUB (Substitute)

27 1B ESC (Escape)

28 1C FS (File Separator)

29 1D GS (Group Separator)

30 1E RS (Record Separator)

31 1F US (Unit Separator)

32 20 (Space)

33 21 !

34 22 "

35 23 #

36 24 $

37 25 %

38 26 &

39 27 '

40 28 (

41 29 )

42 2A *

43 2B +

44 2C ,

45 2D - (Dash)

46 2E . (Period)

47 2F /

48 30 0

49 31 1

50 32 2

51 33 3

52 34 4

53 35 5

54 36 6

55 37 7

56 38 8

57 39 9

58 3A :

59 3B ;

60 3C <

61 3D =

62 3E >

63 3F ?

64 40 @

65 41 A

66 42 B

67 43 C

68 44 D

69 45 E

70 46 F

71 47 G

72 48 H

73 49 I

74 4A J

75 4B K

76 4C L

77 4D M

78 4E N

79 4F O

80 50 P

81 51 Q

82 52 R

83 53 S

84 54 T

85 55 U

86 56 V

87 57 W

88 58 X

89 59 Y

90 5A Z

91 5B [

92 5C \

93 5D ]

94 5E ^ (Caret)

95 5F _ (Underline)

96 60 `

97 61 a

98 62 b

99 63 c

100 64 d

101 65 e

102 66 f

103 67 g

104 68 h

105 69 i

106 6A j

107 6B k

108 6C l

109 6D m

110 6E n

111 6F o

112 70 p

113 71 q

114 72 r

115 73 s

116 74 t

117 75 u

118 76 v

119 77 w

120 78 x

121 79 y

122 7A z

123 7B {

124 7C |

125 7D }

126 7E ~

127 7F DEL (Delete)