数字图像处理冈萨雷斯英文第三版指导手册

lingmeng

贡献于2015-01-10

字数:0 关键词: 图形/图像处理 手册

Instructor’s Manual Instructor’s Manual NOTICE This manual is intended for your personal use only. Copying, printing, posting, or any form of printed or electronic distribution of any part of this manual constitutes a violation of copyright law. As a security measure, this manual was encrypted during download with the serial number of your book, and with your personal information. Any printed or electronic copies of this file will bear that encryption, which will tie the copy to you. Please help us defeat piracy of intellectual property, one of the principal reasons for the increase in the cost of books. ------------------------------- Digital Image Processing Third Edition Instructor's Manual Version 3.0 Rafael C. Gonzalez Richard E. Woods Prentice Hall Upper Saddle River, NJ 07458 www.imageprocessingplace.com Copyright © 1992-2008 R. C. Gonzalez and R. E. Woods NOTICE This manual is intended for your personal use only. Copying, printing, posting, or any form of printed or electronic distribution of any part of this manual constitutes a violation of copyright law. As a security measure, this manual was encrypted during download with the serial number of your book, and with your personal information. Any printed or electronic copies of this file will bear that encryption, which will tie the copy to you. Please help us defeat piracy of intellectual property, one of the principal reasons for the increase in the cost of books. ------------------------------- Chapter 1 Introduction The purpose of this chapter is to present suggested guidelines for teaching mate- rial from Digital Image Processing at the senior and first-year graduate levels. We also discuss use of the book web site. Although the book is totally self-contained, the web site offers, among other things, complementary review material and computer projects that can be assigned in conjunction with classroom work. Detailed solutions to all problems in the book also are included in the remain- ing chapters of this manual. 1.1 Teaching Features of the Book Undergraduate programs that offer digital image processing typically limit cov- erage to one semester. Graduate programs vary, and can include one or two semesters of the material. In the following discussion we give general guidelines for a one-semester senior course, a one-semester graduate course, and a full- year course of study covering two semesters. We assume a 15-week program per semester with three lectures per week. In order to provide flexibility for exams and review sessions, the guidelines discussed in the following sections are based on forty, 50-minute lectures per semester. The background assumed on the part of the student is senior-level preparation in mathematical analysis, matrix the- ory, probability, and computer programming. The Tutorials section in the book web site contains review materials on matrix theory and probability, and has a brief introduction to linear systems. PowerPoint classroom presentation mate- rial on the review topics is available in the Faculty section of the web site. The suggested teaching guidelines are presented in terms of general objec- tives, and not as time schedules. There is so much variety in the way image pro- cessing material is taught that it makes little sense to attempt a breakdown of the material by class period. In particular, the organization of the present edition of 1 2 CHAPTER 1. INTRODUCTION the book is such that it makes it much easier than before to adopt significantly different teaching strategies, depending on course objectives and student back- ground. For example, it is possible with the new organization to offer a course that emphasizes spatial techniques and covers little or no transform material. This is not something we recommend, but it is an option that often is attractive in programs that place little emphasis on the signal processing aspects of the field and prefer to focus more on the implementation of spatial techniques. 1.2 One Semester Senior Course A basic strategy in teaching a senior course is to focus on aspects of image pro- cessing in which both the inputs and outputs of those processes are images. In the scope of a senior course, this usually means the material contained in Chapters 1 through 6. Depending on instructor preferences, wavelets (Chap- ter 7) usually are beyond the scope of coverage in a typical senior curriculum. However, we recommend covering at least some material on image compres- sion (Chapter 8) as outlined below. We have found in more than three decades of teaching this material to se- niors in electrical engineering, computer science, and other technical disciplines, that one of the keys to success is to spend at least one lecture on motivation and the equivalent of one lecture on review of background material, as the need arises. The motivational material is provided in the numerous application areas dis1.2 One Semester Senior Coursecussed in Chapter 1. This chapter was pre- pared with this objective in mind. Some of this material can be covered in class in the first period and the rest assigned as independent reading. Background re- view should cover probability theory (of one random variable) before histogram processing (Section 3.3). A brief review of vectors and matrices may be required later, depending on the material covered. The review material in the book web site was designed for just this purpose. Chapter 2 should be covered in its entirety. Some of the material (Sections 2.1 through 2.3.3) can be assigned as independent reading, but more detailed explanation (combined with some additional independent reading) of Sections 2.3.4 and 2.4 through 2.6 is time well spent. The material in Section 2.6 covers concepts that are used throughout the book and provides a number of image processing applications that are useful as motivational background for the rest of the book Chapter 3 covers spatial intensity transformations and spatial correlation and convolution as the foundation of spatial filtering. The chapter also covers a number of different uses of spatial transformations and spatial filtering for im- age enhancement. These techniques are illustrated in the context enhancement 1.2. ONE SEMESTER SENIOR COURSE 3 (as motivational aids), but it is pointed out several times in the chapter that the methods developed have a much broader range of application. For a se- nior course, we recommend covering Sections 3.1 through 3.3.1, and Sections 3.4 through 3.6. Section 3.7 can be assigned as independent reading, depending on time. The key objectives of Chapter 4 are (1) to start from basic principles of signal sampling and from these derive the discrete Fourier transform; and (2) to illus- trate the use of filtering in the frequency domain. As in Chapter 3, we use mostly examples from image enhancement, but make it clear that the Fourier trans- form has a much broader scope of application. The early part of the chapter through Section 4.2.2 can be assigned as independent reading. We recommend careful coverage of Sections 4.2.3 through 4.3.4. Section 4.3.5 can be assigned as independent reading. Section 4.4 should be covered in detail. The early part of Section 4.5 deals with extending to 2-D the material derived in the earlier sections of this chapter. Thus, Sections 4.5.1 through 4.5.3 can be assigned as independent reading and then devote part of the period following the assign- ment to summarizing that material. We recommend class coverage of the rest of the section. In Section 4.6, we recommend that Sections 4.6.1-4.6.6 be cov- ered in class. Section 4.6.7 can be assigned as independent reading. Sections 4.7.1-4.7.3 should be covered and Section 4.7.4 can be assigned as independent reading. In Sections 4.8 through 4.9 we recommend covering one filter (like the ideal lowpass and highpass filters) and assigning the rest of those two sections as independent reading. In a senior course, we recommend covering Section 4.9 through Section 4.9.3 only. In Section 4.10, we also recommend covering one filter and assigning the rest as independent reading. In Section 4.11, we recom- mend covering Sections 4.11.1 and 4.11.2 and mentioning the existence of FFT algorithms. The log2 computational advantage of the FFT discussed in the early part of Section 4.11.3 should be mentioned, but in a senior course there typically is no time to cover development of the FFT in detail. Chapter 5 can be covered as a continuation of Chapter 4. Section 5.1 makes this an easy approach. Then, it is possible to give the student a “flavor” of what restoration is (and still keep the discussion brief) by covering only Gaussian and impulse noise in Section 5.2.1, and two of the spatial filters in Section 5.3. This latter section is a frequent source of confusion to the student who, based on dis- cussions earlier in the chapter, is expecting to see a more objective approach. It is worthwhile to emphasize at this point that spatial enhancement and restora- tion are the same thing when it comes to noise reduction by spatial filtering. A good way to keep it brief and conclude coverage of restoration is to jump at this point to inverse filtering (which follows directly from the model in Section 5.1) and show the problems with this approach. Then, with a brief explanation 4 CHAPTER 1. INTRODUCTION regarding the fact that much of restoration centers around the instabilities in- herent in inverse filtering, it is possible to introduce the “interactive” form of the Wiener filter in Eq. (5.8-3) and discuss Examples 5.12 and 5.13. At a minimum, we recommend a brief discussion on image reconstruction by covering Sections 5.11.1-5.11-2 and mentioning that the rest of Section 5.11 deals with ways to generated projections in which blur is minimized. Coverage of Chapter 6 also can be brief at the senior level by focusing on enough material to give the student a foundation on the physics of color (Sec- tion 6.1), two basic color models (RGB and CMY/CMYK), and then concluding with a brief coverage of pseudocolor processing (Section 6.3). We typically con- clude a senior course by covering some of the basic aspects of image compres- sion (Chapter 8). Interest in this topic has increased significantly as a result of the heavy use of images and graphics over the Internet, and students usually are easily motivated by the topic. The amount of material covered depends on the time left in the semester. 1.3 One Semester Graduate Course (No Background in DIP) The main difference between a senior and a first-year graduate course in which neither group has formal background in image processing is mostly in the scope of the material covered, in the sense that we simply go faster in a graduate course and feel much freer in assigning independent reading. In a graduate course we add the following material to the material suggested in the previous section. Sections 3.3.2-3.3.4 are added as is Section 3.3.8 on fuzzy image processing. We cover Chapter 4 in its entirety (with appropriate sections assigned as inde- pendent readying, depending on the level of the class). To Chapter 5 we add Sec- tions 5.6-5.8 and cover Section 5.11 in detail. In Chapter 6 we add the HSI model (Section 6.3.2) , Section 6.4, and Section 6.6. A nice introduction to wavelets (Chapter 7) can be achieved by a combination of classroom discussions and in- dependent reading. The minimum number of sections in that chapter are 7.1, 7.2, 7.3, and 7.5, with appropriate (but brief) mention of the existence of fast wavelet transforms. Sections 8.1 and 8.2 through Section 8.2.8 provide a nice introduction to image compression. If additional time is available, a natural topic to cover next is morphological image processing (Chapter 9). The material in this chapter begins a transition from methods whose inputs and outputs are images to methods in which the in- puts are images, but the outputs are attributes about those images, in the sense defined in Section 1.1. We recommend coverage of Sections 9.1 through 9.4, and 1.4. ONE SEMESTER GRADUATE COURSE (WITH STUDENT HAVING BACKGROUND IN DIP)5 some of the algorithms in Section 9.5. 1.4 One Semester Graduate Course (with Student Hav- ing Background in DIP) Some programs have an undergraduate course in image processing as a prereq- uisite to a graduate course on the subject, in which case the course can be biased toward the latter chapters. In this case, a good deal of Chapters 2 and 3 is review, with the exception of Section 3.8, which deals with fuzzy image processing. De- pending on what is covered in the undergraduate course, many of the sections in Chapter 4 will be review as well. For Chapter 5 we recommend the same level of coverage as outlined in the previous section. In Chapter 6 we add full-color image processing (Sections 6.4 through 6.7). Chapters 7 and 8 are covered as outlined in the previous section. As noted in the previous section, Chapter 9 begins a transition from methods whose inputs and outputs are images to methods in which the inputs are images, but the out- puts are attributes about those images. As a minimum, we recommend coverage of binary morphology: Sections 9.1 through 9.4, and some of the algorithms in Section 9.5. Mention should be made about possible extensions to gray-scale images, but coverage of this material may not be possible, depending on the schedule. In Chapter 10, we recommend Sections 10.1 through 10.4. In Chapter 11 we typically cover Sections 11.1 through 11.4. 1.5 Two Semester Graduate Course (No Background in DIP) In a two-semester course it is possible to cover material in all twelve chapters of the book. The key in organizing the syllabus is the background the students bring to the class. For example, in an electrical and computer engineering cur- riculum graduate students have strong background in frequency domain pro- cessing, so Chapter 4 can be covered much quicker than would be the case in which the students are from, say, a computer science program. The important aspect of a full year course is exposure to the material in all chapters, even when some topics in each chapter are not covered. 1.6 Projects One of the most interesting aspects of a course in digital image processing is the pictorial nature of the subject. It has been our experience that students truly enjoy and benefit from judicious use of computer projects to complement the 6 CHAPTER 1. INTRODUCTION material covered in class. Because computer projects are in addition to course work and homework assignments, we try to keep the formal project reporting as brief as possible. In order to facilitate grading, we try to achieve uniformity in the way project reports are prepared. A useful report format is as follows: Page 1: Cover page. • Project title • Project number • Course number • Student’s name • Date due • Date handed in • Abstract (not to exceed 1/2page) Page 2: One to two pages (max) of technical discussion. Page 3 (or 4): Discussion of results. One to two pages (max). Results: Image results (printed typically on a laser or inkjet printer). All images must contain a number and title referred to in the discussion of results. Appendix: Program listings, focused on any original code prepared by the stu- dent. For brevity, functions and routines provided to the student are referred to by name, but the code is not included. Layout: The entire report must be on a standard sheet size (e.g., letter size in the U.S. or A4 in Europe), stapled with three or more staples on the left margin to form a booklet, or bound using clear plastic standard binding products.1.2 One Semester Senior Course Project resources available in the book web site include a sample project, a list of suggested projects from which the instructor can select, book and other images, and MATLAB functions. Instructors who do not wish to use MATLAB will find additional software suggestions in the Support/Software section of the web site. 1.7. THE BOOK WEB SITE 7 1.7 The Book Web Site The companion web site www.prenhall.com/gonzalezwoods (or its mirror site) www.imageprocessingplace.com is a valuable teaching aid, in the sense that it includes material that previously was covered in class. In particular, the review material on probability, matri- ces, vectors, and linear systems, was prepared using the same notation as in the book, and is focused on areas that are directly relevant to discussions in the text. This allows the instructor to assign the material as independent reading, and spend no more than one total lecture period reviewing those subjects. An- other major feature is the set of solutions to problems marked with a star in the book. These solutions are quite detailed, and were prepared with the idea of using them as teaching support. The on-line availability of projects and digital images frees the instructor from having to prepare experiments, data, and hand- outs for students. The fact that most of the images in the book are available for downloading further enhances the value of the web site as a teaching resource. NOTICE This manual is intended for your personal use only. Copying, printing, posting, or any form of printed or electronic distribution of any part of this manual constitutes a violation of copyright law. As a security measure, this manual was encrypted during download with the serial number of your book, and with your personal information. Any printed or electronic copies of this file will bear that encryption, which will tie the copy to you. Please help us defeat piracy of intellectual property, one of the principal reasons for the increase in the cost of books. ------------------------------- Chapter 2 Problem Solutions Problem 2.1 The diameter, x, of the retinal image corresponding to the dot is obtained from similar triangles, as shown in Fig. P2.1. That is, (d /2) 0.2 = (x/2) 0.017 which gives x = 0.085d . From the discussion in Section 2.1.1, and taking some liberties of interpretation, we can think of the fovea as a square sensor array having on the order of 337,000 elements, which translates into an array of size 580 × 580 elements. Assuming equal spacing between elements, this gives 580 elements and 579 spaces on a line 1.5 mm long. The size of each element and each space is then s =[(1.5mm)/1,159]=1.3 × 10−6 m. If the size (on the fovea) of the imaged dot is less than the size of a single resolution element, we assume that the dot will be invisible to the eye. In other words, the eye will not detect a dot if its diameter, d , is such that 0.085(d ) < 1.3 × 10−6 m, or d < 15.3 × 10−6 m. x/2xImage of the dot on the fovea Edge view of dot d d/2 0.2 m 0.017 m Figure P2.1 9 10 CHAPTER 2. PROBLEM SOLUTIONS Problem 2.2 Brightness adaptation. Problem 2.3 The solution is λ = c/v = 2.998 × 108(m/s)/60(1/s) = 4.997 × 106 m = 4997 Km. Problem 2.4 (a) From the discussion on the electromagnetic spectrum in Section 2.2, the source of the illumination required to see an object must have wavelength the same size or smaller than the object. Because interest lies only on the boundary shape and not on other spectral characteristics of the specimens, a single illu- mination source in the far ultraviolet (wavelength of .001 microns or less) will be able to detect all objects. A far-ultraviolet camera sensor would be needed to image the specimens. (b) No answer is required because the answer to (a) is affirmative. Problem 2.5 From the geometry of Fig. 2.3, (7mm)/(35mm)=(z)/(500mm),orz = 100mm. So the target size is 100 mm on the side. We have a total of 1024 elements per line, so the resolution of 1 line is 1024/100 = 10 elements/mm. For line pairs we divide by 2, giving an answer of 5 lp/mm. Problem 2.6 One possible solution is to equip a monochrome camera with a mechanical de- vice that sequentially places a red, a green and a blue pass filter in front of the lens. The strongest camera response determines the color. If all three responses are approximately equal, the object is white. A faster system would utilize three different cameras, each equipped with an individual filter. The analysis then would be based on polling the response of each camera. This system would be a little more expensive, but it would be faster and more reliable. Note that both solutions assume that the field of view of the camera(s) is such that it is com- 11 Intensity Intensity 0 255 0 255 (, )xy00 DG Equally spaced subdivisions (a) (b) Figure P2.7 pletely filled by a uniform color [i.e., the camera(s) is (are) focused on a part of the vehicle where only its color is seen. Otherwise further analysis would be re- quired to isolate the region of uniform color, which is all that is of interest in solving this problem]. Problem 2.7 The image in question is given by f (x,y )=i(x,y )r(x,y ) = 255e −[(x−x0)2+(y −y0)2] × 1.0 = 255e −[(x−x0)2+(y −y0)2] A cross section of the image is shown in Fig. P2.7(a). If the intensity is quan- tized using m bits, then we have the situation shown in Fig. P2.7(b), where G =(255 + 1)/2m . Since an abrupt change of 8 intensity levels is assumed to be detectable by the eye, it follows that G = 8 = 256/2m,orm = 5. In other words, 32, or fewer, intensity levels will produce visible false contouring. 12 CHAPTER 2. PROBLEM SOLUTIONS Intensity 0 63 127 191 255 Image quantized into four levels 255 191 127 63 Figure P2.8 Problem 2.8 The use of two bits (m = 2) of intensity resolution produces four intensity levels in the range 0 to 255. One way to subdivide this range is to let all levels between 0 and 63 be coded as 63, all levels between 64 and 127 be coded as 127, and so on. The image resulting from this type of subdivision is shown in Fig. P2.8. Of course, there are other ways to subdivide the range [0,255] into four bands. Problem 2.9 (a) The total amount of data (including the start and stop bit) in an 8-bit, 1024× 1024 image, is (1024)2 ×[8+2] bits. The total time required to transmit this image over a 56K baud link is (1024)2 × [8 + 2]/56000 = 187.25 sec or about 3.1 min. (b) At 3000K this time goes down to about 3.5 sec. Problem 2.10 The width-to-height ratio is 16/9 and the resolution in the vertical direction is 1125 lines (or, what is the same thing, 1125 pixels in the vertical direction). It is given that the resolution in the horizontal direction is in the 16/9proportion,so the resolution in the horizontal direction is (1125)×(16/9)=2000 pixels per line. The system “paints” a full 1125×2000, 8-bit image every 1/30 sec for each of the 13 Figure P2.11 red, green, and blue component images. There are 7200 sec in two hours, so the total digital data generated in this time interval is (1125)(2000)(8)(30)(3)(7200)= 1.166 × 1013 bits, or 1.458 × 1012 bytes (i.e., about 1.5 terabytes). These figures show why image data compression (Chapter 8) is so important. Problem 2.11 Let p and q be as shown in Fig. P2.11. Then, (a) S1 and S2 are not 4-connected because q isnotinthesetN4(p);(b)S1 and S2 are 8-connected because q is in the set N8(p);(c)S1 and S2 are m-connected because (i) q is in ND (p), and (ii) the set N4(p) ∩ N4(q) is empty. Problem 2.12 The solution of this problem consists of defining all possible neighborhood shapes to go from a diagonal segment to a corresponding 4-connected segments as Fig. P2.12 illustrates. The algorithm then simply looks for the appropriate match ev- ery time a diagonal segments is encountered in the boundary. Problem 2.13 The solution to this problem is the same as for Problem 2.12 because converting from an m-connected path to a 4-connected path simply involves detecting di- agonal segments and converting them to the appropriate 4-connected segment. Problem 2.14 The difference between the pixels in the background that are holes and pixels that are not holes is than no paths exist between hole pixels and the boundary of the image. So, the definition could be restated as follows: The subset of pixels 14 CHAPTER 2. PROBLEM SOLUTIONS Þ or Þ or Þ or Þ or Figure P2.12 of (RU )c that are connected to the border of the image is called the background. All other pixels of (RU )c are called hole pixels. Problem 2.15 (a) When V = {0,1}, 4-path does not exist between p and q because it is impos- sible to get from p to q by traveling along points that are both 4-adjacent and also have values from V . Figure P2.15(a) shows this condition; it is not possible to get to q. The shortest 8-path is shown in Fig. P2.15(b); its length is 4. The length of the shortest m- path (shown dashed) is 5. Both of these shortest paths are unique in this case. (b) One possibility for the shortest 4-path when V = {1,2} is shown in Fig. P2.15(c); its length is 6. It is easily verified that another 4-path of the same length exists between p and q. One possibility for the shortest 8-path (it is not unique) is shown in Fig. P2.15(d); its length is 4. The length of a shortest m-path (shown dashed) is 6. This path is not unique. 15 Figure P.2.15 Problem 2.16 (a) A shortest 4-path between a point p with coordinates (x,y ) and a point q with coordinates (s,t ) is shown in Fig. P2.16, where the assumption is that all points along the path are from V . The length of the segments of the path are |x − s| and y − t , respectively. The total path length is |x − s|+ y − t ,whichwe recognize as the definition of the D4 distance, as given in Eq. (2.5-2). (Recall that this distance is independent of any paths that may exist between the points.) The D4 distance obviously is equal to the length of the shortest 4-path when the length of the path is |x − s| + y − t . This occurs whenever we can get from p to q by following a path whose elements (1) are from V, and (2) are arranged in such a way that we can traverse the path from p to q by making turns in at most two directions (e.g., right and up). (b) The path may or may not be unique, depending on V and the values of the points along the way. Problem 2.17 (a) The D8 distance between p and q [see Eq. (2.5-3) and Fig. P2.16] is D8 p,q = max |x − s|, y − t . Recall that the D8 distance (unlike the Euclidean distance) counts diagonal seg- ments the same as horizontal and vertical segments, and, as in the case of the D4 distance, is independent of whether or not a path exists between p and q.As in the previous problem, the shortest 8-path is equal to the D8 distance when the path length is max |x − s|, y − t . This occurs when we can get from p to q 16 CHAPTER 2. PROBLEM SOLUTIONS Figure P2.16 by following a path whose elements (1) are from V , and (2) are arranged in such a way that we can traverse the path from p to q by traveling diagonally in only one direction and, whenever diagonal travel is not possible, by making turns in the horizontal or vertical (but not both) direction. (b) The path may or may not be unique, depending on V and the values of the points along the way. Problem 2.18 With reference to Eq. (2.6-1), let H denote the sum operator, let S1 and S2 de- note two different small subimage areas of the same size, and let S1 +S2 denote the corresponding pixel-by-pixel sum of the elements in S1 and S2,asexplained in Section 2.6.1. Note that the size of the neighborhood (i.e., number of pixels) is not changed by this pixel-by-pixel sum. The operator H computes the sum of pixel values in a given neighborhood. Then, H(aS1 + bS2) means: (1) mul- tiply the pixels in each of the subimage areas by the constants shown, (2) add the pixel-by-pixel values from aS1 and bS2 (which produces a single subimage area), and (3) compute the sum of the values of all the pixels in that single subim- age area. Let ap1 and bp2 denote two arbitrary (but corresponding)pixelsfrom aS1 +bS2.Thenwecanwrite H(aS1 +bS2)= p1∈S1 and p2∈S2 ap1 +bp2 = p1∈S1 ap1 + p2∈S2 bp2 = a p1∈S1 p1 +b p2∈S2 p2 = aH(S1)+bH(S2) 17 which, according to Eq. (2.6-1), indicates that H is a linear operator. Problem 2.19 The median, ζ, of a set of numbers is such that half the values in the set are below ζ and the other half are above it. A simple example will suffice to show that Eq. (2.6-1) is violated by the median operator. Let S1 = {1,−2,3}, S2 = {4,5,6},anda = b = 1. In this case H is the median operator. We then have H(S1 +S2)=median{5,3,9} = 5, where it is understood that S1 +S2 is the array sum of S1 and S2.Next,wecomputeH(S1)=median{1,−2,3} = 1andH(S2)= median{4,5,6} = 5. Then, because H(aS1 + bS2) = aH(S1)+bH(S2), it follows that Eq. (2.6-1) is violated and the median is a nonlinear operator. Problem 2.20 From Eq. (2.6-5), at any point (x,y ), g = 1 K K i=1 g i = 1 K K i=1 f i + 1 K K i=1 ηi . Then E{g } = 1 K K i=1 E{f i } + 1 K K i=1 E{ηi }. But all the f i are the same image, so E{f i } = f . Also, it is given that the noise has zero mean, so E{ηi } = 0. Thus, it follows that E{g } = f ,whichprovesthe validity of Eq. (2.6-6). To prove the validity of Eq. (2.6-7) consider the preceding equation again: g = 1 K K i=1 g i = 1 K K i=1 f i + 1 K K i=1 ηi . It is known from random-variable theory that the variance of the sum of uncor- related random variables is the sum of the variances of those variables (Papoulis [1991]). Because it is given that the elements of f are constant and the ηi are uncorrelated, then σ2 g = σ2 f + 1 K 2 [σ2 η1 + σ2 η2 + ···+ σ2 ηK ]. The first term on the right side is 0 because the elements of f are constants. The various σ2ηi are simply samples of the noise, which is has variance σ2η.Thus, 18 CHAPTER 2. PROBLEM SOLUTIONS σ2ηi = σ2η and we have σ2 g = K K 2 σ2 η = 1 K σ2 η which proves the validity of Eq. (2.6-7). Problem 2.21 (a) Pixels are integer values, and 8 bits allow representation of 256 contiguous in- teger values. In our work, the range of intensity values for 8-bit images is [0,255]. The subtraction of values in this range cover the range [−255,255].Thisrangeof values cannot be covered by 8 bits, but it is given in the problem statement that the result of subtraction has to be represented in 8 bits also, and, consistent with the range of values used for 8-bit images throughout the book, we assume that values of the 8-bit difference images are in the range [0,255]. What this means is that any subtraction of 2 pixels that yields a negative quantity will be clipped at 0. The process of repeated subtractions of an image b(x,y ) from an image a(x,y ) can be expressed as d K (x,y )=a(x,y ) − K k=1 b(x,y ) = a(x,y ) − K ×b(x,y ) where d K (x,y ) is the difference image resulting after K subtractions. Because image subtraction is an array operation (see Section 2.6.1), we can focus atten- tion on the subtraction of any corresponding pair of pixels in the images. We have already stated that negative results are clipped at 0. Once a 0 result is ob- tained, it will remain so because subtraction of any nonnegative value from 0 is a negative quantity which, again, is clipped at 0. Similarly, any location x0,y0 for which b x0,y0 = 0, will produce the result d K x0,y0 = a x0,y0 .That is, repeatedly subtracting 0 from any value results in that value. The locations in b(x,y ) that are not 0 will eventually decrease the corresponding values in d K (x,y ) until they are 0. The maximum number of subtractions in which this takes place in the context of the present problem is 255, which corresponds to the condition at a location in which a(x,y ) is 255 and b(x,y ) is 1. Thus, we con- clude from the preceding discussion that repeatedly subtracting an image from another will result in a difference image whose components are 0 in the loca- tions in b(x,y ) that are not zero and equal to the original values of a(x,y ) at the locations in b(x,y ) that are 0. This result will be achieved in, at most, 255 subtractions. 19 (b) The order does matter. For example, suppose that at a pair of arbitrary co- ordinates, x0,y0 , a x0,y0 = 128 and b x0,y0 = 0. Subtracting b x0,y0 from a x0,y0 will result in d K x0,y0 = 128 in the limit. Reversing the operation will result in a value of 0 in that same location. Problem 2.22 Let g (x,y ) denote the golden image, and let f (x,y ) denote any input image ac- quired during routine operation of the system. Change detection via subtrac- tion is based on computing the simple difference d (x,y )=g (x,y )− f (x,y ).The resulting image, d (x,y ), can be used in two fundamental ways for change de- tection. One way is use pixel-by-pixel analysis. In this case we say that f (x,y ) is “close enough” to the golden image if all the pixels in d (x,y ) fall within a spec- ified threshold band [Tmin,Tmax] where Tmin is negative and Tmax is positive. Usually, the same value of threshold is used for both negative and positive dif- ferences, so that we have a band [−T,T] in which all pixels of d (x,y ) must fall in order for f (x,y ) to be declared acceptable. The second major approach is sim- ply to sum all the pixels in d (x,y ) and compare the sum against a threshold Q. Note that the absolute value needs to be used to avoid errors canceling out. This is a much cruder test, so we will concentrate on the first approach. There are three fundamental factors that need tight control for difference- based inspection to work: (1) proper registration, (2) controlled illumination, and (3) noise levels that are low enough so that difference values are not affected appreciably by variations due to noise. The first condition basically addresses the requirement that comparisons be made between corresponding pixels. Two images can be identical, but if they are displaced with respect to each other, comparing the differences between them makes no sense. Often, special mark- ings are manufactured into the product for mechanical or image-based align- ment Controlled illumination (note that “illumination” is not limited to visible light) obviously is important because changes in illumination can affect dramatically the values in a difference image. One approach used often in conjunction with illumination control is intensity scaling based on actual conditions. For exam- ple, the products could have one or more small patches of a tightly controlled color, and the intensity (and perhaps even color) of each pixels in the entire im- age would be modified based on the actual versus expected intensity and/or color of the patches in the image being processed. Finally, the noise content of a difference image needs to be low enough so that it does not materially affect comparisons between the golden and input im- ages. Good signal strength goes a long way toward reducing the effects of noise. 20 CHAPTER 2. PROBLEM SOLUTIONS Figure P2.23 Another (sometimes complementary) approach is to implement image process- ing techniques (e.g., image averaging) to reduce noise. Obviously there are a number if variations of the basic theme just described. For example, additional intelligence in the form of tests that are more sophisti- cated than pixel-by-pixel threshold comparisons can be implemented. A tech- nique used often in this regard is to subdivide the golden image into different regions and perform different (usually more than one) tests in each of the re- gions, based on expected region content. Problem 2.23 (a) The answer is shown in Fig. P2.23. (b) With reference to the sets in the problem statement, the answers are, from left to right, (A ∩ B ∩C) − (B ∩C); (A ∩ B ∩C) ∪ (A ∩C) ∪ (A ∩ B); B ∩ (A ∪C)c ∪{(A ∩C) − [(A ∩C) ∩ (B ∩C)]}. Problem 2.24 Using triangular regions means three tiepoints, so we can solve the following set of linear equations for six coefficients: x  = c1x + c2y + c3 y  = c4x + c5y + c6 21 to implement spatial transformations. Intensity interpolation is implemented using any of the methods in Section 2.4.4. Problem 2.25 The Fourier transformation kernel is separable because r(x,y,u,v)=e −j 2π(ux/M+vy/N ) = e −j 2π(ux/M)e −j 2π(vy/N ) = r1(x,u)r2(y,v). It is symmetric because e −j 2π(ux/M+vy/N ) = e −j 2π(ux/M)e −j 2π(vy/N ) = r1(x,u)r1(y,v). Problem 2.26 From Eq. (2.6-27) and the definition of separable kernels, T(u,v)= M−1 x=0 N −1 y =0 f (x,y )r(x,y,u,v) = M−1 x=0 r1(x,u) N −1 y =0 f (x,y )r2(y,v) = M−1 x=0 T(x,v)r1(x,u) where T (x,v)= N −1 y =0 f (x,y )r2(y,v). For a fixed value of x,thisequationisrecognizedasthe1-Dtransformalong one row of f (x,y ). By letting x vary from 0 to M −1 we compute the entire array T(x,v). Then, by substituting this array into the last line of the previous equa- tion we have the 1-D transform along the columns of T(x,v).Inotherwords, when a kernel is separable, we can compute the 1-D transform along the rows of the image. Then we compute the 1-D transform along the columns of this in- termediate result to obtain the final 2-D transform, T (u,v).Weobtainthesame result by computing the 1-D transform along the columns of f (x,y ) followed by the 1-D transform along the rows of the intermediate result. 22 CHAPTER 2. PROBLEM SOLUTIONS This result plays an important role in Chapter 4 when we discuss the 2-D Fourier transform. From Eq. (2.6-33), the 2-D Fourier transform is given by T (u,v)= M−1 x=0 N −1 y =0 f (x,y )e −j 2π(ux/M+vy/N ). It is easily verified that the Fourier transform kernel is separable (Problem 2.25), so we can write this equation as T(u,v)= M−1 x=0 N −1 y =0 f (x,y )e −j 2π(ux/M+vy/N ) = M−1 x=0 e −j 2π(ux/M) N −1 y =0 f (x,y )e −j 2π(vy/N ) = M−1 x=0 T(x,v)e −j 2π(ux/M) where T (x,v)= N −1 y =0 f (x,y )e −j 2π(vy/N ) is the 1-D Fourier transform along the rows of f (x,y ),asweletx = 0,1,...,M −1. Problem 2.27 The geometry of the chips is shown in Fig. P2.27(a). From Fig. P2.27(b) and the geometry in Fig. 2.3, we know that Δx = λ × 80 λ − z where Δx is the side dimension of the image (assumed square because the viewing screen is square) impinging on the image plane, and the 80 mm refers to the size of the viewing screen, as described in the problem statement. The most inexpensive solution will result from using a camera of resolution 512×512. Based on the information in Fig. P2.27(a), a CCD chip with this resolution will be of size (16μ) × (512)=8 mm on each side. Substituting Δx = 8mminthe above equation gives z = 9λ as the relationship between the distance z and the focal length of the lens, where a minus sign was ignored because it is just a co- ordinate inversion. If a 25 mm lens is used, the front of the lens will have to be located at approximately 225 mm from the viewing screen so that the size of the 23 Figure P2.27 image of the screen projected onto the CCD image plane does not exceed the 8 mm size of the CCD chip for the 512×512 camera. This value of z is reasonable, but any other given lens sizes would be also; the camera would just have to be positioned further away. Assuming a 25 mm lens, the next issue is to determine if the smallest defect will be imaged on, at least, a 2 × 2 pixel area, as required by the specification. It is given that the defects are circular, with the smallest defect having a diameter of 0.8 mm. So, all that needs to be done is to determine if the image of a circle of diameter 0.8 mm or greater will, at least, be of size 2×2pixelsontheCCDimag- ing plane. This can be determined by using the same model as in Fig. P2.27(b) with the 80 mm replaced by 0.8 mm. Using λ = 25 mm and z = 225 mm in the above equation yields Δx = 100 μ. In other words, a circular defect of diameter 0.8 mm will be imaged as a circle with a diameter of 100 μ on the CCD chip of a 512 × 512 camera equipped with a 25 mm lens and which views the defect at a distance of 225 mm. If, in order for a CCD receptor to be activated, its area has to be excited in its entirety, then, it can be seen from Fig. P2.27(a) that to guarantee that a 2 × 2 array of such receptors will be activated, a circular area of diameter no less than (6)(8)=48 μ has to be imaged onto the CCD chip. The smallest defect is imaged as a circle with diameter of 100 μ, which is well above the 48 μ minimum requirement. 24 CHAPTER 2. PROBLEM SOLUTIONS Therefore, we conclude that a CCD camera of resolution 512 × 512 pixels, using a 25 mm lens and imaging the viewing screen at a distance of 225 mm, is sufficient to solve the problem posed by the plant manager. Chapter 3 Problem Solutions Problem 3.1 Let f denote the original image. First subtract the minimum value of f denoted f min from f to yield a function whose minimum value is 0: g 1 = f − f min Next divide g 1 by its maximum value to yield a function in the range [0,1] and multiply the result by L − 1 to yield a function with values in the range [0,L − 1] g = L − 1 max g 1 g 1 = L − 1 max f − f min f − f min Keep in mind that f min is a scalar and f is an image. Problem 3.2 (a) General form: s = T (r)=Ae−Kr2. For the condition shown in the problem figure, Ae−KL2 0 = A/2. Solving for K yields −KL2 0 = ln(0.5) K = 0.693/L2 0. Then, s = T(r)=Ae − 0.693 L2 0 r 2 . (b) General form: s = T(r)=B(1−e −Kr2). For the condition shown in the prob- 25 26 CHAPTER 3. PROBLEM SOLUTIONS Figure P3.3 lem figure, B(1 − e −KL2 0)=B/2. The solution for K isthesameasin(a),so s = T (r)=B(1 − e − 0.693 L2 0 r 2 ) (c) General form: s = T(r)=(D −C)(1 − e −Kr2)+C. Problem 3.3 (a) s = T(r)= 1 1+(m/r )E . (b) See Fig. P3.3. (c) We want s to be 0 for r < m,ands to be 1 for values of r > m.Whenr = m, s = 1/2. But, because the values of r are integers, the behavior we want is s = T(r)= ⎧ ⎨ ⎩ 0.0 when r ≤ m − 1 0.5 when r = m 1.0 when r ≥ m + 1. The question in the problem statement is to find the smallest value of E that will make the threshold behave as in the equation above. When r = m,wesee from (a) that s = 0.5, regardless of the value of E.IfC is the smallest positive 27 number representable in the computer, and keeping in mind that s is positive, then any value of s less than C/2willbecalled0bythecomputer.Tofindthe smallest value of E for which this happens, simply solve the following equation for E, using the given value m = 128: 1 1 +[m/(m − 1)]E < C/2. Because the function is symmetric about m,theresultingvalueofE will yield s = 1forr ≥ m + 1. Problem 3.4 The transformations required to produce the individual bit planes are nothing more than mappings of the truth table for eight binary variables. In this truth table, the values of the 8th bit are 0 for byte values 0 to 127, and 1 for byte values 128 to 255, thus giving the transformation mentioned in the problem statement. Note that the given transformed values of either 0 or 255 simply indicate a binary image for the 8th bit plane. Any other two values would have been equally valid, though less conventional. Continuing with the truth table concept, the transformation required to pro- duce an image of the 7th bit plane outputs a 0 for byte values in the range [0, 63], a 1 for byte values in the range [64, 127], a 0 for byte values in the range [128, 191], and a 1 for byte values in the range [192, 255]. Similarly, the trans- formation for the 6th bit plane alternates between eight ranges of byte values, the transformation for the 5th bit plane alternates between 16 ranges, and so on. Finally, the output of the transformation for the lowest-order bit plane al- ternates between 0 and 255 depending on whether the byte values are even or odd. Thus, this transformation alternates between 128 byte value ranges, which explains why an image of that bit plane is usually the “busiest” looking of all the bit plane images. Problem 3.5 (a) The number of pixels having different intensity level values would decrease, thus causing the number of components in the histogram to decrease. Because the number of pixels would not change, this would cause the height of some of the remaining histogram peaks to increase in general. Typically, less variability in intensity level values will reduce contrast. (b) The most visible effect would be significant darkening of the image. For ex- ample, dropping the highest bit would limit the brightest level in an 8-bit im- 28 CHAPTER 3. PROBLEM SOLUTIONS age to be 127. Because the number of pixels would remain constant, the height of some of the histogram peaks would increase. The general shape of the his- togram would now be taller and narrower, with no histogram components being located past 127. Problem 3.6 All that histogram equalization does is remap histogram components on the in- tensity scale. To obtain a uniform (flat) histogram would require in general that pixel intensities actually be redistributed so that there are L groups of n/L pixels with the same intensity, where L is the number of allowed discrete intensity lev- els and n = MN is the total number of pixels in the input image. The histogram equalization method has no provisions for this type of (artificial) intensity redis- tribution process. Problem 3.7 Let n = MN be the total number of pixels and let nrj be the number of pixels in the input image with intensity value rj . Then, the histogram equalization transformation is sk = T (rk )= k j =0 nrj /n = 1 n k j =0 nrj . Because every pixel (and no others) with value rk is mapped to value sk ,itfol- lows that nsk = nrk . A second pass of histogram equalization would produce values vk according to the transformation vk = T(sk )= 1 n k j =0 ns j . But, ns j = nrj ,so vk = T (sk )= 1 n k j =0 nrj = sk which shows that a second pass of histogram equalization would yield the same result as the first pass. We have assumed negligible round-off errors. 29 Problem 3.8 The general histogram equalization transformation function is s = T (r)= r 0 pr (w)dw. There are two important points about which the student must show awareness in answering this problem. First, this equation assumes only positive values for r. However, the Gaussian density extends in general from −∞ to ∞. Recognition of this fact is important. Once recognized, the student can approach this diffi- culty in several ways. One good answer is to make some assumption, such as the standard deviation being small enough so that the area of the curve under pr (r) for negative values of r is negligible. Another is to scale up the values until the area under the negative part of the curve is negligible. The second major point is to recognize is that the transformation function itself, s = T(r)= 1 2πσ r 0 e − (w−m)2 2σ2 dw has no closed-form solution. This is the cumulative distribution function of the Gaussian density, which is either integrated numerically, or its values are looked up in a table. A third, less important point, that the student should address is the high-end values of r. Again, the Gaussian PDF extends to +∞. One possibility here is to make the same assumption as above regarding the standard deviation. Another is to divide by a large enough value so that the area under the positive part of the PDF past that point is negligible (this scaling reduces the standard deviation). Another approach the student can take is to work with histograms, in which case the transformation function would be in the form of a summation. The is- sue of negative and high positive values must still be addressed, and the possible answers suggested above regarding these issues still apply. The student needs to indicate that the histogram is obtained by sampling the continuous function, so some mention should be made regarding the number of samples (bits) used. The most likely answer is 8 bits, in which case the student needs to address the scaling of the function so that the range is [0, 255]. Problem 3.9 We are interested in just one example in order to satisfy the statement of the problem. Consider the probability density function in Fig. P3.9(a). A plot of 30 CHAPTER 3. PROBLEM SOLUTIONS L - 1 2( 1)L - L - 1L/4 3 /4LL/20 L - 1 ( 1)/2L - L - 1 L/4 3 /4LL/20 Figure P3.9. the transformation T (r) in Eq. (3.3-4) using this particular density function is shown in Fig. P3.9(b). Because pr (r) is a probability density function we know from the discussion in Section 3.3.1 that the transformation T (r) satisfies con- ditions (a) and (b) stated in that section. However, we see from Fig. P3.9(b) that the inverse transformation from s back to r is not single valued, as there are an infinite number of possible mappings from s =(L − 1)/2backtor.Itisimpor- tant to note that the reason the inverse transformation function turned out not to be single valued is the gap in pr (r) in the interval [L/4,3L/4]. Problem 3.10 (a) We need to show that the transformation function in Eq. (3.3-8) is monotonic in the range [0,L − 1], and that its values are in the range [0,L − 1].FromEq.(3.3- 8) sk = T (rk )=(L − 1) k j =0 pr (rj ) Because all the pr (rj ) are positive, it follows that T(rk ) is monotonic in the range k ∈ [0,L − 1]. Because the sum of all the values of pr (rj ) is 1, it follows that 0 ≤ sk ≤ L − 1. 31 (b) If none of the intensity levels rk , k = 1,2,...,L − 1, are 0, then T(rk ) will be strictly monotonic. This implies a one-to-one mapping both ways, meaning that both forward and inverse transformations will be single-valued. Problem 3.11 First, we obtain the histogram equalization transformation: s = T (r)= r 0 pr (w)dw = r 0 (−2w + 2)dw = −r 2 + 2r. Next we find v =G(z)= z 0 pz (w) dw = z 0 2wdw= z 2. Finally, z =G −1(v)=± v. But only positive intensity levels are allowed, so z = v.Then,wereplacev with s, which in turn is −r 2 + 2r,andwehave z =  −r 2 + 2r. Problem 3.12 The value of the histogram component corresponding to the kth intensity level in a neighborhood is pr (rk )=nk n for k = 1,2,...,K − 1, where nk is the number of pixels having intensity level rk , n is the total number of pixels in the neighborhood, and K is the total number of possible intensity levels. Suppose that the neighborhood is moved one pixel to the right (we are assuming rectangular neighborhoods). This deletes the left- most column and introduces a new column on the right. The updated histogram then becomes p r (rk )= 1 n [nk − n Lk + nRk ] for k = 0,1,...,K − 1, where n Lk is the number of occurrences of level rk on the left column and nRk is the similar quantity on the right column. The preceding equation can be written also as p r (rk )=pr (rk )+ 1 n [nRk − n Lk ] 32 CHAPTER 3. PROBLEM SOLUTIONS for k = 0,1,...,K −1. The same concept applies to other modes of neighborhood motion: p r (rk )=pr (rk )+ 1 n [bk − a k ] for k = 0,1,...,K −1, where a k is the number of pixels with value rk in the neigh- borhood area deleted by the move, and bk is the corresponding number intro- duced by the move. Problem 3.13 The purpose of this simple problem is to make the student think of the meaning of histograms and arrive at the conclusion that histograms carry no information about spatial properties of images. Thus, the only time that the histogram of the images formed by the operations shown in the problem statement can be de- termined in terms of the original histograms is when one (both) of the images is (are) constant. In (d) we have the additional requirement that none of the pixels of g (x, y ) can be 0. Assume for convenience that the histograms are not normalized, so that, for example, h f (rk ) is the number of pixels in f (x, y ) having intensity level rk . Assume also that all the pixels in g (x, y ) have constant value c. The pixels of both images are assumed to be positive. Finally, let uk denote the intensity levels of the pixels of the images formed by any of the arithmetic oper- ations given in the problem statement. Under the preceding set of conditions, the histograms are determined as follows: (a) We obtain the histogram hsum(uk ) of the sum by letting uk = rk +c,andalso hsum(uk )=h f (rk ) for all k. In other words, the values (height) of the compo- nents of hsum are the same as the components of h f , but their locations on the intensity axis are shifted right by an amount c. (b) Similarly, the histogram hdiff(uk ) of the difference has the same components as h f buttheirlocationsaremovedleftbyanamountc as a result of the sub- traction operation. (c) Following the same reasoning, the values (heights) of the components of his- togram hprod(uk ) oftheproductarethesameash f , but their locations are at uk = c × rk . Note that while the spacing between components of the resulting histograms in (a) and (b) was not affected, the spacing between components of hprod(uk ) will be spread out by an amount c. (d) Finally, assuming that c = 0, the components of hdiv(uk ) are the same as those of h f , but their locations will be at uk = rk /c. Thus, the spacing between components of hdiv(uk ) will be compressed by an amount equal to 1/c.The preceding solutions are applicable if image f (x, y ) is constant also. In this case 33 Figure P3.14 the four histograms just discussed would each have only one component. Their location would be affected as described (a) through (d). Problem 3.14 (a) The number of boundary points between the black and white regions is much larger in the image on the right. When the images are blurred, the boundary points will give rise to a larger number of different values for the image on the right, so the histograms of the two blurred images will be different. (b) To handle the border effects, we surround the image with a border of 0s. We assume that image is of size N × N (the fact that the image is square is evident from the right image in the problem statement). Blurring is implemented by a3×3 mask whose coefficients are 1/9. Figure P3.14 shows the different types of values that the blurred left image will have (see image in the problem state- ment). The values are summarized in Table P3.14-1. It is easily verified that the sum of the numbers on the left column of the table is N 2.Ahistogramiseasily constructed from the entries in this table. A similar (tedious) procedure yields the results in Table P3.14-2. Table P3.14-1 No. of Points Value N N 2 − 1 0 22/9 N − 23/9 44/9 3N − 86/9 (N − 2) N 2 − 2 1 34 CHAPTER 3. PROBLEM SOLUTIONS Table P3.14-2 No. of Points Value N 2 2 − 14N + 98 0 28 2/9 14N − 224 3/9 128 4/9 98 5/9 16N − 256 6/9 N 2 2 − 16N + 128 1 Problem 3.15 (a) Consider a 3×3 mask first. Because all the coefficients are 1 (we are ignoring the 1/9 scale factor), the net effect of the lowpass filter operation is to add all the intensity values of pixels under the mask. Initially, it takes 8 additions to produce the response of the mask. However, when the mask moves one pixel location to the right, it picks up only one new column. The new response can be computed as Rnew = Rold −C1 +C3 where C1 is the sum of pixels under the first column of the mask before it was moved, and C3 is the similar sum in the column it picked up after it moved. This is the basic box-filter or moving-average equation. For a 3 × 3 mask it takes 2 additions to get C3 (C1 was already computed). To this we add one subtraction and one addition to get Rnew. Thus, a total of 4 arithmetic operations are needed to update the response after one move. This is a recursive procedure for moving from left to right along one row of the image. When we get to the end of a row, we move down one pixel (the nature of the computation is the same) and continue the scan in the opposite direction. For a mask of size n × n, (n − 1) additions are needed to obtain C3,plusthe single subtraction and addition needed to obtain Rnew, which gives a total of (n + 1) arithmetic operations after each move. A brute-force implementation would require n2 − 1 additions after each move. (b) The computational advantage is A = n2 − 1 n + 1 = (n + 1)(n − 1) (n + 1) = n − 1. The plot of A as a function of n is a simple linear function starting at A = 1for n = 2. 35 Problem 3.16 (a) The key to solving this problem is to recognize (1) that the convolution re- sult at any location (x,y ) consists of centering the mask at that point and then forming the sum of the products of the mask coefficients with the corresponding pixels in the image; and (2) that convolution of the mask with the entire image results in every pixel in the image being visited only once by every element of the mask (i.e., every pixel is multiplied once by every coefficient of the mask). Because the coefficients of the mask sum to zero, this means that the sum of the products of the coefficients with the same pixel also sum to zero. Carrying out this argument for every pixel in the image leads to the conclusion that the sum of the elements of the convolution array also sum to zero. (b) The only difference between convolution and correlation is that the mask is rotated by 180◦. This does not affect the conclusions reached in (a), so cor- relating an image with a mask whose coefficients sum to zero will produce a correlation image whose elements also sum to zero. Problem 3.17 One of the easiest ways to look at repeated applications of a spatial filter is to use superposition. Let f (x, y ) and h(x, y ) denote the image and the filter function, respectively. Assuming square images of size N × N for convenience, we can express f (x, y ) as the sum of at most N 2 images, each of which has only one nonzero pixel (initially, we assume that N can be infinite). Then, the process of running h(x, y ) over f (x, y ) can be expressed as the following convolution: h(x,y )+f (x,y )=h(x,y )+  f 1(x,y )+f 2(x,y )+···+ f N 2 (x,y ) . Suppose for illustrative purposes that f i (x,y ) has value 1 at its center, while the other pixels are valued 0, as discussed above (see Fig. P3.17a). If h(x,y )isa 3 × 3maskof1/9’s (Fig. P3.17b), then convolving h(x,y ) with f i (x,y ) will pro- duce an image with a 3 × 3arrayof1/9’s at its center and 0s elsewhere, as Fig. P3.17(c) shows. If h(x,y ) is now applied to this image, the resulting image will be as shown in Fig. P3.17(d). Note that the sum of the nonzero pixels in both Figs. P3.17(c) and (d) is the same, and equal to the value of the original pixel. Thus, it is intuitively evident that successive applications of h(x,y ) will ”dif- fuse” the nonzero value of f i (x,y ) (not an unexpected result, because h(x,y ) is a blurring filter). Since the sum remains constant, the values of the nonzero elements will become smaller and smaller, as the number of applications of the filter increases. The overall result is given by adding all the convolved f k (x,y ), for k = 1,2,...,N 2. 36 CHAPTER 3. PROBLEM SOLUTIONS Figure P3.17 It is noted that every iteration of blurring further diffuses the values out- wardly from the starting point. In the limit, the values would get infinitely small, but, because the average value remains constant, this would require an image of infinite spatial proportions. It is at this junction that border conditions become important. Although it is not required in the problem statement, it is instructive to discuss in class the effect of successive applications of h(x,y ) to an image of finite proportions. The net effect is that, because the values cannot diffuse out- ward past the boundary of the image, the denominator in the successive appli- cations of averaging eventually overpowers the pixel values, driving the image to zero in the limit. A simple example of this is given in Fig. P3.17(e), which shows an array of size 1× 7 that is blurred by successive applications of the 1 × 3mask h(y )= 1 3 [1,1,1]. We see that, as long as the values of the blurred 1 can diffuse out, the sum, S, of the resulting pixels is 1. However, when the boundary is met, 37 an assumption must be made regarding how mask operations on the border are treated. Here, we used the commonly made assumption that pixel value imme- diately past the boundary are 0. The mask operation does not go beyond the boundary, however. In this example, we see that the sum of the pixel values be- gins to decrease with successive applications of the mask. In the limit, the term 1/(3)n would overpower the sum of the pixel values, yielding an array of 0s. Problem 3.18 (a) There are n2 points in an n × n median filter mask. Because n is odd, the median value, ζ,issuchthatthereare(n2 − 1)/2 points with values less than or equal to ζ and the same number with values greater than or equal to ζ.How- ever, because the area A (number of points) in the cluster is less than one half n2,andA and n are integers, it follows that A is always less than or equal to (n2 − 1)/2. Thus, even in the extreme case when all cluster points are encom- passed by the filter mask, there are not enough points in the cluster for any of them to be equal to the value of the median (remember, we are assuming that all cluster points are lighter or darker than the background points). Therefore, if the center point in the mask is a cluster point, it will be set to the median value, which is a background shade, and thus it will be “eliminated” from the cluster. This conclusion obviously applies to the less extreme case when the number of cluster points encompassed by the mask is less than the maximum size of the cluster. (b) For the conclusion reached in (a) to hold, the number of points that we con- sider cluster (object) points can never exceed (n2 − 1)/2. Thus, two or more dif- ferent clusters cannot be in close enough proximity for the filter mask to encom- pass points from more than one cluster at any mask position. It then follows that no two points from different clusters can be closer than the diagonal dimension of the mask minus one cell (which can be occupied by a point from one of the clusters). Assuming a grid spacing of 1 unit, the minimum distance between any two points of different clusters then must greater than 2(n − 1).Inother words, these points must be separated by at least the distance spanned by n − 1 cells along the mask diagonal. Problem 3.19 (a) Numerically sort the n2 values. The median is ζ =[(n2 + 1)/2]-th largest value. 38 CHAPTER 3. PROBLEM SOLUTIONS (b) Once the values have been sorted one time, we simply delete the values in the trailing edge of the neighborhood and insert the values in the leading edge in the appropriate locations in the sorted array. Problem 3.20 (a) The most extreme case is when the mask is positioned on the center pixel of a 3-pixel gap, along a thin segment, in which case a 3×3 mask would encompass a completely blank field. Since this is known to be the largest gap, the next (odd) mask size up is guaranteed to encompass some of the pixels in the segment. Thus, the smallest mask that will do the job is a 5 × 5averagingmask. (b) The smallest average value produced by the mask is when it encompasses only two pixels of the segment. This average value is a gray-scale value, not bi- nary, like the rest of the segment pixels. Denote the smallest average value by Amin, and the binary values of pixels in the thin segment by B. Clearly, Amin is less than B. Then, setting the binarizing threshold slightly smaller than Amin will create one binary pixel of value B in the center of the mask. Problem 3.21 From Fig. 3.33 we know that the vertical bars are 5 pixels wide, 100 pixels high, and their separation is 20 pixels. The phenomenon in question is related to the horizontal separation between bars, so we can simplify the problem by consid- ering a single scan line through the bars in the image. The key to answering this question lies in the fact that the distance (in pixels) between the onset of one bar and the onset of the next one (say, to its right) is 25 pixels. Consider the scan line shown in Fig. P3.21. Also shown is a cross section of a 25 × 25 mask. The response of the mask is the average of the pixels that it encompasses. We note that when the mask moves one pixel to the right, it loses one value of the vertical bar on the left, but it picks up an identical one on the right, so the response doesn’t change. In fact, the number of pixels belonging to the vertical bars and contained within the mask does not change, regardless of where the mask is located (as long as it is contained within the bars, and not near the edges of the set of bars). Thefactthatthenumberofbarpixelsunder the mask does not change is due to the peculiar separation between bars and the width of the lines in relation to the 25-pixel width of the mask This constant response is the reason why no white gaps are seen in the image shown in the problem statement. Note that this constant response does not happen with the 23×23 or the 45×45 masks because they are not ”synchronized” with the width of the bars and their separation. 39 Figure P3.21 Problem 3.22 There are at most q 2 points in the area in which we want to reduce the inten- sity level of each pixel to one-tenth its original value. Consider an averaging mask of size n × n encompassing the q ×q neighborhood. The averaging mask has n2 points of which we are assuming that q 2 points are from the object and the rest from the background. Note that this assumption implies separation be- tween objects that, at a minimum, is equal to the area of the mask all around each object. The problem becomes intractable unless this assumption is made. This condition was not given in the problem statement on purpose in order to force the student to arrive at that conclusion. If the instructor wishes to simplify the problem, this should then be mentioned when the problem is assigned. A further simplification is to tell the students that the intensity level of the back- ground is 0. Let B represent the intensity level of background pixels, let a i denote the in- tensity levels of points inside the mask and oi the levels of the objects. In addi- tion, let Sa denote the set of points in the averaging mask, So the set of points in the object, and Sb the set of points in the mask that are not object points. Then, the response of the averaging mask at any point on the image can be written as R = 1 n2 a i ∈Sa a i = 1 n2 ⎡ ⎢⎣ o j ∈So oj + a k ∈Sb a k ⎤ ⎥⎦ = 1 n2 ⎡ ⎢⎣q 2 q 2 o j ∈So oj ⎤ ⎥⎦ + 1 n2 ⎡ ⎢⎣ a k ∈Sb a k ⎤ ⎥⎦ = q 2 n2Q + 1 n2 (n2 −q 2)B 40 CHAPTER 3. PROBLEM SOLUTIONS where Q denotes the average value of object points. Let the maximum expected average value of object points be denoted by Qmax. Then we want the response of the mask at any point on the object under this maximum condition to be less than one-tenthQmax,or q 2 n2Qmax + 1 n2 (n2 −q 2)B < 1 10Qmax from which we get the requirement n > q 10(Qmax − B) (Qmax − 10B) 1/2 for the minimum size of the averaging mask. Note that if the background inten- sity is 0, then the minimum mask size is n < 10q. If this was a fact specified by the instructor, or the student made this assumption from the beginning, then this answer follows almost by inspection. Problem 3.23 The student should realize that both the Laplacian and the averaging process are linear operations, so it makes no difference which one is applied first. Problem 3.24 The Laplacian operator is defined as ∇2 f = ∂ 2 f ∂ x 2 + ∂ 2 f ∂ y 2 for the unrotated coordinates, and as ∇2 f = ∂ 2 f ∂ x 2 + ∂ 2 f ∂ y 2 . for rotated coordinates. It is given that x = x  cosθ − y  sinθ and y = x  sinθ + y  cosθ where θ is the angle of rotation. We want to show that the right sides of the first two equations are equal. We start with ∂ f ∂ x  = ∂ f ∂ x ∂ x ∂ x  + ∂ f ∂ y ∂ y ∂ x  = ∂ f ∂ x cosθ + ∂ f ∂ y sinθ. 41 Taking the partial derivative of this expression again with respect to x  yields ∂ 2 f ∂ x 2 = ∂ 2 f ∂ x 2 cos2 θ + ∂ ∂ x ∂ f ∂ y sinθ cosθ + ∂ ∂ y ∂ f ∂ x cosθ sinθ + ∂ 2 f ∂ y 2 sin2 θ. Next, we compute ∂ f ∂ y  = ∂ f ∂ x ∂ x ∂ y  + ∂ f ∂ y ∂ y ∂ y  = −∂ f ∂ x sinθ + ∂ f ∂ y cosθ. Taking the derivative of this expression again with respect to y  gives ∂ 2 f ∂ y 2 = ∂ 2 f ∂ x 2 sin2 θ − ∂ ∂ x ∂ f ∂ y cosθ sinθ − ∂ ∂ y ∂ f ∂ x sinθ cosθ + ∂ 2 f ∂ y 2 cos2 θ. Adding the two expressions for the second derivatives yields ∂ 2 f ∂ x 2 + ∂ 2 f ∂ y 2 = ∂ 2 f ∂ x 2 + ∂ 2 f ∂ y 2 which proves that the Laplacian operator is independent of rotation. Problem 3.25 The Laplacian mask with a −4 in the center performs an operation proportional to differentiation in the horizontal and vertical directions. Consider for a mo- ment a 3 × 3“Laplacian”maskwitha−2 in the center and 1s above and below the center. All other elements are 0. This mask will perform differentiation in only one direction, and will ignore intensity transitions in the orthogonal direc- tion.Animageprocessedwithsuchamaskwillexhibitsharpeninginonlyone direction. A Laplacian mask with a -4 in the center and 1s in the vertical and horizontal directions will obviously produce an image with sharpening in both directions and in general will appear sharper than with the previous mask. Sim- ilarly, and mask with a −8 in the center and 1s in the horizontal, vertical, and diagonal directions will detect the same intensity changes as the mask with the −4 in the center but, in addition, it will also be able to detect changes along the diagonals, thus generally producing sharper-looking results. 42 CHAPTER 3. PROBLEM SOLUTIONS Figure P3.26 Problem 3.26 (a) The size and coefficients of the 3 × 3 Laplacian mask come directly from the definition of the Laplacian given in Eq. (3.6-6). In other words, the number of coefficients (and thus size of the mask) is a direct result of the definition of the second derivative. A larger “Laplacian-like” mask would no longer be im- plementing a second derivative, so we cannot expect that in general it will give sharper results. In fact, as explained in part (b), just the opposite occurs. (b) In general, increasing the size of the “Laplacian-like” mask produces blur- ring. To see why this is so, consider an image consisting of two vertical bands, a black band on the left and a white band on the right, with the transition be- tween the bands occurring through the center of the image. That is, the image has a sharp vertical edge through its center. From Fig. 3.36 we know that a sec- ond derivative should produce a double edge in the region of the vertical edge when a 3 × 3 Laplacian mask is centered on the edge. As the center of the mask moves more than two pixels on either side of the edge the entire mask will en- compass a constant area and its response would be zero, as it should. However, suppose that the mask is much larger. As its center moves through, say, the black (0) area, one half of the mask will be totally contained in that area. However, de- pending on its size, part of the mask will be contained in the white (255) area. The sum of products will therefore be different from 0. This means that there will be a response in an area where the response should have been 0 because the mask is centered on a constant area. Figure P3.26 shows these effects. The 43 1 000 00 000 (a) 1 9 17 (b) Figure P3.27 figure on the top left is the band just mentioned and the figure next to it is the result of using a 3 × 3 Laplacian mask with an 8 in the center. The other figures shows the results of using “Laplacian-like” masks of sizes 15×15, 35×35, 75×75, and 125 × 125, respectively. The progressively increasing blurring as a result of mask size is evident in these results. Problem 3.27 With reference to Eqs. (3.6-8) and (3.6-9), and using k = 1wecanwritethe following equation for unsharp masking: g (x,y )=f (x,y )+f (x,y ) − ¯f (x,y ) = 2f (x,y ) − ¯f (x,y ) Convolving f (x,y ) with the mask in Fig. P3.27(a) produces f (x,y ).Convolv- ing f (x,y ) with the mask in Fig. 3.32(a) produces ¯f (x,y ). Then, because these operations are linear, we can use superposition, and we see from the preceding equation that using two masks of the form in Fig. P3.27(a) and the mask in Fig. 3.32(a) produces the composite mask in Fig. P3.27(b). Convolving this mask with f (x,y ) produces g (x,y ), the unsharp result. Problem 3.28 Consider the following equation: f (x,y ) −∇2 f (x,y )=f (x,y ) −  f (x + 1,y )+f (x − 1,y )+f (x,y + 1) +f (x,y − 1) − 4f (x,y ) = 6f (x,y ) −  f (x + 1,y )+f (x − 1,y )+f (x,y + 1) +f (x,y − 1)+f (x,y ) 44 CHAPTER 3. PROBLEM SOLUTIONS = 5 1.2f (x,y )− 1 5  f (x + 1,y )+f (x − 1,y )+f (x,y + 1) +f (x,y − 1)+f (x,y ) = 5 1.2f (x,y ) − f (x,y ) where f (x,y ) denotes the average of f (x,y ) in a predefined neighborhood cen- tered at (x,y ) and including the center pixel and its four immediate neighbors. Treating the constants in the last line of the above equation as proportionality factors, we may write f (x,y ) −∇2 f (x,y ) f (x,y ) − f (x,y ). The right side of this equation is recognized within the just-mentioned propor- tionality factors to be of the same form as the definition of unsharp masking given in Eqs. (3.6-8) and (3.6-9). Thus, it has been demonstrated that subtract- ing the Laplacian from an image is proportional to unsharp masking. Problem 3.29 (a) From Problem 3.24, f (x,y ) ∂ f ∂ x  = ∂ f ∂ x cosθ + ∂ f ∂ y sinθ and ∂ f ∂ y  = −∂ f ∂ x sinθ + ∂ f ∂ y cosθ from which it follows that ∂ f ∂ x  2 + ∂ f ∂ y  2 = ∂ f ∂ x 2 + ∂ f ∂ y 2 or ∂ f ∂ x  2 + ∂ f ∂ y  2 1/2 = ∂ f ∂ x 2 + ∂ f ∂ y 2 1/2 . Thus, we see that the magnitude of the gradient is an isotropic operator. (b) From Eq. (3.6-10), (3.6-12), and the preceding results, |Gx | = ∂ f ∂ x Gy = ∂ f ∂ y , 45 |Gx  | = ∂ f ∂ x  = ∂ f ∂ x cosθ + ∂ f ∂ y sinθ , and Gy  = ∂ f ∂ y  = −∂ f ∂ x sinθ + ∂ f ∂ y cosθ . Clearly, |Gx  | + Gy  = |Gx | + Gy . Problem 3.30 It is given that the range of illumination stays in the linear portion of the camera response range, but no values for the range are given. The fact that images stay in the linear range implies that images will not be saturated at the high end or be driven in the low end to such an extent that the camera will not be able to respond, thus losing image information irretrievably. The only way to establish a benchmark value for illumination is when the variable (daylight) illumination is not present. Let f 0(x,y ) denote an image taken under artificial illumination only, with no moving objects (e.g., people or vehicles) in the scene. This be- comes the standard by which all other images will be normalized. There are nu- merous ways to solve this problem, but the student must show awareness that areas in the image likely to change due to moving objects should be excluded from the illumination-correction approach. One way is to select various representative subareas of f 0(x,y ) not likely to be obscured by moving objects and compute their average intensities. We then select the minimum and maximum of all the individual average values, denoted by, f min and f max. The objective then is to process any input image, f (x,y ),so that its minimum and maximum will be equal to f min and f max, respectively. The easiest way to do this is with a linear transformation function of the form f out(x,y )=af(x,y )+b where f out is the scaled output image. It is easily verified that the output image will have the required minimum and maximum values if we choose a = f max − f min f max − f min and b = f min f max − f max f min f max − f min where f max and f min are the maximum and minimum values of the input image. 46 CHAPTER 3. PROBLEM SOLUTIONS Note that the key assumption behind this method is that all images stay within the linear operating range of the camera, thus saturation and other nonlineari- ties are not an issue. Another implicit assumption is that moving objects com- prise a relatively small area in the field of view of the camera, otherwise these objects would overpower the scene and the values obtained from f 0(x,y ) would not make sense. If the student selects another automated approach (e.g., his- togram equalization), he/she must discuss the same or similar types of assump- tions. Problem 3.31 Substituting directly into Eq. (3.8-9) 2 b − a c − a 2 = 1 2 b − a c − a = 1 2 b = a + c 2 . Problem 3.32 All figures can be constructed using various membership functions from Fig. 3.46 and the definition of fuzzy union (the pointwise maxima between the curves), as follows. (a) The figure in part (a) of the problem statement can be constructed using two trapezoidal functions [Figs. 3.46(b)] and one triangular function [Fig. 3.46(a)], as Fig. P3.32(a) shows. (b) The figure in part (b) of the problem statement can be constructed using two trapezoidal functions in which the one on the right terminates vertically, as Fig. P3.32(b) shows. (c) The figure in part (c) of the problem statement can be constructed using two triangular functions, as Fig. P3.32(c) shows. Problem 3.33 The thickness of the boundaries increases as a the size of the filtering neigh- borhood increases. We support this conclusion with an example. Consider a one-pixel-thick straight black line running vertically through a white image. If a3× 3 neighborhood is used, any neighborhoods whose centers are more than 47 z0 .5 1 aab-ac- a+b a+c 1 z0 .5 1 1aa c+ bb c+ d z0 .5 1 1a =.5ab- ab+ (a) (b) (c) Figure P3.32 two pixels away from the line will produce differences with values of zero and the center pixel will be designated a region pixel. Leaving the center pixel at same location, if we increase the size of the neighborhood to, say, 5 × 5,thelinewill be encompassed and not all differences be zero, so the center pixel will now be designated a boundary point, thus increasing the thickness of the boundary. As the size of the neighborhood increases, we would have to be further and further from the line before the center point ceases to be called a boundary point. That is, the thickness of the boundary detected increases as the size of the neighbor- hood increases. Problem 3.34 (a) If the intensity of the center pixel of a 3× 3 region is larger than the intensity of all its neighbors, then decrease it. If the intensity is smaller than the intensity of all its neighbors, then increase it. Else, do not nothing. (b) Rules IF d 2 is PO AND d 4 is PO AND d 6 is PO AND d 8 is PO THEN v is PO IF d 2 is NE AND d 4 is NE AND d 6 is NE AND d 8 is NE THEN v is NE ELSE v is ZR. Note: In rule 1, all positive differences mean that the intensity of the noise pulse (z 5)islessthanthatofallits4-neighbors.Thenwe’llwanttomaketheoutput z  5 more positive so that when it is added to z 5 it will bring the value of the cen- ter pixel closer to the values of its neighbors. The converse is true when all the differences are negative. A mixture of positive and negative differences calls for no action because the center pixel is not a clear spike. In this case the correction should be zero (keep in mind that zero is a fuzzy set too). (c) The membership functions PO and NE are triangles from −1to0and0to1, respectively. Membership function ZR is also a triangle. It is centered on 0 and overlaps the other two slightly. Figure P3.34(a) shows these three membership functions. 48 CHAPTER 3. PROBLEM SOLUTIONS (d) Figure P3.34(b) shows in graphic form the three rules from part (b). The off-center elements in the 3 × 3 neighborhoods are differences d i = z i − z 5,for i = 2,4,6,8. The value v in the center is the correction that will be added to z 5 to obtain the new intensity, z  5 ,inthatlocation. (e) Figure P3.34(c) describes the fuzzy system responsible for generating the cor- rection factor, v. This diagram is similar to Fig. 3.52, with two main differences: (1) the rules are composed of ANDs, meaning that we have to use the min oper- ation when applying the logical operations (step 2 in Fig. 3.52); and (2) we show how to implement an ELSE rule. This rule is nothing more that computing 1 minus the minimum value of the outputs of step 2, and using the result to clip the ZR membership function. It is important to understand that the output of the fuzzy system is the center of gravity of the result of aggregation (step 4 in Fig. 3.52). Thus, for example, if both the first and second rules were to produce a minimum of 0, the “strength” of the ELSE rule would be 1. This would produce the complete ZR membership function in the implication step (step 3 in Fig. 3.52). The other two results would be zero, so the result of aggregation would be the ZR function. The center of gravity of this function is 0, so the output of the system would be v = 0, indicating no correction, as it should be. For the particu- lar values of the d i s shown in Fig. P3.34(c), the aggregated function is biased to the right, so the value of the correction (center of gravity) will be positive. This is as it should be because the differences are all positive, indicating that the value of z 5 is less than the value of its 4-neighbors. 49 v THEN IF d6 is POd4 is PO d8 is PO d2 is PO v is PO ELSE v is ZR v THEN IF d6 is NEd4 is NE d8 is NE d2 is NE v is NE NE L 1--L +10 ZE L 1--L +10L 1--L +10 PO 0 1 (a) (b) IF THEN 1. Fuzzify inputs. 2. Apply fuzzy logical operation(s) (AND = min). 3. Apply implication method (min). 4. Apply aggregation method (max). d2 d4 d6 d8 v is POd6 is POd4 is PO d8 is POd2 is PO ANDANDAND IF THEN v is NEd6 is NEd4 is NE d8 is NEd2 is NE ANDANDAND 5. Defuzzify (center of gravity.)v v l1 l2 lE = min{ , }=1l11 - l21 - ELSE is ZRv z5 + Neighborhood differences Correction from fuzzy system z5Output, Input (c) Figure 3.34 Chapter 4 Problem Solutions Problem 4.1 F(μ)= ∞ −∞ f (t )e −j 2πμt dt = T 0 Ae−j 2πμt dt = −A j 2πμ e −j 2πμt T 0 = −A j 2πμ e −j 2πμt − 1 = A j 2πμ e j πμT − e −j πμT e −j πμT = A πμ sin πμT e −j πμT = AT sin πμT πμT e −j πμT . If we let T = W, the only difference between this result and the result in Example 4.1 is the exponential term. It is a phase term that accounts for the shift in the function. The magnitude of the Fourier transform is the same in both cases, as expected. 51 52 CHAPTER 4. PROBLEM SOLUTIONS Problem 4.2 (a) To prove infinite periodicity in both directions with period 1/ΔT ,wehaveto show that ˜F μ + k[1/ΔT] = ˜F(μ) for k = 0,±1,±2,... . From Eq. (4.3-5), ˜F μ + k[1/ΔT] = 1 T ∞ n=−∞ F μ + k T − n T = 1 T ∞ n=−∞ F μ + k − n T = 1 T ∞ m=−∞ F μ − m T = ˜F μ where the third line follows from the fact that k and n are integers and the limits of summation are symmetric about the origin. The last step follows from Eq. (4.3-5). (b) Again, we need to show that we have to show that ˜F μ + k/ΔT = ˜F(μ) for k = 0,±1,±2,.... FromEq.(4.4-2), ˜F μ + k/ΔT = ∞ n=−∞ f n e −j 2π(μ+k/ΔT)nΔT = ∞ n=−∞ f n e −j 2πμnΔT e −j 2πkn = ∞ n=−∞ f n e −j 2πμnΔT = ˜F μ where the third line follows from the fact that e −j 2πkn = 1becausebothk and n are integers (see Euler’s formula), and the last line follows from Eq. (4.4-2). Problem 4.3 From the definition of the 1-D Fourier transform in Eq. (4.2-16), 53 F μ = ∞ −∞ f (t )e −j 2πμt dt = ∞ −∞ sin(2πnt)e −j 2πμt dt = −j 2 ∞ −∞ e j 2πnt − e −j 2πnt e −j 2πμt dt = −j 2 ∞ −∞ e j 2πnt e −j 2πμt dt − −j 2 ∞ −∞ e −j 2πnt e −j 2πμt dt. From the translation property in Table 4.3 we know that f (t )e j 2πμ0t ⇔ F μ − μ0 and we know from the statement of the problem that the Fourier transform of a constant  f (t )=1  is an impulse. Thus, (1)e j 2πμ0t ⇔ δμ − μ0 . Thus, we see that the leftmost integral in the the last line above is the Fourier transform of (1)e j 2πnt,whichisδμ − n , and similarly, the second integral is the transform of (1)e −j 2πnt,orδμ + n . Combining all results yields F μ = j 2 δμ + n − δμ − n  as desired. Problem 4.4 (a) The period is such that 2πnt = 2π,ort = 1/n. (b) The frequency is 1 divided by the period, or n. The continuous Fourier trans- form of the given sine wave looks as in Fig. P4.4(a) (see Problem 4.3), and the transform of the sampled data (showing a few periods) has the general form il- lustrated in Fig. P4.4(b) (the dashed box is an ideal filter that would allow recon- struction if the sine function were sampled, with the sampling theorem being satisfied). (c) The Nyquist sampling rate is exactly twice the highest frequency, or 2n.That is, (1/ΔT )=2n,orΔT = 1/2n. Taking samples at t = ±ΔT,±2ΔT,... would yield the sampled function sin(2πnΔT ) whose values are all 0s because ΔT = 54 CHAPTER 4. PROBLEM SOLUTIONS n- n m F()m (a) n -n D T1 - nD T1 D T -1 D T1 + n .... .... m F()m (b) n -n D T -1 D T1 - nD T1 D T1 + n .... .... m F()m (c) Figure P4.4 1/2n and n is an integer. In terms of Fig. P4.4(b), we see that when ΔT = 1/2n all the positive and negative impulses would coincide, thus canceling each other and giving a result of 0 for the sampled data. (d) When the sampling rate is less than the Nyquist rate, we can have a situation such as the one illustrated in Fig. P4.4(c), which is the sum of two sine waves in this case. For some values of sampling, the sum of the two sines combine to form a single sine wave and a plot of the samples would appear as in Fig. 4.8 of the book. Other values would result in functions whose samples can describe any shape obtainable by sampling the sum of two sines. 55 Problem 4.5 Starting from Eq. (4.2-20), f (t )+g (t )= ∞ −∞ f (τ)g (t − τ)d τ. The Fourier transform of this expression is ℑ f (t )+g (t ) = ∞ −∞ ⎡ ⎣ ∞ −∞ f (τ)g (t − τ)d τ ⎤ ⎦e −j 2πμt dt = ∞ −∞ f (τ) ⎡ ⎣ ∞ −∞ g (t − τ)e −j 2πμt dt ⎤ ⎦d τ. The term inside the inner brackets is the Fourier transform of g (t − τ).But,we know from the translation property (Table 4.3) that ℑ g (t − τ) =G(μ)e −j 2πμτ so ℑ f (t )+g (t ) = ∞ −∞ f (τ) G(μ)e −j 2πμτ d τ = G(μ) ∞ −∞ f (τ)e −j 2πμτd τ = G(μ)F(μ). This proves that multiplication in the frequency domain is equal to convolution in the spatial domain. The proof that multiplication in the spatial domain is equal to convolution in the spatial domain is done in a similar way. 56 CHAPTER 4. PROBLEM SOLUTIONS Problem 4.6 f (t )=h(t )+ ˜f (t ) = ∞ −∞ h(z) ˜f (t − z)dz = ∞ −∞ sin(πz/ΔT ) (πz/ΔT ) ∞ n=−∞ f (t − z)δ(t − nΔT − z)dz = ∞ n=−∞ ∞ −∞ sin(πz/ΔT ) (πz/ΔT ) f (t − z)δ(t − nΔT − z)dz = ∞ n=−∞ f (nΔT ) sin[π(t − nΔT )/ΔT ] [π(t − nΔT )/ΔT ] = ∞ n=−∞ f (nΔT )sinc[(t − nΔT )/ΔT ]. Problem 4.7 The tent function is obtained by convolving two boxes, and recall that the trans- form of a box is a sinc function. Because, by the convolution theorem, the Fourier transform of the spatial convolution of two functions is the product their transforms, it follows that the Fourier transform of a tent function is a sinc func- tion squared. Problem 4.8 (a) We solve this problem by direct substitution using orthogonality. Substitut- ing Eq. (4.4-5) into (4.4-4) yields Fm = M−1 n=0 ⎡ ⎣ 1 M M−1 r =0 Fr e −j 2πrn/M ⎤ ⎦e −j 2πmn/M = 1 M M−1 r =0 Fr ⎡ ⎣ M−1 n=0 e −j 2πrn/M e −j 2πmn/M ⎤ ⎦ = Fm where the last step follows from the orthogonality condition given in the prob- lem statement. Substituting Eq. (4.4-4) into (4.6-5) and using the same basic procedure yields a similar identity for f n . 57 (b) We solve this problem as above, by direct substitution and using orthogonal- ity. Substituting Eq. (4.4-7) into (4.4-6) yields F(u)= M−1 x=0 ⎡ ⎣ 1 M M−1 r =0 F(u)e −j 2πrx/M ⎤ ⎦e −j 2πux/M = 1 M M−1 r =0 F(r) ⎡ ⎣ M−1 x=0 e −j 2πrx/M e −j 2πux/M ⎤ ⎦ = F(u) where the last step follows from the orthogonality condition given in the prob- lem statement. Substituting Eq. (4.4-6) into (4.6-7) and using the same basic procedure yields a similar identity for f (x). Problem 4.9 To prove infinite periodicity we have to show that F(u + kM)=F(u) for k = 0,±1,±2,... . We do this by direct substitution into Eq. (4.4-6): F(u + kM)= M−1 x=0 f (x)e −j 2π[u +kM]x/M = ⎡ ⎣ M−1 x=0 f (x)e −j 2πux/M ⎤ ⎦e −j 2πkx = F(u) where the last step follows from the fact that e −j 2πkx = 1becausek and x are integers. Note that this holds for positive and negative values of k.Weprovethe validity of Eq. (4.4-9) in a similar way. Problem 4.10 With reference to the statement of the convolution theorem given in Eqs. (4.2- 21) and (4.2-22), we need to show that f (x)+h(x) ⇔ F(u)H(u) and that f (x)h(x) ⇔ F(u)+H(u). 58 CHAPTER 4. PROBLEM SOLUTIONS From Eq. (4.4-10) and the definition of the DFT in Eq. (4.4-6), ℑ f (x)+h(x) = M−1 x=0 ⎡ ⎣ M−1 m=0 f (m)h(x − m) ⎤ ⎦e −j 2πux/M = M−1 m=0 f (m) ⎡ ⎣ M−1 x=0 h(x − m)e −j 2πux/M ⎤ ⎦ = M−1 m=0 f (m)H(u)e −j 2πum/M = H(u) M−1 m=0 f (m)e −j 2πum/M = H(u)F(u). The other half of the discrete convolution theorem is proved in a similar manner. Problem 4.11 With reference to Eq. (4.2-20), f (t ,z)+h(t ,z)= ∞ −∞ ∞ −∞ f (α,β)h(t − α,z − β)d αd β. Problem 4.12 The limit of the imaging system is 1 pixel per square, so each row and column in the resulting image would be of the form . . . 101010101 . . . . The period of this waveform is P = 2 mm, so the max frequency is μ = 1/P = 0.5 cycles/mm. To avoid aliasing we have to sample at a rate that exceeds twice this frequency or 2(0.5)=1sample/mm. That is, the sampling rate would have to exceed 1 sample/mm. So, each square has to correspond to slightly more than one pixel in the imaging system. Problem 4.13 Shrinking can cause aliasing because the effective sampling rate is reduced. This is not the case in zooming, which introduces additional samples. Although no new detail in introduced by zooming, it certainly does not reduce the sampling rate,sozoomingcannotresultinaliasing. 59 Problem 4.14 From Eq. (4.5-7), Recall that in this chapter we use (t ,z ) and (μ,ν) for continuous variables, and (x,y ) and (u ,v) for discrete variables. F(μ,ν)=ℑ f (t ,z) = ∞ −∞ ∞ −∞ f (t ,z)e −j 2π(μt +νz )dtdz. From Eq. (2.6-2), the Fourier transform operation is linear if ℑ a 1 f 1(t ,z)+a 2 f 2(t ,z) = a 1ℑ f 1(t ,z) + a 2ℑ f 2(t ,z) . Substituting into the definition of the Fourier transform yields ℑ a 1 f 1(t ,z)+a 2 f 2(t ,z) = ∞ −∞ ∞ −∞  a 1 f 1(t ,z)+a 2 f 2(t ,z) ×e −j 2π(μt +νz )dtdz = a 1 ∞ −∞ ∞ −∞ f (t ,z)e −j 2π(μt +νz )dtdz + a 2 ∞ −∞ ∞ −∞ f 2(t ,z)e −j 2π(μt +νz )dtdz = a 1ℑ f 1(t ,z) + a 2ℑ f 2(t ,z) . where the second step follows from the distributive property of the integral. Similarly, for the discrete case, ℑ a 1 f 1(x,y )+a 2 f 2(x,y ) = M−1 x=0 N −1 y =0  a 1 f 1(x,y )+a 2 f 2(x,y ) e −j 2π(ux/M+vy/N ) = a 1 M−1 x=0 N −1 y =0 f 1(x,y )e −j 2π(ux/M+vy/N ) + a 2 M−1 x=0 N −1 y =0 f 2(x,y )e −j 2π(ux/M+vy/N ) = a 1ℑ f 1(x,y ) + a 2ℑ f 2(x,y ) . The linearity of the inverse transforms is proved in exactly the same way. Problem 4.15 Select an image of your choice and compute its average value: ¯f (x,y )= 1 MN M−1 x=0 N −1 y =0 f (x,y ). 60 CHAPTER 4. PROBLEM SOLUTIONS Compute the DFT of f (x,y ) and obtain F(0,0).IfF(0,0)=MN ¯f (x,y ),then 1/MN was included in front of the IDFT [see Eqs. (4.5-15), (4.5-16) and (4.6- 21)]. Similarly, if F(0,0)= ¯f (x,y ) the 1/MN term was included in front of the DFT. Finally, if F(0,0)= MNf(x,y ),theterm1/ MN was included in the for- mulation of both the DFT and IDFT. Problem 4.16 (a) From Eq. (4.5-15), ℑ f (x,y )e j 2π(u0x+v0y ) = M−1 x=0 N −1 y =0 f (x,y )e j 2π(u0x+v0y ) e −j 2π(ux/M+vy/N ) = M−1 x=0 N −1 y =0 f (x,y )e −j 2π[(u −u0)x/M+(v−v0)y /N ] = F(u − u0,v − v0). (b) From Eq. (4.5-16), ℑ−1 F(u,v)e −j 2π(ux0+uy0) = 1 MN M−1 u =0 N −1 v=0 F(u,v)e −j 2π(ux0+uy0) ×e j 2π(ux/M+vy/N ) = 1 MN M−1 u =0 N −1 v=0 F(u,v) ×e j 2π[(u (x−x0)/M+v(y −y0)/N )] = f (x −x0,y − y0). Problem 4.17 From Eq. (4.5-7), F μ,ν = ∞ −∞ ∞ −∞ f (t ,z)e −j 2π(μt +νz )dtdz = ∞ −∞ ∞ −∞ sin 2πμ0t + 2πν0z e −j 2π(μt +νz )dtdz. 61 Expressing the sine function in terms of exponentials yields F μ,ν = −j 2 ∞ −∞ ∞ −∞ e j 2π(μ0t +ν0z ) − e −j 2π(μ0t +ν0z ) e −j 2π(μt +νz )dtdz = −j 2 ⎡ ⎣ ∞ −∞ ∞ −∞ e j 2π(μ0t +ν0z )e −j 2π(μt +νz )dtdz ⎤ ⎦ −−j 2 ⎡ ⎣ ∞ −∞ ∞ −∞ e −j 2π(μ0t +ν0z )e −j 2π(μt +νz )dtdz ⎤ ⎦. We recognize the terms inside the brackets as the Fourier transforms of (1)e j 2π(μ0t +ν0z ) and (1)e −j 2π(μ0t +ν0z ). From the problem statement we know that the ℑ[1]=δ(μ,ν),andfromTable 4.3 we know that the exponential just introduces a shift in the origin of the trans- form. Therefore, ℑ (1)e j 2π(μ0t +ν0z ) = δ(μ−μ0,ν−ν0), and similarly for the other transform. Plugging these results in the preceding equation yields, ℑ sin 2πμ0t + 2πν0z  = j 2 δ(μ + μ0) − δ(ν − ν0) . Problem 4.18 We consider the 1-D case first. From Eq. (4.4-6), F(u)= M−1 x=0 f (x)e −j 2πux/M . When f (x)=1andu = 0, F(u)=1. When f (x)=1andu = 0, F(u)= M−1 x=0 e −j 2πux/M . This expression is 0 for any integer value of u in the range [0,N − 1].Thereare various ways of proving this. One of the most intuitive is based on using Euler’s formula to express the exponential term as e −j 2πux/M = cos(2πux/M) − j sin(2πux/M). 62 CHAPTER 4. PROBLEM SOLUTIONS This expression describes a unit vector in the complex plane. The vector is cen- tered at the origin and its direction depends on the value of the argument. For any integer value of u in the range [0,N − 1] the argument ranges over integer values of x in the same range. This means that the vector makes an integer num- ber of revolutions about the origin in equal increments. Thus, for any positive value of cos(2πux/M), there will be a corresponding negative value of this term. This produces a zero sum for the real part of the exponent. Similar comments apply the imaginary part. Therefore, when f (x)=0andu = 0, it follows that F(u)=0. Thus, we have shown that for discrete quantities, ℑ[1]=δ(u)= 1ifu = 0 0ifu = 0. A similar procedure applies in the case of two variables, and we have that ℑ[1]=δ(u,v)= 1ifu = 0andv = 0 0otherwise. Problem 4.19 From Eq. (4.5-15), and using the exponential representation of the sine function, we have F(u,v)= M−1 x=0 N −1 y =0 sin 2πu0x + 2πv0y e −j 2π(ux/M+vy/N ) = −j 2 M−1 x=0 N −1 y =0 e j 2π(u0x+v0y ) − e −j 2π(u0x+v0y ) e −j 2π(ux/M+vy/N ) = −j 2 M−1 x=0 N −1 y =0 e j 2π(Mu0x/M+Nv0y /N )e −j 2π(ux/M+vy/N ) −−j 2 M−1 x=0 N −1 y =0 e −j 2π(Mu0x/M+Nv0y /N )e −j 2π(ux/M+vy/N ) = −j 2 ℑ (1)e j 2π(Mu0x/M+Nv0y /N ) + j 2 ℑ (1)e −j 2π(u0x+v0y ) = j 2 [δ(u + Mu0,v + Nv0) − δ(u − Mu0,v − Nv0)] where the fourth step follows from the discussion in Problem 4.17, and the last line follows from Table 4.3. 63 Problem 4.20 The following are proofs of some of the properties in Table 4.1. Proofs of the other properties are given in Chapter 4. Recall that when we refer to a function as imaginary, its real part is zero. We use the term complex to denote a function whose real and imaginary parts are not zero. We prove only the forward part the Fourier transform pairs. Similar techniques are used to prove the inverse part. (a) Property 2: If f (x,y ) is imaginary, f (x,y ) ⇔ F ∗(−u,−v)=−F(u,v). Proof: Because f (x,y ) is imaginary, we can express it as jg(x,y ),whereg (x,y ) is a real function. Then the proof is as follows: F ∗(−u − v)= ⎡ ⎢⎣ M−1 x=0 N −1 y =0 jg(x,y )e j 2π(ux/M+vy/N ) ⎤ ⎥⎦ ∗ = M−1 x=0 N −1 y =0 −jg(x,y )e −j 2π(ux/M+vy/N ) = − M−1 x=0 N −1 y =0  jg(x,y ) e −j 2π(ux/M+vy/N ) = − M−1 x=0 N −1 y =0 f (x,y )e −j 2π(ux/M+vy/N ) = −F(u,v). (b) Property 4: If f (x,y ) is imaginary, then R(u,v) is odd and I (u,v) is even. Proof: F is complex, so it can be expressed as F(u,v)=real[F(u,v)] + j imag[F(u,v)] = R(u,v)+jI(u,v). Then, −F(u,v)=−R(u,v)− jI(u,v) and F ∗(−u,−v)=R(−u,−v)− jI(−u,−v). But, because f (x,y ) is imaginary, F ∗(−u,−v)=−F(u,v) (see Property 2). It then follows from the previous two equations that R(u,v)=−R(−u,−v) (i.e., R is odd) and I (u,v)=I (−u,−v) (I is even). (c) Property 5: f (−x,−y ) ⇔ F ∗(u,v).Thatis,iff (x,y ) is real and its transform is F(u,v), then the transform of f (−x,−y ) is F ∗(u,v). And conversely. Proof: From Example 4.12, ℑ f (−x,−y ) = M−1 m=0 N −1 n=0 f (m,n)e j 2π(um/M+vn/N ). 64 CHAPTER 4. PROBLEM SOLUTIONS To see what happens when f (x,y ) is real, we write the right side of this equation as M−1 m=0 N −1 n=0 f (m,n)e j 2π(um/M+vn/N ) = ⎡ ⎣ M−1 m=0 N −1 n=0 f ∗(m,n)e −j 2π(um/M+vn/N ) ⎤ ⎦ ∗ = ⎡ ⎣ M−1 m=0 N −1 n=0 f (m,n)e −j 2π(um/M+vn/N ) ⎤ ⎦ ∗ = F ∗(u,v) where the second step follows from the fact that f (x,y ) is real. Thus, we have shown that ℑ f (−x,−y ) = F ∗(u,v). (d) Property 7: When f (x,y ) is complex, f ∗(x,y ) ⇔ F ∗(−u,−v). Proof: ℑ f ∗(x,y ) = M−1 x=0 N −1 y =0 f ∗(x,y )e −j 2π(ux/M+vy/N ) = ⎡ ⎢⎣ M−1 x=0 N −1 y =0 f (x,y )e j 2π(ux/M+vy/N ) ⎤ ⎥⎦ ∗ = F ∗(−u,−v). (e) Property 9: If f (x,y ) is real and odd, then F(u,v) is imaginary and odd, and conversely. Proof: Because f (x,y ) is real, we know that the real part of F(u,v) is even and its imaginary part is odd. If we can show that F is purely imaginary, then we will have completed the proof. F(u,v)= M−1 x=0 N −1 y =0 f (x,y )e −j 2π(ux/M+vy/N ) = M−1 x=0 N −1 y =0  f (x,y ) e −j 2π(ux/M) e −j 2π(vy/N ) = M−1 x=0 N −1 y =0 [odd] even− j odd  even− j odd  65 = M−1 x=0 N −1 y =0 [odd](even)(even) − 2j (even)(odd) − (odd)(odd) = M−1 x=0 N −1 y =0 [(odd)(even)] − 2j M−1 x=0 N −1 y =0 [(even)(even)] − M−1 x=0 N −1 y =0 [(odd)(even)] = imaginary where the last step follows from Eq. (4.6-13). (f) Property 10: If f (x,y ) is imaginary and even, then F(u,v) is imaginary and even, and conversely. Proof: We know that when f (x,y ) is imaginary, then the real part of F(u,v) is odd and its imaginary part is even. If we can show that the real part is 0, then we will have proved this property. Because f (x,y ) is imagi- nary, we can express it as jg(x,y ),whereg is a real function. Then, F(u,v)= M−1 x=0 N −1 y =0 f (x,y )e −j 2π(ux/M+vy/N ) = M−1 x=0 N −1 y =0  jg(x,y ) e −j 2π(ux/M) e −j 2π(vy/N ) = M−1 x=0 N −1 y =0 [j even][even− j odd][even− j odd] = M−1 x=0 N −1 y =0 [j even](even)(even) − 2j (even)(odd) − (odd)(odd) = j M−1 x=0 N −1 y =0 [(even)(even)] + 2 M−1 x=0 N −1 y =0 [(even)(odd)] −j M−1 x=0 N −1 y =0 [(even)(even)] = imaginary where the last step follows from Eq. (4.6-13). (g) Property 11: If f (x,y ) is imaginary and odd, then F(u,v) is real and odd, and conversely. Proof: If f (x,y ) is imaginary, we know that the real part of F(u,v) is odd and its imaginary part is even. If we can show that the imaginary part is 66 CHAPTER 4. PROBLEM SOLUTIONS zero, then we will have the proof for this property. As above, F(u,v)= M−1 x=0 N −1 y =0 [j odd](even)(even) − 2j (even)(odd) − (odd)(odd) = M−1 x=0 N −1 y =0 [j odd][even− j odd][even− j odd] = j M−1 x=0 N −1 y =0 [(odd)(even)] + 2 M−1 x=0 N −1 y =0 [(even)(even)] −j M−1 x=0 N −1 y =0 [(odd)(even)] = real where the last step follows from Eq. (4.6-13). (h) Property 12: If f (x,y ) is complex and even, then F(u,v) is complex and even, and conversely. Proof: Here, we have to prove that both the real and imaginary parts of F(u,v) are even. Recall that if f (x,y ) is an even function, both its real and imaginary parts are even. Thus, we can write this function as f (x,y )= f re(x,y )+jfie(x,y ).Then, F(u,v)= M−1 x=0 N −1 y =0 [f re(x,y )+jfie(x,y )]e −j 2π(ux/M+vy/N ) = M−1 x=0 N −1 y =0 f re(x,y )e −j 2π(ux/M+vy/N ) =+ M−1 x=0 N −1 y =0 jfie(x,y )e −j 2π(ux/M+vy/N ). The first term of this result is recognized as the DFT of a real, even function, which we know is a real, even function. The second term is the DFT of a purely imaginary even function, which we know is imaginary and even. Thus, we see that the the transform of a complex, even function, has an even real part and an even imaginary part, and is thus a complex even function. This concludes the proof. (i) Property 13: If f (x,y ) is complex and odd, then F(u,v) is complex and odd, 67 and conversely. Proof: The proof parallels the proof in (h). F(u,v)= M−1 x=0 N −1 y =0 [f ro(x,y )+jfio(x,y )]e −j 2π(ux/M+vy/N ) = M−1 x=0 N −1 y =0 f ro(x,y )e −j 2π(ux/M+vy/N ) + M−1 x=0 N −1 y =0 jfio(x,y )e −j 2π(ux/M+vy/N ). The first term is the DFT of an odd, real function, which know is imaginary and odd. The second term is the DFT of purely imaginary odd function, which we know is real and odd. Thus, the sum of the two is a complex, odd function, as we wanted to prove. Problem 4.21 Recall that the reason for padding is to establish a “buffer” between the periods that are implicit in the DFT. Imagine the image on the left being duplicated in- finitelymanytimestocoverthexy-plane. The result would be a checkerboard, with each square being in the checkerboard being the image (and the black ex- tensions). Now imagine doing the same thing to the image on the right. The results would be identical. Thus, either form of padding accomplishes the same separation between images, as desired. Problem 4.22 Unless all borders on of an image are black, padding the image with 0s intro- duces significant discontinuities (edges) at one or more borders of the image. These can be strong horizontal and vertical edges. These sharp transitions in the spatial domain introduce high-frequency components along the vertical and horizontal axes of the spectrum. Problem 4.23 (a) The averages of the two images are computed as follows: ¯f (x,y )= 1 MN M−1 x=0 N −1 y =0 f (x,y ) 68 CHAPTER 4. PROBLEM SOLUTIONS and ¯f p (x,y )= 1 PQ P−1 x=0 Q−1 y =0 f p (x,y ) = 1 PQ M−1 x=0 N −1 y =0 f (x,y ) = MN PQ ¯f (x,y ) where the second step is result of the fact that the image is padded with 0s. Thus, the ratio of the average values is r = PQ MN Thus, we see that the ratio increases as a function of PQ, indicating that the average value of the padded image decreases as a function of PQ.Thisisas expected; padding an image with zeros decreases its average value. (b) Yes, they are equal. We know that F(0,0)=MN ¯f (x,y ) and Fp (0,0)=PQ ¯f p (x,y ). And, from part (a), ¯f p (x,y )=MN ¯f (x,y )/PQ.Then, Fp (0,0) PQ = MN PQ F(0,0) MN Fp (0,0)=F(0,0). Problem 4.24 We solve the problem by direct substitution into the definition of the 2-D DFT: F(u + k1M,v + k2N)= M−1 x=0 N −1 y =0 f (x,y )e −j 2π([u +k1M]x/M+[v+k2N ]y /N ) = M−1 x=0 N −1 y =0 f (x,y )e −j 2π(ux/M+vy/N )e −j 2πk1x e −j 2πk2y = F(u,v) with k1 and k2 having values 0,±1,±2,....Thelaststepfollowsfromthefactthat k1x and k2y are integers, which makes the two rightmost exponentials equal to 1. 69 Problem 4.25 (a) From Eq. (4.4-10) and the definition of the 1-D DFT, ℑ f (x)+h(x) = M−1 x=0 f (x)+h(x)e −j 2πux/M = M−1 x=0 M−1 m=0 f (m)h(x − m)e −j 2πux/M = M−1 m=0 f (m) M−1 x=0 h(x − m)e −j 2πux/M but M−1 x=0 h(x − m)e −j 2πux/M = ℑ[h(x − m)] = H(u)e −j 2πmu/M where the last step follows from Eq. (4.6-4). Substituting this result into the previous equation yields ℑ f (x)+h(x) = M−1 m=0 f (m)e −j 2πmu/M H(u) = F(u)H(u). The other part of the convolution theorem is done in a similar manner. (b) As in (a), ℑ f (x,y )+h(x,y ) = M−1 x=0 N −1 y =0 f (x,y )+h(x,y )e −j 2π(ux/M+vy/N ) = M−1 x=0 N −1 y =0 ⎡ ⎣ M−1 m=0 N −1 n=0 f (m,n)h(x − m,y − n) ⎤ ⎦ = ×e −j 2π(ux/M+vy/N ) = M−1 m=0 N −1 n=0 f (m,n) M−1 x=0 N −1 y =0 h(x − m,y − n) ×e −j 2π(ux/M+vy/N ) = M−1 m=0 N −1 n=0 f (m,n)e −j 2π(um/M+vn/N )H(u,v) = F(u,v)H(u,v). 70 CHAPTER 4. PROBLEM SOLUTIONS (c) Correlationisdoneinthesameway,butbecauseofthedifferenceinsignin the argument of h theresultwillbeaconjugate: ℑ f (x,y ),h(x,y ) = M−1 x=0 N −1 y =0 f (x,y ),h(x,y )e −j 2π(ux/M+vy/N ) = M−1 x=0 N −1 y =0 ⎡ ⎣ M−1 m=0 N −1 n=0 f (m,n)h(x + m,y + n) ⎤ ⎦ ×e −j 2π(ux/M+vy/N ) = M−1 m=0 N −1 n=0 f (m,n) M−1 x=0 N −1 y =0 h(x + m,y + n) ×e −j 2π(ux/M+vy/N ) = M−1 m=0 N −1 n=0 f (m,n)e j 2π(um/M+vn/N )H(u,v) = F ∗(u,v)H(u,v). (d) Webeginwithonevariable: ℑ df(z) dz = ∞ −∞ df(z) dz e −j 2πνz dz Integration by parts has the following general form, sdw = sw − wds. Let s = e −j 2πνz and w = f (z).Then,dw/dz = df(z)/dz or dw = df(z) dz dz and ds =(−j 2πν)e −j 2πνz dz so it follows that ℑ df(z) dz = ∞ −∞ df(z) dz e −j 2πνz dz = f (z) e −j 2πνz ∞ ∞ − ∞ −∞ f (z)(−j 2πν)e −j 2πνz dz =(j 2πν) ∞ −∞ f (z)e −j 2πνz dz =(j 2πν)F (ν) 71 because f (±∞)=0 by assumption (see Table 4.3). Consider next the second derivative. Define g (z)=df(z)/dz.Then ℑ dg(z) dz =(j 2πν)G(ν) where G(ν) is the Fourier transform of g (z).Butg (z)=df(z)/dz,soG(ν)= (j 2πν)F(ν),and ℑ d 2 f (z) dz2 =(j 2πν)2F(ν). Continuing in this manner would result in the expression ℑ d n f (z) dzn =(j 2πν)n F(ν). If we now go to 2-D and take the derivative of only one variable, we would get the same result as in the preceding expression, but we have to use partial derivatives to indicate the variable to which differentiation applies and, instead of F(μ),we would have F(μ,ν).Thus, ℑ ∂ n f (t ,z) ∂ z n =(j 2πν)n F(μ,ν). Define g (t ,z)=∂ n f (t ,z)/∂ t n ,then ℑ ∂ m g (t ,z) ∂ t m =(j 2πμ)m G(μ,ν). But G(μ,ν) is the transform of g (t ,z)=∂ n f (t ,z)/∂ t n ,whichweknowisequal to (j 2πμ)n F(μ,ν). Therefore, we have established that ℑ ∂ ∂ t m ∂ ∂ z n f (t ,z) =(j 2πμ)m (j 2πν)n F(μ,ν). Because the Fourier transform is unique, we know that the inverse transform of the right of this equation would give the left, so the equation constitutes a Fourier transform pair (keep in mind that we are dealing with continuous vari- ables). Problem 4.26 (a) From Chapter 3, the Laplacian of a function f (t ,z) of two continuous vari- ables is defined as ∇2 f (t ,z)=∂ 2 f (t ,z) ∂ t 2 + ∂ 2 f (t ,z) ∂ z 2 . 72 CHAPTER 4. PROBLEM SOLUTIONS We obtain the Fourier transform of the Laplacian using the result from Problem 4.25 (entry 12 in Table 4.3): ℑ ∇2 f (t ,z) = ℑ ∂ 2 f (t ,z) ∂ t 2 + ℑ ∂ 2 f (t ,z) ∂ z 2 =(j 2πμ)2F(μ,ν)+(j 2πν)2F(μ,ν) = −4π2(μ2 + ν 2)F(μ,ν). WerecognizethisasthefamiliarfilteringexpressionG(μ,ν)=H(μ,ν)F(μ,ν),in which H(μ,ν)=−4π2(μ2 + ν 2). (b) As the preceding derivation shows, the Laplacian filter applies to continuous variables. We can generate a filter for using with the DFT simply by sampling this function: H(u,v)=−4π2(u 2 + v 2) for u = 0,1,2,...,M − 1andv = 0,1,2,...,N − 1. When working with centered transforms,theLaplacianfilterfunctioninthefrequencydomainisexpressed as H(u,v)=−4π2([u − M/2]2 +[v − N/2]2). In summary, we have the following Fourier transform pair relating the Laplacian in the spatial and frequency domains: ∇2 f (x,y ) ⇔−4π2([u − M/2]2 +[v − N/2]2)F(u,v) where it is understood that the filter is a sampled version of a continuous func- tion. (c) The Laplacian filter is isotropic, so its symmetry is approximated much closer by a Laplacian mask having the additional diagonal terms, which requires a −8 in the center so that its response is 0 in areas of constant intensity. Problem 4.27 (a) The spatial average (excluding the center term) is g (x,y )=1 4  f (x,y + 1)+f (x + 1,y )+f (x − 1,y )+f (x,y − 1) . From property 3 in Table 4.3, G(u,v)=1 4 e j 2πv/N + e j 2πu /M + e −j 2πu /M + e −j 2πv/N F(u,v) = H(u,v)F(u,v) 73 where H(u,v)=1 2 [cos(2πu/M)+cos(2πv/N)] is the filter transfer function in the frequency domain. (b) To see that this is a lowpass filter, it helps to express the preceding equation in the form of our familiar centered functions: H(u,v)=1 2 [cos(2π[u − M/2])/M)+cos(2π[v − N/2]/N)]. Consider one variable for convenience. As u ranges from 0 to M − 1, the value of cos(2π[u − M/2]/M) starts at −1, peaks at 1 when u = M/2 (the center of the filter) and then decreases to −1againwhenu = M. Thus, we see that the amplitude of the filter decreases as a function of distance from the origin of the centered filter, which is the characteristic of a lowpass filter. A similar argument is easily carried out when considering both variables simultaneously. Problem 4.28 (a) As in Problem 4.27, the filtered image is given by: g (x,y )=f (x + 1,y ) − f (x,y )+f (x,y + 1) − f (x,y ). From property 3 in Table 4.3, G(u,v)=F(u,v)e j 2πu /M − F(u,v)+F(u,v)e j 2πv/N − F(u,v) =[e j 2πu /M − 1]F(u,v)+[e j 2πv/N − 1]F(u,v) = H(u,v)F(u,v) where H(u,v) is the filter function: H(u,v)= (e j 2πu /M − 1)+(e j 2πv/N − 1) = 2j sin(πu/M)e j πu /M + sin(πv/N)e j πv/N . (b) To see that this is a highpass filter, it helps to express the filter function in the form of our familiar centered functions: H(u,v)=2j sin(π[u − M/2]/M)e j πu /M + sin(π[v − N/2]/N)e j πv/N . The function is 0 at the center of the filter u = M/2). As u and v increase, the value of the filter decreases, reaching its limiting value of close to −4j when u = M−1andv = M −1. The negative limiting value is due to the order in which the derivatives are taken. If, instead we had taken differences of the form f (x,y ) − f (x +1,y ) and f (x,y )− f (x,y +1), the filter would have tended toward a positive limiting value. The important point here is that the dc term is eliminated and higher frequencies are passed, which is the characteristic of a highpass filter. 74 CHAPTER 4. PROBLEM SOLUTIONS Problem 4.29 The filtered function is given by g (x,y )=[f (x + 1,y )+f (x − 1,y )+f (x,y + 1)+f (x,y − 1)] − 4f (x,y ). As in Problem 4.28, G(u,v)=H(u,v)F(u,v) where H(u,v)= e j 2πu /M + e −j 2πu /M + e j 2πv/N + e −j 2πv/N − 4 = 2[cos(2πu/M)+cos(2πv/N) − 2]. Shifting the filter to the center of the frequency rectangle gives H(u,v)=2[cos(2π[u − M/2]/M)+cos(2π[v − N/2]/N) − 2]. When (u,v)=(M/2,N/2) (the center of the shifted filter), H(u,v)=0. For val- ues away from the center, H(u,v) decreases (as in Problem 4.28) because of the order in which derivatives are taken. The important point is the the dc term is eliminated and the higher frequencies are passed, which is the characteristic of ahighpassfilter. Problem 4.30 The answer is no. The Fourier transform is a linear process, while the square and square roots involved in computing the gradient are nonlinear operations. The Fourier transform could be used to compute the derivatives as differences (as in Problem 4.28), but the squares, square root, or absolute values must be computed directly in the spatial domain. Problem 4.31 We want to show that ℑ−1 Ae−(μ2+ν2)/2σ2 = A2πσ2e −2π2σ2(t 2+z 2). The explanation will be clearer if we start with one variable. We want to show that, if H(μ)=e −μ2/2σ2 75 then h(t )=ℑ−1  H(μ) = ∞ −∞ e −μ2/2σ2 e j 2t μt d μ = 2πσ−2π2σ2t 2 . We can express the integral in the preceding equations as h(t )= ∞ −∞ e − 1 2σ2 [μ2−j 4πσ2μt ]d μ. Making use of the identity e − (2π)2σ2t 2 2 e (2π)2σ2t 2 2 = 1 in the preceding integral yields h(t )=e − (2π)2σ2t 2 2 ∞ −∞ e − 1 2σ2 [μ2−j 4πσ2μt −(2π)2σ4t 2]d μ. = e − (2π)2σ2t 2 2 ∞ −∞ e − 1 2σ2 [μ−j 2πσ2t ]2 d μ. Next, we make the change of variables r = μ − j 2πσ2t .Then,dr = d μ and the preceding integral becomes h(t )=e − (2π)2σ2t 2 2 ∞ −∞ e − r2 2σ2 dr. Finally, we multiply and divide the right side of this equation by 2πσand ob- tain h(t )= 2πσe − (2π)2σ2t 2 2 ⎡ ⎣ 1 2πσ ∞ −∞ e − r2 2σ2 dr ⎤ ⎦. The expression inside the brackets is recognized as the Gaussian probability density function whose value from -∞ to ∞ is 1. Therefore, h(t )= 2πσe −2π2σ2t 2 . With the preceding results as background, we are now ready to show that h(t ,z)=ℑ−1 Ae−(μ2+ν2)/2σ2 = A2πσ2e −2π2σ2(t 2+z 2). 76 CHAPTER 4. PROBLEM SOLUTIONS By substituting directly into the definition of the inverse Fourier transform we have: h(t ,z)= ∞ −∞ ∞ −∞ Ae−(μ2+ν2)/2σ2 e j 2π(μt +νz )d μd ν = ∞ −∞ ⎡ ⎣ ∞ −∞ Ae − μ2 2σ2 +j 2πμt d μ ⎤ ⎦e − ν2 2σ2 +j 2πνz d ν. The integral inside the brackets is recognized from the previous discussion to be equal to A 2πσe −2π2σ2t 2 . Then, the preceding integral becomes h(t ,z)=A 2πσe −2π2σ2t 2 ∞ −∞ e − ν2 2σ2 +j 2πνz d ν. We now recognize the remaining integral to be equal to 2πσe −2π2σ2z 2 ,from which we have the final result: h(t ,z)= A 2πσe −2π2σ2t 2 2πσe −2π2σ2z 2 = A2πσ2e −2π2σ2(t 2+z 2). Problem 4.32 The spatial filter is obtained by taking the inverse Fourier transform of the frequency- domain filter: hHP(t ,z)=ℑ−1  1 − HLP(μ,ν) = ℑ−1 [1] −ℑ−1  HLP(μ,ν) = δ(0) − A2πσ2e −2π2σ2(t 2+z 2). This result is for continuous functions. To use them with discrete variables we simply sample the function into its desired dimensions. Problem 4.33 The complex conjugate simply changes j to −j in the inverse transform, so the imageontherightisgivenby ℑ−1  F ∗(u,v) = M−1 x=0 N −1 y =0 F(u,v)e −j 2π(ux/M + vy/N ) = M−1 x=0 N −1 y =0 F(u,v)e j 2π(u (−x)/M + v(−y )/N ) = f (−x,−y ) 77 which simply mirrors f (x,y ) about the origin, thus producing the image on the right. Problem 4.34 The equally-spaced, vertical bars on the left, lower third of the image. Problem 4.35 With reference to Eq. (4.9-1), all the highpass filters in discussed in Section 4.9 can be expressed a 1 minus the transfer function of lowpass filter (which we know do not have an impulse at the origin). The inverse Fourier transform of 1 gives an impulse at the origin in the highpass spatial filters. Problem 4.36 (a) The ring in fact has a dark center area as a result of the highpass operation only (the following image shows the result of highpass filtering only). However, the dark center area is averaged out by the lowpass filter. The reason the final result looks so bright is that the discontinuity (edge) on boundaries of the ring are much higher than anywhere else in the image, thus dominating the display of the result. (b) Filtering with the Fourier transform is a linear process. The order does not matter. Problem 4.37 (a) One application of the filter gives: G(u,v)=H(u,v)F(u,v) = e −D2(u ,v)/2D2 0 F(u,v). Similarly, K applications of the filter would give GK (u,v)=e −KD2(u ,v)/2D2 0 F(u,v). The inverse DFT of GK (u,v) would give the image resulting from K passes of the Gaussian filter. If K is “large enough,” the Gaussian LPF will become a notch pass filter, passing only F(0,0). We know that this term is equal to the average value of the image. So, there is a value of K after which the result of repeated lowpass filtering will simply produce a constant image. The value of all pixels 78 CHAPTER 4. PROBLEM SOLUTIONS Figure P4.36. on this image will be equal to the average value of the original image. Note that the answer applies even as K approaches infinity. In this case the filter will ap- proach an impulse at the origin, and this would still give us F(0,0) as the result of filtering. (b) To guarantee the result in (a), K has to be chosen large enough so that the filter becomes a notch pass filter (at the origin) for all values of D(u,v). Keeping in mind that increments of frequencies are in unit values, this means HK (u,v)=e −KD2(u ,v)/2D2 0 = 1if(u,v)=(0,0) 0Otherwise. Because u and v are integers, the conditions on the second line in this equation are satisfied for all u > 1and/or v > 1. When u = v = 0, D(u,v)=0, and HK (u,v)=1, as desired. We want all values of the filter to be zero for all values of the distance from the origin that are greater than 0 (i.e., for values of u and/or v greater than 0). However, the filter is a Gaussian function, so its value is always greaterthan0forallfinitevaluesofD(u,v). But, we are dealing with digital numbers, which will be designated as zero whenever the value of the filter is less than one-half the smallest positive number representable in the computer being used. As given in the problem statement, the value of this number is cmin .So, values of K for which for which the filter function is greater than 0.5 × cminwill 79 suffice. That is, we want the minimum value of K for which e −KD2(u ,v)/2D2 0 < 0.5cmin or K > − ln(0.5cmin) D2(u,v)/2D2 0 > −2D2 0 ln(0.5cmin) D2(u,v) . As noted above, we want this equation for hold for all values of D2(u,v) > 0. Because the exponential decreases as a function of increasing distance from the origin, we choose the smallest possible value of D2(u,v), which is 1. This gives the result K > −2D2 0 ln(0.5cmin) which yields a positive number because cmin << 1. This result guarantees that the lowpass filter will act as a notch pass filter, leaving only the value of the trans- form at the origin. The image will not change past this value of K . Problem 4.38 (a) Thekeyforthestudenttobeabletosolvethisproblemistotreatthenumber of applications (denoted by K ) of the highpass filter as 1 minus K applications of the corresponding lowpass filter, so that HK (u,v)=HK (u,v)F (u,v) = 1 − e −KD2(u ,v)/2D2 0 H(u,v) where the Gaussian lowpass filter is from Problem 4.37. Students who start di- rectly with the expression of the Gaussian highpass filter 1 − e −KD2(u ,v)/2D2 0 and attempt to raise it to the K th power will run into a dead end. The solution to the problem parallels the solution of Problem 4.37. Here, however, the filter will approach a notch filter that will take out F(0,0) and thus will produce an image with zero average value (this implies negative pixels). So, there is a value of K after which the result of repeated highpass filtering will simply produce a constant image. (b) The problem here is to determine the value of K for which HK (u,v)=1 − e −KD2(u ,v)/2D2 0 = 0if(u,v)=(0,0) 1otherwise. 80 CHAPTER 4. PROBLEM SOLUTIONS Because u and v are integers, the conditions on the second line in this equation have to be satisfied for all u ≥ 1and/or v ≥ 1. When u = v = 0, D(u,v)=0, and HK (u,v)=0, as desired. We want all values of the filter to be 1 for all values of the distance from the origin that are greater than 0 (i.e., for values of u and/or v greater than 0). For HK (u,v) to become 1, the exponential term has to become 0 for values of u and/or v greater than 0. This is the same requirement as in Problem 4.37, so the solution of that problem applies here as well. Problem 4.39 (a) Express filtering as convolution to reduce all processes to the spatial domain. Then, the filtered image is given by g (x,y )=h(x,y )+f (x,y ) where h is the spatial filter (inverse Fourier transform of the frequency-domain filter) and f is the input image. Histogram processing this result yields g (x,y )=T  g (x,y ) = T  h(x,y )+f (x,y ) , where T denotes the histogram equalization transformation. If we histogram- equalize first, then g (x,y )=T  f (x,y ) and g (x,y )=h(x,y )+T  f (x,y ) . In general, T is a nonlinear function determined by the nature of the pixels in the image from which it is computed. Thus, in general, T  h(x,y )+f (x,y ) = h(x,y )+T  f (x,y ) and the order does matter. (b) As indicated in Section 4.9, highpass filtering severely diminishes the con- trast of an image. Although high-frequency emphasis helps some, the improve- ment is usually not dramatic (see Fig. 4.59). Thus, if an image is histogram equalized first, the gain in contrast improvement will essentially be lost in the fil- tering process. Therefore, the procedure in general is to filter first and histogram- equalize the image after that. Problem 4.40 From Eq. (4.9-3), the transfer function of a Butterworth highpass filter is H(u,v)= 1 1 + D0 D(u ,v) 2n . 81 We want the filter to have a value of γL when D(u,v)=0, and approach γH for high values of D(u,v). The preceding equation is easily modified to accomplish this: H(u,v)=γL + (γH − γL) 1 + D0 D(u ,v) 2n . The value of n controls the sharpness of the transition between γL and γH . Problem 4.41 Because M = 2n , we can write Eqs. (4.11-16) and (4.11-17) as m(n)=1 2Mn and a(n)=Mn. Proof by induction begins by showing that both equations hold for n = 1: m(1)=1 2 (2)(1)=1anda(1)=(2)(1)=2. We know these results to be correct from the discussion in Section 4.11.3. Next, we assume that the equations hold for n. Then, we are required to prove that they also are true for n + 1. From Eq. (4.11-14), m(n + 1)=2m(n)+2n . Substituting m(n) from above, m(n + 1)=2 1 2Mn + 2n = 2 1 22n n + 2n = 2n (n + 1) = 1 2 2n+1 (n + 1). Therefore, Eq. (4.11-16) is valid for all n. From Eq. (4.11-17), a(n + 1)=2a(n)+2n+1. 82 CHAPTER 4. PROBLEM SOLUTIONS Substituting the above expression for a(n) yields a(n + 1)=2Mn+ 2n+1 = 2(2n n)+2n+1 = 2n+1(n + 1) which completes the proof. Problem 4.42 Consider a single star, modeled as an impulse δ(x −x0,y − y0).Then, f (x,y )=K δ(x −x0,y − y0) from which z(x,y )=ln f (x,y )=lnK + lnδ(x −x0,y − y0) = K  + δ(x −x0,y − y0). Taking the Fourier transform of both sides yields ℑ z(x,y ) = ℑ K  + ℑδ(x −x0,y − y0) = δ(0,0)+e −2π(ux0+vy0). From this result, it is evident that the contribution of illumination is an impulse at the origin of the frequency plane. A notch filter that attenuates only this com- ponent will take care of the problem. Extension of this development to multiple impulses (stars) is implemented by considering one star at a time. The form of the filter will be the same. At the end of the procedure, all individual images are combined by addition, followed by intensity scaling so that the relative bright- ness between the stars is preserved. Problem 4.43 Theproblemcanbesolvedbycarryingoutthefollowingsteps: 1. Perform a median filtering operation. 2. Follow (1) by high-frequency emphasis. 3. Histogram-equalize this result. 4. Compute the average gray level, K0. Add the quantity (K −K0) to all pixels. 83 I1 I2 Input r Output Figure P4.43 5. Perform the transformations shown in Fig. P4.43, where r is the input gray level, and R, G,andB are fed into an RGB color monitor. white character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the topwhite character to forcefigure to the top Chapter 5 Problem Solutions Problem 5.1 The solutions are shown in Fig. P5.1, from left to right. Figure P5.1 Problem 5.2 The solutions are shown in the following figure, from left to right. Figure P5.2 85 86 CHAPTER 5. PROBLEM SOLUTIONS Problem 5.3 The solutions are shown in Fig. P5.3, from left to right. Figure P5.3 Problem 5.4 The solutions are shown in Fig. P5.4, from left to right. Figure P5.4 Problem 5.5 The solutions are shown in Fig. P5.5, from left to right. Figure P5.5 87 Problem 5.6 The solutions are shown in Fig. P5.6, from left to right. Figure P5.6 Problem 5.7 The solutions are shown in Fig. P5.7, from left to right. Figure P5.7 Problem 5.8 The solutions are shown in Fig. P5.8, from left to right. Figure P5.8 88 CHAPTER 5. PROBLEM SOLUTIONS Figure P5.9 Problem 5.9 The solutions are shown in Fig. P5.9, from left to right. Problem 5.10 (a) The key to this problem is that the geometric mean is zero whenever any pixel is zero. Draw a profile of an ideal edge with a few points valued 0 and a few points valued 1. The geometric mean will give only values of 0 and 1, whereas the arithmetic mean will give intermediate values (blur). (b) Black is 0, so the geometric mean will return values of 0 as long as at least onepixelinthewindowisblack.Becausethecenterofthemaskcanbeoutside the original black area when this happens, the figure will be thickened. Problem 5.11 The key to understanding the behavior of the contra-harmonic filter is to think of the pixels in the neighborhood surrounding a noise impulse as being con- stant, with the impulse noise point being in the center of the neighborhood. For the noise spike to be visible, its value must be considerably larger than the value of its neighbors. Also keep in mind that the power in the numerator is 1 plus the power in the denominator. (a) By definition, pepper noise is a low value (really 0). It is most visible when surrounded by light values. The center pixel (the pepper noise), will have little influence in the sums. If the area spanned by the filter is approximately con- stant, the ratio will approach the value of the pixels in the neighborhood—thus reducing the effect of the low-value pixel. For example, here are some values of the filter for a dark point of value 1 in a 3× 3 region with pixels of value 100: For Q = 0.5, filter = 98.78; for Q = 1, filter = 99.88, for Q = 2, filter = 99.99; and for Q = 5, filter = 100.00. 89 (b) The reverse happens when the center point is large and its neighbors are small. The center pixel will now be the largest. However, the exponent is now negative, so the small numbers will dominate the result. The numerator can then be thought of a constant raised to the power Q + 1 and the denominator as a the same constant raised to the power Q. That constant is the value of the pixels in the neighborhood. So the ratio is just that value. (c) When the wrong polarity is used, the large numbers in the case of the salt noise will be raised to a positive power, thus the noise will overpower the result. For salt noise the image will become very light. The opposite is true for pepper noise—the image will become dark. (d) When Q = −1, the value of the numerator at any location is equal to the number of pixels in the neighborhood (mn).Thetermsofthesuminthede- nominator are 1 divided by individual pixel values in the neighborhood. For example, for a 3 × 3 enighborhoood, the response of the filter when Q = −1is: 9/[1/p1+1/p2+···1/p9] where the p’s are the pixel values in the neighborhood. Thus, low pixel values will tend to produce low filter responses, and vice versa. If, for example, the filter is centered on a large spike surrounded by zeros, the response will be a low output, thus reducing the effect of the spike. (e) In a constant area, the filter returns the value of the pixels in the area, inde- pendently of the value of Q. Problem 5.12 A bandpass filter is obtained by subtracting the corresponding bandreject filter from 1: HBP(u,v)=1 − HBR(u,v). Then: (a) Ideal bandpass filter: HIBP(u,v)= ⎧ ⎨ ⎩ 0ifD(u,v) < D0 − W 2 1ifD0 − W 2 ≤ D(u,v) ≤ D0 + W 2 . 0 D(u,v) > D0 + W 2 90 CHAPTER 5. PROBLEM SOLUTIONS (b) Butterworth bandpass filter: HBBP(u,v)=1 − 1 1 + D(u ,v)W D2(u ,v)−D2 0 2n = D(u ,v)W D2(u ,v)−D2 0 2n 1 + D(u ,v)W D2(u ,v)−D2 0 2n . (c) Gaussian bandpass filter: HGBP(u,v)=1 − ⎡ ⎢⎣1 − e − 1 2 D2(u,v)−D2 0 D(u,v)W 2 ⎤ ⎥⎦ = e − 1 2 D2(u,v)−D2 0 D(u,v)W 2 Problem 5.13 We use highpass filters to construct notch filters, as indicated in Eq. (4.10.2). The ideal notch reject fitter is given by HINR(u,v)= 3 k=1 Hk (u,v)H−k (u,v) where Hk (u,v)= 0ifDk (u,v) ≤ D0 1ifDk (u,v) > D0 with Dk (u,v)= (u − M/2 − uk )2 +(v − N/2 − vk )2 in which (uk ,vk ) are the centers of the notches. For the Gaussian filter, Hk (u,v)=1 − e −D2 k (u ,v)/2D2 0 and the notch reject filter is given by HGNR(u,v)= 3 k=1 1 − e −D2 k (u ,v)/2D2 0 1 − e −D2 −k (u ,v)/2D2 0 . 91 Problem 5.14 We proceed as follows: F(u,v)= ∞ −∞ f (x,y )e −j 2π(ux+vy)dxdy = ∞ −∞ A sin(u0x + v0y )e −j 2π(ux+vy)dxdy. Using the exponential definition of the sine function, sinθ = 1 2j e j θ − e −j θ gives us F(u,v)=−jA 2 ∞ −∞ e j (u0x +v0y ) − e −j (u0x +v0y ) e −j 2π(ux+vy)dxdy = −jA 2 ⎡ ⎣ ∞ −∞ e j 2π(u0x/2π+v0y /2π)e −j 2π(ux+vy)dxdy ⎤ ⎦ − jA 2 ⎡ ⎣ ∞ −∞ e −j 2π(u0x/2π+v0y /2π)e −j 2π(ux+vy)dxdy ⎤ ⎦. These are the Fourier transforms of the functions 1 × e j 2π(u0x/2π+v0y /2π) and 1 × e −j 2π(u0x/2π+v0y /2π) respectively. The Fourier transform of the 1 gives an impulse at the origin, and the exponentials shift the origin of the impulse, as discussed in Section 4.6.3 and Table 4.3. Thus, F(u,v)=−jA 2 δ u − u0 2π,v − v0 2π − δ u + u0 2π,v + v0 2π . Problem 5.15 From Eq. (5.4-19) σ2 = 1 (2a + 1)(2b + 1)  g (γ) − wη(γ) −  g − wη2 92 CHAPTER 5. PROBLEM SOLUTIONS where “γ” indicates terms affected by the summations. Letting K = 1/(2a + 1)(2b + 1), taking the partial derivative of σ2 with respect to w and setting the result equal to zero gives ∂σ2 ∂ w = K 2  g (γ) − wη(γ) − g + wη−η(γ)+η = 0 = K −g (γ)η(γ)+g (γ)η + wη2(γ) − wη(γ)η + g η(γ) − g η − wηη(γ)+wη2 = −g η + g η + wη2 − wη2 + g η − g η − wη2 + wη2 = −g η + g η + w η2 − η2 = 0 where, for example, we used the fact that 1 (2a + 1)(2b + 1) g (γ)η(γ)=g η. Solving for w gives us w = g η − g η η2 − η2 . Finally, inserting the variables x and y , w(x,y )= g (x,y )η(x,y ) − g (x,y )η(x,y ) η2(x,y ) − η2(x,y ) which agrees with Eq. (5.4-21). Problem 5.16 From Eq. (5.5-13), g (x,y )= ∞ −∞ f (α,β)h(x − α,y − β)d αd β. It is given that f (x,y )=δ(x − a),sof (α,β)=δ(α − a). Then, using the impulse response given in the problem statement, g (x,y )= ∞ −∞ δ(α − a)e − (x−α)2+(y −β)2 d αd β 93 = ∞ −∞ δ(α − a)e −[(x−α)2]e − (y −β)2 d αd β = ∞ −∞ δ(α − a)e −[(x−α)2]d α ∞ −∞ e − (y −β)2 d β = e −[(x−a)2] ∞ −∞ e − (y −β)2 d β where we used the fact that the integral of the impulse is nonzero only when α = a. Next, we note that ∞ −∞ e − (y −β)2 d β = ∞ −∞ e − (β−y )2 d β which is in the form of a constant times a Gaussian density with variance σ2 = 1/2 or standard deviation σ = 1/ 2. In other words, e − (β−y )2 =  2π(1/2) ⎡ ⎣ 1 2π(1/2) e −(1/2) (β−y )2 (1/2) ⎤ ⎦. The integral from minus to plus infinity of the quantity inside the brackets is 1, so g (x,y )= πe −[(x−a)2] which is a blurred version of the original image. Problem 5.17 Following the image coordinate convention in the book, vertical motion is in the x-direction and horizontal motion is in the y -direction. Then, the components of motion are as follows: x0(t )= at T1 0 ≤ t ≤ T1 aT1 < t ≤ T1 + T2 and y0(t )= 00≤ t ≤ T1 b(t −T1) T1 T1 < t ≤ T1 + T2. 94 CHAPTER 5. PROBLEM SOLUTIONS Then, substituting these components of motion into Eq. (5.6-8) yields H(u,v)= T1 0 e −j 2π[uat/T1]dt + T1+T2 T1 e −j 2π[ua+vb(t −T1)/T2)dt = T1 πua sin(πua)e −j πua + e −j 2πua T1+T2 T1 e −j 2πvb(t −T1)/T2 dt = T1 πua sin(πua)e −j πua + e −j 2πua T2 0 e −j 2πvbτ/T2 d τ = T1 πua sin(πua)e −j πua + e −j 2πua T2 πvb sin(πvb)e −j πvb whereinthethirdlinewemadethechangeofvariablesτ = t − T1.Theblurred image is then g (x,y )=ℑ−1 [H(u,v)F (u,v)] where F(u,v) is the Fourier transform of the input image. Problem 5.18 Following the procedure in Section 5.6.3, H(u,v)= T 0 e −j 2πux0(t )dt = T 0 e −j 2πu [(1/2)at2]dt = T 0 e −j πuat2dt = T 0 cos(πuat2) − j sin(πuat2) dt = T 2 2πuaT2 C( πuaT) − jS( πuaT) where C(z)= 2π T z 0 cost 2dt and S(z)= 2 π z 0 sint 2dt. 95 These are Fresnel cosine and sine integrals. They can be found, for example, the Handbook of Mathematical Functions, by Abramowitz, or other similar ref- erence. Problem 5.19 A basic approach for restoring a rotationally blurred image is to convert the im- age from rectangular to polar coordinates. The blur will then appear as one- dimensional uniform motion blur along the θ-axis. Any of the techniques dis- cussed in this chapter for handling uniform blur along one dimension can then be applied to the problem. The image is then converted back to rectangular co- ordinates after restoration. The mathematical solution is simple. For any pixel with rectangular coordinates (x,y ) we generate a corresponding pixel with polar coordinates (r,θ),where r =  x 2 + y 2 and θ = tan−1 y x . A display of the resulting image would show an image that is blurred along the θ- axis and would, in addition, appear distorted due to the coordinate conversion. Because the extent of the rotational blur is known (it is given as π/8radians), we can use the same solution we used for uniform linear motion (Section 5.6.3), with x = θ and y = r to obtain the transfer function. Any of the methods in Sections 5.7 through 5.9 then become applicable. Problem 5.20 Measure the average value of the background. Set all pixels in the image, ex- cept the cross hairs, to that intensity value. Denote the Fourier transform of this image by G(u,v). Because the characteristics of the cross hairs are given with a high degree of accuracy, we can construct an image of the background (of the same size) using the background intensity levels determined previously. We then construct a model of the cross hairs in the correct location (determined from the given image) using the dimensions provided and intensity level of the cross hairs. Denote by F(u,v) the Fourier transform of this new image . The ratio G(u,v)/F (u,v) is an estimate of the blurring function H(u,v). In the likely event of vanishing values in F(u,v), we can construct a radially-limited filter us- ing the method discussed in connection with Fig. 5.27. Because we know F(u,v) and G(u,v), and an estimate of H(u,v), we can refine our estimate of the blur- ring function by substituting G and H in Eq. (5.8-3) and adjusting K to get as 96 CHAPTER 5. PROBLEM SOLUTIONS close as possible to a good result for F(u,v) (the result can be evaluated visually by taking the inverse Fourier transform). The resulting filter in either case can then be used to deblur the image of the heart, if desired. Problem 5.21 The key to solving this problem is to recognize that the given function, h(x,y )=x 2 + y 2 − 2σ2 σ4 e − x2+y 2 2σ2 is the the second derivative (Laplacian) of the function (see Section 3.6.2 regard- ing the Laplacian) s(x,y )=e − x2+y 2 2σ2 . That is, ∇2[s(x,y )] = ∂ 2s(x,y ) ∂ x 2 + ∂ 2s(x,y ) ∂ y 2 = x 2 + y 2 − 2σ2 σ4 e − x2+y 2 2σ2 . (This result is derived in Section 10.2.6). So, it follows that H(u,v)=ℑ h(x,y ) = ℑ ∇2s(x,y ) . But, we know from the statement of Problem 4.26(a) that ℑ ∇2s(x,y ) = −4π2(u 2 + v 2)F(u,v) where F(u,v)=ℑ s(x,y ) = ℑ e − x2+y 2 2σ2 . Therefore, we have reduced the problem to computing the Fourier transform of a Gaussian function. From the basic form of the Gaussian Fourier transform pair given in entry 13 of Table 4.3 (note that (x,y ) and (u,v) in the present problem are the reverse of the entry in the table), we have ℑ e − x2+y 2 2σ2 = 2πσ2e −2π2σ2(u 2+v 2) 97 so we have the final result H(u,v)=−4π2(u 2 + v 2)F(u,v) = −4π2(u 2 + v 2) 2πσ2e −2π2σ2(u 2+v 2) = −8π3σ2(u 2 + v 2)e −2π2σ2(u 2+v 2) as desired. Keep in mind that the preceding derivations are based on assuming continuous variables. A discrete filter is obtained by sampling the continuous function. Problem 5.22 This is a simple plug in problem. Its purpose is to gain familiarity with the vari- ous terms of the Wiener filter. From Eq. (5.8-3), HW(u,v)= 1 H(u,v) |H(u,v)|2 |H(u,v)|2 + K where |H(u,v)|2 = H ∗(u,v)H(u,v) = H 2(u,v) = 64π6σ4(u 2 + v 2)2e −4π2σ2(u 2+v 2). Then, HW(u,v)=− ⎡ ⎣ −8π3σ2(u 2 + v 2)e −2π2σ2(u 2+v 2) 64π6σ4(u 2 + v 2)2e −4π2σ2(u 2+v 2) + K ⎤ ⎦. Problem 5.23 This also is a simple plug in problem, whose purpose is the same as the previous problem. From Eq. (5.9-4) HC (u,v)= H ∗(u,v) |H(u,v)|2 + γ|P(u,v)|2 = − 8π2σ2(u 2 + v 2)e −2πσ2(u 2+v 2) 64π4σ4(u 2 + v 2)2e −4π2σ2(u 2+v 2) + γ|P(u,v)|2 where P(u,v) is the Fourier transform of the Laplacian operator [Eq. (5.9-5)]. This is as far as we can reasonably carry this problem. It is worthwhile pointing out to students that a filter in the frequency domain for the Laplacian operator is discussed in Section 4.9.4 [see Eq. (4.9-5)]. However, substituting that solution for P(u,v) here would only increase the number of terms in the filter and would not help in simplifying the expression. 98 CHAPTER 5. PROBLEM SOLUTIONS Problem 5.24 Because the system is assumed linear and position invariant, it follows that Eq. (5.5-17) holds. Furthermore, we can use superposition and obtain the response of the system first to F(u,v) and then to N(u,v) becauseweknowthattheimage and noise are uncorrelated. The sum of the two individual responses then gives the complete response. First, using only F(u,v), G1(u,v)=H(u,v)F (u,v) and |G1(u,v)|2 = |H(u,v)|2 |F(u,v)|2 . Then, using only N(u,v), G2(u,v)=N(u,v) and, |G2(u,v)|2 = |N(u,v)|2 so that, |G(u,v)|2 = |G1(u,v)|2 + |G2(u,v)|2 = |H(u,v)|2 |F(u,v)|2 + |N(u,v)|2 . Problem 5.25 (a) It is given that ˆF(u,v) 2 = |R(u,v)|2 |G(u,v)|2 . From Problem 5.24 (recall that the image and noise are assumed to be uncorre- lated), ˆF(u,v) 2 = |R(u,v)|2 |H(u,v)|2 |F(u,v)|2 + |N(u,v)|2 . Forcing ˆF(u,v) 2 to equal |F(u,v)|2 gives R(u,v)= |F(u,v)|2 |H(u,v)|2 |F(u,v)|2 + |N(u,v)|2 1/2 . (b) ˆF(u,v)=R(u,v)G(u,v) = |F(u,v)|2 |H(u,v)|2 |F(u,v)|2 + |N(u,v)|2 1/2 G(u,v) = ⎡ ⎢⎣ 1 |H(u,v)|2 + |N (u ,v)|2 |F(u ,v)|2 ⎤ ⎥⎦ 1/2 G(u,v) 99 and, because |F(u,v)|2 =S f (u,v) and |N(u,v)|2 =Sη(u,v), ˆF(u,v)= ⎡ ⎢⎣ 1 |H(u,v)|2 + Sη(u ,v) S f (u ,v) ⎤ ⎥⎦ 1/2 G(u,v). Problem 5.26 One possible solution: (1) Perform image averaging to reduce noise. (2) Obtain a blurred image of a bright, single star to simulate an impulse (the star should be as small as possible in the field of view of the telescope to simulate an impulse as closely as possible. (3) The Fourier transform of this image will give H(u,v). (4) Use a Wiener filter and vary K until the sharpest image possible is obtained. Problem 5.27 The basic idea behind this problem is to use the camera and representative coins to model the degradation process and then utilize the results in an inverse filter operation. The principal steps are as follows: 1. Select coins as close as possible in size and content as the lost coins. Select a background that approximates the texture and brightness of the photos of the lost coins. 2. Set up the museum photographic camera in a geometry as close as possi- bletogiveimagesthatresembletheimagesofthelostcoins(thisincludes paying attention to illumination). Obtain a few test photos. To simplify experimentation, obtain a TV camera capable of giving images that re- semble the test photos. This can be done by connecting the camera to an image processing system and generating digital images, which will be used in the experiment. 3. Obtain sets of images of each coin with different lens settings. The re- sulting images should approximate the aspect angle, size (in relation to the area occupied by the background), and blur of the photos of the lost coins. 4. The lens setting for each image in (3) is a model of the blurring process for the corresponding image of a lost coin. For each such setting, remove the coin and background and replace them with a small, bright dot on a uniform background, or other mechanism to approximate an impulse of 100 CHAPTER 5. PROBLEM SOLUTIONS (a) (b) (c) Figure P5.28 light. Digitize the impulse. Its Fourier transform is the transfer function of the blurring process. 5. Digitize each (blurred) photo of a lost coin, and obtain its Fourier trans- form. At this point, we have H(u,v) and G(u,v) for each coin. 6. Obtain an approximation to F(u,v) by using a Wiener filter. Equation (5.8-3) is particularly attractive because it gives an additional degree of freedom (K ) for experimenting. 7. The inverse Fourier transform of each approximation ˆF(u,v) gives the re- stored image for a coin. In general, several experimental passes of these basic steps with various different settings and parameters are required to obtain acceptable results in a problem such as this. Problem 5.28 The solutions are shown in the following figures. In each figure the horizontal axis is ρ and the vertical axis is θ,withθ = 0◦ at the bottom and going up to 180◦. In (b) the fat lobes occur at 45◦ and the single point of intersection is at 135◦. The intensity at that point is double the intensity of all other points. Problem 5.29 Because f (x,y ) is rotationally symmetric, its projections are the same for all an- gles, so all we have to do is obtain the projection for θ = 0◦. Equation (5.11-3) 101 then becomes ℜ f (x,y ) = g (ρ,θ)= ∞ −∞ ∞ −∞ f (x,y )δ(x − ρ)dxdy = ∞ −∞ f (ρ,y )dy = A ∞ −∞ e (−ρ2−y 2)dy = Ae−ρ2 ∞ −∞ e −y 2 dy. Because 1 2πσ2 ∞ −∞ e −z 2/2σ2 dz = 1 it follows by letting σ2 = 1/2inthisequationthat, 1 π ∞ −∞ e −z 2dz = 1. Then, ∞ −∞ e −y 2 dy = π and g (ρ,θ)=A πe −ρ2 . Problem 5.30 (a) From Eq. (5.11-3), ℜ f (x,y ) = g (ρ,θ)= ∞ −∞ ∞ −∞ f (x,y )δ(x cos θ + y sinθ − ρ)dxdy = ∞ −∞ ∞ −∞ δ(x,y )δ(x cos θ + y sinθ − ρ)dxdy = ∞ −∞ ∞ −∞ 1 × δ(0 − ρ)dxdy = 1ifρ = 0 0otherwise. 102 CHAPTER 5. PROBLEM SOLUTIONS where the third step follows from the fact that δ(x,y ) is zero if x and/or y are not zero. (b) Similarly, substituting into Eq. (5.11-3), ℜ f (x,y ) = g (ρ,θ)= ∞ −∞ ∞ −∞ f (x,y )δ(x cos θ + y sinθ − ρ)dxdy = ∞ −∞ ∞ −∞ δ(x −x0,y − y0)δ(x cosθ + y sinθ − ρ)dxdy = ∞ −∞ ∞ −∞ 1 × δ(x0 cosθ + y0 sinθ − ρ)dxdy. From the definition of the impulse, this result is 0 unless ρ = x0 cosθ + y0 sinθ which is the equation of a sinusoidal curve in the ρθ-plane. Problem 5.31 (a) From Section 2.6, we know that an operator, O,islinearifO(af1 + bf2)= aO(f 1)+bO(f 2). From the definition of the Radon transform in Eq. (5.11-3), O(af1 +bf2)= ∞ −∞ ∞ −∞ (af1 +bf2)δ(x cosθ + y sinθ − ρ)dxdy = a ∞ −∞ ∞ −∞ f 1δ(x cosθ + y sinθ − ρ)dxdy +b ∞ −∞ ∞ −∞ f 2δ(x cosθ + y sinθ − ρ)dxdy = aO(f 1)+bO(f 2) thus showing that the Radon transform is a linear operation. (b) Let p = x −x0 and q = y −y0.Thendp = dx and dq = dy. From Eq. (5.11-3), the Radon transform of f (x −x0,y − y0) is g (ρ,θ)= ∞ −∞ ∞ −∞ f (x −x0,y − y0)δ(x cosθ + y sinθ − ρ)dxdy = ∞ −∞ ∞ −∞ f (p,q)δ(p +x0)cosθ +(q + y0)sinθ − ρ) dpdq = ∞ −∞ ∞ −∞ f (p,q)δ p cosθ +q sinθ − (ρ −x0 cosθ − y0 sinθ) dpdq = g (ρ −x0 cosθ − y0 sinθ,θ). 103 (c) From Chapter 4 (Problem 4.11), we know that the convolution of two func- tion f and h is defined as c(x,y )=f (x,y )+h(x,y ) = ∞ −∞ ∞ −∞ f (α,β)h(x − α,y − β)d αd β. We want to show that ℜ{c} = ℜ f + ℜ{h},where ℜ denotes the Radon trans- form. We do this by substituting the convolution expression into Eq. (5.11-3). That is, ℜ{c} = ∞ −∞ ∞ −∞ ⎡ ⎣ ∞ −∞ ∞ −∞ f (α,β)h(x − α,y − β)d αd β ⎤ ⎦ ×δ(x cosθ + y sinθ − ρ)dxdy = α β f (α,β) × ⎡ ⎣ x y h(x − α,y − β)δ(x cosθ + y sinθ − ρ)dxdy ⎤ ⎦d αd β where we used the subscripts in the integrals for clarity between the integrals and their variables. All integrals are understood to be between −∞ and ∞.Work- ing with the integrals inside the brackets with x  = x − α and y  = y − β we have x y h(x − α,y − β)δ(x cosθ + y sinθ − ρ)dxdy = x  y  h(x ,y )δ(x  cosθ + y  sinθ − [ρ − αcosθ − β sinθ])dxdy = ℜ{h}(ρ − αcosθ − β sinθ,θ). We recognize the second integral as the Radon transform of h,butinsteadof being with respect to ρ and θ, it is a function of ρ − αcosθ − β sinθ and θ. The notation in the last line is used to indicate “the Radon transform of h as a function of ρ − αcosθ − β sinθ and θ.” Then, ℜ{c} = α β f (α,β) × ⎡ ⎣ x y h(x − α,y − β)δ(x cosθ + y sinθ − ρ)dxdy ⎤ ⎦d αd β = α β f (α,β)ℜ{h}(ρ − ρ,θ)d αd β 104 CHAPTER 5. PROBLEM SOLUTIONS where ρ = αcosθ + β sinθ. Then, based on the properties of the impulse, we can write ℜ{h}(ρ − ρ,θ)= ρ ℜ{h}(ρ − ρ,θ)δ(αcos θ + β sinθ − ρ)d ρ. Then, ℜ{c} = α β f (α,β)ℜ{h}(ρ − ρ,θ) d αd β = α β f (α,β) × ⎡ ⎣ ρ ℜ{h}(ρ − ρ,θ)δ(αcos θ + β sinθ − ρ)d ρ ⎤ ⎦d αd β = ρ ℜ{h}(ρ − ρ,θ) ⎡ ⎣ α β f (α,β)δ(αcos θ + β sinθ − ρ)d αd β ⎤ ⎦d ρ = ρ ℜ{h}(ρ − ρ,θ)ℜ f (ρ,θ)d ρ = ℜ f + ℜ{h} where the fourth step follows from the definition of the Radon transform and the fifth step follows from the definition of convolution. This completes the proof. Problem 5.32 Thesolutionisasfollows: f (x,y )= 2π 0 ∞ 0 G(ω,θ)e j 2πω(x cosθ+y sinθ)ωd ωd θ = π 0 ∞ 0 G(ω,θ)e j 2πω(x cosθ+y sinθ)ωd ωd θ + π 0 ∞ 0 G(ω,θ)e j 2πω(x cos[θ+180◦]+y sin[θ+180◦)ωd ωd θ. But G(θ + 180◦,θ)=G(−ω,θ), so the preceding equation can be expressed as f (x,y )= π 0 ∞ −∞ |ω|G(ω,θ)e j 2πω(x cosθ+y sinθ)d ωd θ which agrees with Eq. (5.11-15). 105 Problem 5.33 The argument of function s in Eq.(5.11-24) may be written as: r cos(β + α − ϕ) − D sinα = r cos(β − ϕ)cosα − [r sin(β − ϕ)+D]sinα. From Fig. 5.47, R cosα = R + r sin(β − ϕ) R sinα = r cos(β − ϕ). Then, substituting in the earlier expression, r cos(β + α − ϕ) − R sinα = R sinα cosα − R cosα sinα = R(sinα cosα − cosα sinα) = R sin(α − α) which agrees with Eq. (5.11-25). Problem 5.34 From the explanation of Eq. (5.11-18), s(ρ)= ∞ −∞ |ω|e j 2πωρd ω. Let ρ = R sinα, and keep in mind that α/R sinα is always positive. Then, s(R sinα)= ∞ −∞ |ω|e j 2πωR sinαd ω. Next, define the transformation ω = ωR sinα α . Then, d ω = α R sinαd ω and we can write s(R sinα)= α R sinα 2 ∞ −∞ ωe j 2πωαd ω = α R sinα 2 s(α) as desired. Chapter 6 Problem Solutions Problem 6.1 From Fig. 6.5 in the book, x = 0.43 and y = 0.4. Since x +y +z = 1, it follows that z = 0.17. These are the trichromatic coefficients. We are interested in tristimulus values X, Y ,andZ, which are related to the trichromatic coefficients by Eqs. (6.1-1) through (6.1-3). Note however, that all the tristimulus coefficients are divided by the same constant, so their percentages relative to the trichromatic coefficients are the same as those of the coefficients. Therefore, the answer is X = 0.43, Y = 0.40, and Z = 0.17. Problem 6.2 Denote by c the given color, and let its coordinates be denoted by (x0,y0).The distance between c and c1 is d (c,c1)= (x0 −x1)2 + y0 − y1 2 1/2 . Similarly the distance between c1 and c2 d (c1,c2)= (x1 −x2)2 + y1 − y2 2 1/2 . The percentage p1 of c1 in c is p1 = d (c1,c2) − d (c,c1) d (c1,c2) × 100. The percentage p2 of c2 is simply p2 = 100 − p1. In the preceding equation we see, for example, that when c = c1,thend (c,c1)=0 and it follows that p1 = 100% and p2 = 0%. Similarly, when d (c,c1)=d (c1,c2), it follows that p1 = 0% and p2 = 100%. Values in between are easily seen to follow from these simple relations. 107 108 CHAPTER 6. PROBLEM SOLUTIONS Figure P6.3 Problem 6.3 Consider Fig. P6.3, in which c1, c2,andc3 are the given vertices of the color triangle and c is an arbitrary color point contained within the triangle or on its boundary. The key to solving this problem is to realize that any color on the border of the triangle is made up of proportions from the two vertices defining the line segment that contains the point. The contribution to a point on the line by the color vertex opposite this line is 0% . The line segment connecting points c3 and c is shown extended (dashed seg- ment) until it intersects the line segment connecting c1 and c2.Thepointofin- tersection is denoted c0. Because we have the values of c1 and c2,ifweknew c0, we could compute the percentages of c1 and c2 contained in c0 by using the method described in Problem 6.2. Let the ratio of the content of c1 and c2 in c0 be denoted by R12. If we now add color c3 to c0, we know from Problem 6.2 that the point will start to move toward c3 along the line shown. For any position of a point along this line we could determine the percentage of c3 and c0,again,by using the method described in Problem 6.2. What is important to keep in mind that the ratio R12 will remain the same for any point along the segment connect- ing c3 and c0. The color of the points along this line is different for each position, but the ratio of c1 to c2 will remain constant. So, if we can obtain c0, we can then determine the ratio R12, and the percent- age of c3, in color c.Thepointc0 is not difficult to obtain. Let y = a 12x +b12 be 109 the straight line containing points c1 and c2,andy = a 3c x +b3c the line contain- ing c3 and c. The intersection of these two lines gives the coordinates of c0.The lines can be determined uniquely because we know the coordinates of the two point pairs needed to determine the line coefficients. Solving for the intersec- tion in terms of these coordinates is straightforward, but tedious. Our interest here is in the fundamental method, not the mechanics of manipulating simple equationssowedonotgivethedetails. At this juncture we have the percentage of c3 and the ratio between c1 and c2. Let the percentages of these three colors composing c be denoted by p1, p2, and p3 respectively. We know that p1 + p2 = 100 − p3, and that p1/p2 = R12, so we can solve for p1 and p2. Finally, note that this problem could have been solved the same way by intersecting one of the other two sides of the triangle. Going to another side would be necessary, for example, if the line we used in the preceding discussion had an infinite slope. A simple test to determine if the color of c is equal to any of the vertices should be the first step in the procedure; in this case no additional calculations would be required. Problem 6.4 Use color filters that are sharply tuned to the wavelengths of the colors of the three objects. With a specific filter in place, only the objects whose color cor- responds to that wavelength will produce a significant response on the mono- chrome camera. A motorized filter wheel can be used to control filter position from a computer. If one of the colors is white, then the response of the three filters will be approximately equal and high. If one of the colors is black, the response of the three filters will be approximately equal and low. Problem 6.5 At the center point we have 1 2R + 1 2 B +G = 1 2 (R +G + B)+1 2G = midgray + 1 2G which looks to a viewer like pure green with a boost in intensity due to the addi- tive gray component. 110 CHAPTER 6. PROBLEM SOLUTIONS Table P6.6 Color RGBMono R Mono G Mono B Black 0 0 0 0 0 0 Red 1 0 0 255 0 0 Yellow 1 1 0 255 255 0 Green 0 1 0 0 255 0 Cyan 0 1 1 0 255 255 Blue 0 0 1 0 0 255 Magenta 1 0 1 255 0 255 White 1 1 1 255 255 255 Gray 0.5 0.5 0.5 128 128 128 Problem 6.6 For the image given, the maximum intensity and saturation requirement means that the RGB component values are 0 or 1. We can create Table P6.6 with 0 and 255 representing black and white, respectively. Thus, we get the monochrome displays shown in Fig. P6.6. Problem 6.7 There are 28 = 256 possible values in each 8-bit image. For a color to be gray, all RGB components have to be equal, so there are 256 shades of gray. Problem 6.8 (a) All pixel values in the Red image are 255. In the Green image, the first column is all 0’s; the second column all 1’s; and so on until the last column, which is Figure P6.6 111 composed of all 255’s. In the Blue image, the first row is all 255’s; the second row all 254’s, and so on until the last row which is composed of all 0’s. (b) Let the axis numbering be the same as in Fig. 6.7 in the book. Then: (0,0,0)= white, (1,1,1,)=black, (1,0,0)=cyan, (1,1,0)=blue, (1,0,1)=green, (0,1,1)= red, (0,0,1)=yellow, (0,1,0)=magenta. (c) The ones that do not contain the black or white point are fully saturated. The others decrease in saturation from the corners toward the black or white point. Table P6.9 Color RGBCMYMono C Mono M Mono Y Black 0 0 0 1 1 1 255 255 255 Red 1 0 0 0 1 1 0 255 255 Yellow110001 0 0 255 Green 0 1 0 1 0 1 255 0 255 Cyan011100 2550 0 Blue 0 0 1 1 1 0 255 255 0 Magenta101010 0 2550 White111000 0 0 0 Gray 0.5 0.5 0.5 0.5 0.5 0.5 128 128 128 Problem 6.9 (a) For the image given, the maximum intensity and saturation requirement means that the RGB component values are 0 or 1. We can create Table P6.9 using Eq. (6.2-1). Thus, we get the monochrome displays shown in Fig. P6.9(a). (b) The resulting display is the complement of the starting RGB image. From left to right, the color bars are (in accordance with Fig. 6.32) white, cyan, blue, ma- genta, red, yellow, green, and black. The middle gray background is unchanged. Figure P6.9 112 CHAPTER 6. PROBLEM SOLUTIONS Problem 6.10 Equation (6.2-1) reveals that each component of the CMY image is a function of a single component of the corresponding RGB image—C is a function of R, M of G,andY of B. For clarity, we will use a prime to denote the CMY components. From Eq. (6.5-6), we know that si = kri for i = 1,2,3 (for the R, G,andB components). And from Eq. (6.2-1), we know that the CMY components corresponding to the ri and si (which we are denoting with primes) are r  i = 1 − ri and s  i = 1 − si . Thus, ri = 1 − r  i and s  i = 1 − si = 1 − kri = 1 − k 1 − r  i so that s  i = kr i +(1 − k). Problem 6.11 (a) The purest green is 00FF00, which corresponds to cell (7, 18). (b) The purest blue is 0000FF, which corresponds to cell (12, 13). Problem 6.12 Using Eqs. (6.2-2) through (6.2-4), we get the results shown in Table P6.12. Note that, in accordance with Eq. (6.2-2), hue is undefined when R =G = B since θ = cos−1 (0/0). In addition, saturation is undefined when R = G = B = 0sinceEq. (6.2-3) yields S = 1 − 3min(0)/(3 × 0)=1 − (0/0). Thus, we get the monochrome display shown in Fig. P6.12. 113 Table P6.12 Color RGB HS I Mono H Mono S Mono I Black 0 0 0 – 0 0 – – 0 Red 1 0 0 0 1 0.33 0 255 85 Yellow 1 1 0 0.17 1 0.67 43 255 170 Green 0 1 0 0.33 1 0.33 85 255 85 Cyan 0 1 1 0.5 1 0.67 128 255 170 Blue 0 0 1 0.67 1 0.33 170 255 85 Magenta 1 0 1 0.83 1 0.67 213 255 170 White 1 1 1 – 0 1 – 0 255 Gray 0.5 0.5 0.5 – 0 0.5 – 0 128 Problem 6.13 With reference to the HSI color circle in Fig. 6.14(b), deep purple is found at approximately 270◦. To generate a color rectangle with the properties required in the problem statement, we choose a fixed intensity I , and maximum saturation (these are spectrum colors, which are supposed to be fully saturated), S.The first column in the rectangle uses these two values and a hue of 270◦.Thenext column (and all subsequent columns) would use the same values of I and S,but the hue would be decreased to 269◦, and so on all the way down to a hue of 0◦, which corresponds to red. If the image is limited to 8 bits, then we can only have 256 variations in hue in the range from 270◦ down to 0◦,whichwillrequirea different uniform spacing than one degree increments or, alternatively, starting at a 255◦ and proceed in increments of 1, but this would leave out most of the purple. If we have more than eight bits, then the increments can be smaller. Longer strips also can be made by duplicating column values. Figure P6.12 114 CHAPTER 6. PROBLEM SOLUTIONS Problem 6.14 There are two important aspects to this problem. One is to approach it in the HSI space and the other is to use polar coordinates to create a hue image whose values grow as a function of angle. The center of the image is the middle of what- ever image area is used. Then, for example, the values of the hue image along a radius when the angle is 0◦ would be all 0’s. Then the angle is incremented by, say, one degree, and all the values along that radius would be 1’s, and so on. Values of the saturation image decrease linearly in all radial directions from the origin. The intensity image is just a specified constant. With these basics in mind it is not difficult to write a program that generates the desired result. Problem 6.15 The hue, saturation, and intensity images are shown in Fig. P6.15, from left to right. Problem 6.16 (a) It is given that the colors in Fig. 6.16(a) are primary spectrum colors. It also is given that the gray-level images in the problem statement are 8-bit images. The latter condition means that hue (angle) can only be divided into a maximum number of 256 values. Because hue values are represented in the interval from 0◦ to 360◦ this means that for an 8-bit image the increments between contiguous hue values are now 360/255. Another way of looking at this is that the entire [0, 360] hue scale is compressed to the range [0, 255]. Thus, for example, yellow (the first primary color we encounter), which is 60◦ now becomes 43 (the closest integer) in the integer scale of the 8-bit image shown in the problem statement. Similarly, green, which is 120◦ becomes 85 in this image. From this we easily compute the values of the other two regions as being 170 and 213. The region in Figure P6.15 115 themiddleispurewhite[equal proportions of red green and blue in Fig. 6.61(a)] so its hue by definition is 0. This also is true of the black background. (b) The colors are spectrum colors, so they are fully saturated. Therefore, the values 255 shown apply‘ to all circle regions. The region in the center of the color image is white, so its saturation is 0. (c) The key to getting the values in this figure is to realize that the center por- tion of the color image is white, which means equal intensities of fully saturated red, green, and blue. Therefore, the value of both darker gray regions in the intensity image have value 85 (i.e., the same value as the other corresponding region). Similarly, equal proportions of the secondaries yellow, cyan, and ma- genta produce white, so the two lighter gray regions have the same value (170) as the region shown in the figure. The center of the image is white, so its value is 255. Problem 6.17 (a) Because the infrared image which was used in place of the red component image has very high gray-level values. (b) The water appears as solid black (0) in the near infrared image [Fig. 6.27(d)]. Threshold the image with a threshold value slightly larger than 0. The result is shown in Fig. P6.17. It is clear that coloring all the black points in the desired shade of blue presents no difficulties. (c) Note that the predominant color of natural terrain is in various shades of red. We already know how to take out the water from (b). Therefore, a method that actually removes the “background” of red and black would leave predominantly the other man-made structures, which appear mostly in a bluish light color. Re- moval of the red [and the black if you do not want to use the method as in (b)] can be done by using the technique discussed in Section 6.7.2. Problem 6.18 Using Eq. (6.2-3), we see that the basic problem is that many different colors have the same saturation value. This was demonstrated in Problem 6.12, where pure red, yellow, green, cyan, blue, and magenta all had a saturation of 1. That is, as long as any one of the RGB components is 0, Eq. (6.2-3) yields a saturation of 1. Consider RGB colors (1,0,0) and (0,0.59,0), which represent shades of red and green. The HSI triplets for these colors [per Eq. (6.4-2) through (6.4-4)] are (0,1,0.33) and (0.33,1,0.2), respectively. Now, the complements of the begin- 116 CHAPTER 6. PROBLEM SOLUTIONS Figure P6.17 ning RGB values (see Section 6.5.2) are (0,1,1) and (1,0.41,1), respectively; the corresponding colors are cyan and magenta. Their HSI values [per Eqs. (6.4-2) through (6.4-4)] are (0.5,1,0.66) and (0.83,0.48,0.8), respectively. Thus, for the red, a starting saturation of 1 yielded the cyan “complemented” saturation of 1, while for the green, a starting saturation of 1 yielded the magenta “comple- mented” saturation of 0.48. That is, the same starting saturation resulted in two different “complemented” saturations. Saturation alone is not enough informa- tion to compute the saturation of the complemented color. Problem 6.19 The complement of a color is the color opposite it on the color circle of Fig. 6.32. The hue component is the angle from red in a counterclockwise direction normalized by 360 degrees. For a color on the top half of the circle (i.e., 0 ≤ H ≤ 0.5), the hue of the complementary color is H + 0.5. For a color on the bottom half of the circle (i.e., for 0.5 ≤ H ≤ 1), the hue of the complement is H − 0.5. Problem 6.20 The RGB transformations for a complement [from Fig. 6.33(b)] are: si = 1 − ri where i = 1,2,3 (for the R, G,andB components). But from the definition of the CMY space in Eq. (6.2-1), we know that the CMY components corresponding to ri and si , which we will denote using primes, are r  i = 1 − ri s  i = 1 − si . 117 Figure P6.21 Thus, ri = 1 − r  i and s  i = 1 − si = 1 − (1 − ri )=1 − 1 − 1 − r  i so that s  = 1 − r  i . Problem 6.21 The RGB transformation should darken the highlights and lighten the shadow areas, effectively compressing all values toward the midtones. The red, green, and blue components should be transformed with the same mapping function so that the colors do not change. The general shape of the curve would be as shown in Fig. P6.21. Problem 6.22 Based on the discussion is Section 6.5.4 and with reference to the color wheel in Fig. 6.32, we can decrease the proportion of yellow by (1) decreasing yellow, (2) increasing blue, (3) increasing cyan and magenta, or (4) decreasing red and green. Problem 6.23 The L∗a ∗b ∗ components are computed using Eqs. (6.5-9) through (6.5-12). Ref- erence white is R =G = B = 1. The computations are best done in a spreadsheet, as shown in Table P6.23. 118 CHAPTER 6. PROBLEM SOLUTIONS Problem 6.24 The simplest approach conceptually is to transform every input image to the HSI color space, perform histogram specification per the discussion in Section 3.3.2 on the intensity (I ) component only (leaving H and S alone), and convert the resulting intensity component with the original hue and saturation components back to the starting color space. Problem 6.25 The given color image is shown in Fig. P6.25(a). Assume that the component image values of the HSI image are in the range [0,1]. Call the component images H (hue), S (saturation), and I (intensity). (a) It is given that the image is fully saturated, so image H will be constant with value 1. Similarly, all the squares are at their maximum value so, from Eq. (6.2- 4), the intensity image also will be constant, with value 1/3 [the maximum value of any (normalized) pixel in the RGB image is 1, and it is given that none of the squares overlap]. The hue component image, H, is shown in Fig. P6.25(b). Recall from Fig. 6.14 that the value of hue is an angle. Because the range of values of H is normalized to [0, 1], we see from that figure, for example, that as we go around the circle in the counterclockwise direction a hue value of 0 corresponds to red, avalueof1/3 to green, and a value of 2/3 to blue. Thus, the important point to be made in Fig. P6.25(b) is that the gray value in the image corresponding to the red square should be black, the value corresponding to the green green square should be lower-mid gray, and the value of the the square corresponding to blue should be a lighter shade of gray than the square corresponding to green. As you can see, this indeed is the case for the squares in Fig. P6.25(b). For the shades of red, green, and blue in Fig. P6.25(a), the exact values are H = 0, H = 0.33, and H = 0.67, respectively. 119 Figure P6.25 (b) The saturation image is constant, so smoothing it will produce the same con- stant value . (c) Figure P6.25(c) shows the result of blurring the hue image. When the aver- aging mask is fully contained in a square, there is no blurring because the value of each square is constant. When the mask contains portions of two or more squares the value produced at the center of the mask will be between the values of the two squares, and will depend the relative proportions of the squares oc- cupied by the mask. To see exactly what the values are, consider a point in the center of red mask in Fig. P6.25(c) and a point in the center of the green mask on the top left. We know from (a) above that the value of the red point is 0 and the value of the green point is 0.33. Thus, the values in the blurred band be- tween red and green vary from 0 to 0.33 because averaging is a linear operation. Figure P6.25(d) shows the result of generating an RGB image with the blurred 120 CHAPTER 6. PROBLEM SOLUTIONS hue component images and the original saturation and intensity images. The values along the line just discussed are transitions from green to red. From Fig. 6.14 we see that those transitions encompass the spectrum from green to red that includes colors such as yellow [all those colors are present in Fig. P6.25(d), although they are somewhat difficult to see]. The reason for the diagonal green line in this figure is that the average values along that region are nearly midway between red and blue, which we know from from Fig. 6.14 is green. Problem 6.26 This is a simple problem to encourage the student to think about the meaning of the elements in Eq. (6.7-2). When C = I, it follows that C−1 = I and Eq. (6.7-2) becomes D(z,a)= (z − a)T (z − a) 1/2 . But the term inside the brackets is recognized as the inner product of the vector (z − a) with itself, which, by definition, is equal to the right side of Eq. (6.7-1). Problem 6.27 (a) The cube is composed of six intersecting planes in RGB space. The general equation for such planes is azR +bzG + czB + d = 0 where a, b, c,andd are parameters and the z’s are the components of any point (vector) z in RGB space lying on the plane. If an RGB point z does not lie on the plane, and its coordinates are substituted in the preceding equation, the equa- tion will give either a positive or a negative value; it will not yield zero. We say that z lies on the positive or negative side of the plane, depending on whether the result is positive or negative. We can change the positive side of a plane by multiplying its coefficients (except d )by−1. Suppose that we test the point a given in the problem statement to see whether it is on the positive or negative side each of the six planes composing the box, and change the coefficients of any plane for which the result is negative. Then, a will lie on the positive side of all planes composing the bounding box. In fact all points inside the bounding box will yield positive values when their coordinates are substituted in the equa- tions of the planes. Points outside the box will give at least one negative (or zero if it is on a plane) value. Thus, the method consists of substituting an unknown color point in the equations of all six planes. If all the results are positive, the point is inside the box; otherwise it is outside the box. A flow diagram is asked 121 Figure P6.29 for in the problem statement to make it simpler to evaluate the student’s line of reasoning. (b) If the box is lined up with the RGB coordinate axes, then the planes intersect the RGB coordinate planes perpendicularly. The intersections of pairs of par- allel planes establish a range of values along each of the RGB axis that must be checked to see if the if an unknown point lies inside the box or not. This can be done on an image per image basis (i.e., the three component images of an RGB image), designating by 1 a coordinate that is within its corresponding range and 0 otherwise. These will produce three binary images which, when ANDed, will give all the points inside the box. Problem 6.28 The sketch is an elongated ellipsoidal figure in which the length lined up with the Red (R) axis is 8 times longer that the other two dimensions. In other words, the figure looks like a blimp aligned with the R-axis. Problem 6.29 Set one of the three primary images to a constant value (say, 0), then consider the two images shown in Fig. P6.29. If we formed an RGB composite image by 122 CHAPTER 6. PROBLEM SOLUTIONS letting the image on the left be the red component and the image on the right the green component, then the result would be an image with a green region on the left separated by a vertical edge from a red region on the right. To compute the gradient of each component image we take second-order partial derivatives. In this case, only the component of the derivative in the horizontal direction is nonzero. If we model the edge as a ramp edge then a profile of the derivative image would appear as shown in Fig. P6.29 (see Section 10.2.4 for more detail on the profile of edges). The magnified view shows clearly that the derivatives of the two images are mirrors of each other. Thus, if we computed the gradient vector of each image and added the results as suggested in the problem state- ment, the components of the gradient would cancel out, giving a zero gradient for a color image that has a clearly defined edge between two different color re- gions. This simple example illustrates that the gradient vector of a color image is not equivalent to the result of forming a color gradient vector from the sum of the gradient vectors of the individual component images. Chapter 7 Problem Solutions Problem 7.1 Following the explanation in Example 7.1, the decoder is as shown in Fig. P7.1. Problem 7.2 A mean approximation pyramid is created by forming 2×2 block averages. Since the starting image is of size 4 × 4, J = 2andf x,y is placed in level 2 of the mean approximation pyramid. The level 1 approximation is (by taking 2×2block averages over f x,y and subsampling) 3.5 5.5 11.5 13.5 Level j prediction residual Prediction Interpolation filter Upsampler 2 Level j-1 approximation + Level j approximation Figure P7.1 123 124 CHAPTER 7. PROBLEM SOLUTIONS and the level 0 approximation is similarly [8.5]. The completed mean approxi- mation pyramid is ⎡ ⎢⎢⎢⎣ 1234 5678 9101112 13 14 15 16 ⎤ ⎥⎥⎥⎦ 3.5 5.5 11.5 13.5 [8.5]. Pixel replication is used in the generation of the complementary prediction resid- ual pyramid. Level 0 of the prediction residual pyramid is the lowest resolu- tion approximation, [8.5]. The level 2 prediction residual is obtained by upsam- pling the level 1 approximation and subtracting it from the level 2 approxima- tion (original image). Thus, we get ⎡ ⎢⎢⎢⎣ 1234 5678 9 101112 13 14 15 16 ⎤ ⎥⎥⎥⎦ − ⎡ ⎢⎢⎢⎣ 3.5 3.5 5.5 5.5 3.5 3.5 5.5 5.5 11.5 11.5 13.5 13.5 11.5 11.5 13.5 13.5 ⎤ ⎥⎥⎥⎦ = ⎡ ⎢⎢⎢⎣ −2.5 −1.5 −2.5 −1.5 1.5 2.5 1.5 2.5 −2.5 −1.5 −2.5 −1.5 1.5 2.5 1.5 2.5 ⎤ ⎥⎥⎥⎦. Similarly, the level 1 prediction residual is obtained by upsampling the level 0 approximation and subtracting it from the level 1 approximation to yield 3.5 5.5 11.5 13.5 − 8.5 8.5 8.5 8.5 = −5 −3 35 . The prediction residual pyramid is therefore ⎡ ⎢⎢⎢⎣ −2.5 −1.5 −2.5 −1.5 1.5 2.5 1.5 2.5 −2.5 −1.5 −2.5 −1.5 1.5 2.5 1.5 2.5 ⎤ ⎥⎥⎥⎦ −5 −3 35 [8.5]. Problem 7.3 The number of elements in a J + 1 level pyramid where N = 2J is bounded by 4 3 N 2 or 4 3 2J 2 = 4 3 22J (see Section 7.1.1): 22J 1 + 1 (4)1 + 1 (4)2 + ... + 1 (4)J ≤ 4 322J 125 for J > 0. We can generate the following table: J Pyramid Elements Compression Ratio 0 1 1 1 5 5/4 = 1.25 2 21 21/16 = 1.3125 3 85 85/64 = 1.328 ... ... ... ∞ 4/3 = 1.33 All but the trivial case, J = 0, are expansions. The expansion factor is a function of J and bounded by 4/3 or 1.33. Problem 7.4 First check for orthonormality. If the filters are orthonormal, they are biorthogo- nal by definition. Because Keven = 2, n = 0,1 , and g 0 (n)= 1 2 , 1 2 for n = 0,1, Eq. (7.1-14) yields: (−1)n g 0 (1 − n)=(−1)n g 0 (1 − n)= g 0 (0),−g 0 (1) = 1 2 , −1 2 = g 1 (n) g 0 (1 − n)= 1 2 , 1 2 = h0 (n) g 1 (1 − n)= −1 2 , 1 2 = h1 (n). Thus, the filters are orthonormal and will also satisfy Eq. (7.1-13). In addition, they will satisfy the biorthogonality conditions stated in Eqs. (7.1-12) and (7.1- 11), but not (7.1-10). The filters are both orthonormal and biorthogonal. Problem 7.5 (a) f a (n)=−f (n)={0,−0.5,−0.25,−1} . (b) fb (n)=f (−n)={1,0.25,0.5,0} . (c) f c (n)=(−1)n f (n)={0,−0.5,0.25,−1} . (d) f d (n)=f c (−n)={−1,0.25,−0.5,0} . (e) f e (n)=(−1)n fb (n)={1,−0.25,0.5,0} . (f) Filter f e (n) in (e) corresponds to Eq. (7.1-9). Problem 7.6 Table 7.1 in the book defines g 0 (n) for n = 0,1,2,...,7 to be about {0.23, 0.72, 0.63, -0.03, -0.19, 0.03, 0.03, -0.01}. Using Eq, (7.1-14) with Keven = 8, we can 126 CHAPTER 7. PROBLEM SOLUTIONS write g 1 (n)=(−1)n g 0 (7 − n). Thus g 1 (n) is an order-reversed and modulated copy of g 0 (n)—that is, {–0.01, -0.03, 0.03, 0.19, -0.03, -0.63, 0.72, -0.23}. To numerically prove the orthonormality of the filters, let m = 0inEq.(7.1- 13): g i (n) g j (n) = δ i − j for i, j = {0,1}. Iterating over i and j ,weget n g 2 0 (n)= n g 2 1 (n)=1 n g 0 (n) g 1 (n)=0. Substitution of the filter coefficient values into these equations yields: n g 0 (n) g 1 (n)=(0.23)(−0.01)+(0.72)(−0.03)+(0.63)(0.03) +(−0.03)(0.19)+(−0.19)(−0.03)+(0.03)(−0.63) +(0.03)(0.72)+(−0.01)(−0.23) = 0 n g 2 0 (n)= n g 2 1 (n) =(±0.23)2 +(0.72)2 +(±0.63)2 +(−0.03)2 +(±0.19)2 +(0.03)2 +(±0.03)2 +(−0.01)2 = 1. Problem 7.7 Reconstruction is performed by reversing the decomposition process—that is, by replacing the downsamplers with upsamplers and the analysis filters by their synthesis filter counterparts, as Fig. P7.7 shows. 127  g0(m)  g1(m) f(m,n) a(m,n) 2  g0(n)  g1(n)  g0(n)  g1(n) dV(m,n) dH(m,n) dD(m,n) Rows (along m) Rows Columns (along n) Columns Columns Columns 2 2 2 2 2 Figure P7.7 Problem 7.8 The Haar transform matrix for N = 8is H8 = 1 8 ⎡ ⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣ 11 1 1 11 1 1 11 1 1−1 −1 −1 −1 2 2 − 2 − 20 0 0 0 00 0 0 2 2 − 2 − 2 2 −2000000 00 2 −200 0 0 00 0 0 2−20 0 00 0 0 00 2 −2 ⎤ ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦ . Problem 7.9 (a) Equation (7.1-18) defines the 2 × 2 Haar transformation matrix as H2 = 1 2 11 1 −1 . Thus, using Eq. (7.1-17), we get T = HFHT = 1 2 2 11 1 −1 3 −1 62 11 1 −1 . = 54 −30 128 CHAPTER 7. PROBLEM SOLUTIONS (b) First, compute H−1 2 = ab cd such that ab cd 1 2 11 1 −1 = 10 01 . Solving this matrix equation yields H−1 2 = 1 2 11 1 −1 = H2 = HT 2 . Thus, F = HT TH = 1 2 2 11 1 −1 54 −30 11 1 −1 = 3 −1 62 . Problem 7.10 (a) The basis is orthonormal and the coefficients are computed by the vector equivalent of Eq. (7.2-5): α0 = 1 2 1 2 3 2 = 5 2 2 α1 = 1 2 − 1 2 3 2 = 2 2 so, 5 2 2 ϕ0 + 2 2 ϕ1 = 5 2 2 1 21 2 + 2 2 1 2 − 1 2 = 3 2 . 129 (b) The basis is biorthonormal and the coefficients are computed by the vector equivalent of Eq. (7.2-3): α0 = 1 −1 3 2 = 1 α1 = 01 3 2 = 2 so, ϕ0 + 2ϕ1 = 1 0 + 2 1 1 = 3 2 . (c) The basis is overcomplete and the coefficients are computed by the vector equivalent of Eq. (7.2-3): α0 = 2 3 0 3 2 = 2 α1 = − 1 3 3 3 3 2 = −1 + 2 3 3 α2 = − 1 3 − 3 3 3 2 = −1 − 2 3 3 130 CHAPTER 7. PROBLEM SOLUTIONS Figure P7.11 so, 2ϕ0 + −1 + 2 3 3 ϕ1 + −1 − 2 3 3 ϕ2 = 2 1 0 + −1 + 2 3 3 − 1 2 3 2 + −1 − 2 3 3 − 1 2 − 3 2 = 3 2 . Problem 7.11 As can be seen in Fig. P7.11, scaling function ϕ0,0 (x) cannot be written as a sum of double resolution copies of itself. Note the gap between ϕ1,0 (x) and ϕ1,1 (x). 131 Figure P7.12 Problem 7.12 Substituting j = 3 into Eq. (7.2-13), we get V3 =Span k ϕ3,k (x) V3 =Span k 2 3 2 ϕ 23x − k V3 =Span k 2 2ϕ (8x − k) . Using the Haar scaling function [Eq. (7.2-14)], we then get the result shown in Fig. P7.12. Problem 7.13 From Eq. (7.2-19), we find that ψ3,3(x)=23/2ψ(23x − 3) = 2 2ψ(8x − 3) and using the Haar wavelet function definition from Eq. (7.2-30), obtain the plot in Fig. P7.13. To express ψ3,3 (x) as a function of scaling functions, we employ Eq. (7.2-28) and the Haar wavelet vector defined in Example 7.6—that is, hψ(0)= 1/ 2andhψ(1)=−1/ 2. Thus we get ψ(x)= n hψ (n) 2ϕ (2x − n) so that ψ(8x − 3)= n hψ(n) 2ϕ(2[8x − 3] − n) = 1 2 2ϕ(16x − 6)+ −1 2 2ϕ(16x − 7) = ϕ(16x − 6) − ϕ(16x − 7). 132 CHAPTER 7. PROBLEM SOLUTIONS Then, since ψ3,3 (x)=2 2ψ(8x − 3) from above, substitution gives ψ3,3 = 2 2ψ(8x − 3) = 2 2ϕ(16x − 6) − 2 2ϕ(16x − 7). Problem 7.14 Using Eq. (7.2-22), V3 = V2 ⊕ W2 = V1 ⊕ W1 ⊕ W2 = V0 ⊕ W0 ⊕ W1 ⊕ W2 The scaling and wavelet functions are plotted in Fig. P7.14. Problem 7.15 With j0 = 1 the approximation coefficients are c1(0) and c1(1): c1(0)= 1/2 0 x 2 2dx = 2 24 c1(1)= 1 1/2 x 2 2dx = 7 2 24 . ψ (x) = 2 2 ψ(8x-3)3, 3 0 2 2 0 3/8 1 -2 2 Figure P7.13 133 V3 V2 V1 W0 W2 W1 V0 Figure P7.14 Therefore, the V1 approximation is 2 24 ϕ1,0(x)+7 2 24 ϕ1,1 (x) which, when plotted, is identical to the V1 approximation in Fig. 7.15(d). The last two coefficients are d 1(0) and d 1(1), which are computed as in the example. Thus, the expansion is y = 2 24 ϕ1,0(x)+7 2 24 ϕ1,1(x)+ − 2 32 ψ1,0(x) − 3 2 32 ψ1,1(x) + ··· Problem 7.16 (a) Because M = 4, J = 2, and j0 = 1, the summations in Eqs. (7.3-5) through (7.3-7) are performed over n = 0,1,2,3, j = 1, and k = 0,1. Using Haar functions and assuming that they are distributed over the range of the input sequence, we 134 CHAPTER 7. PROBLEM SOLUTIONS get Wϕ(1,0)=1 2  f (0)ϕ1,0(0)+f (1)ϕ1,0(1)+f (2)ϕ1,0(2)+f (3)ϕ1,0(3) = 1 2 (1)( 2)+(4)( 2)+(−3)(0)+(0)(0) = 5 2 2 Wϕ(1,1)=1 2  f (0)ϕ1,1(0)+f (1)ϕ1,1(1)+f (2)ϕ1,1(2)+f (3)ϕ1,1(3) = 1 2 (1)(0)+(4)(0)+(−3)( 2)+(0)( 2) = −3 2 2 Wψ(1,0)=1 2  f (0)ψ1,0(0)+f (1)ψ1,0(1)+f (2)ψ1,0(2)+f (3)ψ1,0(3) = 1 2 (1)( 2)+(4)(− 2)+(−3)(0)+(0)(0) = −3 2 2 Wψ(1,1)=1 2  f (0)ψ1,1(0)+f (1)ψ1,1(1)+f (2)ψ1,1(2)+f (3)ψ1,1(3) = 1 2 (1)(0)+(4)(0)+(−3)( 2)+(0)(− 2) = −3 2 2 so that the DWT is {5 2/2,−3 2/2,−3 2/2,−3 2/2}. (b) Using Eq. (7.3-7), f (n)=1 2 [Wϕ(1,0)ϕ1,0(n)+Wϕ(1,1)ϕ1,1(n)+ Wψ(1,0)ψ1,0(n)+Wψ(1,1)ψ1,1(n)] which, with n = 1, becomes f (1)= 2 4 (5)( 2)+(−3)(0)+(−3)( 2)+(−3)(0) = 2( 2)2 4 = 1. Problem 7.17 Intuitively, the continuous wavelet transform (CWT) calculates a “resemblance index” between the signal and the wavelet at various scales and translations. When the index is large, the resemblance is strong; else it is weak. Thus, if a function is similar to itself at different scales, the resemblance index will be sim- ilar at different scales. The CWT coefficient values (the index) will have a char- acteristic pattern. As a result, we can say that the function whose CWT is shown is self-similar—like a fractal signal. 135 2 2 Wϕ(2, n) = f(n) = {1, 4, -3, 0} Wϕ(1, n) = {5/ 2, -3/ 2} Wψ(1, n) = {-3/ 2, -3/ 2}{-1/ 2, 1/ 2} {1/ 2, 1/ 2} {-1/ 2 , -3/ 2, 7/ 2, -3/ 2, 0} {1/ 2 , 5/ 2, 1/ 2, -3/ 2, 0}   Figure P7.19 Problem 7.18 (a) The scale and translation parameters are continuous, which leads to the overcompleteness of the transform. (b) The DWT is a better choice when we need a space saving representation that is sufficient for reconstruction of the original function or image. The CWT is often easier to interpret because the built-in redundancy tends to reinforce traits of the function or image. For example, see the self-similarity of Problem 7.17. Problem 7.19 The filter bank is the first bank in Fig. 7.19, as shown in Fig. P7.19: Problem 7.20 The complexity is determined by the number of coefficients in the scaling and wavelet vectors—that is, by n in Eqs. (7.2-18) and (7.2-28). This defines the number of taps in filters hψ (−n), hϕ (−n), hψ (n),andhϕ (n). Problem 7.21 (a) Input ϕ(n)={1,1,1,1,1,1,1,1} = ϕ0,0(n) for a three-scale wavelet transform with Haar scaling and wavelet functions. Since wavelet transform coefficients measure the similarity of the input to the basis functions, the resulting transform is {Wϕ(0,0),Wψ(0,0),Wψ(1,0),Wψ(1,1),Wψ(2,0),Wψ(2,1),Wψ(2,2) Wψ(2,3)} = {2 2,0,0,0,0,0,0,0}. 136 CHAPTER 7. PROBLEM SOLUTIONS The Wϕ(0,0) term can be computed using Eq. (7.3-5) with j0 = k = 0. (b) Using the same reasoning as in part (a), the transform is {0,2 2,0,0,0,0,0,0}. (c) For the given transform, Wψ(2,2)=B and all other transform coefficients are 0. Thus, the input must be proportional to ψ2,2(x). The input sequence must be of the form {0,0,0,0,C,−C,0,0} for some C. To determine C, use Eq. (7.3-6) to write Wψ(2,2)= 1 8 {f (0)ψ2,2(0)+f (1)ψ2,2(1)+f (2)ψ2,2(2)+f (3)ψ2,2(3)+ f (4)ψ2,2(4)+f (5)ψ2,2(5)+f (6)ψ2,2(6)+f (7)ψ2,2(7)} = 1 8 {(0)(0)+(0)(0)+(0)(0)+(0)(0)+(C)(2)+(−C)(−2)+ (0)(0)+(0)(0)} = 1 8 {2C + 2C} = 4C 8 = 2C. Because this coefficient is known to have the value B,wehavethat 2C = B or C = 2 2 B. Thus, the input sequence is {0,0,0,0, 2B/2,− 2B/2,0,0}.Tochecktheresult substitute these values into Eq. (7.3-6): Wψ(2,2)= 1 8 {(0)(0)+(0)(0)+(0)(0)+(0)(0)+( 2 2 B)(2)+ (− 2 2 B)(−2)+(0)(0)+(0)(0)} = 1 8 { 2B + 2B} = B. Problem 7.22 They are both multi-resolution representations that employ a single reduced- resolution approximation image and a series of “difference” images. For the FWT,these “difference” images are the transform detail coefficients; for the pyra- mid, they are the prediction residuals. To construct the approximation pyramid that corresponds to the transform in Fig. 7.10(a), we will use the FWT−1 2-d synthesis bank of Fig. 7.24(c). First, place the 64× 64 approximation “coefficients” from Fig. 7.10(a) at the top of the 137 2 2 2 2 2 2 Wϕ(1, m, n) Wψ(0, 0, 0) =[0] Wϕ(0, 0, 0) =[5] Wψ(0, 0, 0) =[4] Wψ(0, 0, 0) =[-3] H V D Columns (along n) Columns Rows (along m) Rows Rows Rows 3 -1 6 2= {-1/ 2, 1/ 2} {-1/ 2, 1/ 2} {1/ 2, 1/ 2} {1/ 2, 1/ 2} {1/ 2, 1/ 2} {-1/ 2, 1/ 2}2 2 2 2 2 4 2 Each Row Each Row Each Column Each Column Each Column Each Column 5 -3 4 0 Ordered per Fig. 7.24(b)       Figure P7.23 pyramid being constructed. Then use it, along with 64 × 64 horizontal, vertical, and diagonal detail coefficients from the upper-left of Fig. 7.10(a), to drive the filter bank inputs in Fig. 7.24(c). The output will be a 128 × 128 approximation of the original image and should be used as the next level of the approximation pyramid. The 128×128 approximation is then used with the three 128×128 de- tail coefficient images in the upper 1/4 of the transform in Fig. 7.10(a) to drive the synthesis filter bank in Fig. 7.24(c) a second time—producing a 256×256 ap- proximation that is placed as the next level of the approximation pyramid. This process is then repeated a third time to recover the 512 × 512 original image, which is placed at the bottom of the approximation pyramid. Thus, the approx- imation pyramid would have 4 levels. Problem 7.23 One pass through the FWT 2-d filter bank of Fig. 7.24(a) is all that is required (see Fig. P7.23). Problem 7.24 As can be seen in the sequence of images that are shown, the DWT is not shift invariant. If the input is shifted, the transform changes. Since all original images intheproblemare128× 128, they become the Wϕ(7,m,n) inputs for the FWT computation process. The filter bank of Fig. 7.24(a) can be used with j + 1 = 7. 138 CHAPTER 7. PROBLEM SOLUTIONS For a single scale transform, transform coefficients Wϕ(6,m,n) and W i ψ (6,m,n) for i = H,V,D are generated. With Haar wavelets, the transformation process subdivides the image into non-overlapping 2 × 2 blocks and computes 2-point averages and differences (per the scaling and wavelet vectors). Thus, there are no horizontal, vertical, or diagonal detail coefficients in the first two transforms shown; the input images are constant in all 2 × 2blocks(soalldifferencesare 0). If the original image is shifted by one pixel, detail coefficients are generated since there are then 2×2 areas that are not constant. This is the case in the third transform shown. Problem 7.25 The table is completed as shown in Fig. P7.25. The functions are determined using Eqs. (7.2-18) and (7.2-28) with the Haar scaling and wavelet vectors from Examples 7.5 and 7.6: ϕ(x)=ϕ(2x)+ϕ(2x − 1) ψ(x)=ϕ(2x) − ϕ(2x − 1). To order the wavelet functions in frequency, count the number of transitions that are made by each function. For example, V0 has the fewest transitions (only 2) and lowest frequency content, while W2,AA has the most (9) transitions and correspondingly highest frequency content. From top to bottom in the figure, there are 2, 3, 5, 4, 9, 8, 6, and 7 transitions, respectively. Therefore, the frequency ordered subspaces are (from low to high frequency) V0, W0, W1,D , W1,A, W2,DA, W2,DD, W2,AD,andW2,AA. Problem 7.26 (a) TheanalysistreeisshowninFig.P7.26(a). (b) The corresponding frequency spectrum is shown in Fig. P7.26(b). Problem 7.27 First use the entropy measure to find the starting value for the input sequence, which is E{f (n)} = 7 n=0 f 2(n)ln f 2(n) = 2.7726. Then perform an iteration of the FWT and compute the entropy of the generated approximation and detail coefficients. They are 2.0794 and 0, respectively. Since 139 V3 V2 V1 W0 W2 W1 W2,A W2,D W1,A W1,D W2,AA W2,AD W2,DA W2,DD V0 Figure P7.25 their sum is less than the starting entropy of 2.7726, we will use the decomposi- tion. Because the detail entropy is 0, no further decomposition of the detail is war- ranted. Thus, we perform another FWT iteration on the approximation to see if it should be decomposed again. This process is then repeated until no further decompositions are called for. The resulting optimal tree is shown in Fig. P7.27. VJ-2 VJ VJ-1 WJ-1 H WJ-2 H WJ-1,A D WJ-1,H D WJ-1,V D WJ-1,D D WJ-1,HA D WJ-1,HH D WJ-1,HV D WJ-1,HD D WJ-1 V WJ-1 D WJ-2 V WJ-2 D Figure P7.26(a) 140 CHAPTER 7. PROBLEM SOLUTIONS ωHORIZπ π −π −π WJ-1 V WJ-1 H WJ-1 V WJ-1 H ωVERT a bb c c x z ww w w z z z x x x n ml o lm no l m no lm no VJ-2 WJ-2 V WJ-2 D WJ-2 H a = b = c = d = x = l = z = w = m = n = o = WJ-1,A D WJ-1,V D WJ-1,D D WJ-1,HA D WJ-1,HH D WJ-1,HV D WJ-1,HD D d dd d Figure P7.26(b) 2.7726 2.0794 1.3863 0.6931 00 0 0 0 Figure P7.27 Dummywhitetexttomovefigure.. Chapter 8 Problem Solutions Problem 8.1 (a) A histogram equalized image (in theory) has an intensity distribution which is uniform. That is, all intensities are equally probable. Eq. (8.1-4) thus becomes Lavg = 1 2n 2n −1 k=0 l (rk ) where 1/2n is the probability of occurrence of any intensity. Since all intensities are equally probable, there is no advantage to assigning any particular intensity fewer bits than any other. Thus, we assign each the fewest possible bits required to cover the 2n levels. This, of course is n bits and Lavg becomes n bits also: Lavg = 1 2n 2n −1 k=0 (n) = 1 2n 2n n = n. (b) Since spatial redundancy is associated with the geometric arrangement of the intensitess in the image, it is possible for a histogram equalized image to contain a high level of spatial redundancy - or none at all. Problem 8.2 (a) A single line of raw data contains n1 = 2n bits. The maximum run length would be 2n and thus require n bits for representation. The starting coordinate 141 142 CHAPTER 8. PROBLEM SOLUTIONS of each run also requires n bits since it may be arbitrarily located within the 2n pixel line. Since a run length of 0 can not occur and the run-length pair (0,0) is used to signal the start of each new line - an additional 2n bits are required per line. Thus, the total number of bits required to code any scan line is n2 = 2n + Navg (n + n) = 2n 1 + Navg where Navg is the average number of run-length pairs on a line. To achieve some level of compression, C must be greater than 1. So, C = n1 n2 = 2n 2n 1 + Navg > 1 and Navg < 2n−1 n − 1. (b) For n = 10, Navg must be less than 50.2 run-length pairs per line. Problem 8.3 The original pixel intensities, their 4-bit quantized counterparts, and the differ- ences between them are shown in Table P8.3. Note that the quantized intensities must be multiplied by 16 to decode or decompress them for the rms error and signal-to-noise calculations. Table P8.3 f x,y ˆf x,y 16 ˆf x,y − f x,y Base 10 Base 2 Base 2 Base 10 Base 10 108 01101100 0110 6 -12 139 10001011 1000 8 -11 135 10000111 1000 8 -7 244 11110100 1111 15 -4 172 10101100 1010 10 -12 173 10101101 1010 10 -13 56 00111000 0011 3 -8 99 01100011 0110 6 -3 143 Using Eq. (8.1-10), the rms error is erms = 1 8 0 x=0 7 y =0 16 ˆf x,y − f x,y 2 = 1 8 (−12)2 +(−11)2 +(−7)2 +(−4)2 +(−12)2 +(−13)2 +(−8)2 +(−3)2 = 1 8 (716) = 9.46 or about 9.5 intensity levels. From Eq. (8.1-11), the signal-to-noise ratio is SNRms = 0 x=0 7 y =0 16 ˆf x,y 2 0 x=0 7 y =0 16 ˆf x,y − f x,y 2 = 962 + 1282 + 1282 + 2402 + 1602 + 1602 + 482 + 962 716 = 162304 716  227. Problem 8.4 (a) Table P8.4 shows the starting intensity values, their 8-bit codes, the IGS sum used in each step, the 4-bit IGS code and its equivalent decoded value (the decimal equivalent of the IGS code multiplied by 16), the error between the decoded IGS intensities and the input values, and the squared error. (b) Using Eq. (8.1-10) and the squared error values from Table P8.4, the rms error is erms = 1 8 (144 + 25 + 49 + 16 + 16 + 169 + 64 + 9) = 1 8 (492) = 7.84 144 CHAPTER 8. PROBLEM SOLUTIONS or about 7.8 intensity levels. From Eq. (8.1-11), the signal-to-noise ratio is SNRms = 962 + 1442 + 1282 + 2402 + 1762 + 1602 + 642 + 962 492 = 173824 492  353. Table P8.4 Intensity 8-bit Code Sum IGS Code Decoded IGS Error Square Error 00000000 108 01101100 01101100 0110 96 -12 144 139 10001011 10010111 1001 144 5 25 135 10000111 10001110 1000 128 -7 49 244 11110100 11110100 1111 240 -4 16 172 10101100 10110000 1011 176 4 16 173 10101101 10101101 1010 160 -13 169 56 00111000 01000101 0100 64 8 64 99 01100011 01101000 0110 96 -3 9 Problem 8.5 (a) The maximum compression with Huffman coding is C = 8 5.3 = 1.509. (b) No, but Huffman coding does provide the smallest possible number of bits per intensity value subject to the constraint that the intensities are coded one at atime. (c) One possibility is to eliminate spatial redundancies in the image before Huff- man encoding. For instance, compute the differences between adjacent pixels and Huffman code them. Problem 8.6 The conversion factors are computed using the logarithmic relationship loga x = 1 logb a logb x. Thus, 1 Hartley = 3.3219 bits and 1 nat = 1.4427 bits. 145 Problem 8.7 Let the set of source symbols be a 1,a 2,...,aq with probabilities P (a 1),P (a 2),...,P aq T . Then, using Eq. (8.1-6) and the fact that the sum of all P (a i ) is 1, we get logq − H = logq ⎡ ⎢⎣ q j =1 P a j ⎤ ⎥⎦ + q j =1 P a j logP a j = q j =1 P a j logq + q j =1 P a j logP a j = q j =1 P a j logqP a j . Using the log relationship from Problem 8.6, this becomes = loge q j =1 P a j lnqP a j . Then, multiplying the inequality lnx ≤ x − 1by-1togetln1/x ≥ 1 − x and ap- plying it to this last result, logq − H ≥ loge q ij=1 P a j ⎡ ⎣1 − 1 qP a j ⎤ ⎦ ≥ loge ⎡ ⎢⎣ q j =1 P a j − 1 q q j =1 P a j P a j ⎤ ⎥⎦ ≥ loge [1 − 1] ≥ 0 so that logq ≥ H. Therefore, H is always less than, or equal to, logq.Furthermore,inviewofthe equality condition (x = 1) for ln1/x ≥ 1 − x, which was introduced at only one point in the above derivation, we will have strict equality if and only if P(a j )= 1/q for all j . Problem 8.8 (a) There are two unique codes. 146 CHAPTER 8. PROBLEM SOLUTIONS 21 3/8 3/8 5/8 243 3/8 3/8 3/8 95 1/8 2/8 169 1/8 Source reductions 21 1 1 0 243 00 00 1 95 000 01 169 001 Code assignments Figure P8.9 (b) The codes are: (1) 0, 11, 10 and (2) 1, 00, 01. The codes are complements of one another. They are constructed by following the Huffman procedure for three symbols of arbitrary probability. Problem 8.9 (a) The entropy of the image is estimated using Eq. (8.1-7) to be ˜H = − 255 k=0 pr (rk )log2 pr (rk ) = − 12 32 log2 12 32 + 4 32 log2 4 32 + 4 32 log2 4 32 + 12 32 log2 12 32 = −[−0.5306 − 0.375 − 0.375 − 0.5306] = 1.811 bits/pixel. The probabilities used in the computation are given in Table P8.9-1. Table P8.9-1 Intensity Count Probability 21 12 3/8 95 4 1/8 169 4 1/8 243 12 3/8 (b) Figure P8.9 shows one possible Huffman source reduction and code assign- ment. Use the procedures described in Section 8.2.1. The intensities are first arranged in order of probability from the top to the bottom (at the left of the source reduction diagram). The least probable symbols are them combined to create a reduced source and the process is repeated from left to right in the di- agram. Code words are then assigned to the reduced source symbols from right to left. The codes assigned to each intensity value are read from the left side of thecodeassignmentdiagram. 147 (c) Using Eq. (8.1-4), the average number of bits required to represent each pixel in the Huffman coded image (ignoring the storage of the code itself) is Lavg = 1 3 8 + 2 3 8 + 3 1 8 + 3 1 8 = 15 8 = 1.875 bits/pixel. Thus, the compression achieved is C = 8 1.875 = 4.27. Because the theoretical compression resulting from the elimination of all coding redundancy is 8 1.811 = 4.417, the Huffman coded image achieves 4.27 4.417 × 100 or 96.67% of the maximum compression possible through the removal of coding redundancy alone. (d) We can compute the relative frequency of pairs of pixels by assuming that the image is connected from line to line and end to beginning. The resulting probabilities are listed in Table P8.9-2. Table P8.9-2 Intensity pair Count Probability (21, 21) 8 1/4 (21, 95) 4 1/8 (95, 169) 4 1/8 (169, 243) 4 1/8 (243, 243) 8 1/4 (243, 21) 4 1/8 The entropy of the intensity pairs is estimated using Eq. (8.1-7) and dividing by 2 (because the pixels are considered in pairs): 1 2 ˜H = −1 2 1 4 log2 1 4 + 1 8 log2 1 8 + 1 8 log2 1 8 + 1 8 log2 1 8 + 1 4 log2 1 4 + 1 8 log2 1 8 = 2.5 2 = 1.25 bits/pixel. The difference between this value and the entropy in (a) tells us that a mapping can be created to eliminate (1.811 − 1.25)=0.56 bits/pixel of spatial redundancy. 148 CHAPTER 8. PROBLEM SOLUTIONS (e) Construct a difference image by replicating the first column of the original image and using the arithmetic difference between adjacent columns for the remaining elements. The difference image is 210074747400 210074747400 210074747400 210074747400 The probabilities of its various elements are given in Table 8.9-3. Table P8.9-3 Intensity difference Count Probability 21 4 1/8 0161/2 74 12 3/8 The entropy of the difference image is estimated using Eq. (8.1-7) to be ˜H = − 1 8 log2 1 8 + 1 2 log2 1 2 + 3 8 log2 3 8 = 1.41 bits/pixel. (f) The entropy calculated in (a) is based on the assumption of statistically in- dependent pixels. The entropy (of the pixel pairs) computed in (d), which is smaller that the value found in (a), reveals that the pixels are not statistically independent. There is at least (1.811 − 1.25)=0.56 bits/pixel of spatial redun- dancy in the image. The difference image mapping used in (e) removes most of that spatial redundancy, leaving only (1.41 − 1.25)=0.16 bits/pixel. Problem 8.10 The decoded message is a 3a 6a 6a 2a 5a 2a 2a 2a 4. Problem 8.11 The Golomb code is shown in Table P8.11. It is computed by the procedure out- lined in Section 8.2.2 in conjunction with Eq. (8.2-1). 149 Table P8.11 Integer n Unary code of n m Truncated n modm Golomb code G3 (n) 00 0 00 1 0 10 010 2 0 11 011 3 10 0 100 4 10 10 1010 5 10 11 1011 6 110 0 1100 7 110 10 11010 8 110 101 11011 9 1110 0 11100 10 1110 10 111010 11 1110 101 111011 12 11110 0 111100 13 11110 10 1111010 14 11110 11 1111011 15 111110 0 1111100 Problem 8.12 To decode Gm (n),letk = log2 m and c = 2k − m.Then: 1. Count the number of 1s in a left-to-right scan of a concatenatedGm (n) bit sequence before reaching the first 0, and multiply the number of 1s by m. 2. If the decimal equivalent of the next k −1bitsislessthanc,addittoresult from step 1; else add the decimal equivalent of the next k bits and subtract c. For example, to decode the first G3 (n) code in the Golomb coded bit sequence 1111010011..., let k = log23 = 2, c = 22 − 3 = 1, and note that there are 4 1s in a left-to-right scan of the bit stream before reaching the first 0. Multiplying the number of 1s by 3 yields 12 (the result of step 1). The bit following the 0 identified in step 1 is a 1, whose decimal equivalent is not less than c, i.e., 1 ≮ 1. So add the decimal equivalent of the 2 bits following the 0 identified in step 1 and subtract 1. Thus, the first integer is 12 + 2 − 1 = 13. Repeat the process for the next code word, which begins with bits 011... 150 CHAPTER 8. PROBLEM SOLUTIONS Problem 8.13 The probability mass function defined by Eq. (8.2-2) is for an infinite set of in- tegers. It is not possible to list an infinite set of source symbols and perform the source reductions required by Huffman’s approach. Problem 8.14 The exponential Golomb code is shown in Table P8.14. It is computed by the procedure outlined in Section 8.2.2 in conjunction with Eqs. (8.2-5) and (8.2-6). Table P8.14 Integer n Parameter i Golomb code G 2 exp (n) 0 0 000 1 0 001 2 0 010 3 0 011 4 1 10000 5 1 10001 6 1 10010 7 1 10011 8 1 10100 9 1 10101 10 1 10110 11 1 10111 12 2 1100000 13 2 1100001 14 2 1100010 15 2 1100011 Problem 8.15 To decode G k exp (n): 1. Count the number of 1s in a left-to-right scan of a concatenated G k exp (n) bit sequence before reaching the first 0, and let i be the number of 1s counted. 2. Get the k +i bits following the 0 identified in step 1 and let d be its decimal equivalent. 151 ρ m Equation (8.2-3) 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 20 30 40 50 60 70 0 Figure P8.16 3. The decoded integer is then d + i−1 j =0 2j +k . For example, to decode the first G 2 exp (n) code in the bit stream 10111011..., let i = 1,thenumberof1sinaleft-to-rightscanofthebitstreambeforefindingthe first 0. Get the 2 + 1 = 3 bits following the 0, that is, 111 so d = 7. The decoded integer is then 7 + 1−1 j =0 2j +2 = 7 + 22 = 11. Repeat the process for the next code word, which begins with the bit sequence 011... Problem 8.16 ThegraphshowninFigureP8.16wasobtainedusingMATLAB.Notethatthe function is not defined for ρ = {0,1}. 152 CHAPTER 8. PROBLEM SOLUTIONS a b c d 0 1 a b c d 0.1 0.5 a b c d 0.14 0.3 a b c d 0.14 0.156 a b c d 0.1528 0.156 bbad 0.15536 0.1544 c Encoding sequence Figure P8.17 Problem 8.17 Figure P8.17 shows the arithmetic coding process as described in Section 8.2.3. Any value in the interval [0.1544,0.15536) at the right side of the figure can be used to code the sequence. For example, the value 0.155. Problem 8.18 The arithmetic decoding process is the reverse of the encoding procedure. Start by dividing the [0, 1) interval according to the symbol probabilities. This is shown in Table P8.18. The decoder immediately knows the message 0.23355 begins with an “e”, since the coded message lies in the interval [0.2, 0.5). This makes it clear that the second symbol is an “a”, which narrows the interval to [0.2, 0.26). To further see this, divide the interval [0.2, 0.5) according to the sym- bol probabilities. Proceeding like this, which is the same procedure used to code the message, we get “eaii!”. Table P8.18 Symbol Probability Range a 0.2 [0.0, 0.2) e 0.3 [0.2, 0.5) i 0.1 [0.5, 0.6) o 0.2 [0.6, 0.8) u 0.1 [0.8, 0.9) ! 0.1 [0.9, 1.0) 153 Problem 8.19 Assume that the first 256 codes in the starting dictionary are the ASCII codes. If you assume 7-bit ASCII, the first 128 locations are all that are needed. In either case, the ASCII ”a” corresponds to location 97. The coding proceeds as shown in Table P8.19. The encoded output is 97 256 257 258 97. Table P8.19 Recognized Character Output Dict. Address Dict. Entry a a a 97 256 aa aa aa a 256 257 aaa aa aa a aaa a 257 258 aaaa aa aa a aaa a aaaa a 258 259 aaaaa a97 Problem 8.20 The input to the LZW decoding algorithm in Example 8.7 is 39 39 126 126 256 258 260 259 257 126 The starting dictionary, to be consistent with the coding itself, contains 512 locations–with the first 256 corresponding to intensity values 0 through 255. The decoding algorithm begins by getting the first encoded value, outputting the corresponding value from the dictionary, and setting the “recognized sequence” to the first value. For each additional encoded value, we (1) output the dictio- nary entry for the pixel value(s), (2) add a new dictionary entry whose content is the “recognized sequence” plus the first element of the encoded value being processed, and (3) set the “recognized sequence” to the encoded value being processed. For the encoded output in Example 8.12, the sequence of operations isasshowninTableP8.20. Note, for example, in row 5 of the table that the new dictionary entry for lo- cation 259 is 126-39, the concatenation of the currently recognized sequence, 154 CHAPTER 8. PROBLEM SOLUTIONS 126, and the first element of the encoded value being processed–the 39 from the 39-39 entry in dictionary location 256. The output is then read from the third column of the table to yield 39 39 126 126 39 39 126 126 39 39 126 126 39 39 126 126 where it is assumed that the decoder knows or is given the size of the image that was received. Note that the dictionary is generated as the decoding is carried out. Table P8.20 Recognized Encoded Value Pixels Dict. Address Dict. Entry 39 39 39 39 39 256 39-39 39 126 126 257 39-126 126 126 126 258 126-126 126 256 39-39 259 126-39 256 258 126-126 260 39-39-126 258 260 39-39-126 261 126-126-39 260 259 126-39 262 39-39-126-126 259 257 39-126 263 126-39-39 257 126 126 264 39-126-126 Problem 8.21 Using the BMP specification given in Example 8.8 of Section 8.2.5, the first two bytes indicate that the uncompressed data begins with a run of 4s with length 3. In a similar manner, the second two bytes call for a run of 6s with length 5. The first four bytes of the BMP encoded sequence are encoded mode. Because the 5th byte is 0 and the 6th byte is 3, absolute mode is entered and the next three values are taken as uncompressed data. Because the total number of bytes in absolute mode must be aligned on a 16-bit word boundary, the 0 in the 10th byte of the encoded sequence is padding and should be ignored. The final two bytes specify an encoded mode run of 47s with length 2. Thus, the complete uncompressed sequence is {4, 4, 4, 6, 6, 6, 6, 6, 103, 125, 67, 47, 47}. 155 Problem 8.22 (a) Using Eq. (8.2-9), form Table P8.22. Table P8.22 Binary Gray Code Binary Gray Code 0000 0000 1000 1100 0001 0001 1001 1101 0010 0011 1010 1111 0011 0010 1011 1110 0100 0110 1100 1010 0101 0111 1101 1011 0110 0101 1110 1001 0111 0100 1111 1000 (b) The procedure is to work from the most significant bit to the least significant bit using the equations: a m−1 = g m−1 a i = g i ⊕ a i+1 0 ≤ i ≤ m − 2. The decoded binary value is thus 0101100111010. Problem 8.23 Following the procedure in the flow chart of Fig. 8.14 in the book, the proper code is 0001 010 1 0011000011 0001 where the spaces have been inserted for readability alone. The coding mode sequence is pass, vertical (1 left), vertical (directly below), horizontal (distances 3 and 4), and pass. Problem 8.24 (a) - (b) Following the procedure outlined in Section 8.2.8, we obtain the results shown in Table P8.24. 156 CHAPTER 8. PROBLEM SOLUTIONS Table P8.24 DC Coefficient Difference Two’s Complement Value Code -7 1...1001 00000 -6 1...1010 00001 -5 1...1011 00010 -4 1...1100 00011 4 0...0100 00100 5 0...0101 00101 6 0...0110 00110 7 0...0111 00111 Problem 8.25 In general, the number of MAD computations for single-pixel precision and dis- placements ±dx and ±dy is (2dx+ 1) 2dy + 1 .Withbothdx and dy equal to ±8, the number of MAD computations for this problem is 172 or 289 per mac- roblock. With 8 × 8 macroblocks, each MAD computation involves 82 (3)=192 operations (64 subtractions, absolute values, and additions). So, the total num- ber of math operations for 8 × 8 macroblocks and single pixel displacements dx = dy = 8 is 289 × 192 = 55,488. For 1 4 pixel precision, the number of MAD computations is multiplied by 16, yielding 16 × 289 = 4,624 MAD computations per macroblock. The num- ber of math operations for each MAD computation is increased from 192 to 192+82 (4)=448, where the additional operations are for bilinear interpolation. So, the total number of math operations for 8 × 8 macroblocks and 1 4 pixel dis- placements dx = dy = 8 (using bilinear interpolation) is 4624×448 = 2,071,552. Problem 8.26 There are several ways in which backward motion-compensated prediction can help: 1. It provides a means of predicting uncovered background when an object is moving. 2. It provides a means of predicting pixels on the edges of frames when the camera is panning. 3. Backward and forward prediction can average the noise in two reference frames to yield a more accurate prediction. 157 Inverse Mapper (e.g., DCT-1) Inverse Quantizer Variable Length Decoder Decoded Image Macroblock Variable Length Decoder Motion Estimator and Compensator +Encoded Macroblock Encoded Motion Vector Figure P8.27 Problem 8.27 The appropriate MPEG decoder is shown in Fig. P8.27. Problem 8.28 (a) Substituting ρh = 0 into Eq. (8.2-49) and evaluating it to form the elements of R and r,weget R = σ2 1 ρ ρ 1 r = σ2 ρ ρ2 . (b) First form the inverse of R, R−1 = 1 σ2 1 − ρ2 1 −ρ −ρ 1 . Then, perform the matrix multiplication of Eq. (8.2-45): α = R−1r = σ2 σ2 1 − ρ2 ρ 1 − ρ2 0 = ρ 0 . Thus, α1 = ρ and α2 = 0. (c) The variance is computed using Eq. (8.2-48): σ2 e = σ2 − αT r = ρ 0 ρ ρ2 = σ2 1 − ρ2 . 158 CHAPTER 8. PROBLEM SOLUTIONS Problem 8.29 The derivation proceeds by substituting the uniform probability function into Eqs. (8.2-57) - (8.2-59) and solving the resulting simultaneous equations with L = 4. Equation (8.2-58) yields s0 = 0 s1 = 1 2 (t1 + t2) s2 = ∞. Substituting these values into the integrals defined by Eq. (8.2-57), we get two equations. The first is (assuming s1 ≤ A) s1 s0 (s − t1)p (s)ds = 0 1 2A 1 2 (t1+t2) 0 (s − t1)ds = s 2 2 − t1s 1 2 (t1 + t2) 0 = 0 (t1 + t2)2 − 4t1 (t1 + t2)=0 (t1 + t2)(t2 − 3t1)=0 so t1 = −t2 t2 = 3t1. The first of these relations does not make sense since both t1 and t2 must be pos- itive. The second relationship is a valid one. The second integral yields (noting that s1 is less than A so the integral from A to ∞ is 0 by the definition of p(s)) s2 s1 (s − t2)p (s)ds = 0 1 2A A 1 2 (t1+t2) (s − t2)ds = s 2 2 − t2s A 1 2 (t1 + t2) = 0 4A2 − 8At2 − (t1 + t2)2 − 4t2 (t1 + t2)=0. 159 Substituting t2 = 3t1 from the first integral simplification into this result, we get 8t 2 1 − 6At1 + A2 = 0 t1 − A 2 (8t1 − 2A)=0 t1 = A 2 t1 = A 4 . Back substituting these values of t1, we find the corresponding t2 and s1 values: t2 = 3A 2 and s1 = A for t1 = A 2 t2 = 3A 4 and s1 = A 2 for t1 = A 4 . Because s1 = A is not a real solution (the second integral equation would then be evaluated from A to A, yielding 0 or no equation), the solution is given by the second. That is, s0 = 0 s1 = A 2 s2 = ∞ t1 = A 4 t2 = 3A 4 . Problem 8.30 There are many ways to approach this problem. For example, JPEG-2000 com- pression could be used. Here’s another alternative. Because the T1 transfer rate is 1.544 Mbit/sec, a 6 second transfer will provide (1.544 × 106)(6sec)=9.264 × 106 bits of data. The initial approximation of the X-ray must contain no more than this number of bits. The required compression ratio is thus C = 4096 × 4096 × 12 9.264 × 106 = 21.73 The JPEG transform coding approach of Section 8.2.8 can achieve this level of compression and provide reasonably good reconstructions. At the X-ray en- coder, the X-ray can be JPEG compressed using a normalization array that yields about a 25:1 compression. While it is being transmitted over the T1 line to the remote viewing station, the encoder can decode the compressed JPEG data and identify the “differences” between the resulting X-ray approximation and the 160 CHAPTER 8. PROBLEM SOLUTIONS original X-ray image. Since we wish to transmit these “differences” over a span of 1 minute with refinements every 5 - 6 seconds, there can be no more than 60 6 to 60 5 or 10 to 12 refinements. If we assume that 12 refinements are made and that each refine- ment corresponds to the “differences” between one of the 12 bits in the origi- nal X-ray and the JPEG reconstructed approximation, then the compression that must be obtained per bit (to allow a 6 second average transfer time for each bit) is C = 4096 × 4096 × 1 9.264 × 106 = 1.81 where, as before, the bottom of the fraction is the number of bits that can be transmitted over a T1 line in 6 seconds. Thus, the “difference” data for each bit must be compressed by a factor just less than 2. One simple way to generate the “difference information” is to XOR the actual X-ray with the reconstructed JPEG approximation. The resulting binary image will contain a 1 in every bit position at which the approximation differs from the original. If the XOR result is trans- mitted one bit at a time beginning with the MSB and ending with the LSB, and each bit is compressed by an average factor of 1.81:1, we will achieve the perfor- mance that is required in the problem statement. To achieve an average error- free bit-plane compression of 1.81:1 (see Section 8.2.7), the XOR data can be Gray coded, run-length coded, and finally variable-length coded. A conceptual block diagram for both the encoder and decoder are given in Fig. P8.30. Note that the decoder computes the bit refinements by XORing the decoded XOR data with the reconstructed JPEG approximation. Problem 8.31 To demonstrate the equivalence of the lifting based approach and the traditional FWT filter bank method, we simply derive general expressions for one of the odd and even outputs of the lifting algorithm of Eq. (8.2-62). For example, the Y (0) output of step 4 of the algorithm can be written as Y4 (0)=Y2 (0)+δ[Y3 (−1)+Y3 (1)] = X (0)+β [Y1 (−1)+Y1 (1)] + δ[Y3 (−1)+Y3 (1)] 161 JPEG Encoder JPEG Decoder MUX Control/ Sequencer 1-bit XOR Gray Coder Run-length Coder Variable Length Coder Bit Selector MUX Bit Selector MUX T1 Transmitter Compressed Data Out X-ray Encoder JPEG Decoder Control/ Sequencer MUX 12-bit XOR Variable Length Decoder Run-length Decoder Gray Decoder Zero Pad and Position Bit Frame Buffer X-ray to Display Compressed Data In Decoder Display Generator Figure P8.30 where the subscripts on the Y ’s have been added to identify the step of the lift- ing algorithm from which the value is generated. Continuing this substitution pattern from earlier steps of the algorithm until Y4 (0) is a function of X’s only, we get Y (0)= 1 + 2αβ + 2αδ + 6αβγδ + 2γδ X (0) +β + 3βγδ+ δ X (1) +αβ + 4αβγδ + αδ + γδ X (2) +βγδ X (3) +αβγδ X (4) +β + 3βγδ+ δ X (−1) +αβ + 4αβγδ + αδ + γδ X (−2) +βγδ X (−3) +αβγδ X (−4). 162 CHAPTER 8. PROBLEM SOLUTIONS Thus, we can form the lowpass analysis filter coefficients shown in Table P8.31- 1. Table P8.31-1 Coefficient Index Expression Value ±4 αβγδ/K 0.026748757 ±3 βγδ/K -0.016864118 ±2 αβ + 4αβγδ + αδ + γδ/K -0.07822326 ±1 β + 3βγδ+ δ/K 0.26686411 0 1 + 2αβ + 2αδ + 6αβγδ + 2γδ/K 0.60294901 Here, the coefficient expressions are taken directly from our expansion of Y (0) and the division by K is in accordance with step 6 of Eq. (8.2-62). The coefficient values in column 3 are determined by substituting the values of α, β, γ, δ,andK from the text into the expressions of column 2. A similar derivation beginning with Y3 (1)=Y1 (1)+γ[Y2 (0)+Y2 (2)] yields Y (1)=α + 3αβγ + δ X (0) + 1 + 2βγ X (1) +α + 3αβγ + δ X (2) +βγ X (3) +αβγ X (4) +βγ X (−1) +αβγ X (−2) from which we can obtain the highpass analysis filter coefficients shown in Table P8.31-2 Table P8.31-2 Coefficient Index Expression Value -2 −K αβγ -0.091271762 -1 −K βγ 0.057543525 0 −K α + 3αβγ + δ 0.591271766 1 −K 1 + 2βγ -1.115087053 2 −K α + 3αβγ + δ 0.591271766 3 −K βγ 0.057543525 4 −K αβγ -0.091271762 163 Problem 8.32 From Eq. (8.6-5) and the problem statement, we get that μ2LL = μ0 = 8 ε2LL = ε0 + 2 − 2 = ε0 = 8. Substituting these values into Eq. (8.2-64), we find that for the 2LL subband Δ2LL = 2(8+0)−8 1 + 8 211 = 1.00390625. Here, we have assumed an 8-bit image so that Rb = 8. Likewise, using Eqs. (8.2- 65), (8.2-64), and Fig. 8.48 (to find the analysis gain bits for each subband), we get Δ2HH = 2(8+2)−8 1 + 8 211 = 4.015625 Δ2HL =Δ2LH = 2(8+1)−8 1 + 8 211 = 2.0078125 Δ1HH = 2(8+2)−8 1 + 8 211 = 4.015625 Δ1HL =Δ1LH = 2(8+1)−8 1 + 8 211 = 2.0078125. Problem 8.33 One approach is to implement Eq. (8.3-1) using Fourier transforms. Using the properties of the Fourier transform: f w =(1 − α) f + αw ℑ f w  = ℑ(1 − α) f + αw  Fw = ℑ(1 − α) f  + ℑ[αw] =(1 − α)ℑ f  + αℑ[w] So, visible watermarking in the transform domain can be accomplished by adding ascaled(byα) version of a watermark’s Fourier transform to a scaled version (by 1 − α) of an image’s Fourier transform and taking the inverse transform of the sum. This is obviously more computationally demanding than the equivalent spatial domain approach [Eq. (8.3-1)]. Problem 8.34 A variety of methods for inserting invisible watermarks into the DFT coefficients of an image have been reported in the literature. Here is a simplified outline of one in which watermark insertion is done as follows: 164 CHAPTER 8. PROBLEM SOLUTIONS II I III IV Figure P8.34 1. Create a watermark by generating a P-element pseudo-random sequence of numbers, ω1,ω2,...,ωP , taken from a Gaussian distribution with zero mean and unit variance. 2.ComputetheDFToftheimagetobewatermarked.Weassumethatthe transform has not been centered by pre-multiplying the image by (−1)x+y . 3. Choose P 2 coefficients from each of the four quadrants of the DFT in the middle frequency range. This is easily accomplished by choosing coeffi- cients in the order shown in Fig. P8.34 and skipping the first K coefficients (the low frequency coefficients) in each quadrant. 4. Insert the first half of the watermark into the chosen DFT coefficients, ci for 1 ≤ i ≤ P 2 , in quadrants I and III of the DFT using c  i = ci (1 + αωi ) 5. Insert the second half of the watermark into the chosen DFT coefficients of quadrants II and IV of the DFT in a similar manner. Note that this pro- cess maintains the symmetry of the transform of a real-valued image. In addition, constant α determines the strength of the inserted watermark. 6. Compute the inverse DFT with the watermarked coefficients replacing the unmarked coefficients. 165 Watermark extraction is performed as follows: 1. Locate the DFT coefficients containing the watermark by following the in- sertion process in the embedding algorithm. 2. Compute the watermark ˆω1, ˆω2,..., ˆωP using ˆωi = ˆci − ci 3. Compute the correlation between ω and ˆωand compare to a pre-determined threshold T to determine if the mark is present. Problem 8.35 There are many ways to watermark an image using DWTs. A simple example that has been reported in the literature uses the following watermark insertion technique: 1. Create a watermark by generating a 1000 element pseudo-random sequence of numbers, ω1,ω2,...,ω1000, taken from a Gaussian distribution with zero mean and unit variance. 2. Compute the L + 1-level DWT of the image to be watermarked. Choose L so that the number of approximation coefficients at level J − L is about 1000. Recall from chapter 7 that these coefficients were denoted Wϕ (J − L,m,n) where J = log2 N and the image is of size N × N.Forthepurposesofthis algorithm, we will call the selected coefficients ci (i) for 1 ≤ i ≤ 1000. 3. Compute the average of the selected approximation coefficients c. 4. Embed watermark ω into the selected approximation coefficients using c  i = c +[ci − c](1 + αωi ) where α determines the “intensity” of the watermark. 5. Compute the watermarked image by taking the inverse DWT with the marked approximation coefficients replacing the original unmarked coefficients. Watermark detection is accomplished as follows: 1. Compute the L + 1-level DWT of the image in question. 166 CHAPTER 8. PROBLEM SOLUTIONS 2. Compute the average ¯ˆc and variance σ(ˆc) of the level J −L approximation coefficients, ˆc1, ˆc2,..., ˆc1000. 3. Extract watermark ˆω using ˆωi = ˆci − ¯ˆc σ(c) σ(ˆc) − (ci − ¯c) ci − ¯c 4. Compute the correlation between ω and ˆω and compare to a pre-determined threshold T to determine if the mark is present. Chapter 9 Problem Solutions Problem 9.1 (a) Converting a rectangular to a hexagonal grid basically requires that even and odd lines be displaced horizontally with respect to each other by one-half the horizontal distance between adjacent pixels (see the figure in the problem state- ment). Because in a rectangular grid there are no pixel values defined at the new locations, a rule must be specified for their creation. A simple approach is to double the image resolution in both dimensions by interpolation (see Sec- tion 2.4.4). Then, the appropriate 6-connected points are picked out of the ex- panded array. The resolution of the new image will be the same as the original (but the former will be slightly blurred due to interpolation). Figure P9.1(a) il- lustrates this approach. The black points are the original pixels and the white points are the new points created by interpolation. The squares are the image points picked for the hexagonal grid arrangement. (b) Rotations in a 6-neighbor arrangement are invariant in 60◦ increments. (c) Yes. Ambiguities arise when there is more than one path that can be followed from one 6-connected pixel to another. Figure P9.1(b) shows an example, in which the 6-connected points of interest are in black. Problem 9.2 (a) With reference to the discussion in Section 2.5.2, m-connectivity is used to avoid multiple paths that are inherent in 8-connectivity. In one-pixel-thick, fully connected boundaries, these multiple paths manifest themselves in the four ba- sic patterns shown in Fig. P9.2(a). The solution to the problem is to use the hit- or-miss transform to detect the patterns and then to change the center pixel to 0, thus eliminating the multiple paths. A basic sequence of morphological steps 167 168 CHAPTER 9. PROBLEM SOLUTIONS (a) (b) Figure P9.1 to accomplish this is as follows: X1 = A B1 Y1 = A ∩ X c 1 X2 = Y1 B2 Y2 = Y1 ∩ X c 2 X3 = Y2 B3 Y3 = Y2 ∩ X c 3 X4 = Y3 B4 Y4 = Y3 ∩ X c 4 where A is the input image containing the boundary. (b) Only one pass is required. Application of the hit-or-miss transform using a given Bi finds all instances of occurrence of the pattern described by that struc- turing element. (c) The order does matter. For example, consider the sequence of points in Fig. P9.2(b), and assume that we are traveling from left to right. If B 1 is applied first, point a will be deleted and point b will remain after application of all other structuring elements. If, on the other hand, B 3 is applied first, point b will be deleted and point a will remain. Thus, we would end up with different (but ac- ceptable) m-paths. 169 (b) Figure P9.2 Problem 9.3 See Fig. P9.3. Keep in mind that erosion is the set formed from the locations of the origin of the structuring element, such that the structuring element is contained within the set being eroded. Problem 9.4 (a) Erosion is set intersection. The intersection of two convex sets is convex also. (b) See Fig. P9.4(a). Keep in mind that the digital sets in question are the larger black dots. The lines are shown for convenience in visualizing what the continu- ous sets would be, they are not part of the sets being considered here. The result of dilation in this case is not convex because the center point is not in the set. (c) See Fig. P9.4(b). Here, we see that the lower right point is not connected to the others. (d) See Fig. 9.4(c). The two inner points are not in the set. Problem 9.5 Refer to Fig. P9.5. The center of each structuring element is shown as a black dot. (a) This solution was obtained by eroding the original set (shown dashed) with the structuring element shown (note that the origin is at the bottom, right). (b) This solution was obtained by eroding the original set with the tall rectangu- lar structuring element shown. 170 CHAPTER 9. PROBLEM SOLUTIONS Figure P9.3 (a) (b) (c) Figure P9.4 (c) This solution was obtained by first eroding the image shown down to two vertical lines using the rectangular structuring element (note that this elements is slightly taller than the center section of the “U” figure). This result was then dilated with the circular structuring element. (d) This solution was obtained by first dilating the original set with the large disk shown. The dilated image was eroded with a disk whose diameter was equal to one-half the diameter of the disk used for dilation. Problem 9.6 The solutions to (a) through (d) are shown from top to bottom in Fig. P9.6. Problem 9.7 (a) The dilated image will grow without bound. (b) A one-element set (i.e., a one-pixel image). 171 (a) (b) (c) (d) Figure P9.5 Problem 9.8 (a) The image will erode to one element. (b) The smallest set that contains the structuring element. Problem 9.9 The proof, which consists of showing that the expression x ∈Z 2 |x +b ∈ A , for every b ∈ B ≡ x ∈Z 2 |(B)x ⊆ A follows directly from the definition of translation because the set (B)x has ele- ments of the form x +b for b ∈ B.Thatis,x +b ∈ A for every b ∈ B implies that (B)x ⊆ A.Conversely,(B)x ⊆ A implies that all elements of (B)x are contained in A,orx +b ∈ A for every b ∈ B. Problem 9.10 (a) Let x ∈ A B. Then, from the definition of erosion given in the problem statement, for every b ∈ B, x +b ∈ A.But,x +b ∈ A implies that x ∈ (A)−b .Thus, 172 CHAPTER 9. PROBLEM SOLUTIONS Figure P9.6 for every b ∈ B, x ∈ (A)−b , which implies that x ∈ b∈B (A)−b . Suppose now that x ∈ b∈B (A)−b .Then,foreveryb ∈ B, x ∈ (A)−b .Thus,foreveryb ∈ B, x +b ∈ A which, from the definition of erosion, means that x ∈ A B. (b) Suppose that x ∈ A B = b∈B (A)−b .Then,foreveryb ∈ B, x ∈ (A)−b ,or x +b ∈ A.But,asshowninProblem9.9,x +b ∈ A for every b ∈ B implies that (B)x ⊆ A,sothatx ∈ A B = x ∈Z 2 |(B)x ⊆ A . Similarly, (B)x ⊆ A implies that all elements of (B)x are contained in A,orx + b ∈ A for every b ∈ B or, as in (a), x + b ∈ A implies that x ∈ (A)−b .Thus,ifforeveryb ∈ B, x ∈ (A)−b ,then x ∈ b∈B (A)−b . 173 Problem 9.11 The approach is to prove that x ∈Z 2 ( ˆB)x ∩ A =  ≡ x ∈Z 2 |x = a +b for a ∈ A and b ∈ B . The elements of ( ˆB)x are of the form x −b for b ∈ B. The condition ( ˆB)x ∩ A =  implies that for some b ∈ B, x −b ∈ A,orx −b = a for some a ∈ A (note in the preceding equation that x = a +b). Conversely, if x = a +b for some a ∈ A and b ∈ B,thenx −b = a or x −b ∈ A, which implies that ( ˆB)x ∩ A = . Problem 9.12 (a) Suppose that x ∈ A ⊕ B.Then,forsomea ∈ A and b ∈ B, x = a +b.Thus,x ∈ (A)b and, therefore, x ∈ b∈B (A)b . On the other hand, suppose that x ∈ b∈B (A)b . Then, for some b ∈ B, x ∈ (A)b . However, x ∈ (A)b implies that there exists an a ∈ A such that x = a + b. But, from the definition of dilation given in the problem statement, a ∈ A, b ∈ B,andx = a +b imply that x ∈ A ⊕ B. (b) Suppose that x ∈ b∈B (A)b .Then,forsomeb ∈ B, x ∈ (A)b . However, x ∈ (A)b implies that there exists an a ∈ A such that x = a +b.But,ifx = a +b for some a ∈ A and b ∈ B,thenx−b = a or x−b ∈ A, which implies that x ∈ ( ˆB)x ∩ A =  . Now, suppose that x ∈ ( ˆB)x ∩ A =  .Thecondition( ˆB)x ∩A =  implies that for some b ∈ B, x −b ∈ A or x −b = a (i.e., x = a +b )forsomea ∈ A.But,ifx = a +b for some a ∈ A and b ∈ B,thenx ∈ (A)b and, therefore, x ∈ b∈B (A)b . Problem 9.13 From the definition of dilation, Eq. (9.2-3), (A ⊕ B)c = z ( ˆB)z ∩ A =  c . The complement of the set of z’s that satisfy ( ˆB)z ∩A =  is the set of z’s such that ( ˆB)z ∩ A = .Inturn,( ˆB)z ∩ A =  implies that that ( ˆB)z is contained in Ac .That is, ( ˆB)z ⊆ Ac , and we can write (A ⊕ B)c = z ( ˆB)z ⊆ Ac = Ac  ˆB where the second step follows from the definition of erosion, Eq. (9.2-1). This completes the proof. 174 CHAPTER 9. PROBLEM SOLUTIONS Problem 9.14 Starting with the definition of closing, (A • B)c =[(A ⊕ B) B]c =(A ⊕ B)c ⊕ ˆB =(Ac ˆB) ⊕ ˆB = Ac ◦ ˆB. The proof of the other duality property follows a similar approach. Problem 9.15 (a) Erosion of a set A by B is defined as the set of all values of translates, z,ofB such that (B)z is contained in A. If the origin of B is contained in B, then the set of points describing the erosion is simply all the possible locations of the origin of B such that (B)z is contained in A. Then it follows from this interpretation (and the definition of erosion) that erosion of A by B is a subset of A. Similarly, dilation of a set C by B is the set of all locations of the origin of ˆB such that the intersection of C and ( ˆB)z is not empty. If the origin of B is contained in B,this implies that C is a subset of the dilation of C by B. From Eq. (9.3-1), we know that A ◦ B =(A B) ⊕ B.LetC denote the erosion of A by B. It was already established that C is a subset of A. From the preceding discussion, we know also that C is a subset of the dilation of C by B.ButC is a subset of A,sotheopening of A by B (the erosion of A by B followed by a dilation of the result) is a subset of A. (b) From Eq. (9.3-3), C ◦ B = {(B)z |(B)z ⊆ C } and D ◦ B = {(B)z |(B)z ⊆ D }. Therefore, if C ⊆ D, it follows that C ◦ B ⊆ D ◦ B. (c) From (a), (A ◦ B) ◦ B ⊆ (A ◦ B). From the definition of opening, (A ◦ B) ◦ B = {(A ◦ B) B}⊕B = {[(A B) ⊕ B] B}⊕B = {(A B) • B}⊕B ⊇ (A B) ⊕ B ⊇ A ◦ B. 175 But, the only way that (A ◦ B) ◦ B ⊆ (A ◦ B) and (A ◦ B) ◦ B ⊇ (A ◦ B) can hold is if (A ◦ B)◦ B =(A ◦ B). The next to last step in the preceding sequence follows from the fact that the closing of a set by another contains the original set [this is from Problem 9.16(a)]. Problem 9.16 (a) From Problem 9.14, (A • B)c = Ac ◦ ˆB, and, from Problem 9.15(a), it follows that (A • B)c = Ac ◦ ˆB ⊆ Ac . Taking the complement of both sides of this equation reverses the inclusion sign and we have that A ⊆ (A • B), as desired. (b) From Problem 9.15(b), if Dc ⊆ C c ,thenDc ◦ ˆB ⊆ C c ◦ ˆB where we used Dc , C c , and ˆB instead of C, D,andB. From Problem 9.15, (C • B)c = C c ◦ ˆB and (D • B)c = Dc ◦ ˆB. Therefore, if Dc ⊆ C c then (D • B)c ⊆ (C • B)c . Taking complements reverses the inclusion, so we have that if C ⊆ D,then(C • B) ⊆ (D • B), as desired. (c) Using results from Problems 9.14 and 9.15, (A • B) • B = (A • B)c ◦ ˆB c = (Ac ◦ ˆB) ◦ ˆB c = (Ac ◦ ˆB) c = (A • B)c c =(A • B). where the third step follows from Problem 9.15(c) and the fourth step follows from Problem 9.14. Problem 9.17 Figure P9.17 shows the solution. Although the images shown could be sketched by hand, they were done in MATLAB for clarity of presentation. The MATLAB code follows: > > f = imread(’FigProb0917.tif’); >>se = strel(’dis’, 11, 0); % Structuring element. >>fa = imerode(f, se); >>fb = imdilate(fa, se); >>fc = imdilate(fb, se); >>fd = imerode(fc, se); 176 CHAPTER 9. PROBLEM SOLUTIONS Figure P9.17 The size of the original image is 648 × 624 pixels. A disk structuring element of radius 11 was used. This structuring element was just large enough to encom- pass each noise element, as given in the problem statement. The images shown in Fig. P9.17 are: (a) erosion of the original, (b) dilation of the result, (c) another dilation, and finally (d) an erosion. The main points we are looking for from the student’s answer are: The first erosion should take out all noise elements that do not touch the rectangle, should increase the size of the noise elements completely contained within the rectangle, and should decrease the size of the rectangle. If worked by hand, the student may or may not realize that some ”imperfections” are left along the boundary of the object. We do not consider this an important issue because it is scale-dependent, and nothing is said in the problem statement about this. The first dilation should shrink the noise compo- nents that were increased in erosion, should increase the size of the rectangle, 177 and should round the corners. The next dilation should eliminate the internal noise components completely and further increase the size of the rectangle. The final erosion (bottom right) should then decrease the size of the rectangle. The rounded corners in the final answer are an important point that should be rec- ognized by the student. Problem 9.18 It was possible to reconstruct the three large squares to their original size be- cause they were not completely eroded and the geometry of the objects and structuring element was the same (i.e., they were squares). This also would have been true if the objects and structuring elements were rectangular. However, a complete reconstruction, for instance, by dilating a rectangle that was partially eroded by a circle, would not be possible. Problem 9.19 Select a one-pixel border around the structuring element (subimage of the T), and let the origin be located at the horizontal/vertical midpoint of this subim- age. The result of applying the hit-or-miss transform would be a single point where the two T’s were in perfect registration. The location of the point would be the same as the origin of the structuring element. Problem 9.20 The key difference between the Lake and the other two features is that the for- mer forms a closed contour. Assuming that the shapes are processed one at a time, a basic two-step approach for differentiating between the three shapes is as follows: Step 1. Apply an end-point detector to the object. If no end points are found, the object is a Lake. Otherwise it is a Bay or a Line. Step 2. There are numerous ways to differentiate between a Bay and a Line. One of the simplest is to determine a line joining the two end points of the ob- ject. If the AND of the object and this line contains only two points, the fig- ure is a Bay. Otherwise it is a Line. There are pathological cases in which this test will fail, and additional ”intelligence” needs to be built into the process, but these pathological cases become less probable with increasing resolution of the thinned figures. 178 CHAPTER 9. PROBLEM SOLUTIONS Figure P9.21 Problem 9.21 (a) The entire image would be filled with 1’s. (b) The background would be filled with 1’s. (c) See Fig. P9.21. Problem 9.22 (a) With reference to the example shown in Fig. P9.22, the boundary that re- sults from using the structuring element in Fig. 9.15(c) generally forms an 8- connected path (leftmost figure), whereas the boundary resulting from the struc- turing element in Fig. 9.13(b) forms a 4-connected path (rightmost figure). (b) Using a 3×3 structuring element of all 1’s would introduce corner pixels into segments characterized by diagonally-connected pixels. For example, square (2,2) in Fig. 9.15(e) would be a 1 instead of a 0. That value of 1 would carry all the way to the final result in Fig. 9.15(i). There would be other 1’s introduced that would turn Fig. 9.15(i) into a much more distorted object. Problem 9.23 (a) If the spheres are not allowed to touch, the solution of the problem starts by determining which points are background (black) points. To do this, we pick a black point on the boundary of the image and determine all black points con- nected to it using a connected component algorithm(Section 9.5.3). These con- nected components are labels with a value different from 1 or 0. The remaining 179 Figure P9.22 black points are interior to spheres. We can fill all spheres with white by apply- ing the hole filling algorithm in Section 9.5.2 until all interior black points have been turned into white points. The alert student will realize that if the interior points are already known, they can all be turned simply into white points thus filling the spheres without having to do region filling as a separate procedure. (b) If the spheres are allowed to touch in arbitrary ways, a way must be found to separate them because they could create “pockets” of black points surrounded by all white or surrounded by all white and part of the boundary of the image. The simplest approach is to separate the spheres by preprocessing. One way to do this is to erode the white components of the image by one pass of a 3 × 3 structuring element, effectively creating a black border around the spheres, thus “separating” them. This approach works in this case because the objects are spherical, which implies that they have small areas of contact. To handle the case of spheres touching the border of the image, we simply set all border point to black. We then proceed to find all background points To do this, we pick a point on the boundary of the image (which we know is black due to prepro- cessing) and find all black points connected to it using a connected component algorithm (Section 9.5.3). These connected components are labels with a value different from 1 or 0. The remaining black points are interior to the spheres. We can fill all spheres with white by applying the hole filling algorithm in Section 9.5.2 until all such interior black points have been turned into white points. The alert student will realize that if the interior points are already known, they can all be turned simply into white points thus filling the spheres without having to 180 CHAPTER 9. PROBLEM SOLUTIONS do region filling as a separate procedure. Note that the erosion of white areas makes the black areas interior to the spheres grow, so the possibility exists that such an area near the border of a sphere could grow into the background. This issue introduces further complications that the student may not have the tools to solve yet. We recommend making the assumption that the interior black ar- eas are small and near the center. Recognition of the potential problem by the student should be sufficient in the context of this problem. Problem 9.24 Denote the original image by A. Create an image of the same size as the origi- nal, but consisting of all 0’s, call it B. Choose an arbitrary point labeled 1 in A, call it p1, and apply the connected component algorithm. When the algorithm converges, a connected component has been detected. Label and copy into B the set of all points in A belonging to the connected components just found, set those points to 0 in A and call the modified image A1. Choose an arbitrary point labeled 1 in A1,callitp2, and repeat the procedure just given. If there are K con- nected components in the original image, this procedure will result in an image consisting of all 0’s after K applications of the procedure just given. Image B will contain K labeled connected components. Problem 9.25 From Section 9.5.9, the following expression is an image H in which all the holes in image f have been filled: H = RD f c (F) c . The only difference between this expression and the original image f is that the holes have been filled. Therefore, the intersection of f c and H is an image con- taining only the (filled) holes. The complement of that result is an image con- taining only the holes. fholes = f c ∩ RD f c (F) c c . Figure P9.25 shows the holes extracted from Fig 9.31(a). Keep in mind that the holes in the original image are black. That’s the reason for the complement in the preceding equation. 181 Figure P9.25 Problem 9.26 (a) If all the border points are 1, then F in Eq. (9.5-28) will be all 0s. The dilation of F by B will also be all 0s, and the intersection of this result with f c will be all 0s also. Then H, which is the complement, will be all 1s. (b) If all the points in the border of f are 1, that means that the interior of the entire image is a hole, so H in (a) will be the correct result. That is, the entire image should be filled with 1s. This algorithm is not capable of detecting holes within holes, so the result is as expected. Problem 9.27 Erosion is the set of points z such that B, translated by z, is contained in A.IfB is a single point, this definition will be satisfied only by the points comprising A, so erosion of A by B is simply A. Similarly, dilation is the set of points z such that ˆB ( ˆB = B in this case), translated by z,overlapsA by at least one point. Because B is a single point, the only set of points that satisfy this definition is the set of points comprising A, so the dilation of A by B is A. Problem 9.28 The dark image structures that are completely filled by morphological closing remain closed after the reconstruction. 182 CHAPTER 9. PROBLEM SOLUTIONS Problem 9.29 Consider first the case for n = 1: E (1) G (F)= E (1) G (F) c c = [(F  B)∪G]c c = (F  B)c ∩G c c = F c ⊕ ˆB ∩G c c =  F c ⊕ B ∩G c c = D(1) G c (F c ) c where the third step follows from DeMorgan’s law, (A ∪ B)c = Ac ∩ Bc , the fourth step follows from the duality property of erosion and dilation (see Section 9.2.3), the fifth step follows from the symmetry of the SE, and the last step follows from the definition of geodesic dilation. The next step, E (2) G (F), would involve the geodesic erosion of the above result. Butthatresultissimplyaset,sowecould obtain it in terms of dilation. That is, we would complement the result just men- tioned, complement G, compute the geodesic dilation of size 1 of the two, and complement the result. Continuing in this manner we conclude that E (n) G = D(1) G c E (n−1) G (F) c c = D(1) G c D(n−1) G c F c c . Similarly, D(1) G (F)= D(1) G (F) c c = [(F ⊕ B)∩G]c c = (F ⊕ B)c ∪G c c = F c  ˆB ∪G c c =  F c  B ∪G c c = E (1) G c (F c ) c . As before, D(n) G = E (1) G c D(n−1) G (F) c c = E (1) G c E (n−1) G c F c c . 183 Problem 9.30 RD G (F)=D(k) G (F) = E (1) G c E (k−1) G c F c c = E (k) G c F c c = R E G c F c c where we used the result from Problem 9.29. The other duality property is proved in a similar manner. Problem 9.31 (a) Consider the case when n = 2 [(F  2B)]c =[(F  B) B]c =(F  B)c ⊕ ˆB = F c ⊕ ˆB ⊕ ˆB = F c ⊕ 2 ˆB where the second and third lines follow from the duality property in Eq. (9.2-5). For an arbitrary number of erosions, [(F  nB)]c =[(F  (n − 1) B)  B]c =[(F  (n − 1) B)]c ⊕ ˆB which, when expanded, will yield [(F  nB)]c = F c ⊕ n ˆB. (b) Proved in a similar manner. Problem 9.32 O(n) R (F)=RD F (F  nB) = R E F c [F  nB]c c = R E F c F c ⊕ n ˆB c = R E F c F c ⊕ nB c = C (n) R F c c 184 CHAPTER 9. PROBLEM SOLUTIONS where the second step follows from the duality of reconstruction by dilation (Problem 9.30), the third step follows from the result in Problem 9.31, the fourth step follows from the symmetry of B, and the last step follows from the defini- tion of closing by reconstruction. The other duality property can be proved in a similar manner. Problem 9.33 (a) From Eq. (9.6-1), f b c = min(s,t )∈b f (x + s,y + t )c = − max(s,t )∈b −f (x + s,y + t )c = max(s,t )∈b −f (x + s,y + t ) = −f ⊕ˆb = f c ⊕ˆb. The second step follows from the definition of the complement of a gray-scale function; that is, the minimum of a set of numbers is equal to the negative of themaximumofthenegativeofthosenumbers.Thethirdstepfollowsfrom the definition of the complement. The fourth step follows from the definition of gray-scale dilation in Eq. (9.6-2), using the fact that ˆb(x,y )=b(−x − y ).Thelast step follows from the definition of the complement, −f = f c . The other duality property is proved in a similar manner. (b) We prove the second duality property: f ◦b c =  f b ⊕b c = f b c ˆb = f c ⊕ˆb ˆb = f c •ˆb. The second and third steps follow from the duality property of dilation and ero- sion, and the last step follows from the definition of closing in Eq. (9.6-8). The other property in the problem statement is proved in a similar manner. (c) We prove the first duality property. Start with the a geodesic dilation of size 1: 185 D(1) g f = D(1) g f c c = (f ⊕b) ∧ g c c = −−(f ⊕b) ∨−g c c = −(f ⊕b) ∨−g c = (f ⊕b)c ∨ g c c = (f c b) ∨ g c c = E (1) g c f c c . The second step follows from the definition of geodesic dilation. The third step follows from the fact that the point-wise minimum of two sets of numbers is the negative of the point-wise maximum of the two numbers. The fourth and fifth steps follow from the definition of the complement. The sixth step follows from the duality of dilation and erosion (we used the given fact that ˆb = b).Thelast step follows from the definition of geodesic erosion. Thenextstepintheiteration,D(2) g f , would involve the geodesic dilation of size 1 of the preceding result. But that result is simply a set, so we could obtain it in terms of erosion. That is, we would complement the result just mentioned, complement g , compute the geodesic erosion of the two, and complement the result. Continuing in this manner we conclude that D(n) g (f )= E (1) g c E (n−1) g c f c c . The other property is proved in a similar way. (d) We prove the first property: RD g (f )=D(k) g (f ) = E (1) g c E (k−1) g c f c c = E (k) g c f c c = R E g c f c c . The other property is proved in a similar manner. (e) We prove the first property. Consider the case when n = 2  f  2b c =  f  B b c = f b c ⊕ˆb = f c ⊕ˆb ⊕ˆb = f c ⊕ 2ˆb 186 CHAPTER 9. PROBLEM SOLUTIONS where the second and third lines follow from the duality property in Eq. (9.6-5). For an arbitrary number of erosions,  f  nb c =  f  (n − 1)b b c =  f  (n − 1)b c ⊕ˆb which, when expanded, will yield  f  nb c = f c ⊕ nˆb. The other property is proved in a similar manner. (f) We prove the first property: O(n) R (f )=RD f f  nb = R E f c  f  nb c c = R E f c f c ⊕ nˆb c = R E f c f c ⊕ nb c = C (n) R f c n c where the second step follows from the duality of reconstruction by dilation given in (d) , the third step follows from the result in (e), the fourth step fol- lows from the symmetry of b, and the last step follows from the definition of closing by reconstruction. The other duality property can be proved in a similar manner. Problem 9.34 The method is predicated on image openings and closings, having nothing to do with spatial arrangement. Because the blobs do not touch, there is no difference between the fundamental arrangement in Fig. 9.43 and the figure in this prob- lem. The steps to the solution will be the same. The one thing to watch is that the SE used to remove the small blobs do not remove or severely attenuate large blobs and thus open a potential path to the boundary of the image larger than the diameter of the large blobs. In this case, disks of radius 30 and 60 (the same those used in Fig. 9.43) do the job properly. The solution images are shown in Fig. P9.34. The explanation is the same as Fig. 9.43. The MATLAB code used to generated the solution follows. > > f = imread(’FigP0934(blobs_in_circular_arrangement).tif’); > > figure, imshow(f) > > % Remove small blobs. > > fsm = imclose(f, strel(’disk’,30)); > > figure, imshow(fsm) 187 Figure P9.34 > > % Remove large blobs. > > flrg = imopen(fsm, strel(’disk’,60)); > > figure, imshow(flrg) > > % Use a morphological gradient to obtain the boundary > > % between the regions. > > se = ones(3); > > grad = imsubtract(imdilate(flrg, se), imerode(flrg, se)); > > figure, imshow(grad) > > % Superimpose the boundary on the original image. > > idx = find(grad > 0); > > final = f; > > final(idx) = 255; > > figure, imshow(final) 188 CHAPTER 9. PROBLEM SOLUTIONS Figure P9.35 Problem 9.35 (a) The noise spikes are of the general form shown in Fig. P9.35(a), with other possibilities in between. The amplitude is irrelevant in this case; only the shape of the noise spikes is of interest. To remove these spikes we perform an opening with a cylindrical structuring element of radius greater than Rmax,asshownin Fig. P9.35(b). Note that the shape of the structuring element is matched to the known shape of the noise spikes. (b) The basic solution is the same as in (a), but now we have to take into account the various possible overlapping geometries shown in Fig. P9.35(c). A structur- ing element like the one used in (a) but with radius slightly larger than 4Rmax will do the job. Note in (a) and (b) that other parts of the image would be affected by this approach. The bigger Rmax, the bigger the structuring element that would be needed and, consequently, the greater the effect on the image as a whole. Problem 9.36 (a) Color the image border pixels the same color as the particles (white). Call the resulting set of border pixels B. Apply the connected component algorithm 189 (Section 9.5.3). All connected components that contain elements from B are particles that have merged with the border of the image. (b) It is given that all particles are of the same size (this is done to simplify the problem; more general analysis requires tools from Chapter 11). Determine the area (number of pixels) of a single particle; denote the area by A. Eliminate from the image the particles that were merged with the border of the image. Apply the connected component algorithm. Count the number of pixels in each com- ponent. A component is then designated as a single particle if the number of pixels is less than or equal to A +,where is a small quantity added to account for variations in size due to noise. (c) Subtract from the image single particles and the particles that have merged with the border, and the remaining particles are overlapping particles. Problem 9.37 As given in the problem statement, interest lies in deviations from the round in the inner and outer boundaries of the washers. It is stated also that we can ig- nore errors due to digitizing and positioning. This means that the imaging sys- tem has enough resolution so that objectionable artifacts will not be introduced as a result of digitization. The mechanical accuracy similarly tells us that no ap- preciable errors will be introduced as a result of positioning. This is important if we want to do matching without having to register the images. The first step in the solution is the specification of an illumination approach. Because we are interested in boundary defects, the method of choice is a back- lighting system that will produce a binary image. We are assured from the prob- lem statement that the illumination system has enough resolution so that we can ignore defects due to digitizing. The next step is to specify a comparison scheme. The simplest way to match binary images is to AND one image with the complement of the other. Here, we match the input binary image with the complement of the golden image (this is more efficient than computing the complement of each input image and com- paring it to the golden image). If the images are identical (and perfectly regis- tered) the result of the AND operation will be all 0s. Otherwise, there will be 1s in the areas where the two images do not match. Note that this requires that the images be of the same size and be registered, thus the assumption of the mechanical accuracy given in the problem statement. As noted, differences in the images will appear as regions of 1s in the AND image. These we group into regions (connected components) by using the al- gorithm given in Section 9.5.3. Once all connected components have been ex- 190 CHAPTER 9. PROBLEM SOLUTIONS tracted, we can compare them against specified criteria for acceptance or re- jectionofagivenwasher.Thesimplestcriterionistosetalimitonthenumber and size (number of pixels) of connected components. The most stringent crite- rion is 0 connected components. This means a perfect match. The next level for “relaxing” acceptance is one connected component of size 1, and so on. More sophisticated criteria might involve measures like the shape of connected com- ponents and the relative locations with respect to each other. These types of descriptors are studied in Chapter 11. Chapter 10 Problem Solutions Problem 10.1 Expand f (x +Δx) into a Taylor series about x: f (x +Δx)=f (x)+Δxf(x)+(Δx) 2! f (x)+··· The increment in the spatial variable x is defined in Section 2.4.2 to be 1, so by letting Δx = 1 and keeping only the linear terms we obtain the result f (x)=f (x + 1) − f (x) which agrees with Eq. (10.2-1). Problem 10.2 The masks would have the coefficients shown in Fig. P10.2. Each mask would yield a value of 0 when centered on a pixel of an unbroken 3-pixel segment ori- ented in the direction favored by that mask. Conversely, the response would be a +2 when a mask is centered on a one-pixel gap in a 3-pixel segment oriented in the direction favored by that mask. Problem 10.3 The key to solving this problem is to find all end points (see Section 9.5.8 for a definition of end point) of line segments in the image. Once all end points have been found, the D8 distance between all pairs of such end points gives the lengths of the various gaps. We choose the smallest distance between end points of every pair of segments and any such distance less than or equal to 191 192 CHAPTER 10. PROBLEM SOLUTIONS Figure P10.2 K satisfies the statement of the problem. This is a rudimentary solution, and numerous embellishments can be added to build intelligence into the process. For example, it is possible for end points of different, but closely adjacent, lines to be less than K pixels apart, and heuristic tests that attempt to sort out things like this are quite useful. Although the problem statement does not call for any such tests, they normally are needed in practice and it is worthwhile to bring this up in class if this problem is assigned as homework. Problem 10.4 (a) The lines were thicker than the width of the line detector masks. Thus, when, for example, a mask was centered on the line it ”saw” a constant area and gave aresponseof0. (b) Via connectivity analysis. Problem 10.5 (a) The first row in Fig. P10.5 shows a step, ramp, and edge image, and horizontal profiles through their centers. Similarly, the second row shows the correspond- ing gradient images and horizontal profiles through their centers. The thin dark borders in the images are included for clarity in defining the borders of the im- ages; they are not part of the image data. (b) The third row in Fig. P10.5 shows the angle images and their profiles. All images were scaled for display, so only the shapes are of interest. In partic- ular, the profile of the angle image of the roof edge has zero, negative, and pos- itive values. The gray in the scaled angle image denotes zero, and the the light and dark areas correspond to positive and equally negative values, respectively. The black in all other images denotes 0 and the pure white is the maximum value (255 because these are 8-bit images). Grays are values in between. 193 Edges and their profiles Gradient images and their profiles Angle images and their profiles Figure P10.5 Problem 10.6 Figure P10.6 shows the solution. Problem 10.7 Figure P10.7 shows the solution. Problem 10.8 (a) Inspection of the Sobel masks shows that g x = 0 for edges oriented vertically and g y = 0 for edges oriented horizontally. Therefore, it follows in this case that, for vertical edges, ∇f = g 2 y = g y , and similarly for horizontal edges. (b) ThesameargumentappliestothePrewittmasks. 194 CHAPTER 10. PROBLEM SOLUTIONS n n Figure P10.6 Problem 10.9 Consider first the Sobel masks of Figs. 10.14 and 10.15. A simple way to prove that these masks give isotropic results for edge segments oriented at multiples of 45◦ is to obtain the mask responses for the four general edge segments shown in Fig. P10.9, which are oriented at increments of 45◦. The objective is to show that the responses of the Sobel masks are indistinguishable for these four edges. That this is the case is evident from Table P10.9, which shows the response of each Sobel mask to the four general edge segments. We see that in each case the response of the mask that matches the edge direction is (4a − 4b), and the response of the corresponding orthogonal mask is 0. The response of the re- maining two masks is either (3a − 3b) or (3b − 3a). The sign difference is not significant because the gradient is computed by either squaring or taking the absolute value of the mask responses. The same line of reasoning applies to the Prewitt masks. Table P10.9 Edge Horizontal Vertical +45◦ −45◦ direction Sobel (g x )Sobel(g y )Sobel(g 45)Sobel(g −45) Horizontal 4a − 4b 03a − 3b 3b − 3a Vertical 0 4a − 4b 3a − 3b 3a − 3b +45◦ 3a − 3b 3a − 3b 4a − 4b 0 −45◦ 3b − 3a 3a − 3b 04a − 4b Problem 10.10 Consider first the 3 × 3 smoothing mask mentioned in the problem statement, and a general 3 × 3 subimage area with intensities a through i, whose center point has value e, as Fig. P10.10 shows. Recall that value e is replaced by the response of the 3 × 3 mask when its center is at that location. Ignoring the 1/9 195 Figure P10.7 scale factor, the response of the mask when centered at that location is (a + b + c + d + e + f + g + h + i). The idea with the one-dimensional mask is the same: We replace the value of a pixel by the response of the mask when it is centered on that pixel. With this in mind, the mask [111] would yield the following responses when centered at the pixels with values b, e,andh, respectively: (a +b +c), (d +e + f ),and(g +h +i). Next, we pass the mask ⎡ ⎢⎣ 1 1 1 ⎤ ⎥⎦ through these results. When this mask is centered at the pixel with value e,its response will be [(a +b +c) + (d +e + f ) + (g +h +i)],whichisthesameasthe result produced by the 3 × 3 smoothing mask. Returning now to problem at hand, when the g x Sobel mask is centered at the pixel with value e, its response is g x = (g + 2h + i) − (a + 2b + c).Ifwepass the one-dimensional differencing mask ⎡ ⎢⎣ −1 0 1 ⎤ ⎥⎦ 196 CHAPTER 10. PROBLEM SOLUTIONS through the image, its response when its center is at the pixels with values d , e,andf , respectively, would be:(g − a) , (h −b),and(i − c). Next we apply the smoothing mask [121] to these results. When the mask is centered at the pixel with value e, its response would be [(g − a) + 2(h − b) + (i − c)] which is [(g + 2h+i) − (a +2b +c)]. This is the same as the response of the 3×3Sobelmaskfor g x . The process to show equivalence for g y is basically the same. Note, however, that the directions of the one-dimensional masks would be reversed in the sense that the differencing mask would be a column mask and the smoothing mask would be a row mask. Problem 10.11 (a) The operators are as follows (negative numbers are shown underlined): 111 110 101 011 111 110101 011 000 101 101 101 000 101 101 101 111 011 101 110 111 011 101 110 (b) The solution is as follows: Compass gradient operators ENENNWWSWSSE Gradient direction NNWWSWSSEENE Problem 10.12 (a) The solution is shown in Fig. P10.12(a). The numbers in brackets are values of [g x , g y ]. Figure P10.9 197 Figure P10.10 (b) The solution is shown in Fig. P10.12(b). The angle was not computed for the trivial cases in which g x = g y = 0. The histogram follows directly from this table. (c) The solution is shown in Fig. P10.12(c). Problem 10.13 (a) The local average at a point (x,y ) in an image is given by ¯f (x,y )= 1 n2 z i ∈Sxy z i where Sxy is the region in the image encompassed by the n × n averaging mask when it is centered at (x,y ) and the z i are the intensities of the image pixels in that region. The partial ∂ ¯f /∂ x = ¯f (x + 1,y ) − ¯f (x,y ) is thus given by ∂ ¯f /∂ x = 1 n2 z i ∈Sx+1,y z i − 1 n2 z i ∈Sxy z i The first summation on the right can be interpreted as consisting of all the pixels in the second summation minus the pixels in the first row of the mask, plus the row picked up by the mask as it moved from (x,y ) to (x + 1,y ).Thus,wecan write the preceding equation as ∂ ¯f /∂ x = 1 n2 z i ∈Sx+1,y z i − 1 n2 z i ∈Sxy z i 198 CHAPTER 10. PROBLEM SOLUTIONS Figure P10.12 = 1 n2 z i ∈Sxy z i + 1 n2 (sum of pixels in new row) − 1 n2 (sum of pixels in 1st row) − 1 n2 z i ∈Sxy z i = 1 n2 y + n−1 2 k=y − n−1 2 f (x + n + 1 2 ,k) − 1 n2 y + n−1 2 k=y − n−1 2 f (x − n − 1 2 ,k) = 1 n2 y + n−1 2 k=y − n−1 2 f (x + n + 1 2 ,k) − f (x − n − 1 2 ,k) . 199 This expression gives the value of ∂ ¯f /∂ x at coordinates (x,y ) of the smoothed image. Similarly, ∂ ¯f /∂ y = 1 n2 z i ∈Sx,y +1 z i − 1 n2 z i ∈Sxy z i = 1 n2 z i ∈Sxy z i + 1 n2 (sum of pixels in new col) − 1 n2 (sum of pixels in 1st col) − 1 n2 z i ∈Sxy z i = 1 n2 x+ n−1 2 k=x− n−1 2 f (k,y + n + 1 2 ) − 1 n2 x+ n−1 2 k=x− n−1 2 f (k,y − n − 1 2 ) = 1 n2 x+ n−1 2 k=x− n−1 2 f (k,y + n + 1 2 ) − f (k,y − n − 1 2 ) . The edge magnitude image corresponding to the smoothed image ¯f (x,y ) is then given by ¯M(x,y )=  (∂ ¯f /∂ x)2 +(∂ ¯f /∂ y )2. (b) From the preceding equation for ∂ ¯f /∂ x, the maximum value this term can have is when each difference in the summation is maximum which, for an m-bit image, is L = 2m − 1 (e.g., 255 for an 8-bit image). Thus, ∂ ¯f /∂ x max = 1 n2 (nL)= L n and similarly for other derivative term. Therefore, ¯Mmax(x,y )= L2 n2 + L2 n2 = L n 2. For the original image, the maximum value that ∂ f /∂ x can have is L,andsimi- larly for ∂ f /∂ y . Therefore, Mmax(x,y )=L 2 and the ratio is: ¯Mmax(x,y ) Mmax(x,y ) = 1 n which shows that the edge strength of a smoothed image compared to the orig- inal image is inversely proportional to the size of the smoothing mask. 200 CHAPTER 10. PROBLEM SOLUTIONS Problem 10.14 (a) We proceeed as follows Average ∇2G(x,y ) = ∞ −∞ ∞ −∞ ∇2G(x,y )dxdy = ∞ −∞ ∞ −∞ x 2 + y 2 − 2σ2 σ4 e − x2+y 2 2σ2 dxdy = 1 σ4 ∞ −∞ x 2e − x2 2σ2 dx ∞ −∞ e − y 2 2σ2 dy + 1 σ4 ∞ −∞ y 2e − y 2 2σ2 dy ∞ −∞ e − x2 2σ2 dx − 2 σ2 ∞ −∞ e − x2+y 2 2σ2 dxdy = 1 σ4 2πσ × σ2 2πσ + 1 σ4 2πσ × σ2 2πσ −2 2πσ2 σ2 = 4π − 4π = 0 thefourthlinefollowsfromthefactthat variance(z)=σ2 = 1 2πσ ∞ −∞ z 2e − z2 2σ2 dz and 1 2πσ ∞ −∞ e − z2 2σ2 dz = 1. (b) Convolving an image f (x,y ) with ∇2G(x,y ) in the spatial domain is the same as multiplying the Fourier transforms of these two functions in the frequency domain. From Chapter 4, we know that the average value of a function in the frequency domain is proportional to the value of its transform at the origin of the frequency domain. We showed in (a) that the average value of ∇2G(x,y ) is zero, so its Fourier transform at the origin must be zero. Therefore, the product of the Fourier transforms of f (x,y ) and ∇2G(x,y ) evaluated at the origin is zero, from 201 which it follows that the convolution of these two functions has a zero average value. (c) The answer is yes. From Problem 3.16 we know that convolving a mask (like the Laplacian masks in the problem statement) whose coefficients sum to zero with any image gives a result whose average value is zero. The result is digital, so if its average is zero, this means that all the elements sum to zero. Thus, con- volving this result with any image also gives a result whose average value is zero. Problem 10.15 (a) Let g L(x,y ) denote the LoG image, as in Fig. 10.22(b). This image has both positive and negative values, and we know that the zero crossings are such that at least a pair of opposing locations in a 3× 3 neighborhood differ in sign. Con- sider a binary image g P (x,y ) formed by letting g P (x,y )=1ifg L(x,y ) > 0and g P (x,y )=0 otherwise. Figure P10.15(a) shows g P (x,y ) for the Laplacian im- age in Fig. 10.22(b). The important thing about this image is that it consists of connected components. Futhermore (for the 3 × 3 mask used to detect zero crossings and a threshold of 0) each point on the boundary of these connected components is either a points of transition between positive and negative (i.e., it is the center point of the mask) or it is adjacent to the point of transition. In other words, all zero-crossing points are adjacent to the boundaris of the con- nected components just described. But, boundaries of connected components formaclosedpath(apathexistsbetween any two points of a connected compo- nent, and the points forming the boundary of a connected components are part of the connected component, so the boundary of a connected a component is a closed path). Figure P10.15(b) shows the closed paths for the problem in ques- tion. Compare these closed paths and the binary regions in Fig. P10.15(a). (b) The answer is yes for functions that meet certain mild conditions, and if the zero crossing method is based on rotational operators like the LoG func- tion and a threshold of 0. Geometrical properties of zero crossings in general are explained in some detail in the paper ”On Edge Detection,” by V.Torre and T. Poggio, IEEE Trans. Pattern Analysis and Machine Intell., vol. 8, no. 2, 1986, pp. 147-163. Looking up this paper and becoming familiar with the mathematical underpinnings of edge detection is an excellent reading assignment for graduate students. 202 CHAPTER 10. PROBLEM SOLUTIONS Figure P10.15 Problem 10.16 G (r)=−r σ2 e − r2 2σ2 ∇2G(x,y )=G (r) = r 2 σ4 − 1 σ2 e − r2 2σ2 = r 2 − σ2 σ4 e − r2 2σ2 Letting r 2 = x 2 + y 2 we obtain ∇2G(x,y )=x 2 + y 2 − σ2 σ4 e − x2+y 2 2σ2 Comparing this derivation with the derivation of Eq. (10.2-15) we see that the reason for the difference is in the derivatives of r as opposed to the derivatives with respect to x and y . Another way of explaining the difference is that working with r is basically working with a radial slice of a 2-D function, which is the same as working with a 1-D Gaussian. The 2-D nature of the problem is thus lost. As it turns out, the difference of only 2 in the numerator multiplying sigma makes a significant difference in the filtered result. Problem 10.17 (a) From Eq. (10.2-26), the DoG function is zero when 1 2πσ2 1 e − x2+y 2 2σ2 1 = 1 2πσ2 2 e − x2+y 2 2σ2 2 . 203 Taking the natural log of both sides yields, ln 1 2πσ2 1 − x 2 + y 2 2σ2 1 = ln 1 2πσ2 2 − x 2 + y 2 2σ2 2 . Combining terms, x 2 + y 2 1 2σ2 1 − 1 2σ2 2 = ln 1 2πσ2 1 − ln 1 2πσ2 2 = ln σ2 1 σ2 2 . The LoG function [Eq. (10.2-23)] is zero when x 2 + y 2 = 2σ2. Then, from the preceding equation, σ2 1 σ2 2 − 1 σ2 1 = ln σ2 1 σ2 2 . Finally, solving for σ2, σ2 = σ2 1σ2 2 σ2 1 − σ2 2 ln σ2 1 σ2 2 which agrees with Eq. (10.2-27). (b) To obtain an expression in terms of k,weletσ1 = kσ2 in the preceding equation: σ2 = k 2σ4 2 k 2σ2 2 − σ2 2 ln k 2σ2 2 σ2 2 = k 2 k 2 − 1 σ2 2 ln(k 2) with k > 1. Problem 10.18 (a) Equation (10.2-21) can be written in the following separable form G(x,y )=e − x2+y 2 2σ2 = e − x2 2σ2 e − y 2 2σ2 = G(x)G(y ). 204 CHAPTER 10. PROBLEM SOLUTIONS From Eq. (3.4-2) and the preceding equation, the convolution of G(x,y ) and f (x,y ) can be written as G(x,y )+f (x,y )= a s =−a a t =−a G(s,t )f (x − s,y − t ) = a s =−a a t =−a e − s2 2σ2 e − t 2 2σ2 f (x − s,y − t ) = a s =−a e − s2 2σ2 ⎡ ⎣ a t =−a e − t 2 2σ2 f (x − s,y − t ) ⎤ ⎦ where a =(n − 1)/2andn is the size of the n × n mask obtained by sampling Eq. (10.2-21). The expression inside the brackets is the 1-D convolution of the exponential term, e −t 2/2σ2 ,withtherowsoff (x,y ). Then the outer summation is the convolution of e −s 2/2σ2 with the columns of the result. Stated another way, G(x,y )+f (x,y )=G(x)+  G(y )+f (x,y ) . (b) Direct implementation of 2-D convolution requires n2 multiplications at each location of f (x,y ), so the total number of multiplications is n2 × M × N. 1-D convolution requires n multiplications at each location of every row in the image, for a total of n × M × N for the pass along the rows. Then, n × M × N multiplications are required for the pass along the columns, for a total of 2nMN multiplications. The computational advantage, A,isthen A = n2MN 2nMN = n 2 which is independent of image size. For example, if n = 25, A = 12.5, so it takes 12.5 more multiplications to implement 2-D convolution directly than it does to implement the procedure just outlined that uses 1-D convolutions. Problem 10.19 (a) As Eq. (10.2-25) shows, the first two steps of the algorithm can be summa- rized into one equation: g (x,y )=∇2[G(x,y )+f (x,y )]. Using the definition of the Laplacian operator we can express this equation as g (x,y )= ∂ 2 ∂ x 2  G(x,y )+f (x,y ) + ∂ 2 ∂ y 2  G(x,y )+f (x,y ) = ∂ 2 ∂ x 2  G(x)+G(y )+f (x,y ) + ∂ 2 ∂ y 2  G(x)+G(y )+f (x,y ) 205 where the second step follows from Problem 10.18, withG(x)=e − x2 2σ2 andG(y )= e − y 2 2σ2 . The terms inside the two brackets are the same, so only two convolutions are required to implement them. Using the definitions in Section 10.2.1, the partials may be written as ∂ 2 f ∂ x 2 = f (x + 1)+f (x − 1) − 2f (x) and ∂ 2 f ∂ y 2 = f (y + 1)+f (y − 1) − 2f (y ). The first term can be implemented via convolution with a 1 × 3 mask having coefficients coefficients, [1 − 21], and the second with a 3 × 1 mask having the same coefficients. Letting ∇2 x and ∇2 y represent these two operator masks, we have the final result: g (x,y )=∇2 x +  G(x)+G(y )+f (x,y ) + ∇2 y +  G(x)+G(y )+f (x,y ) which requires a total of four different 1-D convolution operations. (b) If we use the algorithm as stated in the book, convolving an M × N image with an n × n mask will require n2 × M × N multiplications (see the solution to Problem 10.18). Then convolution with a 3 × 3 Laplacian mask will add another 9 × M × N multiplications for a total of (n2 + 9) × M × N multiplications. De- composing a 2-D convolution into 1-D passes requires 2nMN multiplications, as indicated in the solution to Problem 10.18. Two more convolutions of the re- sulting image with the 3×1and1×3 derivative masks adds 3MN+3MN = 6MN multiplications. The computational advantage is then A = (n2 + 9)MN 2nMN + 6MN = n2 + 9 2n + 6 which is independent of image size. For example, for n = 25, A = 11.32, so it takes on the order of 11 times more multiplications if direct 2-D convolution is used. Problem 10.20 (a) ThesolutionisbasicallythesameasinProblem10.19,butusingthe1-D masks in Fig. 10.13 to implement Eqs. (10.2-12) and (10.2-13). (b) Smoothing with a Gaussian filter (step 1 of the algorithm) requires convolv- ing an M × N image with an n × n filter mask will require n2 × M × N multipli- cations as explained in the solution of Problem 10.19. Then two convolutions 206 CHAPTER 10. PROBLEM SOLUTIONS with a 3×3 horizontal and vertical edge detector requires 2(9×M ×N)=18MN multiplications, for a total of (n2 + 18) × M × N multiplications. The magni- tude image requires two multiplications, one to compute ∂ f /∂ x times itself and ∂ f /∂ y times itself, for a total of 2MN multiplications. The total for the 2- D approach is then (n2 + 20)MN multiplications. The 1-D convolution for step 1requires2nMN multiplications (see the solution to Problem, 10.19). The y component of the gradient can be implemented using the difference ∂ f /∂ y = f (x,y + 1) − f (x,y ), which, as stated in Section 10.2.5, can be implemented by convolving the smoothed image with a 1-D mask with coefficients [−11].This will require 2MN multiplications. Similarly, the other partial derivative can be implemented with a vertical mask with the same coefficients, for a total of 4NM multiplications. The squares are the same as for the 2-D implementation: 2MN multiplications. The total for the 1-D implementation is then (2n + 6)MN.The ratio of 2-D to 1-D multiplications is A = (n2 + 20)MN (2n + 6)MN = n2 + 20 2n + 6 which is independent of image size. When, for example, n = 25, then A = 11.52, so it would take approximately 11 times more multiplications to implement the first part of the Canny algorithm with 2-D convolutions. Problem 10.21 Parts (a) through (e) are shown in rows 2 through 6 of Fig. P10.21. Problem 10.22 (a) Express x cosθ + y sinθ = ρ in the form y = −(cotθ)x + ρ/sinθ.Equat- ing terms with the slope-intercept form, y = ax + b,givesa = −(cotθ) and b = ρ/sinθ.Thisgivesθ = cot−1(a) and ρ = b sinθ. Once obtained from a and b of a given line, the parameters θ and ρ completely specify the normal representation of that line. (b) θ = cot−1(2)=26.6◦ and ρ =(1)sin θ = 0.45. Problem 10.23 (a) Point 1 has coordinates x = 0andy = 0. Substituting into Eq. (10.2-38) yields ρ = 0, which, in a plot of ρ vs. θ,isastraightline. (b) Only the origin (0,0) would yield this result. 207 Edges and their profiles Gradient images and their profiles Laplacian images and their profiles Images from Steps 1 and 2 of the Marr-Hildreth algorithm and their profiles Images from Steps 1 and 2 of the Canny algorithm and their profiles Canny angle images and their profiles Figure P10.21 208 CHAPTER 10. PROBLEM SOLUTIONS (c) At θ =+90◦, it follows from Eq. (10.2-38) that x · (0)+y · (1)=ρ,ory = ρ.At θ = −90◦ , x · (0)+y · (−1)=ρ,or−y = ρ. Thus the reflective adjacency. Problem 10.24 Subdividing the θ-axis into K increments gives, for every point (xk ,yk ), K values of ρ corresponding to the K possible values of θ.Withn image points, this method involves nK computations. Thus the procedure is linear in n. Problem 10.25 This problem is a natural for the Hough transform, which is set up as follows: The θ axis is divided into six subdivisions, corresponding to the six specified di- rections and their error bands. For example (because the angle directions spec- ified in the problem statement are with respect to the horizontal) the first band for angle θ extends from −30◦ to −20◦, corresponding to the −25◦ direction and its ±5◦ band. The ρ axis extends from ρ = − D to ρ =+ D,whereD is the largest distance between opposite corners of the image, properly calibrated to fit the particular imaging set up used. The subdivisions in the ρ axis are chosen finely enough to resolve the minimum expected distance between tracks that may be parallel, but have different origins, thus satisfying the last condition of the problem statement. Set up in this way, the Hough transform can be used as a “filter” to categorize all points in a given image into groups of points in the six specified directions. Each group is then processed further to determine if its points satisfy the criteria for a valid track: (1) each group must have at least 100 points; and (2) it cannot have more than three gaps, each of which cannot be more than 10 pixels long (see Problem 10.3 regarding the estimation of gaps of a given length). Problem 10.26 The essence of the algorithm is to compute at each step the mean value, m1,of all pixels whose intensities are less than or equal to the previous threshold and, similarly, the mean value, m2, of all pixels with values that exceed the threshold. Let pi = ni /n denote the ith component of the image histogram, where ni is the number of pixels with intensity i,andn is the total number of pixels in the image. Valid values of i areintherange0≤ i ≤ L − 1, where L is the number on intensities and i is an integer. The means can be computed at any step k of the 209 algorithm: m1(k)= I (k−1) i=0 ipi /P(k) where P(k)= I (k−1) i=0 pi and m2(k)= L−1 i=I (k−1)+1 ipi /[1 − P(k)] . The term I (k − 1) is the smallest integer less than or equal to T(k − 1),andT (0) is given. The next value of the threshold is then T (k + 1)=1 2 [m1(k)+m2(k)]. Problem 10.27 As stated in Section 10.3.2, we assume that the initial threshold is chosen be- tween the minimum and maximum intensities in the image. To begin, consider the histogram in Fig. P10.27. It shows the threshold at the kth iterative step, and the fact that the mean m1(k + 1) will be computed using the intensities greater than T(k) times their histogram values. Similarly, m2(k +1) will be computed us- ing values of intensities less than or equal to T(k) times their histogram values. Then, T (k +1)=0.5[m1(k +1)+m2(k +1)]. The proof consists of two parts. First, we prove that the threshold is bounded between 0 and L−1. Then we prove that the algorithm converges to a value between these two limits. To prove that the threshold is bounded, we write T (k + 1)=0.5[m1(k + 1)+ m2(k + 1)].Ifm2(k + 1)=0, then m1(k + 1) will be equal to the image mean, M,andT(k + 1) will equal M/2whichislessthanL − 1. If m2(k + 1) is zero, the same will be true. Both m1 and m2 cannot be zero simultaneously, so T(k + 1) will always be greater than 0 and less than L − 1. To prove convergence, we have to consider three possible conditions: 1. T(k + 1)=T(k), in which case the algorithm has converged. 2. T(k + 1) < T(k), in which case the threshold moves to the left. 3. T(k + 1) > T(k), in which case the threshold moves to the right. 210 CHAPTER 10. PROBLEM SOLUTIONS Tk()Intensities used to compute (+1)mk2 Intensities used to compute (+1)mk1 Intensity Histogram values }} L 1-0 Figure P10.27 In case (2), when the threshold value moves to the left, m2 will decrease or stay the same and m1 will also decrease or stay the same (the fact that m1 decreases or stays the same is not necessarily obvious. If you don’t see it, draw a simple his- togram and convince yourself that it does), depending on how much the thresh- old moved and on the values of the histogram. However, neither threshold can increase. If neither mean changes, then T (k +2) will equal T(k +1) and the algo- rithm will stop. If either (or both) mean decreases, then T (k +2) < T(k +1),and the new threshold moves further to the left. This will cause the conditions just stated to happen again, so the conclusion is that if the thresholds starts mov- ing left, it will always move left, and the algorithm will eventually stop with a value T > 0, which we know is the lower bound for T . Because the threshold al- ways decreases or stops changing, no oscillations are possible, so the algorithm is guaranteed to converge. Case (3) causes the threshold to move the right. An argument similar to the preceding discussion establishes that if the threshold starts moving to the right it will either converge or continue moving to the right and will stop eventually with a value less than L−1. Because the threshold always increases or stops changing, no oscillations are possible, so the algorithm is guaranteed to converge. Problem 10.28 Consider an image whose intensities (of both background and objects) are greater than L/2 and less than L − 1, where L is the number of intensity levels and L − 1 is the maximum intensity level (recall that we work with the intensity scale in the range [0,L − 1]. Suppose that we choose the initial threshold as T = 0. Then 211 Intensity Histogram values L 1-0 L/2 Intensities of background Intensities of objects Imin } } M Figure P10.29 all pixels will be assigned to class 1 and none to class 2. So m1 will be the mean value, M,oftheimageandm2 will be zero because no pixels will be assigned to class 2. T will be the sum of the two means divided by 2, or M/2. However, the image mean, M, is between L/2andL−1 because there are no pixels with values below L/2. So, the new threshold, T = M/2willbebelowL/2. Because there are no pixels with values below L/2, m2 will be zero again, and m1 will be the mean value of the intensities and T will again be equal to M/2. So, the algorithm will terminate with the wrong value of threshold. The same concept is restated more formally and used as a counter-example in the solution of Problem 10.29. Problem 10.29 The value of the the threshold at convergence is independent of the initial value if the initial value of the threshold is chosen between the minimum and max- imum intensity of the image (we know from Problem 10.27 that the algorithm converges under this condition). The final threshold is not independent of the initial value chosen for T if that value does not satisfy this condition. For ex- ample, consider an image with the histogram in Fig. P10.29. Suppose that we select the initial threshold T(1)=0. Then, at the next iterative step, m2(2)=0, m1(2)=M,andT(2)=M/2. Because m2(2)=0, it follows that m2(3)=0, m1(3)=M,andT (3)=T(2)=M/2. Any following iterations will yield the same result, so the algorithm converges with the wrong value of threshold. If we had started with Imin < T (1) < Imax, the algorithm would have converged properly. 212 CHAPTER 10. PROBLEM SOLUTIONS Problem 10.30 (a) For a uniform histogram, we can view the intensity levels as points of unit mass along the intensity axis of the histogram. Any values m1(k) and m2(k) are the means of the two groups of intensity values G1 and G2.Becausethehis- togram is uniform, these are the centers of mass of G1 andG2.Weknowfromthe solution of Problem 10.27 that if T starts moving to the right, it will always move in that direction, or stop. The same holds true for movement to the left. Now, assume that T(k) has arrived at the center of mass (average intensity). Because all points have equal "weight" (remember the histogram is uniform), if T (k + 1) moves to the right G2 will pick up, say, Q new points. But G1 will lose the same number of points, so the sum m1 + m2 will be the same and the algorithm will stop. (b) The prove is similar to (a) because the modes are identical. When the algo- rithm arrives at the point between the two means, further motion of the thresh- old toward one or the other mean will cause one group to pick up and the other to lose the same "mass". Thus, their sum will be the same and the algorithm will stop. Problem 10.31 (a) A1 = A2 and σ1 = σ2 = σ, which makes the two modes identical [this is the same as Problem 10.30(b)]. (b) You know from Problem 10.30 that if the modes were symmetric and iden- tical, the algorithm would converge to the point midway bewteen the means. In the present problem, the modes are symmtric but they may not be identical. Then, all we can say is that the algorithm will converge to a point somewhere be- tween the means (from Section 10.3.2 we know that the algorithm must start at some point between the minimum and maximum image intensities). So, if both A1 and A2 are greater than 0, we are assured that the algorithm will converge to a point somewhere between m1 and m2 . (c) σ1 >> σ2. This will ”pull” the threshold toward m1 during iteration. 213 Problem 10.32 (a) σ2 B = P1(m 1 − mG )2 + P2(m 2 − mG )2 = P1(m1 − (P1m1 + P2m2)]2 + P2[m2 − (P1m1 + P2m2)]2 = P1[m 1 − m1(1 − P 2) − P2m2]2 + P2[m2 − P1m1 − m2(1 − P1)]2 = P1[P2m1 − P2m2]2 + P2[P1m2 − P1m1]2 = P1P2 2 (m1 − m2)2 + P2P2 1 (m1 − m2)2 =(m1 − m2)2[P1P2 2 + P2P2 1 ] =(m1 − m2)2[P1P2(P2 + P1)] = P1P2(m1 − m2)2 we used the facts that mG = P1m1 + P2m2 and P1 + P2 = 1. This proves the first part of Eq. (10.3-15). (b) First, we have to show that m2(k)=mG − m(k) 1 − P1(k) . This we do as follows: m2(k)= 1 P2(k) L−1 i=k+1 ipi = 1 1 − P1(k) L−1 i=k+1 ipi = 1 1 − P1(k) ⎡ ⎣ L−1 i=0 ipi − k i=0 ipi ⎤ ⎦ = mG − m(k) 1 − P1(k) . Then, σ2 B = P1P2(m1 − m2)2 = P1P2 m P1 − mG − m 1 − P1 2 = P1(1 − P1) m − P1mG P1(1 − P1) 2 = (mG P1 − m)2 P1(1 − P1) . 214 CHAPTER 10. PROBLEM SOLUTIONS Problem 10.33 From, Eq. (10.3-15), σ2 B = P1(k)P2(k)[m 1(k) − m 2(k)]2 = P1(k)[1 − P1(k)][m 1(k) − m 2(k)]2 As stated in the book, the measure σ2 B (or η) takes a minimum value of 0 when m1 = m2 which implies that there is only one class of pixels (the image is con- stant), in which case P1(k)=1orP1(k)=0forsomevalueofk.Inanyother case, P1(k)[1−P1(k)] > 0 or, equivalently, for 0 < P1(k) < 1, so the measure takes on positive and bounded values, and a finite maximum is guaranteed to exist for some value of k if the image is not constant. Problem 10.34 From the definition in Eq. (10.3-12), η = σ2 B σ2 G where σ2 B = P1(m 1 − mG )2 + P2(m 2 − mG )2 = P1P2(m 1 − m 2)2 and σ2 G = L−1 i=0 (i − mG )2pi . As in the text, we have omitted k in σ2 B for the sake of notational clarity, but the assumption is that 0 ≤ k ≤ L−1. As explained in the book, the minimum value of η is zero, and it occurs when the image is constant. It remains to be shown that the maximum value is 1, and that it occurs for two-valued images with values 0 and L − 1. From the second line of the expression for σ2 B we see that the maximum oc- curs when the quantity (m1 − m2)2 is maximum because P1 and P2 are positive. The intensity scale extends from 0 to L −1, so the maximum difference between means occurs when m1 = 0andm2 = L − 1. But the only way this can happen is if the variance of the two classes of pixels is zero, which implies that the image only has these two values, thus proving the assertion that the maximum occurs only when the image is two-valued with intensity values 0 and L − 1. It remains to be shown that the maximum possible value of η is 1. 215 When m1 = 0andm2 = L − 1, σ2 B = P1P2(m 1 − m 2)2 = P1P2(L − 1)2. For an image with values 0 and L − 1, σ2 G = L−1 i=0 (i − mG )2pi = k i=0 (i − mG )2P1 + L−1 i=k+1 (i − mG )2P2 =(0 − mG )2P1 +(L − 1 − mG )2P2 From Eq. (10.3-10) mG = P1m1 + P2m2 = P2(L − 1). So, σ2 G =(0 − mG )2P1 +(L − 1 − mG )2P2 = P2 2 (L − 1)2P1 +(L − 1)2(1 − P2)2P2 =(L − 1)2(P2 2 P1 + P2 1 P2) =(L − 1)2P2P1(P2 + P1) = P2P1(L − 1)2 where we used the fact that P1 +P2 = 1. We see from the preceding results for σ2 B and σ2 G that σ2 B /σ2 G = 1 when the image is two-valued with values 0 and L − 1. This completes the prrof. Problem 10.35 (a) Let R1 and R2 denote the regions whose pixel intensities are greater than T and less or equal to T, respectively. The threshold T is simply an intensity value, so it gets mapped by the transformation function to the value T  = 1−T.Values in R1 are mapped to R 1 and values in R2 are mapped to R 2. The important thing is that all values in R 1 are below T  and all values in R 2 are equal to or above T . The sense of the inequalities has been reversed, but the separability of the intensities in the two regions has been preserved. (b) The solution in (a) is a special case of a more general problem. A thresh- old is simply a location in the intensity scale. Any transformation function that 216 CHAPTER 10. PROBLEM SOLUTIONS z pz() 60 170 2550 Figure P1036 preserves the order of intensities will preserve the separabiliy established by the threshold. Thus, any monotonic function (increasing or decreasing) will pre- serve this order. The value of the new threshold is simply the old threshold pro- cessed with the transformation function. Problem 10.36 With reference to Fig. P10.36, the key is to note that the means are more than 10 standard deviations apart, so the valley between the two modes of the histogram of the image is wide and deep. The simple global thresholding algorithm of Sec- tion 10.3.2 or Otsu’s algorithm will give a performance that easily exceeds the 90% specification. In fact, the solution will give close to 100% accuracy. Problem 10.37 (a) The first column would be black and all other columns would be white. The reason: A point in the segmented image is set to 1 if the value of the image at that point exceeds b at that point. But b = 0, so all points in the image that are greater than 0 will be set to 1 and all other points would be set to 0. But the only points in the image that do not exceed 0 are the points that are 0, which are the points in the first column. (b) Therightmostcolumnwouldbe0andallotherswouldbe1,forthereason stated in (a). 217 (c) As in (a), all the pixels in the first column of the segmented image are labeled 0. Consider all other pixels in the first row of the segmented image and keep in mind that n = 2andb = 1. Because the ramp in the original image increases in intensity to the right, the value of a pixel at location k in the image is always greater than the (moving) average of that pixel and its predecessor. Thus, all points along the first row, with the exception of the first are labeled 1. When the scan reaches the end of the first row and reverses direction as it starts the second row, the reverse condition will exist and all pixels along the second row of the segmented image will be labeled 0. The conditions in the third row are the same as in the first row, so the net effect is that the segmented image will consist of all 0s in the first column, and all 0s in alternating rows, starting with the second. All other pixels will be 1. (d) The first pixel in the first row of the segmented pixel will be 0 and the rest will be 1s for the reason stated in (a). However, in subsequent rows the conditions change. Consider the change in direction between the end of the first row and the reverse scan of the second row. Because n is now much greater than 2, it will take “a while" before the elements in that row of the original image will become smaller than then running average, which reached is highest value at the end of the first row. The result will be K white pixels before the values in the original image drop below the running average and the pixels in the segmented image become black. This condition is reversed when the turn is made from the second to the third row. K black pixels will now be present in the first row before the values in the original image increase past the running average. Once they do, the corresponding locations in the segmented region will be labeled white through the end of that row. The segmented image will thus look as follows: The first pixel in the first row will be black and all others will be white. The next row will have a K black pixels on the left and K white pixels on the right, with all other pixels being black. The left and right of the next row will be the same, but the pixels in between will be white. These two alternating row patterns are applicable to the rest of the rows in the segmented image. Problem 10.38 The means are at 60 and 170, and the standard deviation of the noise is 10 in- tensity levels. A range of ±3σabout 60 gives the range [30, 90] and a similar range about 170 gives [140 200] so significant separation exists between the two intensity populations. So choosing 170 as the seed value from which to grow the objects is quite adequate. One approach is to grow regions by appending to a seed any pixel that is 8-connected to any pixel previously appended to that 218 CHAPTER 10. PROBLEM SOLUTIONS seed, and whose intensity is 170± 3σ. Problem 10.39 The region splitting is shown in Fig. P10.39(a). The corresponding quadtree is shown in Fig. P10.39(b). Problem 10.40 To obtain the sparse outer region we simply form the AND of the mask with the outer region. Because the mask completely separates the inner region from the background, its complement will produce two disjoint regions. We find these two regions by simply by using a connected-components algorithm (see Section 9.5.3). We then identify the region containing the background by determining which of the two regions contains at least one point on the boundary of the im- age (say, any one of the four corner points). We then AND an image containing only this region with the original to obtain the background. ANDing the other region with the image isolates only the inner region. Problem 10.41 (a) The elements of T [n] are the coordinates of points in the image below the plane g (x,y )=n,wheren is an integer that represents a given step in the execu- tion of the algorithm. Because n never decreases, the set of elements in T[n −1] is a subset of the elements in T[n]. In addition, we note that all the points below the plane g (x,y )=n − 1 are also below the plane g (x,y )=n, so the elements of T [n] are never replaced. Similarly, Cn (Mi ) is formed by the intersection of C(Mi ) and T [n],whereC(Mi ) (whose elements never change) is the set of coor- dinates of all points in the catchment basin associated with regional minimum Mi . Because the elements of C(Mi ) never change, and the elements of T [n] are never replaced, it follows that the elements in Cn (Mi ) are never replaced either. In addition, we see that Cn−1(Mi ) ⊆ Cn (Mi ). (b) This part of the problem is answered by the same argument as in (a). Be- cause (1) n always increases; (2) the elements of neither Cn (Mi ) nor T [n] are ever replaced; and (3) T[n − 1] ⊆ T[n] and Cn−1(Mi ) ⊆ Cn (Mi ), it follows that the number of elements of both Cn (Mi ) and T[n] either increases or remains the same. 219 1 11 12 13 14 2 21 22 23 24 3 31 32 33 34 321 322 323 324 4 41 42 43 44 411 412 413 414 1 11 12 13 14 2 21 22 23 24 3 31 32 33 34 4 41 42 43 44 321 322 323 324 411 412 413 414 Figure P10.39 Problem 10.42 Using the terminology of the watershed algorithm, a break in a boundary be- tween two catchment basins would cause water between the two basins to merge. However, the heart of the algorithm is to build a dam higher than the highest in- tensity level in the image any time a break in such boundaries occurs. Because the entire topography is enclosed by such a dam, dams are built any time there is a break that causes water to merge between two regions, and segmentation boundaries are precisely the tops of the dams, so it follows that the watershed algorithm always produces closed boundaries. 220 CHAPTER 10. PROBLEM SOLUTIONS Figure P10.43 221 Problem 10.43 The first step in the application of the watershed segmentation algorithm is to build a dam of height max + 1 to prevent the rising water from running off the ends of the function, as shown in Fig. P10.43(b). For an image function we would build a box of height max + 1 around its border. The algorithm is initialized by setting C[1]=T[1].Inthiscase,T[1]= g (2) , as shown in Fig. P10.43(c) (note the water level). There is only one connected component in this case: Q[1]= q1 = {g (2)}. Next, we let n = 2 and, as shown in Fig. P10.43(d), T [2]={g (2), g (14)} and Q[2]={q1;q2}, where, for clarity, different connected components are separated by semicolons. We start construction of C[2] by considering each connected component in Q[2].Whenq = q1,thetermq ∩ C[1] is equal to {g (2)},socon- dition 2 is satisfied and, therefore, C[2]={g (2)}.Whenq = q2, q ∩C[1]= (the empty set) so condition 1 is satisfied and we incorporate q in C[2],whichthen becomes C[2]={g (2); g (14)} where, as above, different connected components are separated by semicolons. When n = 3 [Fig. P10.43(e)], T [3]={2,3,10,11,13,14} andQ[3]={q1;q2;q3} = {2,3;10,11;13,14} where, in order to simplify the notation we let k denote g (k). Proceeding as above, q1 ∩ C[2]={2} satisfies condition 2, so q1 is incorporated into the new set to yield C[3]={2,3;14}. Similarly, q2 ∩ C[2]= satisfies con- dition 1 and C[3]={2,3;10,11;14}. Finally, q3 ∩ C[2]={14} satisfies condi- tion 2 and C[3]={2,3;10,11;13,14}. It is easily verified that C[4]=C[3]= {2,3;10,11;13,14}. When n = 5 [Fig. P10.43(f)],wehave,T[5]={2,3,5,6,10,11,12,13,14} and Q[5]={q1;q2;q3} = {2,3;5,6;10,11,12,13,14} (note the merging of two previ- ously distinct connected components). Is is easily verified that q1 ∩C[4] satisfies condition 2 and that q2 ∩ C[4] satisfied condition 1. Proceeding with these two connected components exactly as above yields C[5]={2,3;5,6;10,11;13,14} up to this point. Things get more interesting when we consider q3.Now,q3 ∩C[4]= {10,11;13,14} which, becuase it contains two connected components of C[4], satisfies condition 3. As mentioned previously, this is an indication that water from two different basins has merged and a dam must be built to prevent this condition. Dam building is nothing more than separating q3 into the two origi- nal connected components. In this particular case, this is accomplished by the dam shown in Fig. P10.43(g), so that now q3 = {q31;q32} = {10,11;13,14}.Then, q31 ∩C[4] and q32 ∩C[4] each satisfy condition 2 and we have the final result for n = 5, C[5]={2,3;5,6;10,11;13;14}. Continuing in the manner just explained yields the final segmentation result shown in Fig. P10.43(h), where the “edges” are visible (from the top) just above 222 CHAPTER 10. PROBLEM SOLUTIONS Figure P10.44 the water line. A final post-processing step would remove the outer dam walls to yield the inner edges of interest. Problem 10.44 With reference to Eq. (10.6-4) , we see that comparing the negative ADI against a positive, rather than a negative, threshold would yield the image negative of the positive ADI [see Eq. (10.6-3)]. The result is shown in the left of Fig. P10.44. The image on the right is the positive ADI from Fig. 10.59(b). We have included it here for convenience in making the comparison. Problem 10.45 (a) True, assuming that the threshold is not set larger than all the differences en- countered as the object moves. The easiest way to see this is to draw a simple reference image, such as the white rectangle on a black background. Let that rectangle be the object that moves. Because the absolute ADI image value at any location is the absolute difference between the reference and the new im- age, it is easy to see that as the object enters areas that are background in the reference image, the absolute difference will change from zero to nonzero at the new area occupied by the moving object. Thus, as long as the object moves, the dimension of the absolute ADI will grow. (b) True. The positive ADI is stationary and equal to the dimensions of the mov- ing object because the differences between the reference and the moving object never exceed the threshold in areas that are background in the reference image [assuming, as in Eq. (10.6-3), that the background has lower intensity values than the object]. 223 (c) True. From Eq. (10.6-4), we see that differences between the background and the object always will be negative [assuming, as in Eq. (10.6-4), that the intensity levels in the object exceed the value of the background]. Assuming also that the differences are more negative than the threshold, we see for the same reason as in (a) that all new background areas occupied by the moving object will have nonzero counts, thus increasing the dimension of the nonzero entries in the negative ADI (keep in mind that the values in this image are counts). Problem 10.46 Consider first the fact that motion in the x-direction is zero. When all compo- nents of an image are stationary, g x (t ,a 1) is a constant, and its Fourier trans- form yields an impulse at the origin. Therefore, Fig. 10.63 would now consists of a single impulse at the origin. The other two peaks shown in the figure would no longer be present. To handle the motion in the positive y -direction and its change in the opposite direction, recall that the Fourier transform is a linear pro- cess, so we can use superposition to obtain a solution. The first part of motion is in the positive y -direction at 1 pixel/frame. This is the same as in Example 10.28, so the peaks corresponding to this part of the motion are the same as the ones shown in Fig. 10.64. The reversal of motion is instantaneous, so the 33rd frame would show the object traveling in exactly the opposite direction. To han- dle this, we simply change a 2 to −a 2 in Eq. (10.6-7). Based on the discussion in connection with Eq. (10.6-5), all this change would do is produce peaks at frequencies u = −a 2V2 and K + a 2V2. From Example 10.28 we know that the value of a 2 is 4. From the problem statement, we know that V2 = 1andK = 32. Thus, we have two new peaks added to Fig. 10.64: one at u = −4 and the other at u = 36. As noted above, the original peaks correspond to the motion in the positive y -direction given in the problem statement, which is the same as in Example 10.28. Note that the frame count was restarted from 0 to 31 with the change in direction. Problem 10.47 Recall that velocity is a vector, whose magnitude is speed. Function g x is a one-dimensional "record" of the position of the moving object as a function of time (frame rate). The value of velocity (speed) is determined by taking the first derivative of this function. To determine whether velocity is positive or negative at a specific time, n, we compute the instantaneous acceleration (rate of change of speed) at that point; that is we compute the second derivate of g x .Viewed another way, we determine direction by computing the derivative of the deriva- 224 CHAPTER 10. PROBLEM SOLUTIONS tive of g x . But, the derivative at a point is simply the tangent at that point. If the tangent has a positive slope, the velocity is positive; otherwise it is negative or zero. Because g x is a complex quantity, its tangent is given by the ratio of its imaginary to its real part. This ratio is positive when S1x and S2x have the same sign, which is what we started out to prove. Problem 10.48 Set up a small target of known, uniform reflectivity, R, near a corner of the view- ing area, at coordinates (x0,y0). The small target will become part of the images captured by the system and can thus be monitored by the software. With refer- ence to the image model discussed in Section 2.3.4, you can determine A(0) by imaging the target when the bulb is new (t = 0) because then f (x0,y 0)=iR = A(0)R and f and R will be known in the region of the imaged target. The expo- nential term is known, so can compute it once and store it. The image at any time t will be given by f (x,y )=i(x,y )r(x,y )= A(t ) − t 2e −[(x−M/2)2+(y −N /2)2] r(x,y ). It is given that Otsu’s algorithm works well when f = A(0)r(x,y ) (i.e., when the lamp is new (and t = 0) so, if we can force the time and spatially varying compo- nent of the previous equation to have the value A(0), then segmentation of the corrected image will be as desired. Consider the portion of the image at contain- ing the known reflectivity target at any time tn f n (x0,y0)= A(tn ) − t 2 n e −[(x0−M/2)2+(y0−N /2)2] R. (Note: in practice we would use the average value of f around (x0,y0) instead of f (x0,y0) to reduce the effects of noise.) The exponential term is known, and so are R, f n (x0,y0),andtn .SowecansolveforA(tn ). Then we can correct the input image as follows: g (x,y )= A(0)f (x,y ) A(tn ) − t 2 n e −[(x0−M/2)2+(y0−N /2)2] = A(0) ⎡ ⎣ A(t ) − t 2e −[(x−M/2)2+(y −N /2)2] A(tn ) − t 2 n e −[(x0−M/2)2+(y0−N /2)2] ⎤ ⎦r(x,y ). At t = tn , g (x,y )=A(0)r(x,y ), and we know that Otsu’s algorithm will perform satisfactorily. Because we don’t know how A(t ) behavesasafunctionoftimeand 225 the lamps are experimental, the safest course of action is to perform the preced- ing correction as many times as possible during the time the lamps are oper- ating. This requires frequent computation of A(tn ) using information obtained from the reflective target, as we did above. If continuous correction presents a computational burden that is too great, or too expensive to implement, then performing experiments with various lamps can help in determining a longer acceptable period between corrections. It is likely that longer periods between corrections will be acceptable because the change in physical devices generally is much slower than the speed of imaging devices. Problem 10.49 (a) It is given that 10% of the image area in the horizontal direction is occupied by a bullet that is 2.5 cm long. Because the imaging device is square (256 × 256 elements) the camera looks at an area that is 25 cm × 25 cm, assuming no optical distortions. Thus, the distance between pixels is 25/256=0.098 cm/pixel. The maximum speed of the bullet is 1000 m/sec = 100,000 cm/sec. At this speed, the bullet will travel 100,000/0.98 = 1.02×106 pixels/sec. It is required that the bullet not travel more than one pixel during exposure. That is, (1.02×106 pixels/sec) × K sec ≤ 1pixel.So,K ≤ 9.8 × 10−7 sec. (b) The frame rate must be fast enough to capture at least two images of the bullet in successive frames so that the speed can be computed. If the frame rate is set so that the bullet cannot travel a distance longer (between successive frames) than one half the width of the image, then we have the cases shown in Fig. P10.49. In cases A and E we get two shotsoftheentirebulletinframest2 and t3 and t1 and t2, respectively. In the other cases we get partial bullets. Although these cases could be handled with some processing (e.g., by determining size, leading and trailing edges, and so forth) it is possible to guarantee that at least two complete shots of every bullet will be available by setting the frame rate so that a bullet cannot travel more than one half the width of the frame, minus the length of the bullet. The length of the bullet in pixels is (2.5 cm)/(0.098 cm/pixel) ≈ 26 pixels. One half of the image frame is 128 pixels, so the maximum travel distance allowed is 102 pixels. Because the bullet travels at a maximum speed of 1.02×106 pixels/sec, the minimum frame rate is 1.02×106/102 = 104 frames/sec. (c) In a flashing situation with a reflective object, the images will tend to be dark, with the object shining brightly. The techniques discussed in Section 10.6.1 would then be quite adequate. (d) First we have to determine if a partial or whole image of the bullet has been obtained. After the pixels corresponding to the object have been identified us- 226 CHAPTER 10. PROBLEM SOLUTIONS Figure P10.49 ing motion segmentation, we determine if the object runs into the left boundary (see the solution to Problem 9.36 regarding a method for determining if a binary object runs into the boundary of an image). If it does, we look at the next two frames, with the assurance that a complete image of the bullet has been ob- tained in each because of the frame rate in (b). If the object does not run into the left boundary, we are similarly assured of two full shots in two of the three frames. We then compute the centroid of the object in each image and count the number of pixels between the centroids. Because the distance between pixels and the time between frames are known, computation of the speed is a trivial problem. The principal uncertainty in this approach is how well the object is segmented. However, because the images are of the same object in basically the same geometry, consistency of segmentation between frames can be expected. Chapter 11 Problem Solutions Problem 11.1 (a) The key to this problem is to recognize that the value of every element in a chain code is relative to the value of its predecessor. The code for a boundary that is traced in a consistent manner (e.g., clockwise) is a unique circular set of numbers. Starting at different locations in this set does not change the struc- ture of the circular sequence. Selecting the smallest integer as the starting point simply identifies the same point in the sequence. Even if the starting point is not unique, this method would still give a unique sequence. For example, the sequence 101010 has three possible starting points, but they all yield the same smallest integer 010101. (b) Code: 11076765543322. The starting point is 0, yielding the sequence 07676554332211. Problem 11.2 (a) The first difference only counts the number of directions that separate ad- jacent elements of the code. Because the counting process is independent of direction, the first difference is independent of boundary rotation. (It is worth- while to point out to students that the assumption here is that rotation does not change the code itself). (b) Code: 0101030303323232212111. Difference: 3131331313031313031300. The code was treated as a circular sequence, so the first element of the difference is the transition between the last and first element of the code, as explained in the text. 227 228 CHAPTER 11. PROBLEM SOLUTIONS Figure P11.3 Problem 11.3 (a) The rubber-band approach forces the polygon to have vertices at every in- flection of the cell wall. That is, the locations of the vertices are fixed by the structure of the inner and outer walls. Because the vertices are joined by straight lines, this produces the minimum-perimeter polygon for any given wall config- uration. (b) If a corner of a cell is centered at a pixel on the boundary, and the cell is such that the rubber band is tightened on the opposite corner, we would have the situation in Fig. P11.3. Assuming that the cell is of size d × d , the maximum difference between the pixel and the boundary in that cell is 2d . If cells are centered on pixels, the maximum difference is ( 2d )/2. Problem 11.4 (a) When the B vertices are mirrored, they coincide with the two white vertices in the corners, so they become collinear with the corner vertices. The algorithm ignores collinear vertices, so the small indentation will not be detected. (b) When the indentation is deeper than one pixel (but still 1 pixel wide) we have the situation shown in Fig. P11.4. Note that the B vertices cross after mirroring. Referring to the bottom figure, when the algorithm gets to vertex 2, vertex 1 will be identified as a vertex of the MPP, so the algorithm is initialized at that step. Because of initialization, vertex 2 is visited again. It will be collinear with W C and VL,soBC will be set at the location of vertex 2. When vertex 3 is visited, sgn(VL,W C,V3) will be 0, so BC will be set at vertex 3. When vertex 4 is visited, sgn(1,3,4) will be negative, so VL will be set to vertex 3 and the algorithm is reini- tialized. Because vertex 2 will never be visited again, it will never become a ver- 229 Order and location of vertices before mirroring the verticesB Order and location of vertices after mirroring the verticesB 12 34 1 2 3 4 Figure P11.4 tex of the MPP. The next MPP vertex to be detected will be vertex 4. Therefore, indentations 2 pixels or greater in depth and 1 pixel wide will be represented by the sequence 1−3−4 in the second figure. Thus, the algorithm solves the cross- ing caused by the mirroring of the two B vertices by keeping only one vertex. This is a general result for 1-pixel wide, 2 pixel (or greater) deep intrusions. (c) When the B vertices of a 1-pixel deep protrusion are mirrored, they become aligned with the two convex vertices of the protrusion, which are known to be vertices of the MPP (otherwise they would be the corners of the protrusion). A line passing from the first convex vertex to the previous MPP vertex would show that the corresponding mirrored B vertex would be outside the MPP,so it could not be a vertex of the MPP.A similar conclusion is reached by passing a line from the second corner to the next vertex of the MPP, so the second mirrored vertex could not be part of the MPP.The conclusion is that the small protrusion would be missed by the algorithm. (d) A drawing similar to the one used in (b) would show that both vertices of the protrusion would be detected normally for any protrusion extending for more than one pixel. Problem 11.5 (a) The resulting polygon would contain all the boundary pixels. (b) Actually, in both cases the resulting polygon would contain all the boundary pixels. 230 CHAPTER 11. PROBLEM SOLUTIONS Figure P11.6 Problem 11.6 (a) The solution is shown in Fig. P11.6(b). (b) The solution is shown in Fig. P11.6(c). Problem 11.7 (a) From Fig. P11.7(a), we see that the distance from the origin to the triangle is given by r(θ)= D0 cosθ 0◦ ≤ θ<60◦ = D0 cos(120◦ − θ) 60◦ ≤ θ<120◦ = D0 cos(180◦ − θ) 120◦ ≤ θ<180◦ = D0 cos(240◦ − θ) 180◦ ≤ θ<240◦ = D0 cos(300◦ − θ) 240◦ ≤ θ<300◦ = D0 cos(360◦ − θ) 300◦ ≤ θ<360◦ 231 where D0 is the perpendicular distance from the origin to one of the sides of the triangle, and D = D0/cos(60◦)=2D0. Once the coordinates of the vertices of the triangle are given, determining the equation of each straight line is a simple problem, and D0 (which is the same for the three straight lines) follows from elementary geometry. Figure P11.7 232 CHAPTER 11. PROBLEM SOLUTIONS (b) From Fig. P11.7(c), r(θ)= B 2cosθ 0◦ ≤ θ<ϕ = A 2cos(90◦ − θ) ϕ ≤ θ<90◦ = A 2cos(θ − 90◦) 90◦ ≤ θ<(180◦ − ϕ) = B 2cos(180◦ − θ) (180◦ − ϕ) ≤ θ<180◦ = B 2cos(θ − 180◦) 180◦ ≤ θ<180◦ + ϕ = A 2cos(270◦ − θ) 180◦ + ϕ ≤ θ<270◦ = A 2cos(θ − 270) 270◦ ≤ θ<270◦ + ϕ = B 2cos(360◦ − θ) 270◦ + ϕ ≤ θ<360◦. where ϕ = tan−1(A/B). (c) The equation of the ellipse in Fig. P11.7(e) is x 2 a 2 + y 2 b 2 = 1. We are interested in the distance from the origin to an arbitrary point (x,y ) on the ellipse. In polar coordinates, x = r cosθ and y = r sinθ where r is the distance from the origin to (x,y ): r =  x 2 + y 2. Substituting into the equation of the ellipse we obtain r 2 cos2 θ a 2 + r 2 sin2 θ b 2 = 1 from which we obtain the desired result: r(θ)= 1 cosθ a 2 + sinθ b 2 1/2 . When b = a, we have the familiar equation of a circle, r(θ)=a,orx 2 + y 2 = a 2. 233 Figure P11.8 Figure P11.10 Problem 11.8 The solutions are shown in Fig. P11.8. Problem 11.9 (a) In the first case, N(p)=5, S(p)=1, p2 · p4 · p6 = 0, and p4 · p6 · p8 = 0, so Eq. (11.1-4) is satisfied and p is flagged for deletion. In the second case, N(p)=1, so Eq. (11.1-4) is violated and p is left unchanged. In the third case p2 · p4 · p6 = 1 and p4 · p6 · p8 = 1, so conditions (c) and (d) of Eq. (11.1-4) are violated and p is left unchanged. In the forth case S(p)=2, so condition (b) is violated and p is left unchanged. (b) In the first case p2·p6·p8 = 1 so condition (d) in Eq. (11.1-6) is violated and p is left unchanged. In the second case N(p)=1sop is left unchanged. In the third case (c)and(d) are violated and p is left unchanged. In the fourth case S(p)=2andp is left unchanged. Problem 11.10 (a) The result is shown in Fig. P11.10(b). (b) The result is shown in Fig. P11.10(c). 234 CHAPTER 11. PROBLEM SOLUTIONS Problem 11.11 (a) The number of symbols in the first difference is equal to the number of seg- ment primitives in the boundary, so the shape order is 12. (b) Starting at the top left corner, Chain code: 000332123211 Difference: 300303311330 Shape number: 003033113303 Problem 11.12 With reference to Chapter 4, the DFT can be real only if the data sequence is conjugate symmetric. Only contours that are symmetric with respect to the ori- ginhavethisproperty.TheaxissystemofFig.11.19wouldhavetobesetupso that this condition is satisfied for symmetric figures. This can be accomplished by placing the origin at the center of gravity of the contour. Problem 11.13 Suppose that we have a particle moving in the xy-plane. At each time t the par- ticle will be at some point whose coordinates can be written as [x(y ),y (t )].The situation with which we are dealing is in the complex plane. Recalling that a point in the complex plane is written at c = x + jy, we can write the location of the moving particle in the complex plane as z(t )=x(t )+jy(t ).Ast varies, z(t ) describes a path in the complex plane. This is called the parametric representa- tion of a curve because it depends on parameter t . The "standard" equation of a circle with center at (b,c) and radius r is given by (x −b)2 +(y − c)2 = r 2. Using polar coordinates, we can write x(θ)=b + r cosθ y (θ)=c + r sinθ. Then, the circle can be represented in parametric form as the pair x(θ),y (θ) , with θ being the parameter. The circle can thus be represented in the complex plane, as the curve z(θ)=x(θ)+jy(θ) =(b + c)+r(cosθ + j sinθ). 235 Using only two descriptors in Eq. (11.2-5) gives ˆs(k)= 1 P a(0)+a(1) cos 2πk K + j sin 2πk K = a(0) P + a(1) P cos 2πk K + j sin 2πk K which we see is in the form of a circle by letting (b +c)=a(0)/P, r = a(1)/P,and θ = 2πk/K . Problem 11.14 The mean is sufficient. Problem 11.15 Two ellipses with different, say, major axes, have signatures with the same mean and third statistical moment descriptors (both due to symmetry) but different second moment (due to spread). Problem 11.16 This problem can be solved by using two descriptors: holes and the convex de- ficiency (see Section 9.5.4 regarding the convex hull and convex deficiency of a set). The decision making process can be summarized in the form of a simple decision, as follows: If the character has two holes, it is an 8. If it has one hole it is a 0 or a 9. Otherwise, it is a 1 or an X. To differentiate between 0 and 9 we compute the convex deficiency. The presence of a ”significant” deficiency (say, having an area greater than 20% of the area of a rectangle that encloses the char- acter) signifies a 9; otherwise we classify the character as a 0. We follow a similar procedure to separate a 1 from an X. The presence of a convex deficiency with four components whose centroids are located approximately in the North, East, West, and East quadrants of the character indicates that the character is an X. Otherwise we say that the character is a 1. This is the basic approach. Imple- mentation of this technique in a real character recognition environment has to take into account other factors such as multiple ”small” components in the con- vex deficiency due to noise, differences in orientation, open loops, and the like. However, the material in Chapters 3, 9 and 11 provide a solid base from which to formulate solutions. 236 CHAPTER 11. PROBLEM SOLUTIONS Problem 11.17 (a) Co-occurrence matrix: 19600 200 0 20000 (b) Normalize the matrix by dividing each component by 19600 + 200 + 20000 = 39800: 0.4925 0.0050 0 0.5025 so p11 = 0.4925, p12 = 0.005, p21 = 0, and p22 = 0.5025. (c) Descriptors: (1) Maximum probability: 0.05025. (2) Correlation: The means are mr = 2 i=1 i 2 j =1 pij = 1(p11 + p12)+2(p21 + p22) = 1(.4925 + 0.005)+2(0 + 0.5025) = 1.5025 and mc = 2 i=1 j 2 i=1 pij = 1(p11 + p21)+2(p12 + p22) = 1.5075. Similarly, the standard deviations are: σ2 r = 2 i=1 (i − mr )2 2 j =1 pij =(1 − mr )(p11 + p12)+(2 − mr )(p21 + p22) = 0.5000 and σ2 c = 2 j =1 (j − mc )2 2 i=1 pij =(1 − mc)(p11 + p21)+(2 − mc)(p12 + p22) = 0.4999. 237 Then, the correlation measure is computed as 2 i=1 2 j =1 (i − mr ) j − mc pij σr σc = 1 σr σc 2 i=1 2 j =1 (i − mr ) j − mc pij = 0.9900. (3) Contrast: 2 i=1 2 j =1 i − j 2 pij =(1 − 1)2p11 +(1 − 2)2p12 +(2 − 1)2p21 +(2 − 2)2p22 = 0.005. (4) Uniformity: 2 i=1 2 j =1 p2 ij =(p11)2 +(p12)2 +(p21)2 +(p22)2 = 0.4951. (5) Homogeneity: 2 i=1 2 i=1 pij 1 + i − j = p11 1 + p12 2 + p21 2 + p22 1 = 0.9975. (6) Entropy: − 2 i=1 2 i=1 pij log2 pij = − p11 log2 p11 + p12 log2 p12 + p21 log2 p21 +p22 log2 p22  = 1.0405 whereweused0log2 0 ≡ 0. Problem 11.18 We can use the position operator P:“2m pixels to the right and 2m pixels below.” 238 CHAPTER 11. PROBLEM SOLUTIONS Problem 11.19 (a) The image is 01010 10101 01010 10101 01010 . Let z 1 = 0andz 2 = 1. Because there are only two intensity levels, matrix G is of order 2×2. Element g 11 is the number of pixels valued 0 located one pixel to the right of a 0. By inspection, g 11 = 0. Similarly, g 12 = 10, g 21 = 10, and g 22 = 0. The total number of pixels satisfying the predicate P is 20, so the normalized co-occurrence matrix is G = 01/2 1/20 . (b) In this case, g 11 is the number of 0’s two pixels to the right of a pixel valued 0. By inspection, g 11 = 8. Similarly, g 12 = 0, g 21 = 0, and g 22 = 7. The number of pixels satisfying P is 15, so the normalized co-occurrence matrix is G = 8/15 0 07/15 . Problem 11.20 When assigning this problem, the Instructor may wish to point the student to the review of matrices and vectors in the book web site. From Eq. (11.4-6), y = A(x − mx). Then, my = E{y} = E{A(x − mx)} = A[E{x}−E{mx}] = A(mx − mx) = 0. This establishes the validity of Eq. (11.4-7). To prove the validity of Eq. (11.4-8), we start with the definition of the covari- ance matrix given in Eq. (11.4-3): Cy = E{(y − my)(y − my)T }. 239 Because my = 0, it follows that Cy = E{yyT } = E{[A(x − mx)][A(x − mx)]T } = AE{(x − mx)(x − mx)T }AT = ACxAT . Showing the validity of Eq. (11.4-9) is a little more complicated. We start by not- ing that covariance matrices are real and symmetric. From basic matrix algebra, it is known that a real symmetric matrix of order n has n linearly independent eigenvectors (which are easily orthonormalized by, say, the Gram-Schmidt pro- cedure). The rows of matrix A are the orthonormal eigenvectors of Cx.Then, CxAT = Cx[e1,e2,...en ] =[Cxe1,Cxe2,...Cxen ] =[λ1e1,λ2e2,...,λn en ] = AT D where we used the definition of an eigenvector (i.e., Cxei = λi ei )andD is a di- agonal matrix composed of the eigenvalues of Cx: D = ⎡ ⎢⎢⎢⎢⎣ λ1 0 ··· 0 0 λ2 ··· 0 ... ... ... ... 00··· λn ⎤ ⎥⎥⎥⎥⎦ . Premultiplying both sides of the preceding equation by matrix A gives ACxAT = AAT D = D where we used the fact that AT A = AAT = I because the rows of A are orthonor- mal vectors. Therefor, because Cy = ACxAT , we have shown that Cy is a diagonal matrix that is produced by diagonalizing matrix Cx using a transformation ma- trix composed of its eigenvectors. The eigenvalues of Cy are seen to be the same as the eigenvalues of Cx. (Recall that the eigenvalues of a diagonal matrix are its diagonal terms). The fact that Cyei = Dei = λi ei shows that the eigenvectors of Cy are equal to the eigenvectors of Cx. 240 CHAPTER 11. PROBLEM SOLUTIONS Problem 11.21 The mean square error, given by Eq. (11.4-12), is the sum of the eigenvalues whose corresponding eigenvectors are not used in the transformation. In this particular case, the four smallest eigenvalues are applicable (see Table 11.6), so themeansquareerroris ems = 6 j =3 λj = 1729. The maximum error occurs when K = 0 in Eq. (11.4-12) which then is the sum of all the eigenvalues, or 15039 in this case. Thus, the error incurred by using only the two eigenvectors corresponding to the largest eigenvalues is just 11.5 % of the total possible error. Problem 11.22 This problem is similar to the previous one. The covariance matrix is of order 4096×4096 because the images are of size 64×64. It is given that the covariance matrix is the identity matrix, so all its 4096 eigenvalues are equal to 1. From Eq. (11.4-12), the mean square error is ems = 4096 j =1 λj − 2048 i=1 λi = 2048. Problem 11.23 When the boundary is symmetric about the both the major and minor axes and both axes intersect at the centroid of the boundary. Problem 11.24 A solution using the relationship ”connected to,” is shown in Fig. P11.24. Problem 11.25 We can compute a measure of texture using the expression R(x,y )=1 − 1 1 + σ2(x,y ) where σ2(x,y ) is the intensity variance computed in a neighborhood of (x,y ). The size of the neighborhood must be sufficiently large so as to contain enough 241 Figure P11.24 samples to have a stable estimate of the mean and variance. Neighborhoods of size 7 × 7or9× 9 generally are appropriate for a low-noise case such as this. Because the variance of normal wafers is known to be 400, we can obtain a normal value for R(x,y ) by using σ2 = 400 in the above equation. An abnor- mal region will have a variance of about (50)2 = 2,500 or higher, yielding a larger value of R(x,y ). The procedure then is to compute R(x,y ) at every point (x,y ) and label that point as 0 if it is normal and 1 if it is not. At the end of this pro- cedure we look for clusters of 1’s using, for example, connected components (see Section 9.5.3 regarding computation of connected components) . If the area (number of pixels) of any connected component exceeds 400 pixels, then weclassifythesampleasdefective. Problem 11.26 This problem has four major parts. (1) Detecting individual bottles in an image; (2) finding the top each bottle; (3) finding the neck and shoulder of each bottle; and (4) determining the level of the liquid in the region between the neck and the shoulder. (1) Finding individual bottles. Note that the background in the sample image is much darker than the bottles. We assume that this is true in all images. Then, a simple way to find individual bottles is to find vertical black stripes in the image having a width determined by the average separation between bottles, a number that is easily computable from images representative of the actual setup during 242 CHAPTER 11. PROBLEM SOLUTIONS operation. We can find these stripes in various ways. One way is to smooth the image to reduce the effects of noise (we assume that, say, a 3×3or5×5averag- ing mask is sufficient). Then, we run a horizontal scan line through the middle of the image. The low values in the scan line will correspond to the black or nearly black background. Each bottle will produce a significant rise and fall of intensity level in the scan line for the width of the bottle. Bottles that are fully in the field of view of the camera will have a predetermined average width. Bottles that are only partially in the field of view will have narrower profiles, and can be eliminated from further analysis (but we need to make sure that the trailing in- complete bottles are analyzed in the next image; presumably, the leading partial bottle was already processed.). (2) Finding the top of each bottle. Once the location of each (complete or nearly complete) bottle is determined, we again can use the contrast between the bottles and the background to find the top of the bottle. One possible ap- proach is to compute a gradient image (sensitive only to horizontal edges) and look for a horizontal line near the top of the gradient image. An easier method is to run a vertical scan line through the center of the locations found in the pre- vious step. The first major transition in gray level (from the top of the image) in the scan line will give a good indication of the location of the top of a bottle. (3) Finding the neck and shoulder of a bottle. In the absence of other infor- mation, we assume that all bottles are of the same size, as shown in the sample image. Then, once we now where the top of a bottle is, the location of the neck andshoulderareknowntobeatafixed distance from the bottle top. (4) Determining the level of the liquid. The area defined by the bottom of the neck and the top of the shoulder is the only area that needs to be examined to de- termine acceptable vs. unacceptable fill level in a given bottle. In fact, As shown in the sample image, an area of a bottle that is void of liquid appears quite bright in an image, so we have various options. We could run a single vertical scan line again, but note that the bottles have areas of reflection that could confuse this approach. This computation is at the core of what this system is designed to do, so a more reliable method should be used. One approach is to threshold the area spanning a rectangle defined by the bottom of the neck, the shoulder, and sides of the bottle. Then, we count the number of white pixels above the midpoint of this rectangle. If this number is greater than a pre-established value, we know that enough liquid is missing and declare the bottle improperly filled. A slightly more sophisticated technique would be to actually find the level of the liquid. This would consist of looking for a horizontal edge in the region within the bot- tle defined by the sides of the bottle, the bottom of the neck, and a line passing midway between the shoulder and the bottom of the neck. A gradient/edge- linking approach, as described in Chapter 10, would be suitable. Note however, 243 that if no edge is found, the region is either filled (dark values in the region) or completely void of liquid (white, or near white values in the region). A compu- tation to resolve these two possible conditions has to follow if the system fails to find an edge. Problem 11.27 The key specification of the desired system is that it be able to detect individual bubbles. No specific sizes are given. We assume that bubbles are nearly round, as shown in the test image. One solution consists of (1) segmenting the image; (2) post-processing the result; (3) finding the bubbles and bubble clusters, and determining bubbles that merged with the boundary of the image; (4) detecting groups of touching bubbles; (5) counting individual bubbles; and (6) determin- ing the ratio of the area occupied by all bubbles to the total image area. (1) Segmenting the image. We assume that the sample image is truly rep- resentative of the class of images that the system will encounter. The image shown in the problem statement is typical of images that can be segmented by a global threshold. As shown by the histogram in Fig. P11.27, the intensity levels of the objects of interest are high on the gray scale. A simple adaptive thresh- old method for data that is that high on the scale is to choose a threshold equal to the mean plus a multiple of the standard deviation. We chose a threshold equal to m + 2σ, which, for the image in the problem statement, was 195. The segmented result is shown on the right of Fig. P11.27. Obviously this is not the only approach we could take, but this is a simple method that adapts to overall changes in intensity. (2) Post-processing. As shown in the segmented image of Fig. P11.27, many of the bubbles appear as broken disks, or disks with interior black components. These are mostly due either to reflection or actual voids within a bubble. We could attempt to build a morphological procedure to repair and/or fill the bub- bles. However, this can turn into a computationally expensive process that is not warranted unless stringent measurement standards are required, a fact not mentioned in the problem statement. An alternative is to calculate, on average (as determined from a set of sample images), the percentage of bubble areas that are filled with black or have black ”bays” which makes their black areas merge with the background. Then, once the dimensions of each bubble (or bubble cluster) have been established, a correction factor based on area would be ap- plied. (3) Finding the bubbles. Refer to the solution to Problem 9.36. The solution is based on connected components, which also yields all bubbles and bubble clusters. 244 CHAPTER 11. PROBLEM SOLUTIONS Figure P11.27 (4) In order to detect bubble clusters we make use of shape analysis. For each connected component, we find the eigen axes (see Section 11.4) and the stan- dard deviation of the data along these axes (square root of the eigenvalues of the covariance matrix). One simple solution is to compute the ratio of the large to the small variance of each connected component along the eigen axes. A single, uniformly-filled, perfectly round bubble will have a ratio of 1. Deviations from 1 indicate elongations about one of the axes. We look for elliptical shapes as being formed by clusters of bubbles. A threshold to classify bubbles as single vs. clus- ters has to be determined experimentally. Note that single pixels or pixel streaks one pixel wide have a standard deviation of zero, so they must be processed sep- arately. We have the option of considering connected components that consist of only one pixel to be either noise, or the smallest detectable bubble. No in- formation is given in the problem statement about this. In theory, it is possible for a cluster to be formed such that its shape would be symmetrical about both axes, in which case the system would classify the cluster as a single bubble. Res- olution of conflicts such as this would require additional processing. However, there is no evidence in the sample image to suggest that this in fact is a problem. Bubble clusters tend to appear as elliptical shapes. In cases where the ratio of the standard deviations is close to the threshold value, we could add additional processing to reduce the chances of making a mistake. (5) Counting individual bubbles. A bubble that does not merge with the bor- der of the image or is not a cluster, is by definition a single bubble. Thus, count- ing these bubbles is simply counting the connected components that have not been tagged as clusters or merged with the boundary of the image. (6) Ratio of the areas. This ratio is simply the number of pixels in all the con- nected components plus the correction factors mentioned in (2), divided by the total number of pixels in the image. 245 The problem also asks for the size of the smallest bubble the system can de- tect. If, as mentioned in (4), we elect to call a one-pixel connected component a bubble, then the smallest bubble dimension detectable is the physical size of one pixel. From the problem statement, 700 pixels cover 7 cm, so the dimension of one pixel is 10 mm. Chapter 12 Problem Solutions Problem 12.1 (a) By inspection, the mean vectors of the three classes are, approximately, m1 = (1.5,0.3)T , m2 =(4.3,1.3)T ,andm3 =(5.5,2.1)T for the classes Iris setosa, versi- color, and virginica, respectively. The decision functions are of the form given in Eq. (12.2-5). Substituting the preceding values of mean vectors gives: d 1(x)=xT m1 − 1 2mT 1 m1 = 1.5x1 + 0.3x2 − 1.2 d 2(x)=xT m2 − 1 2mT 2 m2 = 4.3x1 + 1.3x2 − 10.1 d 3(x)=xT m3 − 1 2mT 3 m3 = 5.5x1 + 2.1x2 − 17.3. (b) The decision boundaries are given by the equations d 12(x)=d 1(x) − d 2(x)=−2.8x1 − 1.0x2 + 8.9 = 0 d 13(x)=d 1(x) − d 3(x)=−4.0x1 − 1.8x2 + 16.1 = 0 d 23(x)=d 2(x) − d 3(x)=−1.2x1 − 0.8x2 + 7.2 = 0. Figure P12.1 shows a plot of these boundaries. Problem 12.2 From the definition of the Euclidean distance, Dj (x)= x − mj = (x − mj )T (x − mj ) .1/2 247 248 CHAPTER 12. PROBLEM SOLUTIONS Figure P12.1 Because Dj (x) is non-negative, choosing the smallest Dj (x) isthesameaschoos- ing the smallest D2 j (x),where D2 j (x)= x − mj 2 =(x − mj )T (x − mj ) = xT x − 2xT mj + mT j mj = xT x − 2 xT mj − 1 2mT j mj . We note that the term xT x is independent of j (that is, it is a constant with re- spect to j in D2 j (x), j = 1,2,...). Thus,choosingtheminimumofD2 j (x) is equiva- lent to choosing the maximum of xT mj − 1 2 mT j mj . Problem 12.3 The equation of the decision boundary between a pair of mean vectors is d ij(x)=xT (mi − mj ) − 1 2 (mT i mi − mT j mj ). The midpoint between mi and mj is (mi +mj )/2 (see Fig. P12.3) . First, we show that this point is on the boundary by substituting it for x in the above equation 249 Figure P12.3 and showing that the result is equal to 0: 1 2 (mT i mi − mT j mj ) − 1 2 (mT i mi − mT j mj )=1 2 (mT i mi − mT j mj ) −1 2 (mT i mi − mT j mj ) = 0. Next, we show that the vector (mi − mj ) is perpendicular to the hyperplane boundary. There are several ways to do this. Perhaps the easiest is to show that (mi − mj ) is in the same direction as the unit normal to the hyperplane. For a hyperplane with equation w1x1 + w2x2 + ...wnxn + wn+1 = 0, the unit normal is u = wo wo where wo =(w1,w2,...,wn )T . Comparing the above equation for d ij(x) with the general equation of a hyperplane just given, we see that wo =(mi − mj ) and wn+1 = −(mT i mi −mT j mj )/2 . Thus, the unit normal of our decision boundary is u = (mi − mj )mi − mj which is in the same direction as the vector (mi −mj ). This concludes the proof. 250 CHAPTER 12. PROBLEM SOLUTIONS Figure P12.4 Problem 12.4 The solution is shown in Fig. P12.4, where the x’s are treated as voltages and the Y ’s denote impedances. From basic circuit theory, the currents, I ’s, are the products of the voltages times the impedances. Problem 12.5 Assume that the mask is of size J × K . For any value of displacement (s,t ),we can express the area of the image under the mask, as well as the mask w(x,y ), in vector form by letting the first row of the subimage under the mask represent the first K elements of a column vector a, the elements of next row the next K elements of a, and so on. At the end of the procedure we subtract the average value of the intensity levels in the subimage from every element of a.Thevector a is of size (J × K ) × 1. A similar approach yields a vector, b, of the same size, for the mask w(x,y ) minus its average. This vector does not change as (s,t ) varies because the coefficients of the mask are fixed. With this construction in mind, we see that the numerator of Eq. (12.2-8) is simply the vector inner-product aT b. Similarly, the first term in the denominator is the norm squared of a,de- noted aT a = a2, while the second term has a similar interpretation for b.The correlation coefficient then becomes γ(s,t )= aT b (aT a)(bT b) 1/2 . When a = b (a perfect match), γ(s,t )=a2 /aa = 1, which is the maximum value obtainable by the above expression. Similarly, the minimum value occurs 251 when a = −b, in which case γ(s,t )=−1. Thus, although the vector a varies in general for every value of (s,t ),thevaluesofγ(s,t ) areallintherange[−1,1]. Problem 12.6 The solution to the first part of this problem is based on being able to extract connected components (see Chapters 2 and 11) and then determining whether a connected component is convex or not (see Chapter 11). Once all connected components have been extracted we perform a convexity check on each and reject the ones that are not convex. All that is left after this is to determine if the remaining blobs are complete or incomplete. To do this, the region consisting of the extreme rows and columns of the image is declared a region of 1’s. Then if the pixel-by-pixel AND of this region with a particular blob yields at least one result that is a 1, it follows that the actual boundary touches that blob, and the blob is called incomplete. When only a single pixel in a blob yields an AND of 1 we have a marginal result in which only one pixel in a blob touches the boundary. We can arbitrarily declare the blob incomplete or not. From the point of view of implementation, it is much simpler to have a procedure that calls a blob incomplete whenever the AND operation yields one or more results valued 1. After the blobs have been screened using the method just discussed, they need to be classified into one of the three classes given in the problem state- ment. We perform the classification problem based on vectors of the form x = (x1,x2)T ,wherex1 and x2 are, respectively, the lengths of the major and minor axis of an elliptical blob, the only type left after screening. Alternatively, we could use the eigen axes for the same purpose. (See Section 11.2.1 on obtaining the major axes or the end of Section 11.4 regarding the eigen axes.) The mean vector of each class needed to implement a minimum distance classifier is given in the problem statement as the average length of each of the two axes for each class of blob. If‘ they were not given, they could be obtained by measuring the length of the axes for complete ellipses that have been classified a priori as belonging to each of the three classes. The given set of ellipses would thus constitute a training set, and learning would consist of computing the principal axes for all ellipses of one class and then obtaining the average. This would be repeated for each class. A block diagram outlining the solution to this problem is straightfor- ward. Problem 12.7 (a) Because it is given that the pattern classes are governed by Gaussian densi- ties, only knowledge of the mean vector and covariance matrix of each class are 252 CHAPTER 12. PROBLEM SOLUTIONS x1 x2 d( )=0x 6 6 + - Figure P12.7 required to specify the Bayes classifier. Substituting the given patterns into Eqs. (12.2-22) and (12.2-23) yields m1 = 1 1 m2 = 5 5 C1 = 10 01 = C−1 1 and C2 = 10 01 = C−1 2 . Because C1 = C2 = I, the decision functions are the same as for a minimum distance classifier: d 1(x)=xT m1 − 1 2mT 1 m1 = 1.0x1 + 1.0x2 − 1.0 and d 2(x)=xT m2 − 1 2mT 2 m2 = 5.0x1 + 5.0x2 − 25.0. The Bayes decision boundary is given by the equation d (x)=d 1(x) − d 2(x)=0, or d (x)=−4.0x1 − 4.0x2 + 24.0 = 0. (b) Figure P12.7 shows a plot of the boundary. 253 Problem 12.8 (a) As in Problem 12.7, m1 = 0 0 m1 = 0 0 m1 = 0 0 C1 = 1 2 10 01 ; C−1 1 = 2 10 01 ; |C1| = 0.25 and C2 = 2 10 01 ; C−1 2 = 1 2 10 01 ; |C2| = 4.00. Because the covariance matrices are not equal, it follows from Eq. (12.2-26) that d 1(x)=−1 2 ln(0.25) − 1 2 xT 20 02 x = −1 2 ln(0.25) − (x 2 1 +x 2 2) and d 2(x)=−1 2 ln(4.00) − 1 2 xT 0.5 0 00.5 x = −1 2 ln(4.00) − 1 4 (x 2 1 +x 2 2) where the term lnP(ωj ) was not included because it is the same for both deci- sion functions in this case. The equation of the Bayes decision boundary is d (x)=d 1(x) − d 2(x)=1.39 − 3 4 (x 2 1 +x 2 2)=0. (b) Figure P12.8 shows a plot of the boundary. Problem 12.9 The basic mechanics are the same as in Problem 12.6, but we have the addi- tional requirement of computing covariance matrices from the training patterns of each class. 254 CHAPTER 12. PROBLEM SOLUTIONS x1 x2 + - d( )=0x 1.36 Figure P12.8 Problem 12.10 From basic probability theory, p(c)= x p(c/x)p(x). For any pattern belonging to class ωj , p(c/x)=p(ωj /x). Therefore, p(c)= x p(ωj /x)p(x). Substituting into this equation the formula p(ωj /x)=p(x/ωj )p(ωj )/p(x) gives p(c)= x p(x/ωj )p(ωj ). Because the argument of the summation is positive, p(c) is maximized by maxi- mizing p(x/ωj )p(ωj ) for each j .Thatis,ifforeachx we compute p(x/ωj )p(ωj ) for j = 1,2,...,W , and use the largest value each time as the basis for selecting the class from which x came, then p(c) will be maximized. Since p(e)=1−p(c), the probability of error is minimized by this procedure. Problem 12.11 (a) For class ω1 we let y(1)=(0,0,0,1)T , y(2)=(1,0,0,1)T , y(3)=(1,0,1,1)T , y(4)=(1,1,0,1)T . Similarly, for class ω2, y(5)=(0,0,1,1)T , y(6)=(0,1,1,1)T , y(7)=(0,1,0,1)T , y(8)=(1,1,1,1)T .Then,usingc = 1and w(1)=(−1,−2,−2,0)T 255 it follows from Eqs. (12.2-34) through (12.2-36) that: w(1)T y(1)=0, w(2)=w(1)+y(1)=(−1,−2,−2,1)T ; w(2)T y(2)=0, w(3)=w(2)+y(2)=(0,−2,−2,2)T ; w(3)T y(3)=0, w(4)=w(3)+y(3)=(1,−2,−1,3)T ; w(4)T y(4)=2, w(5)=w(4)=(1,−2,−1,3)T ; w(5)T y(5)=2, w(6)=w(5) − y(5)=(−1,−2,−2,2)T ; w(6)T y(6)=−2, w(7)=w(6)=(−1,−2,−2,2)T ; w(7)T y(7)=0, w(8)=w(7) − y(7)=(1,−3,−2,1)T ; w(8)T y(8)=−3, w(9)=w(8)=(1,−3,−2,1)T . A complete iteration through all patterns with no errors was not achieved. There- fore, the patterns are recycled by letting y(9)=y(1), y(10)=y(2),andsoon,which gives w(9)T y(9)=1, w(10)=w(9)=(1,−3,−2,1)T ; w(10)T y(10)=2, w(11)=w(10)=(1,−3,−2,1)T ; w(11)T y(11)=0, w(12)=w(11)+y(11)=(2,−3,−1,2)T ; w(12)T y(12)=1, w(13)=w(12)=(2,−3,−1,2)T ; w(13)T y(13)=1, w(14)=w(13) − y(13)=(2,−3,−2,1)T ; w(14)T y(14)=−4, w(15)=w(14)=(2,−3,−2,1)T ; w(15)T y(15)=−2, w(16)=w(15)=(2,−3,−2,1)T ; w(16)T y(16)=−2, w(17)=w(16)=(2,−3,−2,1)T . Again, a complete iteration over all patterns without an error was not achieved, so the patterns are recycled by letting y(17)=y(1), y(18)=y(2),andsoon,which gives: w(17)T y(17)=1, w(18)=w(17)=(2,−3,−2,1)T ; w(18)T y(18)=3, w(19)=w(18)=(2,−3,−2,1)T ; w(19)T y(19)=1, w(20)=w(19)=(2,−3,−2,1)T ; w(20)T y(20)=0, w(21)=w(20)+y(20)=(3,−2,−2,2)T ; w(21)T y(21)=0, w(22)=w(21) − y(21)=(3,−2,−3,1)T . It is easily verified that no more corrections take place after this step, so w(22)= (3,−2,−3,1)T is a solution weight vector. (b) The decision surface is given by the equation wT y = 3y1 − 2y2 − 3y3 + 1 = 0. A section of this surface is shown schematically in Fig. P12.11. The positive side of the surface faces the origin. 256 CHAPTER 12. PROBLEM SOLUTIONS Figure P12.11 Problem 12.12 We start by taking the partial derivative of J with respect to w: ∂ J ∂ w = 1 2 ysgn(wT y) − y where, by definition, sgn(wT y)=1ifwT y > 0, and sgn(wT y)=−1otherwise. Substituting the partial derivative into the general expression given in the prob- lem statement gives w(k + 1)=w(k)+c 2 y(k) − y(k)sgn w(k)T y(k) where y(k) is the training pattern being considered at the kth iterative step. Sub- stituting the definition of the sgn function into this result yields w(k + 1)=w(k)+c 0 if w(k)T y(k) y(k) otherwise where c > 0andw(1) is arbitrary. This expression agrees with the formulation given in the problem statement. Problem 12.13 Let the training set of patterns be denoted by y1,y2,...,yN . It is assumed that the training patterns of class ω2 have been multiplied by −1. If the classes are lin- early separable, we want to prove that the perceptron training algorithm yields 257 a solution weight vector, w∗,withtheproperty w∗T yi ≥ T0 where T0 is a nonnegative threshold. With this notation, the Perceptron algo- rithm (with c = 1) is expressed as w(k +1)=w(k) if wT (k)yi (k) ≥ T0 or w(k +1)= w(k)+yi (k) otherwise. Suppose that we retain only the values of k for which a correction takes place (these are the only indices of interest). Then, re-adapting the index notation, we may write w(k + 1)=w(k)+yi (k) and wT (k)yi(k) ≤ T0. With these simplifications in mind, the proof of convergence is as follows: From the above equation, w(k + 1)=w(1)+yi (1)+yi (2)+···+ yi (k). Taking the inner product of the solution weight vector with both sides of this equation gives wT (k + 1)w∗ = wT (1)w∗ + yT i (1)w∗ + yT i (2)w∗ + ···+ yT i (k)w∗. Each term yT i (j )w∗, j = 1,2,...,k,islessthanT0,so wT (k + 1)w∗ ≥ wT (1)w∗ + kT0. Using the Cauchy-Schwartz inequality, a2 b2 ≥ (aT b)2,resultsin wT (k + 1)w∗2 ≤ wT (k + 1) 2 w∗2 or wT (k + 1) 2 ≥  wT (k + 1)w∗2 w∗2 . Another line of reasoning leads to a contradiction regarding wT (k + 1) 2. From above, w(j + 1) 2 = w(j ) 2 + 2wT (j )yi (j )+ yi (j ) 2 or w(j + 1) 2 − w(j ) 2 = 2wT (j )yi (j )+ yi (j ) 2 . 258 CHAPTER 12. PROBLEM SOLUTIONS LetQ = max i ||yi (j )||2.Then,becausewT (j )yi (j ) ≤ T0, w(j + 1) 2 − w(j ) 2 ≤ 2T0 +Q. Adding these inequalities for j = 1,2,...,k yields w(j + 1) 2 ≤w(1)2 +[2T0 +Q]k. This inequality establishes a bound on w(j + 1) 2 that conflicts for sufficiently large k with the bound established by our earlier inequality. In fact, k can be no larger than km , which is a solution to the equation  wT (k + 1)w∗ + km T0 2 w∗2 = w(1)2 +[2T0 +Q]km . This equation says that km is finite, thus proving that the perceptron training algorithm converges in a finite number of steps to a solution weight vector w∗ if the patterns of the training set are linearly separable. Note: The special case with T0 = 0 is proved in a slightly different manner. Under this condition we have wT (k + 1)w∗ ≥ wT (1)w∗ + ka where a = min i yT i (j )w∗ . Because, by hypothesis, w∗ is a solution weight vector, we know that yT i (j )w∗ ≥ 0. Also, because wT (j )yi (j ) ≤ (T = 0), w(j + 1) 2 − w(j ) 2 ≤ yi (j ) 2 ≤ Q. The rest of the proof remains the same. The bound on the number of steps is the value of km that satisfies the following equation:  wT (1)w∗ + km a 2 w∗2 = w(1)2 +Qkm . 259 Problem 12.14 The single decision function that implements a minimum distance classifier for two classes is of the form d ij(x)=xT (mi − mj ) − 1 2 (mT i mi − mT j mj ). Thus, for a particular pattern vector x,whend ij(x) > 0, x is assigned to class ω1 and, when d ij(x) < 0, x is assigned to class ω2.Valuesofx for which d ij(x)=0 are on the boundary (hyperplane) separating the two classes. By letting w = (mi − mj ) and wn+1 = − 1 2 (mT i mi − mT j mj ), we can express the above decision function in the form d (x)=wT x − wn+1. This is recognized as a linear decision function in n dimensions, which is imple- mented by a single layer neural network with coefficients wk =(mik − mjk) k = 1,2,...,n and θ = wn+1 = −1 2 (mT i mi − mT j mj ). Problem 12.15 The approach to solving this problem is basically the same as in Problem 12.14. The idea is to combine the decision functions in the form of a hyperplane and then equate coefficients. For equal covariance matrices, the decision function for two pattern classes is obtained Eq. (12.2-27): d ij(x)=d i (x) − d j (x)=lnP(ωi ) − lnP(ωj )+xT C−1(mi − mj ) −1 2 (mi − mj )T C−1(mi − mj ). As in Problem 12.14, this is recognized as a linear decision function of the form d (x)=wT x − wn+1 which is implemented by a single layer perceptron with coefficients wk = vk k = 1,2,...,n and θ = wn+1 = lnP(ωi ) − lnP(ωj )+xT C−1(mi − mj ) where the vk are elements of the vector v = C−1(mi − mj ). 260 CHAPTER 12. PROBLEM SOLUTIONS Figure P12.17 Problem 12.16 (a) When P(ωi )=P(ωj ) and C = I. (b) No. The minimum distance classifier implements a decision function that is the perpendicular bisector of the line joining the two means. If the probability densities are known, the Bayes classifier is guaranteed to implement an opti- mum decision function in the minimum average loss sense. The generalized delta rule for training a neural network says nothing about these two criteria, so it cannot be expected to yield the decision functions in Problems 12.14 or 12.15. Problem 12.17 The classes and boundary needed to separate them are shown in Fig. P12.17(a). The boundary of minimum complexity in this case is a triangle, but it would be so tight in this arrangement that even small perturbations in the position of the patterns could result in classification errors. Thus, we use a network with the capability to implement 4 surfaces (lines) in 2D. The network, shown in Fig. P12.17(b), is an extension of the concepts discussed in the text in connection with Fig. 12.22. In this case, the output node acts like an AND gate with 4 inputs. The output node outputs a 1 (high) when the outputs of the preceding 4 nodes are all high simultaneously. This corresponds to a pattern being on the + side of all 4 lines and, therefore, belonging to class ω1. Any other combination yields a 0 (low) output, indicating class ω21. Problem 12.18 All that is needed is to generate for each class training vectors of the form x = (x1,x2)T ,wherex1 is the length of the major axis and x2 is the length of the mi- nor axis of the blobs comprising the training set. These vectors would then be used to train a neural network using, for example, the generalized delta rule. (Because the patterns are in 2D, it is useful to point out to students that the neu- 261 ral network could be designed by inspection in the sense that the classes could be plotted, the decision boundary of minimum complexity obtained, and then its coefficients used to specify the neural network. In this case the classes are far apart with respect to their spread, so most likely a single layer network imple- menting a linear decision function could do the job.) Problem 12.19 This problem, although it is a simple exercise in differentiation, is intended to help the student fix in mind the notation used in the derivation of the general- ized delta rule. From Eq. (12.2-50), with θ0 = 1, h j (Ij )= 1 1 + e − NK k=1 w jkOk +θj . Because, from Eq. (12.2-48), Ij = NK k=1 w jkOk it follows that h j (Ij )= 1 1 + e −[I j +θj ] . Taking the partial derivative of this expression with respect to I j gives h j (Ij )= ∂ h j (Ij ) ∂ Ij = e −[I j +θj ] 1 + e −[I j +θj ] 2 . From Eq. (12.2-49) Oj = h j (Ij )= 1 1 + e −[I j +θj ] . It is easily shown that Oj (1 −Oj )= e −[I j +θj ] 1 + e −[I j +θj ] 2 so h j (Ij )=Oj (1 −Oj ). This completes the proof. 262 CHAPTER 12. PROBLEM SOLUTIONS Problem 12.20 The first part of Eq. (12.3-3) is proved by noting that the degree of similarity, k,is non-negative, so D(A, B)=1/k ≥ 0. Similarly, the second part follows from the fact that k is infinite when (and only when) the shapes are identical. To prove the third part we use the definition of D to write D(A,C) ≤ max[D(A, B),D(B,C)] as 1 kac ≤ max 1 kab , 1 kbc or, equivalently, kac ≥ min[kab,kbc] where kij is the degree of similarity between shape i and shape j .Recallfrom the definition that k is the largest order for which the shape numbers of shape i and shape j still coincide. As Fig. 12.24(b) illustrates, this is the point at which the figures ”separate” as we move further down the tree (note that k increases as we move further down the tree). We prove that kac ≥ min[kab,kbc] by con- tradiction. For kac ≤ min[kab,kbc] to hold, shape A has to separate from shape Cbefore(1) shape A separates from shape B, and (2) before shape B separates from shape C,otherwisekab ≤ kac or kbc ≤ kac, which automatically vio- lates the condition kac < min[kab,kbc]. But, if (1) has to hold, then Fig. P12.20 shows the only way that A can separate from C before separating from B.This, however, violates (2), which means that the condition kac < min[kab,kbc] is vi- olated (we can also see this in the figure by noting that kac = kbc which, since kbc < kab, violates the condition). We use a similar argument to show that if (2) holds then (1) is violated. Thus, we conclude that it is impossible for the condition kac < min[kab,kbc] to hold, thus proving that kac ≥ min[kab,kbc] or, equivalently, that D(A,C) ≤ max[D(A, B),D(B,C)]. Problem 12.21 Q = 0 implies that max(|A|,|B|)=M. Suppose that |A| > |B|. Then, it must follow that |A| = M and, therefore, that M > |B|.ButM is obtained by matching A and B,soitmustbeboundedbyM ≤ min(|A|,|B|). Because we have stipulated that |A| > |B|, the condition M ≤ min(|A|,|B|) implies M ≤|B|. But this contradicts the above result, so the only way for max(|A|,|B|)=M to hold is if |A| = |B|. This, in turn, implies that A and B must be identical strings (A ≡ B) because |A| = |B| = M means that all symbols of A and B match.Theconverseresultthat if A ≡ B thenQ = 0 follows directly from the definition of Q. 263 Figure P12.20 Problem 12.22 There are various possible approaches to this problem, and our students have shown over the years a tendency to surprise us with new and novel approaches to problems of this type. We give here a set of guidelines that should be satisfied by most practical solutions, and also offer suggestions for specific solutions to various parts of the problem. Depending on the level of maturity of the class, some of these may be offered as ”hints” when the problem is assigned. Because speed and cost are essential system specifications, we conceptual- ize a binary approach in which image acquisition, preprocessing, and segmen- tation are combined into one basic operation. This approach leads us to global thresholding as the method of choice. In this particular case this is possible be- cause we can solve the inspection problem by concentrating on the white parts of the flag (stars and white stripes). As discussed in Chapter 10, uniform illu- mination is essential, especially when global thresholding is used for segmen- tation. The student should mention something about uniform illumination, or compensation for nonuniform illumination. A discussion by the student of color filtering to improve contrast between white and (red/blue/background) parts of an image is a plus in the design. The first step is to specify the size of the viewing area, and the resolution required to detect the smallest components of interest, in this case the stars. Be- cause the images are moving and the exact location of each flag is not known, it is necessary to specify a field of view that will guarantee that every image will contain at least one complete flag. In addition, the frame rate must be fast enough so that no flags are missed. The first part of the problem is easy to solve. The field of view has to be wide enough to encompass an area slightly greater across than two flags plus the maximum separation between them. Thus, the width, W , of the viewing area must be at least W = 2(5)+2.05 = 12.1in. If we use 264 CHAPTER 12. PROBLEM SOLUTIONS a standard CCD camera of resolution 640 × 480 elements and view an area 12.8 in. wide, this will give us a sampling rate of approximately 50 pixels/inch, or 250 pixels across a single flag. Visual inspection of a typical flag will show that the blue portion of a flag occupies about 0.4 times the length of the flag, which in this case gives us about 100 pixels per line in the blue area. There is a maximum of six stars per line, and the blue space between them is approximately 1.5 times the width of a star, so the number of pixels across a star is 100/([1 + 1.5] × 6)  6 pixels/star. The next two problems are to determine the shutter speed and the frame rate. Because the number of pixels across each object of interest is only 6, we fix the blur at less than one pixel. Following the approach used in the solution of Prob- lem 10.49, we first determine the distance between pixels as (12.8in)/640pixels = 0.02in/pixel. The maximum speed of the flags is 21 in/sec. At this speed, the flags travel 21/0.02 = 1,050 pixels/sec. We are requiring that a flag not travel more than one pixel during exposure; that is (1,050pixels/sec) × T sec ≤ 1pixel. So, T ≤ 9.52 × 10−4 sec is the shutter speed needed. The frame rate must be fast enough to capture an image of every flag that passes the inspection point. It takes a flag (21in/sec)/(12.8in)  0.6 sec to cross the entire field of view, so we have to capture a frame every 0.3 sec in order to guarantee that every image will contain a whole flag, and that no flag will be missed. We assume that the camera is computer controlled to fire from a clock signal. We also make the standard assumption that it takes 1/30sec  330 × 10−4 sec to read a captured image into a frame buffer. Therefore, the total time needed to acquire an image is (330 + 9.5) × 10−4  340 × 10−4 sec. Subtracting this quantity from the 0.3 sec frame rate leaves us with about 0.27 sec to do all the processing required for inspection, and to output an appropriate signal to some other part of the manufacturing process. Because a global thresholding function can be incorporated in most digitiz- ers as part of the data acquisition process, no additional time is needed to gen- erate a binary image. That is, we assume that the digitizer outputs the image in binary form. The next step is to isolate the data corresponding to a complete flag. Given the imaging geometry and frame rate discussed above, four basic binary image configurations are expected: (1) part of a flag on the left of the im- age, followed by a whole flag, followed by another partial flag; (2) one entire flag touching the left border, followed by a second entire flag, and then a gap before the right border; (3) the opposite of (2); and (4) two entire flags, with neither flag touching the boundary of the image. Cases (2), (3), and (4) are not likely to occur with any significant frequency, but we will check for each of these conditions. As will be seen below, Cases (2) and (3) can be handled the same as Case (1), but, given the tight bounds on processing time, the output each time Case (4) occurs 265 will be to reject both flags. To handle Case (1) we have to identify a whole flag lying between two par- tialflags.Oneofthequickestwaystodothisistorunawindowaslongasthe image vertically, but narrow in the horizontal direction, say, corresponding to 0.35 in. (based on the window size 1/2of[12.8 − 12.1]), which is approximately (0.35)(640)/12.8  17 pixels wide. This window is used look for a significant gap between a high count of 1’s, and it is narrow enough to detect Case (4). For Case (1), this approach will produce high counts starting on the left of the im- age, then drop to very few counts (corresponding to the background) for about two inches, pick up again as the center (whole flag) is encountered, go like this for about five inches, drop again for about two inches as the next gap is encoun- tered, then pick up again until the right border is encountered. The 1’s between the two inner gaps correspond to a complete flag and are processed further by the methods discussed below; the other 1’s are ignored. (A more elegant and po- tentially more rugged way is to determine all connected components first, and then look for vertical gaps, but time and cost are fundamental here). Cases (2) and (3) are handled in a similar manner with slightly different logic, being care- ful to isolate the data corresponding to an entire flag (i.e., the flag with a gap on each side). Case (4) corresponds to a gap-data-gap-data-gap sequence, but, as mentioned above, it is likely that time and cost constraints would dictate reject- ing both flags as a more economical approach than increasing the complexity of the system to handle this special case. Note that this approach to extracting 1’s is based on the assumption that the background is not excessively noisy. In other words, the imaging system must be such that the background is reliably segmented as black, with acceptable noise. With reference to Fig. 1.23, the preceding discussion has carried us through the segmentation stage. The approach followed here for description, recogni- tion, and the use of knowledge, is twofold. For the stars we use connected com- ponent analysis. For the stripes we use signature analysis. The system knows the coordinates of two vertical lines which contain the whole flag between them. First, we do a connected components analysis on the left half of the region (to save time) and filter out all components smaller and larger than the expected size of stars, say (to give some flexibility), all components less than 9 (3 × 3) pix- els and larger than 64 (8×8) pixels. The simplest test at this point is to count the number of remaining connected components (which we assume to be stars). If the number is 50 we continue with the next test on the stripes. Otherwise, we reject the flag. Of course, the logic can be made much more complicated than this. For instance, it could include a regularity analysis in which the relative lo- cations of the components are analyzed. There are likely to be as many answers here as there are students in the class, but the key objective should be to base 266 CHAPTER 12. PROBLEM SOLUTIONS the analysis on a rugged method such as connected component analysis. To analyze the stripes, we assume that the flags are printed on white stock material. Thus, “dropping a stripe” means creating a white stripe twice as wide as normal. This is a simple defect detectable by running a vertical scan line in an area guaranteed to contain stripes, and then looking at the intensity signa- ture for the number of pulses of the right height and duration. The fact that the data is binary helps in this regard, but the scan line should be preprocessed to bridge small gaps due to noise before it is analyzed. In spite of the ±15◦ varia- tion in direction, a region, say, 1 in. to the right of the blue region is independent enough of the rotational variation in terms of showing only stripes along a scan line run vertically in that region. It is important that any answer to this problem show awareness of the limits in available computation time. Because no mention is made in the problem statement about available processors, it is not possible to establish with absolute certainty if a solution will meet the requirements or not. However, the student should be expected to address this issue. The guidelines given in the preceding solution are among the fastest ways to solve the problem. A solution along these lines, and a mention that multiple systems may be required if a single system cannot meet the specifications, is an acceptable solution to the problem.

下载文档,方便阅读与编辑

文档的实际排版效果,会与网站的显示效果略有不同!!

需要 8 金币 [ 分享文档获得金币 ] 0 人已下载

下载文档

相关文档