Advances in Machine Learning and Data Analysis



字数:0 关键词:

Advances in Machine Learning and Data Analysis Lecture Notes in Electrical Engineering Vo l u m e 4 8 For other titles published in this series, go to Sio-Iong Ao • Burghard B. Rieger Mahyar Amouzegar Editors Advances in Machine Learning and Data Analysis 123 Editors Sio-Iong Ao Harvard School of Engineering and Applied Sciences Harvard University Room 403, 60 Oxford Street Cambridge MA 02138, USA Burghard B. Rieger Universität Trier FB II Linguistische Datenverarbeitung Computerlinguistik Universitätsring 15 54286 Trier Germany Mahyar Amouzegar College of Engineering California State University Long Beach ISSN 1876-1100 e-ISSN 1876-1119 ISBN 978-90-481-3176-1 e-ISBN 978-90-481-3177-8 DOI 10.1007/978-90-481-3177-8 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009930421 c Springer Science+Business Media B.V. 2010 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Cover design: eStudio Calamar S.L. Printed on acid-free paper Springer is part of Springer Science+Business Media ( Preface A large international conference on Advances in Machine Learning and Data Analysis was held in UC Berkeley, CA, USA, October 22–24, 2008, under the auspices of the World Congress on Engineering and Computer Science (WCECS 2008). The WCECS is organized by the International Association of Engineers (IAENG). IAENG is a non-profit international association for the engineers and the computer scientists, which was founded in 1968 and has been undergoing rapid ex- pansions in recent years. The WCECS conferences have served as excellent venues for the engineering community to meet with each other and to exchange ideas. Moreover, WCECS continues to strike a balance between theoretical and appli- cation development. The conference committees have been formed with over two hundred members who are mainly research center heads, deans, department heads (chairs), professors, and research scientists from over thirty countries. The confer- ence participants are also truly international with a high level of representation from many countries. The responses for the congress have been excellent. In 2008, we re- ceived more than six hundred manuscripts, and after a thorough peer review process 56.71% of the papers were accepted. This volume contains sixteen revised and extended research articles written by prominent researchers participating in the conference. Topics covered include Expert system, Intelligent decision making, Knowledge-based systems, Knowledge extraction, Data analysis tools, Computational biology, Optimization algorithms, Experiment designs, Complex system identification, Computational modeling, and industrial applications. The book offers the state of the art of tremendous advances in machine learning and data analysis and also serves as an excellent reference text for researchers and graduate students, working on machine learning and data analysis. Harvard University, USA Sio-Iong Ao University of Trier, Germany Burghard B. Rieger California State University, Long Beach, USA Mahyar Amouzegar v Contents 1 2D/3D Image Data Analysis for Object Tracking and Classification .................................................................. 1 Seyed Eghbal Ghobadi, Omar Edmond Loepprich, Oliver Lottner, Klaus Hartmann, Wolfgang Weihs, and Otmar Loffeld 2 Robot Competence Development by Constructive Learning ....................................................................... 15 Q. Meng, M.H. Lee, and C.J. Hinde 3 Using Digital Watermarking for Securing Next Generation Media Broadcasts............................................................. 27 Dominik Birk and Se´an Gaines 4 A Reduced-Dimension Processor Model ................................... 43 Azam Beg 5 Hybrid Machine Learning Model for Continuous Microarray Time Series...................................................... 57 Sio-Iong Ao 6 An Asymptotic Method to a Financial Optimization Problem ........................................................................ 79 Dejun Xie, David Edwards, and Giberto Schleiniger 7 Analytical Design of Robust Multi-loop PI Controller for Multi-time Delay Processes.............................................. 95 Truong Nguyen Luan Vu and Moonyong Lee 8 Automatic and Semi-automatic Methods for the Detection of Quasars in Sky Surveys ...................................................109 Sio-Iong Ao vii viii Contents 9 Improving Low-Cost Sail Simulator Results by Artificial Neural Networks Models ....................................................139 V. D´ıaz Cas´as, P. Porca Bel´ıo, F. L´opez Pe˜na, and R.J. Duro 10 Rough Set Approaches to Unsupervised Neural Network Based Pattern Classifier......................................................151 Ashwin Kothari and Avinash Keskar 11 A New Robust Combined Method for Auto Exposure and Auto White-Balance .....................................................165 Quoc Kien Vuong, Se-Hwan Yun, and Suki Kim 12 A Mathematical Analysis Around Capacitive Characteristics of the Current of CSCT: Optimum Utilization of Capacitors of Harmonic Filters .............................179 Mohammad Golkhah and Mohammad Tavakoli Bina 13 Harmonic Analysis and Optimum Allocation of Filters in CSCT ............................................................191 Mohammad Golkhah and Mohammad Tavakoli Bina 14 Digital Pen and Paper Technology as a Means of Classroom Administration Relief ........................................203 Jan Broer, Tim Wendisch, and Nina Willms 15 A Conceptual Model for a Network-Based Assessment Security System ...............................................................217 Nathan Percival, Jennifer Percival, and Clemens Martin 16 Incorrect Weighting of Absolute Performance in Self-Assessment ............................................................231 Scott A. Jeffrey and Brian Cozzarin Chapter 1 2D/3D Image Data Analysis for Object Tracking and Classification Seyed Eghbal Ghobadi, Omar Edmond Loepprich, Oliver Lottner, Klaus Hartmann, Wolfgang Weihs, and Otmar Loffeld Abstract Object tracking and classification is of utmost importance for different kinds of applications in computer vision. In this chapter, we analyze 2D/3D image data to address solutions to some aspects of object tracking and classification. We conclude our work with a real time hand based robot control with promising results in a real time application, even under challenging varying lighting conditions. Keywords 2D/3D image data Registration Fusion Feature extraction Tracking Classification Hand-based robot control 1.1 Introduction Object tracking and classification are the main tasks in different kinds of appli- cations such as safety, surveillance, man–machine interaction, driving assistance system and traffic monitoring. In each of these applications, the aim is to detect and find the position of the desired object at each point in time. While in the safety ap- plication, the personnel as the desired objects should be tracked in the hazardous environments to keep them safe from the machinery, in the surveillance application they are tracked to analyze their motion behavior for conformity to a desired norm for social control and security. Man-Machine-Interaction, on the other hand has be- come an important topic for the robotic community. A powerful intuitive interaction between man and machine requires the robot to detect the presence of the user and interpret his gesture motion. A driving assistance system detects and tracks the ob- stacles, vehicles and pedestrians in order to avoid any collision in the moving path. The goal of traffic monitoring in an intelligent transportation system is to improve the efficiency and reliability of the transport system to make it safe and convenient S.E. Ghobadi (), O.E. Loepprich, O. Lottner, K. Hartmann, W. Weihs, and O. Loffeld Center for Sensor Systems (ZESS), University of Siegen, Paul-Bonatz-Str.9-11, D57068 Siegen, Germany;;;;; S.-I. Ao et al. (eds.), Advances in Machine Learning and Data Analysis, Lecture Notes in Electrical Engineering 48, DOI 10.1007/978-90-481-3177-8 1, c Springer Science+Business Media B.V. 2010 1 2 S.E. Ghobadi et al. for the people. There are still so many significant applications in our daily life in which object tracking and classification plays an important role. Nowadays, the 3D vision systems based on Time of Flight (TOF) which deliver range information have the main advantage to observe the objects three-dimensionally and therefore they have become very attractive to be used in the aforementioned applications. However, the current TOF sensors have low lateral resolution which makes them inefficient for accurate processing tasks in the real world problems. In this work, we first pro- pose a solution to this problem by introducing our novel monocular 2D/3D camera system and then we will study some aspects of object tracking and classification using 2D/3D image data. 1.2 2D/3D Vision System Although the current optical TOF sensors [13–16] can provide intensity images in addition to the range data, they suffer from a low lateral resolution. This drawback can be obviated by combining a TOF camera with a conventional one. This com- bination is a tendency in the recent research works because even with regard to the emerging new generation of TOF sensors with high resolution,1 an additional 2D sensor still results in a higher resolution and provides additional color information. With regard to the measurement range, however, the problem of parallax does not allow to simply position two cameras next to each other and overlay the generated images. The multimodal data acquisition device used in this work is a recently devel- oped monocular 2D/3D imaging system, named MultiCam. This camera, which is depicted in Fig. 1.1, consists of two imaging sensors: A conventional 10-bit CMOS gray scale sensor with VGA resolution and a Photonic Mixer Device (PMD) with a resolution of 64 48 pixels. The PMD is an implementation of an optical Time of Flight (TOF) sensor, able to deliver range data at quite high frame rates. Camera Body Infrared Lighting F-Mount Camera Lens (Rear Aperture) Sensor Window AR Ctd. for NIR IR Cut @ 730nm (optional) 2D CMOS Sensor for VIS glass NIR Edge Filter @ 1000nm Sensor Window AR Ctd. for NIR IR Cut @ 730nm (optional) 2D CMOS Sensor for VIS NIR Edge Filter @ 1000nm VIS -NIR beam splitter @ 760nm, AR Ctd. for NIR 3D PMD Sensor for modulated NIR glass C-Mount Camera Lens (Rear Aperture) 3D PMD Sensor for modulated NIR VIS-NIR Beamsplitter @ 760nm, AR Ctd. for NIR Lens Fig. 1.1 Left: 2D/3D vision system (MultiCam) developed at ZESS. Middle: F-mount optical setup. Right: C-mount optical setup 1 For example PMD-40K (200 200 pixels), Swissranger 4000 (176 144 pixels) and ZCam- prototype (320 480 pixels). 1 2D/3D Image Data Analysis for Object Tracking and Classification 3 The principles of this sensor will be presented briefly in the next subsection. In ad- dition, a beam splitter (see again Fig. 1.1), a near-infrared lighting system, a FPGA based processing unit as well as an USB 2.0 communication interface represent the remaining main components of this camera. It should be mentioned that the dichroic beam splitter behind the camera lens is used in order to divide the incident light into two spectral ranges: The visible part, which is forwarded to the CMOS chip and the near-infrared part to the TOF sensor [6]. Thus, the MultiCam is actually a multi- spectral device. In fact, through the use of the MultiCam, one is able not just to achieve distance data at high frame rates (100 FPS and above) but also high resolution color images provided by the CMOS sensor. The novelty hereby is that a monocular setup is used which avoids parallax effects and makes the camera calibration a lot simpler along with the possibility to synchronize the 2D and 3D images down to several microseconds. 1.2.1 3D-Time of Flight Camera Basically, the principle of the range measurement in a TOF camera relies upon the time difference t that the light needs to travel a distance d as follows t D d c (1.1) where c represents the speed of light. As a lighting source, we use a modulated light signal (fmod D 20MHz), which is generated using a MOSFET based driver and a bank of high speed infrared emitting diodes. The illuminated scene then is observed by an intelligent pixel array (the PMD chip), where each pixel samples the amount of modulated light. To determine the distance d, we measure the phase delay ' in each pixel. Recall that ' D 2 fmod t whichinturnleadsusto d D c ' 2 fmod :(1.2) Since the maximal phase difference of ' can be 2, the unambiguous distance interval for range measurement at a modulation frequency of 20 MHz is equal to 15 m. This leads to the maximal (unambiguous) target distance of 7.5 m since the light has to travel the distance twice. In order to be able to use (1.2) for the distance computation in the TOF camera we have to multiply the equation by a factor of 0.5. To calculate the phase delay ', the autocorrelation function of the electrical an optical signal is analyzed by a phase-shift algorithm. Using four samples A1;A2;A3 and A4, each shifted by =2, the phase delay can be calculated using [1] ' D arctan A1 A3 A2 A4 :(1.3) 4 S.E. Ghobadi et al. In addition, the strength a of the signal, which in fact can be seen as its quality, along with the gray scale b can be formulated as follows [5] a D 1 2 p .A1 A3/2 C.A2 A4/2;(1.4) b D 1 4 4X iD1 Ai :(1.5) The environment lighting conditions in the background should be considered in all optical TOF sensors. There are various techniques dealing with this issue like us- ing optical filters which only pass the band interested in, or applying some algorithm techniques that remove the noise artifacts of ambient light [8]. In our case, the PMD chip used has an in-pixel so-called SBI-circuitry (Suppression of Background Illu- mination) which increases the sensor dynamics under strong light conditions [1,13]. 1.2.2 2D/3D Image Registration and Synchronization As a prerequisite to profit from the 2D/3D multi-modality, the temporal and spatial relation of the individual sensors’ images must be determined. Temporal Synchronization The detailed disquisition on the temporal synchronization of the individual sensors of the MultiCam points out that the camera’s internal control unit (FPGA) can syn- chronize the 2D and the 3D sensor in the temporal domain within the limits of the clock resolution and minimal jitter due to the signal run times in the electronics. The synchronization can either refer to the beginning or to the end of the integration time of a 2D image and a single phase image. While the most common configuration is the acquisition of one 2D image per four phase images such that a new 2D image is available along with a new range image, it is also possible to acquire a new 2D image per phase image if necessary. Figure 1.2 gives an overview of the different possibilities. If the synchronization of the 2D image relates to the first phase image, the tem- poral distance between the individual phase images is not equal, as the second phase image is captured only after the end of the 2D sensor’s integration time. In contrast to that, synchronizing to the fourth phase image has the advantage of temporally equidistant phase images. In both configurations, it can occur that a change of the scene is represented only by one of both sensors if this change is outside of the actual integration time. With regard to the total time needed for the acquisition of a complete 2D/3D image, these two configurations do not differ from each other. However, the synchronization to the range image rather than to any of the phase 1 2D/3D Image Data Analysis for Object Tracking and Classification 5 Fig. 1.2 Different possibilities of synchronizing of the 2D image to the 3D data images is advantageous in that the total acquisition time is kept to a minimum, and in that the temporal equidistance of the phase images is maintained. The discus- sion on the motion artifacts in [10] gives details on the impacts of the individual configurations. Binocular setups taking two complete cameras are evidently not as flexible and as precise as the very neatly controlled approach of the MultiCam. Spatial Registration In a 2D/3D vision system, regardless of the kind of setup, a 3D scene is im- aged by two two-dimensional matrices of pixels with a degree of overlap which is a-priori unknown but qualitatively assumed to be high without loss of generality. Both sensors operate in a different spectrum (NIR vs. VIS) and have a different modality, i.e., the grey values of the scene represent different physical proper- ties. Due to the operation principle, the sensors operate with different illumination sources meaning that the effects of the illumination must be taken into consideration (corresponding features may be dissimilar due to different lighting conditions). Both sensors have a different resolution with the 2D sensor’s resolution being higher. The TOF sensor acquires the distance to an observed point with an accuracy and repro- ducibility in the range of a centimeter. The relative arrangement of both sensors is not a function of the time but is known in advance with only an insufficient accuracy, meaning that the configuration needs to be calibrated initially. The aim of the image registration is to establish a spatial transform that maps points from one image to homologous points in a target image as follows Œx1;y1; z1TD f Œx2;y2; z2T :(1.6) 6 S.E. Ghobadi et al. The actual transform model depends on the individual configuration. In the following the monocular setup is going to be presented. Considering the special case of the MultiCam, the sensors share a common lens, a common extrinsic cal- ibration and the same scene is imaged with the same scale. First, the uncorrected view after sensor alignment is described. This analysis is useful for detecting an- gle errors which can occur if the angle of 45ı (90ı respectively) between the beam splitter and the 2D sensor (the PMD sensor respectively) is not exactly adjusted. For this purpose, a test pattern is put in front of the camera and is recorded with both sensors. This test pattern consists of a grid of circles. It is assumed that the reflec- tivity of this test pattern in the visible spectrum does not differ significantly from the reflectivity in the near-infrared spectrum. In that case, the circles’ middle points can be detected reliably with both sensors which results in two sets of matching control points PPMD and P2D. Figure 1.3 shows the average of the displacement between these two sets in units of 2D pixels as a function of the distance between the camera and the pattern for a constant focal length. The average and the stan- dard deviation are computed out of all the circles’ middle points in the image. It can be observed that the displacement averages are stable over distance, which means that there is virtually no angle error in the sensor alignment in the observed range. By examining the displacement distribution over the whole image, it can be further concluded that the displacement arrangement does not reveal significant local de- viations. Consequently a global rigid transformation model can be used which is independent of the location in the image. Additionally, the uncorrected view shows that the pixel-to-pixel mapping is fixed with a negligible rotational component and which, in particular, is independent from the depth. What remains is a transforma- tion composed only of a two-dimensional translational displacement. Consequently, an iterative closest point algorithm2 is used to find an optimal solution. Fig. 1.3 Uncorrected dislocation of PMD and 2D sensor in the MultiCam 4,5 avg. displacement = f (distance) (uncorrected view) distance to a calibration target [m] 4,0 Displacement [2D pixel] 3,5 2,0 3,0 2,5 1,5 0 2 4 2 Levenberg-Marquardt algorithm; the optimization criterion is the sum of squared distances of the individual points. 1 2D/3D Image Data Analysis for Object Tracking and Classification 7 1.3 Multimodal Data Fusion and Segmentation The TOF camera delivers three data items for each pixel at each time step: intensity, range and amplitude of the received modulated light. The intensity image of the TOF camera comparable to the intensity images in CCD or CMOS cameras relies on the environment lighting conditions, whereas the range image and the amplitude of the received modulated light are mutually dependent. None of these individual data can be used solely to make a robust segmentation under variant lighting conditions. Fusing these data provides a new feature informa- tion which is used to improve the performance of the segmentation technique. In this paper we have used the basic technique for the fusing of the range and intensity data which has already been used in other fields like SAR imaging. We observed that the range data in our TOF sensor is dependent on the reflection factor of the object surface (how much light is reflected back from the object). Therefore, there is a correlation between the intensity and range vector sets in a TOF image. These two vector sets are fused to derive a new data set, so-called “phase”, which indicates the angle between two intensity and range vector sets. The details of this technique is presented in our previous works [12]. Another type of fusion which has also been used in our work is to weight the value of the range for each pixel using the modulation amplitude which adjusts the range level in the regions where the range data might get wrong. However, using MultiCam, we can acquire low resolution TOF images with their corresponding features derived from fusion; and high resolution 2D Images. For seg- mentation, same as in [12], first we apply the unsupervised clustering technique to segment the low resolution TOF images. Next, we map the 3D segmented image to 2D image. Due to the monocular setup of MultiCam, mapping the 3D range image to the 2D image is a trivial and fast task which consequently makes the segmentation of high resolution 2D image computationally cheap. This kind of segmentation has two main advantages over 2D segmentation. On the one hand 3D range segmenta- tion is more reliable and robust in the natural environment where lighting conditions might change and on the other hand due to the low resolution of 3D image, segmen- tation is faster. An example of such a segmentation is shown in Fig. 1.4. 1.4 Object Tracking and Classification One of the approach for object identification in tracking process is to use a classifier directly to distinguish between different detected objects. In fact, if the classification method is fast enough to operate at image acquisition frame rate, it can be used directly for tracking as well. For example, supervised learning techniques such as Support Vector Machines (SVM) and AdaBoost can be directly employed to classify the objects in each frame because they are fast techniques which can work at real time rate for many applications. 8 S.E. Ghobadi et al. 1010 100 100 100 100 100 100 100 100 200 200 200 200 200 200 200 200 300 300 300 300 300 300 300 300 400 400 400 400 400 400 400 400 500 500 500 500 600 600 600 600 20 10 10 20 2020 30 30 3030 40 40 4040 50 5060 60 Fig. 1.4 Segmentation in 2D/3D images. Top Left: low resolution range image from TOF sensor. Top Middle: high resolution 2D image. Top Right: modulation amplitude image. Bottom Left:3D rescaled segmented image. Bottom Middle: rescaled segmented image using fusion of range and modulation amplitude data. Bottom Right: 2D segmented image result from mapping In this section, we describe tracking with classifier more in detail by applying a supervised classifier based on AdaBoost to 2D/3D videos in order to detect and track the desired object. After segmentation of the image which was described in the previous section, in the next step the Haar-Like features are extracted and used as the input data for the AdaBoost classifier. Haar-like features which have been used successfully in face tracking and classification problems encode some information about the object to be detected. For a much more in depth understanding the reader is referred to [11]. However, there are two main issues in real time object detection based on Haar- Like features and using AdaBoost technique. The first issue is that background noise in the training images degrades detection accuracy significantly, esp. when it is a cluttered background with varying lighting condition which is the case in many real world problems. The second issue is that computation of all sub-windows (search windows) in an image for every scale is too costly if the real time constraints are to be met. The fundamental idea of our algorithm is to address the solution to these problems using fusion of 3D range data with 2D images. In order to extinguish the background issue from object recognition problem, the procedure of object detec- tion is divided into two levels. In the low level we use range data in order to: (i) Define a 3D volume where the object of interest is appearing (Volume of Interest) and eliminate the background to achieve robustness against cluttered backgrounds and (ii) Segment the foreground image into different clusters. In the high level we map the 3D segmented image to its corresponding 2D color image and apply Viola- Jones method [11] (searching with Haar-Like features) to find the desired object in the image. Figure 1.5 shows some examples of this procedure for hand detection and tracking. 1 2D/3D Image Data Analysis for Object Tracking and Classification 9 Fig. 1.5 Solution to the background issue in object detection using Viola-Jones method. Us- ing range data the cluttered background is removed and the foreground image is segmented and mapped to 2D image. Viola-Jones technique is applied to 2D segmented image to find the object of interest Fig. 1.6 Selection of search windows using range information for hand detection. Left:Handis far from the camera and therefore the image is searched with small search windows. Right:Hand is close to the camera and therefore the image is scanned with large search windows to find the hand in the image The second issue (Computation of all search windows in an image for every scale is too costly.) can be addressed by using the range information directly. After segmentation, the distance of the segmented object from the camera can be easily derived from 3D range image. By having the information about the distance of ob- ject from the camera, its size can be roughly estimated and a set of search windows which could fit to the size of the object is selected and therefore there is no need to use all possible size of search windows to find the object. This reduces the com- putational cost of the Viola-Jones technique to a great extent which is a significant point in real time applications. An example of selecting search windows for hand detection is illustrated in Fig. 1.6. 10 S.E. Ghobadi et al. 1.5 Real Time Hand Based Robot Control Using 2D/3D Images Nowadays, robots are used in the different domains ranging from search and res- cue in the dangerous environments to the interactive entertainments. The more the robots are employed in our daily life, the more a natural communication with the robot is required. Current communication devices, like keyboard, mouse, joystick and electronic pen are not intuitive and natural enough. On the other hand, hand gesture, as a natural interface means, has been attracting so much attention for interactive communication with robots in the recent years [2–4, 9]. In this con- text, vision based hand detection and tracking techniques are used to provide an efficient real time interface with the robot. However, the problem of visual hand recognition and tracking is quite challenging. Many early approaches used position markers or colored gloves to make the problem of hand recognition easier, but due to their inconvenience, they can not be considered as a natural interface for the robot control. Thanks to the latest advances in the computer vision field, the recent vision based approaches do not need any extra hardware except a camera. These techniques can be categorized as: model based and appearance based methods [7]. While model based techniques can recognize the hand motion and its shape exactly, they are computationally expensive and therefore they are infeasible for a real time control application. The appearance based techniques on the other hand are faster but they still deal with some issues such as: complex nature of the hand with more than 20 DOF, cluttered and variant background, variation in lighting conditions and real time computational demand. In this section we present the results of our work in a real time hand based tracking system as an innovative natural commanding system for a Human Robot Interaction (HRI). 1.5.1 Set-Up Set-up mainly consists of three parts: (1) A six axis, harmonic driven robot from Kuka of type KR 3 with attached magnetic grabber. The robot itself has been mounted onto an aluminium rack along with the second system component. (2) A dedicated robot control unit, responsible for robot operation and communication by running proprietary software from Kuka c company. (3) The main PC responsible for data acquisition from 2D/3D imaging system (MultiCam) and running the al- gorithms. Communication between the robot control unit and the application PC is done by exchanging XML-wrapped messages via TCP/IP. The network architecture follows a strict client server model, with the control unit as the client connecting to the main PC, running a server thread, during startup. 1 2D/3D Image Data Analysis for Object Tracking and Classification 11 1.5.2 Control Application In order to make the communication system more convenient for the user, all the necessary commands to control the robot, such as moving the robot in 6 directions .xC;x;yC;y; zC; z/ or (de)activating the grabber (palm-to-fist or vice versa) are done by using a self developed GUI based application illustrated in Fig. 1.7. As a first step, we track the user’s hand movement in a predefined volume covered by the MultiCam, followed by mapping its real world position into a virtual space which is represented by a cuboid of defined size and correlates with the MultiCam’s view frustum. Hand movement is visualized by placing a 3D hand-model in the according location within the cuboid. Depending on the hand’s distance from the cuboid’s center, a velocity vector is generated, along with some other state informa- tion, and sent to the robot’s control unit which is in charge of sending the appropriate information to the robot itself. By placing the virtual hand in the cuboid’s center, the system can be put in a susceptible mode for special commands. For that matter, a rudimentary gesture classification algorithm has been implemented which is able to distinguish between a fist and a palm. We use defined fist to palm transition se- quences (e.g., a palm-fist-palm transition) in order to perform a robot reset, put the system in predefined modes and to (de)activate the magnetic grabber which in turn enables the whole system to handle ferric objects (Fig. 1.8). Fig. 1.7 Left: Hand based robot control using MultiCam, Hannover Fair 2008, Right: Graphical User Interface (GUI) Fig. 1.8 Some results, left: example of correctly detected images (True Positive), right:example of wrongly detected images (First row: missed hand, Second row: misclassified) 12 S.E. Ghobadi et al. Table 1.1 Confusion table for hand detection system Hand Non-hand Hand 2,633 87 Non-hand 224 2,630 Sum 2,857 2,717 1.5.3 Experimental Results For the Hannover Fair 2008, a simple task had been defined to be performed by the visitors and to put the system’s performance under the test as follows: Commanding the robot to move in six directions using moving the hand with any kind of posture in the corresponding directions, picking up a metal object with the magnet grabber using palm to fist gesture, moving the object using the motion of the hand and finally dropping it in the defined areas with palm to fist gesture. It turned out that the system handling has been quite intuitive, since different people have been able to operate the robot instantly. In terms of reliability the whole system worked flawlessly during the complete time exposed at the fair. For training of the classifier we took 1037 positive hand images from seven people, and 1,269 negative images from non-hand objects in our lab environment. Using OpenCV we trained our classifier with 20 stages and window size of 32 32. Although the classifier was trained under the lab conditions, it worked quite well under the extreme lighting conditions at the fair. In order to analyze the performance of the system, we recorded the results of hand detection from our GUI in the video format while different users were commanding the robot. Likewise, we moved the camera and took the videos from the environ- ment. These videos are labeled as “Positive” and “Negative” data. While positive stands for the hand, the negative represents the non-hand objects. The data were ac- quired using a PC with dual core 2.4 GHz CPU. The exposure time for 3D sensor was set at 2ms while for 2D sensor it was about 10 ms. Under these conditions, we had about 15 detected images (including all algorithms computational time) per second. The confusion matrix derived from these videos with 2857 hand images and 2717 non-hand images is shown in Table 1.1. As it can be calculated from this table, the system has a Hit Rate of 0.921, False Positive Rate of 0.032 and the recognition accuracy of 94.4%. 1.6 Conclusion In this work we study some aspects of object detection and tracking using 2D/3D Images. These images are provided by a monocular 2D/3D vision system, so-called MultiCam. The principle of this camera system as well as the registration and fusion of 2D/3D data are discussed. This work is concluded with some results of a real time hand based robot control application which was demonstrated at Hannover fair in Germany in 2008. 1 2D/3D Image Data Analysis for Object Tracking and Classification 13 Acknowledgments This work has been funded by German Research Foundation (DFG) under contract number LO 455/10-2 which is gratefully appreciated. References 1. Moeller, T. Kraft, H. and Frey, J.: Robust 3D Measurement with PMD Sensors, PMD Tech- nologies GmbH, 2. Wang, C.C. and Wang, K.C.: Hand Posture Recognition Using Adaboost with SIFT for Human Robot Interaction, International Conference on Advanced Robotics, 2007. 3. Cerlinca, T.L., Pentiuc, S.G. and Cerlinca, M.C.: Hand Posture Recognition for Human-Robot Interaction. Proceedings of the 2007 workshop on Multimodal interfaces in semantic interac- tion,2007. 4. Malima, A., Ozgur, E. and Cetin, M.: A Fast Algorithm for Vision-based Hand Gesture Recognition for Robot Control. IEEE Conference on Signal Processing and Communications Applications, 2006. 5. Ghobadi, S.E., Hartmann, K., Weihs, W., Netramai, C., Loffeld, O. and Roth, H.: Detection and Classification of Moving Objects-Stereo or Time-of-Flight Images, IEEE conference on Computational Intelligence and Security, 2006, China. 6. Lottner, O., Hartmann, K., Weihs, W. and Loffeld, O.: Image Registration and Calibration aspects for a new 2D/3D camera, EOS Conference on Frontiers in Electronic Imaging, June 2007, Munich, Germany. 7. Fang, Y., Wang, K., Cheng, J. and Lu, H.: A Real-Time Hand Gesture Recognition Method. 2007 IEEE International Conference on Multimedia and Expo. 8. Gokturk, S.B., Yalcin, H. and Bamji, C.: A Time of Flight Depth Sensor, System Description, Issues and Solutions, on IEEE workshop on Real-Time 3D Sensors and Their Use in conjunc- tion with IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Washington, USA 2004. 9. Rogalla, O., Ehrenmann, M., Zoellner, R., Becher, R. and Dillmann, R.: Using Gesture and Speech Control for Commanding a Robot Assistant. 11th IEEE International Workshop on Robot and Human Interactive Communication, 2002. 10. Lottner, O., Sluiter, A., Hartmann, K. and Weihs, W.: Movement Artefacts in Range Images of Time-of-Flight Cameras, EOS DOI: 10.1109/ISSCS.2007.4292665, 2007 Romania. 11. Viola, P. and Jones, M.: Rapid Object Detection using a Boosted Cascade of Simple Features. Conference on Computer vision and Pattern Recognition, 2001. 12. Ghobadi, S.E., Loepprich, O., Hartmann, K. and Loffeld, O.: Hand Segmentation Using 2D/3D Images, IVCNZ 2007 Conference, Hamilton, New Zealand, 5–7. December, 2007. 13. PMD-Technologie; 14. Swissranger; C.C.S. d’Electronique SA, 15. Canesta, Inc., 16. 3DV Systems, ZCam; Chapter 2 Robot Competence Development by Constructive Learning Q. Meng, M.H. Lee, and C.J. Hinde Abstract This paper presents a constructive learning approach for developing sensor-motor mapping in autonomous systems. The system’s adaptation to environ- ment changes is discussed and three methods are proposed to deal with long term and short term changes. The proposed constructive learning allows autonomous sys- tems to develop network topology and adjust network parameters. The approach is supported by findings from psychology and neuroscience especially during infants cognitive development at early stages. A growing radial basis function network is introduced as a computational substrate for sensory-motor mapping learning. Exper- iments are conducted on a robot eye/hand coordination testbed and results show the incremental development of sensory-motor mapping and its adaptation to changes such as in tool-use. Keywords Developmental robotics Biologically inspired systems Constructive learning Adaptation 2.1 Introduction In many situations such as home services for elderly and disabled people, ar- tificial autonomous systems (e.g., robots) need to work for various tasks in an unstructured environment, system designers cannot anticipate every situation and program the system to cope with them. This is different from the traditional in- dustrial robots which mostly work in structured environments and are programmed each time for a specific task. Autonomy, self-learning and organizing, and adapt- ing toenvironment changes are crucial for these artificial systems to successfully Q. Meng () and C.J. Hinde Department of Computer Science, Loughborough University, LE11 3TU, UK e-mail: M.H. Lee Department of Computer Science, University of Wales, Aberystwyth, SY23 3DB, UK S.-I. Ao et al. (eds.), Advances in Machine Learning and Data Analysis, Lecture Notes in Electrical Engineering 48, DOI 10.1007/978-90-481-3177-8 2, c Springer Science+Business Media B.V. 2010 15 16 Q. Meng et al. fulfil various challenging tasks. Traditional controllers for intelligent systems are designed by hand, and they do not have such flexibility and adaptivity. General cog- nitivist approach for cognition is based on symbolic information processing and representation, and does not need to be embodied and physically interact with the environment. Most cognitivist-based artificial cognitive systems rely on the experi- ence from human designers. Human beings [1] and animals face similar problems during their develop- ment of sensor-motor coordination, however we can tackle these problems without too much effort. During human cognitive development, especially at the early stages, each individual undergoes changes both physically and mentally through interaction with environments. These cognitive developments are usually staged, exhibited as behavioural changes and supported by neuron growth and shrinking in the brain. Two kinds of developments in the brain support the sensory-motor coordination: quantitative adjustments and qualitative growth [19]. Quantitative adjustments refer to the adjustments of the synapse connection weights in the network and qualitative growth refers to the changes of the topology of the net- work. Inspired by developmental psychology especially Piaget’s sensory-motor development theory of infants [12], developmental robotics focuses on mecha- nisms, algorithms and architectures for robots to incrementally and automatically build their skills through interaction with their environment[21]. The key features of developmental robotics share similar mechanisms with human cognitive de- velopment which include learning through sensory-motor interaction; scaffolding by constraints; staged, incremental and self-organizing learning; intrinsic moti- vation driven exploration and active learning; neural plasticity, task transfer and adaptation. In this paper, we examine robot sensory-motor coordination devel- opment process at early stages through a constructive learning algorithm. Con- structive learning which is inspired by psychological constructivism, allows both quantitative adjustments and qualitative network growth to support the develop- mental learning process. Most static neural networks need to predefine the net- work structure and learning can only affect the connection weights, and they are not consistent with developmental psychology. Constructive learning is sup- ported by recent neuroscience findings of synaptogenesis and neurogenesis oc- curring under pressures to learn [16, 20]. In this paper, a self-growing radial ba- sis function network (RBF) is introduced as the computational substrate, and a constructive learning algorithm is utilized to build the sensory-motor coordina- tion development. We investigate the plasticity of the network in terms of self- growing in network topology (growing and shrinking) and adjustments of the parameters of each neuron: neuron position, the size of receptive field of each neuron, and connection weights. The networks adaptation to systems changes is further investigated and demonstrated by eye/hand coordination test scenario in tool-use. 2 Robot Competence Development by Constructive Learning 17 2.2 Sensory-Motor Mapping Development Via Constructive Learning In order to support the development of sensor-motor coordination, a self-growing RBF network is introduced due to its biological plausibility. There exists very strong evidence that humans use basis functions to perform sensorimotor transfor- mations[15], Poggio proposed that the brain uses modules as basis components for several of its information processing subsystems and these modules can be realized by generalized RBF networks [13,14]. There are three layers in the RBF network: input layer, hidden layer and output layer. The hidden layer consists of radial basis function units (neurons), the size of receptive field of each neuron varies and the overlaps between fields are different. Each neuron has its own centre and coverage. The output is the linear combination of the hidden neurons. A RBF network is expressed as: f.x/D a0 C NX kD1 akk.x/(2.1) k.x/D exp 1 2 k kx µkk2 ! (2.2) where f.x/D.f1.x/; f2.x/; ;fNo .x//T is the vector of system outputs, No is the number of outputs and X is the system input. ak is the weight vector from the hidden unit k.x/ to the output, N is the number of radial basis function units, and µk and k are the kth hidden unit’s center and width, respectively. 2.2.1 Why Constructive Learning? According to Shultz [19, 20], in addition to that constructive learning is supported by biological and psychological findings, there are several advantages of construc- tive learning over static learning: first, constructive-network algorithms learn fast (in polynomial time) compared with static learning (exponential time), and static learn- ing maybe never solve some problems as the designer of a static network must first find a suitable network topology. Second, constructive learning may find optimal solutions to the bias/variance tradeoff by reducing bias via incrementally adding hid- den units to expand the network and the hypothesis space, and by reducing variance via adjusting connection weights to approach the correct hypothesis. Third, static learning cannot learn a particular hypothesis if it has not been correctly represented, a network may be too weak to learn or too powerful to generalize. Constructive learning avoids this problem because its network growth enables it to represent a hypothesis that could not be represented previously with limited network power. 18 Q. Meng et al. 2.2.2 Topological Development of the Sensory-Motor Mapping Network During the development of sensory-motor mapping network, two mechanisms exist: topological changes of the mapping network and network parameter adjustments. The qualitative growth of the sensory-motor mapping network depends on the novelty of the sensory-motor information which the system obtained during its interaction with the environment in development, the growth is incremental and self-organizing. The sensory-motor mapping network starts with no hidden units, and with each development step, i.e., after the system observes the consequence of an action, the network grows or shrinks when necessary or adjusts the network parameters accordingly. The network growth criteria are based on the novelty of the observations, which are: whether the current network prediction error for the current learning observation is bigger than a threshold, and whether the node to be added is far enough from the existing nodes in the network: ke.t/k D ky.t/ f.x.t//k >e1, kx.t/ µr .t/k >e3. In order to ensure smooth growth of the network the predic- tion error is checked within a sliding window: s tP jDt.m1/ ke.j /k2 m >e2,where, .x.t/; y.t// is the learning data at tth step, and µr .t/ is the centre vector of the near- est node to the current input x.t/. m is the length of the observation window. If the above three conditions are met, then a new node is inserted into the network with the following parameters: aNC1 D e.t/; µNC1 D x.t/; NC1 D k kx.t/ µr .t/k, where, k is the overlap factor between hidden units. The above network growth strategy does not include any network pruning, which means the network size will become large, some of the hidden nodes may not contribute much to the outputs and the network may become overfit. In order to overcome this problem, we use a pruning strategy as in [8], over a period of learning steps, to remove those hidden units with insignificant contribution to the network outputs. Let onj be the jth output component of the nth hidden neuron, onj D anj exp. kx.t/µnk2 2n /; rnj D onj max.o1j ;o2j ; ;oNj /: If rnj <ıfor M consecutive learning steps, then the nth node is removed. ı is a threshold. 2.2.3 Parameter Adjustments of the Sensory-Motor Mapping Network There are two types of parameters in the network, the first type of parameter is the connection weights; the second is parameters of each neuron in the network: the position and the size of receptive field of each neuron. A simplified node- decoupled EKF (ND-EKF) algorithm was proposed to update the parameters of each node independently in order to speed up the process. The parameters of the 2 Robot Competence Development by Constructive Learning 19 network are grouped into No CN components. The first No groups are the weights, wk D Œa0k ;a1k; ;aNkT;k D 1; 2; ;No (aij is the weight from ith hidden node to jth output); and the rest N groups are the parameters of hidden units’ parameters: wk DŒµT k ;kT;k D 1; 2; ;N. The superscript T stands for transpose of a matrix. So for kth parameter group at tth learning step, ND-EKF is given by: wk.t/ D wk.t 1/ CKk.t/ek.t/ (2.3) where ek.t/ D yk.t/ fk.x.t// k D 0; 1; 2; ;No y.t/ f.x.t// k DNo C 1; ;No CN(2.4) and Kk.t/ is the kalman gain, yk.t/ is the kth component of y.t/ in training data .x.t/; y.t//,Bk.t/ is the submatrix of derivatives of network outputs with respect to the kth group’s parameters at tth learning step. Rk.t/ is the variance of the measurement noise, and is set to be diag./ ( is a constant) in this paper. q is a scalar that determines the allowed random step in the direction of the gradient vector. In our algorithm, an extended Kalman filter is used to adjust the systems’s param- eters. There may exist a similar mechanism in our brain. Recent research findings has found evidences that Kalman filtering occurs in visual information process- ing [17,18], motor coordination control [22], and spatial learning and localization in the hippocampus [1,21]. In hippocampus studies, a Kalman filtering framework has been mapped to the entorhinal-hippocampal loop in a biologically plausible way [1, 21]. According to the mapping, region CA1 in the hippocampus holds the system reconstruction error signal, and the internal representation is maintained by Entorhinal Cortex (EC) V–VI. The output of CA1 corrects the internal represen- tation, which in turn corrects the reconstruction of the input at EC layers II–III. O’Keefe also provided a biologically plausible mechanism by which matrix inver- sions might be performed by the CA1 layer through an iterated update scheme and in conjunction with the subiculum [11]. In addition, the matrix inversion lemma has been widely used in computational neuroscience [4]. 2.3 Adaptation of Sensory-Motor Mapping Two kinds of changes in our daily life may require the learned sensory-motor map- ping to update: short term changes and long term changes. For the short term, humans may just reuse learned knowledge and quickly adjust some parameters to adapt to the environment changes. But for the longer term, after an adult is trained in a special environment or for a special tasks for a long time, they may grow new neu- rons to gain new skills, and to enhance the already acquired knowledge. Examples of these two kinds of changes can be found during human development, the kinematics of limbs and bodily structures are not fixed during human growth but may change, either slowly over long periods during growth and bodily maturation, or rapidly 20 Q. Meng et al. such as when we use tools to extend the reach or function of our manipulation abil- ities. It has been discovered that infants learn and update sensorimotor mappings by associating spontaneous motor actions and their sensory consequences [12]. It takes a relatively long time to build up the mapping skills, which involves neuron growth processes in the brain to support the sensorimotor transformation. After an adult has gained the basic skills, they can quickly adapt to different situations, for example, an adult can quickly adapt to the use of a pointer to point to a seen target. This indicates that after rapid structural changes we do not learn new sensorimo- tor skills from scratch, rather we reuse the existing knowledge and simply (and quickly) adjust some parameters. Maguire et al [9] studied the structural changes in the hippocampi of licensed London tax drivers. They found that taxi drivers had a significantly greater volume in the posterior hippocampus, whereas control subjects showed greater volume in the anterior hippocampus. Maguire’s study suggests that the human brain grows or shrinks to reflect the cognitive demands of the environ- ment, even for adults. In autonomous systems, some parameters may gradually change after a long time use, the systems need to adapt to these changes automatically. Autonomous systems have additional situations where structures may change suddenly, these may be un- intentional, for example when damage occurs through collisions, or by design when a new tool is fitted to the arm end-effector. For these reasons it is important for au- tonomous systems in unstructured environments to have the ability to quickly adjust the existing mapping network parameters so as to automatically re-gain the eye/hand coordination skills. We note that humans can handle this problem very well. Recent neurophysiological, psychological and neuropsychological research provides strong evidence that temporal, parietal and frontal areas within the left cerebral hemisphere in humans and animals are involved and change during activities where the hand has been extended physically, such as when using tools [2, 3, 5, 6, 10]. Japanese macaque monkeys were trained to use a rake to pull food closer, which was origi- nally placed beyond the reach of their hands [2, 3]. The researchers found that, in monkeys trained in tool-use, a group of bimodal neurons in the anterior bank of the intraparietal sulcus, which respond both to somatosensory and visual stimuli related to the hand, dynamically altered their visual receptive field properties (the region where a neuron responds to certain visual stimuli) during training of the tool-use. In this paper, we develop approaches of adapting to environments, and more specifically, different robot limb sizes in our experiments, were investigated and compared. All these adaptation skills are usually not available in commercial calibration-based eye/hand mapping systems. In our plastic RBF network for robotic eye/hand mapping, the knowledge learned for the mapping is stored in the network in terms of the number of neurons, their positions and sizes of receptive fields, and the node weights. In order to quickly adapt to structural changes of the robotic system, this knowledge needs to be reused in some way rather than setting up the network again from empty. In this paper, we considered three methods for such adaptation, all of them reuse the learned knowl- edge by adjusting the learned network: 1. Full adjustment of the learned network after a structural change. This includes network topological changes by adding new hidden nodes or remove existing 2 Robot Competence Development by Constructive Learning 21 ones if necessary, and adjusting the following parameters: the centres and widths of the existing nodes, and the weights from the hidden nodes to the outputs. 2. Adjusting the weights of the learn+. ed network, removing the insignificant hid- den units, but keeping the rest of the hidden units unchanged. 3. Only adjusting the weights, and keeping the hidden unit structure of the learned network completely unchanged. 2.4 Experimental Studies 2.4.1 Experimental System In this paper, the robot eye/hand coordination is used as a testbed to demonstrate the process of constructive learning and adaptation of the sensory-motor mapping network to the changes. The experimental robot system has two manipulator arms and a motorized pan/tilt head carrying a color CCD camera as shown in Fig. 2.1. Each arm can move within 6 degrees of freedom. The whole system is controlled by a PC running XP which is responsible for controlling the two manipulator arms, any tools, the pan/tilt head, and also processing images from the CCD camera and other sensory information. The control program is written in C++. In this paper only one of the robot arms was used. In the experiments we com- manded the robot arm to move randomly at a fixed height above the table by driving joint 2 and joint 3 of the robot arm. After each movement, if the hand was in the current field of view of the camera, the eye system moved the camera to centre on the end of the robot finger, and then the pan/tilt head position .p; t/ and current arm Fig. 2.1 Experimental system for developmental coordination learning 22 Q. Meng et al. joint values of the two joints used .j 2; j 3/ were obtained to form a training set for the system; otherwise, if the hand tip is out of the view of the camera, this trial was ignored because the eye could not locate the arm end before setting up the mapping between pan/tilt and robot arm. After each trial, the obtained data .p;t;j2;j3/was used to train the mapping network, and this data was used only once. In order to simplify the image processing task of finding the end of the robot finger we marked the finger end with a blue cover. The position of the blue marker could be slid up and down the finger to effectively alter the length of the finger. 2.4.2 Constructive Learning and Adaptation in Tool-Use To illustrate the network topological growth and parameter adjustments in construc- tive learning, Fig. 2.2 gives the structures of the hidden units at the 100th learning step and the 1597th learning step in eye/hand mapping. The results shows that at the beginning, the system used large neurons to quickly cover the whole space, and later on gradually built the details with smaller neurons when necessary, this let the system achieve more accuracy. This neuron growing process from coarse to fine us- ing different neuron coverages is similar to infant development where the decrease in the size of neural receptive fields in the cortical areas relates to object recogni- tion ability [24]. Figure 2.2 also demonstrates the changes of position and size of receptive field of each neuron. It should be noted that some neurons are removed in the learning process due to their small contribution to the sensory-motor mapping network. Our next experiment was to test the network’s adaptability to sudden changes in the motor-sensory relationship due to structural changes. We chose changes in −1000 0 1000 2000 3000 4000 −500 0 500 1000 1500 2000 2500 3000 3500 4000 Pan Tilt (a) Structure of hidden units at the 100th learning step −1500 −500 500 1500 2500 3500 −1500 −500 500 1500 2500 3500 4000 Pan Tilt (b) Structure of hidden units at the 1597th learning step Fig. 2.2 Distribution of the hidden units and their coverage in eye/hand mapping by RBF with SDEKF. The background points are the input learning points in the pan and tilt space of the camera head, and the circles are the hidden units of the eye/hand mapping network 2 Robot Competence Development by Constructive Learning 23 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 2500 Learning steps Output error (a) Fully updating the learned network 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 2500 Learning steps Output error (b) Updating the weights of the learned network, with pruning 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 2500 Learning steps Output error (c) Only updating weights of the learned network, no pruning Fig. 2.3 Adapting to structural change by reusing the learned network in different ways finger length as a scenario to test this adaptability. Using a variety of tools with different sizes is necessary for a robot system to conduct different tasks, and the eye/hand mapping network’s ability to quickly adapt to this change is crucial for the robot to re-gain its eye/hand coordination skills. We have tested three approaches to reusing and adjusting the learned eye/hand mapping network in order to re-gain coordination skills. As a test, at the 1598th trial (a purely arbitrary point) the finger length was changed in size from 27.5cm long to 20.5 cm long and we investigated the adaptation of the system to such a sudden change. Figure 2.3(a) shows the output error when all the parameters of the learned network are adjusted, including adding possible nodes, moving node centres, adjusting widths of each node, and updating the weights. Figure 2.3(b) and Fig. 2.3(c) show the results of only adjusting the weights and keeping the parameters of the hidden units unchanged, but Fig. 2.3(b) used a pruning procedure as described in Section 2.2.2 to remove the insignificant hidden units, while Fig. 2.3(c) kept the hidden unit structure completely unchanged. From the results, we can see that all three methods quickly adapt to the sudden change in finger size. The method of adjusting the full network parameters achieved the best result. Although the other two methods did not change the parameters of the hidden units of the learned network, they obtained reasonable small errors. It is important to note that, the third method, which completely reused the original hidden unit structure in the mapping network and only adjusted weights, achieved a quite similar result to the second method with pruning. This may be similar to the approach that adults adopt to handle tool changes. We can quickly adapt to structural changes with little effort, but during such short time-scales we cannot regenerate 24 Q. Meng et al. 0 10 20 30 40 50 60 0 500 1000 1500 2000 2500 3000 Learning steps Number of hidden units Updating all paramters weights only, without pruning weight only, with pruning Fig. 2.4 The number of hidden units for the three approaches receptive fields in our brain, and so may only reuse the knowledge already learned and quickly adjust the weights of the existing neurons. But if we are trained to use this tool for a long time, we may improve our operation skills as we might grow new neurons to support the changes as in Fig. 2.3(a). Now considering network size, as shown in Fig. 2.4, the first method with full updating of all network parameters required by far the largest network, 48 nodes; while the second method removed three hidden units, reducing the network to 16 nodes; the third method kept the original network size, 19 nodes. We have also studied the staged development in sensory-motor mapping learning process [7]. The system constructs sensory-motor schemas in terms of interlinked topological mappings of sensory-motor events, and demonstrates that the construc- tive learning moves to next stage if stable behaviour patterns emerges. 2.5 Conclusions Constructive learning has advantages over static learning in sensory-motor map- ping development for autonomous systems. It supports both topological network growth and parameter adjustments, which is supported by findings in psychology and neuroscience. It also has the advantage of adaptation to system changes such as in tool-use. A growing radial basis function network by constructive learning con- structs the computational substrate for such sensory-motor mapping development. It forms a platform to examine the relationship between behaviour development and the growth of internal sensory-motor mapping network; the staged and devel- opmental learning process through various constraints in motors and sensors; and active behaviour learning driven by intrinsic motivation. The experimental results on robot eye/hand coordination demonstrate the incremental growth of the mapping network and the system’s adaptation to environmental changes. 2 Robot Competence Development by Constructive Learning 25 References 1. Bousquet, O, Balakrishnan, K., and Honavar, V (1998). Is the hippocampus a Kalman filter? In Pacific Symposium on Biocomputing, pages 655–666, Hawaii. 2. Hihara, S., Notoya, T., Tanaka, M., Ichinose, S., Ojima, H., Obayashi, S., Fujii, N., and Iriki, A. (2006). Extension of corticocortical afferents into the anterior bank of the intraparietal sulcus by tool-use training in adult monkeys. Neuropsychologia, 44(13):2636–2646. 3. Hihara, S., Obayashi, S., Tanaka, M., and Iriki, A. (2003). Rapid learning of sequential tool use by macaque monkeys. Physiology and Behavior, 78:427–434. 4. Huys, Quentin JM, Zemel, Richard S, Natarajan, Rama, and Dayan, Peter (2007). Fast popu- lation coding. Neural Computation, 19(2):404–441. 5. Imamizu, H., Miyauchi, S., Tamada, T., Sasaki, Y., Takino, R., Puetz, B., Yoshioka, T., and Kawato, M. (2000). Human cerebellar activity reflecting an acquired internal model of a novel tool. Nature, 403:192–195. 6. Johnson-Frey, Scott H. (2004). The neural bases of complex tool use in humans. Trends in Cognitive Science, 8(2):71–78. 7. Lee, M.H., Meng, Q., and Chao, F. (2007). Developmental learning for autonomous robots. Robotics and Autonomous Systems, 55(9):750–759. 8. Lu, Yingwei, Sundararajan, N., and Saratchandran, P. (1998). Performance evaluation of a sequential minimal radial basis function (RBF) neural network learning algorithm. IEEE Trans- actions on neural networks, 9(2):308–318. 9. Maguire, Eleanor A., Gadian, David G., Johnsrude, Ingrid S., Goodd, Catriona D., Ashburner, John, Frackowiak, Richard S. J., and Frith, Christopher D. (2000). Navigation-related structural change in the hippocampi of taxi drivers. PNAS, 97(8):4398–4403. 10. Maravita, A. and Iriki, A. (2004). Tools for the body (schema). Trends in Cognitive Science, 8(2):79–86. 11. O’Keefe, J. (1989). Computations the hippocampus might perform. In Nadel, L., Cooper, L.A., Culicover, P., and Harnish, R.M., editors, Neural connections, mental computation. MIT Press, Cambridge, MA. 12. Piaget, Jean (1952). The Origins of Intelligence in Children. Norton, New York, NY. 13. Poggio, Tomaso (1990). A theory of how the brain might work. MIT AI. memo No. 1253. 14. Poggio, Tomaso and Girosi, Federico (1990). Networks for approximation and learning. Pro- ceedings of the IEEE, 78(9):1481–1497. 15. Pouget, A. and Snyder, L.H. (2000). Computational approaches to sensorimotor transforma- tions. Nature Neuroscience supplement, 3:1192–1198. 16. Quartz, S.R. and Sejnowski, T.J. (1997). The neural basis of cognitive development: A con- structivist manifesto. Brain and Behavioral Sciences, 20:537–596. 17. Rao, R. and Ballard, D. (1997). Dynamic model of visual recognition predicts neural response properties in the visual cortex. Neural Computation, 9(4):721–763. 18. Rao, R. and Ballard, D. (1999). Predictive coding in the visual cortex. Nature Neuroscience, 2(1):79–87. 19. Shultz, T.R (2006). Constructive learning in the modeling of psychological development. In Munakata, Y. and Johnson, M.H., editors, Processes of change in brain and cognitive develop- ment: Attention and performance XXI, pages 61–86. Oxford: Oxford University Press. 20. Shultz, T.R., Mysore, S.P., and Quartz, S. R. (2007). Why let networks grow. In Mareschal, D., Sirois, S., Westermann, G., and Johnson, M.H., editors, Neuroconstructivism: Perspectives and prospects, volume 2, chapter 4, pages 65–98. Oxford: Oxford University Press. 21. Szirtes, GKabor, P´oczos, BarnabKas, and L˝orincz, AndrKas (2005). Neural Kalman filter. Neuro- computing, 65–66:349–355. 22. Todorov, E. and Jordan, M.I. (2002). Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5(11):1226–1235. 26 Q. Meng et al. 23. Weng, Juyang, McClelland, James, Pentland, Alex, Sporns, Olaf, Stockman, Ida, Sur, Mriganka, and Thelen, Esther (2001). Autonomous mental development by robots and animals. Science, 291(5504):599–600. 24. Westermann, G. and Mareschal, D. (2004). From parts to wholes: Mechanisms of development in infant visual object processing. Infancy, 5(2):131–151. Chapter 3 Using Digital Watermarking for Securing Next Generation Media Broadcasts Dominik Birk and Se«an Gaines Abstract The Internet presents a problem for the protection of intellectual property. Those who create content must be adequately compensated for the use of their works. Rights agencies who monitor the use of these works exist in many juris- dictions. In the traditional broadcast environment this monitoring is a difficult task. With Internet Protocol Television (IPTV) and Next Generation Networks (NGN) this situation is further complicated. In this work we focus on Digitally Watermarking next generation media broad- casts. We present a framework which provides the ability to monitor media broad- casts that also utilises a Public Key Infrastructure (PKI) and Digital Certificates. Furthermore, the concept of an independent monitoring agency, that would operate the framework and act as an arbiter, is introduced. We finally evaluate appro- priate short signature schemes, suitable Watermarking algorithms and Watermark robustness. Keywords Next generation networks Broadcast monitoring Public key watermarking IPTV PKI Short signature 3.1 Introduction Radio and television are audio and video broadcasting services typically broad- casted over the air, cable networks or satellite networks. Since the advent of the Internet, the distribution of media content has always been a principal goal, how- ever for many years this was not realised due to the prohibitive cost and limited capabilities of personal computers. D. Birk () Horst G¨ortz Institute for IT Security, Ruhr-University Bochum, Building IC 4, D-44780, Bochum, Germany e-mail: S. Gaines VICOMTech Research Center, Paseo Mikeletegi 57, E-20006, San Sebastian, Spain S.-I. Ao et al. (eds.), Advances in Machine Learning and Data Analysis, Lecture Notes in Electrical Engineering 48, DOI 10.1007/978-90-481-3177-8 3, c Springer Science+Business Media B.V. 2010 27 28 D. Birk and S. Gaines With the transition to digital media streams received over the Internet, new challenges loom. Today, the practices of recording, distribution and copying mul- timedia content is easy and straightforward [11]. Due to these facts, it is more and more difficult to enforce copyright and to safeguard intellectual property for broad- cast media. Digital Watermarking [6], which may be considered as a form of steganography [8], attempts to address this problem by embedding information within the digi- tal signal. The embedded Watermark is invisible to the user, should not affect the perceived aesthetic quality of the final signal, nor should the Watermark reveal any clues about the technique used to embed it. However, it is debatable whether tradi- tional Watermarking systems, which are based on disclosure of the key needed to embed and to detect the watermark are generally suitable for proving ownership or authentication. Therefore, we established a framework based on asymmetric public- key cryptography which is used for exhaustive authentication with the help of Blind Watermarking techniques. In many jurisdictions broadcasters have regulatory obligations which attempt to protect the intellectual property [8] and copyrights of authors, songwriters, perform- ers, actors, publishers, etc. Furthermore, in some jurisdictions there exists bodies charged with the defense of the rights of intellectual property and copyright hold- ers and the calculation, charging and collection of performance royalties on the use of these protected works. Currently, there are several cases in which broadcasters cannot confidentially confirm that their royalties liabilities are correctly calculated. This is because they currently do not employ a viable automated system to measure what protected works are broadcasted, how often and when. Therefore a gap has opened up in the actual amount charged by the rights bodies and the correct payable royalties liability of the broadcaster. This paper describes a specific Watermarking concept that may be used for iden- tifying next generation media broadcast streams based on PKI authentication. We introduce a formal PKI framework in section 3.3 allocating authentication meth- ods and then focus on procedures and measures for Watermarking media streams in legacy networks as well as NGNs using PKI. We prove our proposal through an exemplified scenario on video stream Watermarking. 3.2 Framework Overview A general overview of the framework with its three parties can be seen in Fig. 3.1. It makes use of two additional frameworks, the PKI and the Watermarking Framework. The PKI Framework, described in chapter 3.3, is used for establishing a trust network between all of the involved entities. The broadcaster (BC) can initialise the monitoring process for metering his use of protected works and hence the royalties payable rights entity can also launch the monitoring process for billing purposes. In practice, the PKI procedures (3.1) should be established as the first step in the 3 Using Digital Watermarking for Securing Next Generation Media Broadcasts 29 Broadcaster Monitoring Agency Rights Entity Request for Monitoring Request for Monitoring Hash Table Watermarking Framework Tracking PKI Framework Processing 5 4 2 3 2 1 File Selection Fig. 3.1 General Framework Overview deployment of the framework. The PKI is necessary for establishing a trusted re- lationship with the purpose of distributing authenticated private and public keys utilising digital certificates. To start the process of monitoring, a “Request for Mon- itoring” (3.2)issenttothemonitoring agency (MA). Afterwards, the broadcaster selects a piece of content which he wants to stream (3.3) and computes the corresponding hash table (see Section This hash table is carried over a secure and authenticated channel to the MA as well as to the rights entity (EX). Afterwards, the broadcaster initiates the process defined by the Watermarking Framework. The Watermarking Framework specifies procedures for Watermark embedding, retrieval and verification in media streams (3.4). The rights entity is the entity which charges broadcasters for distributing media content over a chosen distribu- tion network. It also attempts to track and process broadcast media with the help of information obtained by the monitoring agency. The broadcaster will sign the stream which is about to be broadcasted with his private key. Subsequently, the corresponding signature will be embedded into the media stream with known Wa- termarking techniques. Later in the process, the monitoring agency will extract the Watermark and verify the signature. So, the agency can be sure that only the original broadcaster broadcasted the media content, due to the fact that additional security metadata, such as timestamps and identifiers, are used. Additionally, EX can also verify the signature in order to prevent abuse by the MA (5). 30 D. Birk and S. Gaines The objective of the whole framework is to let the broadcaster mark the file stream uniquely but also provides the monitoring agency with the possibility to identify the broadcast stream and therefore the corresponding broadcaster. Within this framework, non-repudiation is also provided. This means that the broadcaster cannot deny having broadcasted a Watermarked media stream. 3.3 PKI Framework The PKI framework makes use of a root Certificate Authority (CA) in which each participating entity must trust. The monitoring agency, rights entity and the broad- caster submit their created public keys (PK) or create the keys directly at the CA for receiving the corresponding Certificate and the Certificates of all other participants. 3.3.1 Overview Trust forms the basis of all communication, be it physical or electronic. In the case of electronic communication, building trust is quite difficult as the identity of the other entity remains concealed. While a PKI normally provides confidential- ity, non-repudiation, authentication and integrity, our framework mainly focuses on authentication and non-repudiation. A detailed description of the three stages in Fig. 3.2 will be given in the following section. 1. This step demonstrates a particular need of a PKI. The public key (PKCA)ofthe CA has to be pre-distributed in an authenticated manner to any involved entity, otherwise no secure communication with the CA is possible. 2. As soon as the entities have received the authenticated PKCA over a secure, au- thenticated channel, they create their own secret (SKX, SKMA and SKBC)and public key (PKX, PKMA and PKBC) and subsequently send a PKCS#10 Cer- tificate request to the Certificate Authority. With Digital Signatures in Certificate requests, the CA can be sure that the sender has a private key related to the public key. Therefore, the sender has a proof of possession [4] but the receiver needs to assure that the entity with which he is communicating is not spoofed. 3. If the CA receives the Certification request, it behaves like a Registration Au- thority (RA) and tries to validate the information stored in the PKCS#10 file. If it is valid, a X.509 Certificate is issued by signing the corresponding PK of the entity. Afterwards, all Certificates are distributed to all entities for authentication reasons. So, the broadcaster owns now a Certificate concerning the PK of EX which was issued by the corresponding CA. The Certificate will be used during the Watermarking processes in order to authenticate the sender. 3 Using Digital Watermarking for Securing Next Generation Media Broadcasts 31 Superior Certificate Authority Monitoring Agency 3 1 1 1 2 2 2 3 3 create(SKMA, PKMA) Certx 509 = signsk(PK) create(SKBC, PKBC) create(SKX, PKX) Broadcaster Rights Entity Fig. 3.2 PKI Framework Overview As previously mentioned, the main purpose of these protocol steps is providing full authentication. Now, the broadcasting entity could sign a message and send it to the monitoring agency or indeed to the rights entity and both entities could be assured, that the message was sent by the broadcaster. 3.4 Watermarking Framework The Watermarking Framework, illustrated in Fig. 3.3, specifies the communication protocol between the broadcaster and the monitoring agency in which the rights entity is not involved. Furthermore, the Watermarking Framework provides a de- tailed insight into procedures for creating, detecting, extracting and verifying the Watermark. 3.4.1 Overview The framework is initialised at the moment the broadcaster had chosen a content file and transferred the corresponding hash table to the MA (see Algorithm Afterwards, no further information needs to be sent to the MA due to the use of 32 D. Birk and S. Gaines File Stream Watermark () Detection () Stored Data Signature Signature Watermarked Stream Original File Original File Compare () Hash () Extraction () True? Flase? ID-str ID-num ID-time Hashtable SKBC Hash () Broadcaster Monitoring Agency Broadcasting Nyberg- Rueppel- Algorithm Fig. 3.3 Watermarking Framework Overview message recovering signatures. So the MA can be sure about who broadcast the stream and what stream has been broadcasted. This information is sufficient for metering and charging purposes. The chief characteristic of a Watermarking scheme for copyright protection, or DRM, is that the Watermark cannot be separated from the medium without knowl- edge of a secret value. We, in our specific case, target on another characteristic: sender authentication. It should be possible to identify the broadcasting station un- ambiguously and show exactly who broadcasted what stream and when. Therefore, our Watermark information contains a digital signature issued by the broadcaster. Each entity that receives the broadcast stream and owns the 3 Using Digital Watermarking for Securing Next Generation Media Broadcasts 33 corresponding broadcaster certificate, can clearly verify the distributed stream with the help of the corresponding PK. Below, we discuss suitable signature schemes and Watermarking algorithms. We introduce adequate procedures for embedding and retrieving the Watermark with the help of a beacon in addition to verifying the signature. 3.4.2 Signature Schemes A principal requirement to all Watermarking systems is the need for a small Water- mark. The larger the Watermark, the larger are the chances for adversely affecting the quality of the streamed media. In our case, the Watermark depends on the cor- responding signature which has to authenticate the media stream. Interference and transaction defects could cause problems in extracting the Watermark. Therefore, the signature scheme output has to be as small [9] as is possible to be able to embed the Watermark as often as possible and to be repeated multiple times throughout the stream. The Nyberg-Rueppel ([10], hereafter NR) signature scheme focuses on the size of the input and output and is a DSA-like signature with message recovery. The Nyberg-Rueppel Signature Scheme NR is perfectly suited to messages shorter than ten bytes but leaves the question of dealing with short messages, of say fifteen bytes, unanswered. In our specific case, the hash to be signed is exactly 10 bytes long and brings only a marginal risk of collision (see Section for further details). Message recovery [1], another characteristic of NR signatures, provides means so that the original message can be extracted out of the signature. In our given case, this characteristic aligns with our goals. The hash value of the message m does not need to be created by the monitoring agency, due to the fact that it can be extracted from the signature due to the aforementioned message recovery characteristic. However, it is necessary to transfer a hash table (see Section once from the BC to the MA. This could happen in periodical time-frames. The complete NR algorithm is shown in Algorithm 3.1 using the standard dis- crete logarithm (DL) problem. Short Hash Methods Hash functions are often used in digital signature algorithms. The message m that is about to be hashed, in our case, consists of an identifier string ID-str concatenated with an ID number ID-num and an unique times-tamp ID-time: 34 D. Birk and S. Gaines Algorithm 3.1 Nyberg-Rueppel signature generation and verification Summary: the broadcaster signs a message m 2 M. The monitoring agency can verify the broad- caster’s signature and recover the message m from the signature. 1. Signature Generation. The broadcaster has to do the following: (a) Compute Qm = R.m/. (b) Select a random secret integer k, 1 k q 1 and compute r D ˛k mod p. (c) Compute e DQmr mod p (d) Compute s D ae C k mod q. (e) The broadcaster’s signature for the specific m is the pair .e; s/. 2. Verification. To verify the broadcaster’s signature .e; s/ on m, the monitoring agency should do the following: (a) Obtain the broadcaster’s authentic public key .p;q;˛;y/ and verify it with the corresponding certificate delivered by the CA earlier (see Fig. 3.2). (b) Verify that 0



需要 6 金币 [ 分享文档获得金币 ] 0 人已下载