Mathematical Game Theory and Applications Vladimir Mazalov www.it-ebooks.info www.it-ebooks.info Mathematical Game Theory and Applications www.it-ebooks.info www.it-ebooks.info Mathematical Game Theory and Applications Vladimir Mazalov Research Director of the Institute of Applied Mathematical Research, Karelia Research Center of Russian Academy of Sciences, Russia www.it-ebooks.info This edition first published 2014 © 2014 John Wiley & Sons, Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Library of Congress Cataloging-in-Publication Data Mazalov, V. V. (Vladimir Viktorovich), author. Mathematical game theory and applications / Vladimir Mazalov. pages cm Includes bibliographical references and index. ISBN 978-1-118-89962-5 (hardback) 1. Game theory. I. Title. QA269.M415 2014 519.3–dc23 2014019649 A catalogue record for this book is available from the British Library. ISBN: 978-1-118-89962-5 Set in 10/12pt Times by Aptara Inc., New Delhi, India. 1 2014 www.it-ebooks.info Contents Preface xi Introduction xiii 1 Strategic-Form Two-Player Games 1 Introduction 1 1.1 The Cournot Duopoly 2 1.2 Continuous Improvement Procedure 3 1.3 The Bertrand Duopoly 4 1.4 The Hotelling Duopoly 5 1.5 The Hotelling Duopoly in 2D Space 6 1.6 The Stackelberg Duopoly 8 1.7 Convex Games 9 1.8 Some Examples of Bimatrix Games 12 1.9 Randomization 13 1.10 Games 2 × 216 1.11 Games 2 × n and m × 218 1.12 The Hotelling Duopoly in 2D Space with Non-Uniform Distribution of Buyers 20 1.13 Location Problem in 2D Space 25 Exercises 26 2 Zero-Sum Games 28 Introduction 28 2.1 Minimax and Maximin 29 2.2 Randomization 31 2.3 Games with Discontinuous Payoff Functions 34 2.4 Convex-Concave and Linear-Convex Games 37 2.5 Convex Games 39 2.6 Arbitration Procedures 42 2.7 Two-Point Discrete Arbitration Procedures 48 2.8 Three-Point Discrete Arbitration Procedures with Interval Constraint 53 www.it-ebooks.info vi CONTENTS 2.9 General Discrete Arbitration Procedures 56 Exercises 62 3 Non-Cooperative Strategic-Form n-Player Games 64 Introduction 64 3.1 Convex Games. The Cournot Oligopoly 65 3.2 Polymatrix Games 66 3.3 Potential Games 69 3.4 Congestion Games 73 3.5 Player-Specific Congestion Games 75 3.6 Auctions 78 3.7 Wars of Attrition 82 3.8 Duels, Truels, and Other Shooting Accuracy Contests 85 3.9 Prediction Games 88 Exercises 93 4 Extensive-Form n-Player Games 96 Introduction 96 4.1 Equilibrium in Games with Complete Information 97 4.2 Indifferent Equilibrium 99 4.3 Games with Incomplete Information 101 4.4 Total Memory Games 105 Exercises 108 5 Parlor Games and Sport Games 111 Introduction 111 5.1 Poker. A Game-Theoretic Model 112 5.1.1 Optimal Strategies 113 5.1.2 Some Features of Optimal Behavior in Poker 116 5.2 The Poker Model with Variable Bets 118 5.2.1 The Poker Model with Two Bets 118 5.2.2 The Poker Model with n Bets 122 5.2.3 The Asymptotic Properties of Strategies in the Poker Model with Variable Bets 127 5.3 Preference. A Game-Theoretic Model 129 5.3.1 Strategies and Payoff Function 130 5.3.2 Equilibrium in the Case of B−A B+C ≤ 3A−B 2(A+C) 132 5.3.3 Equilibrium in the Case of 3A−B 2(A+C) < B−A B+C 134 5.3.4 Some Features of Optimal Behavior in Preference 136 5.4 The Preference Model with Cards Play 136 5.4.1 The Preference Model with Simultaneous Moves 137 5.4.2 The Preference Model with Sequential Moves 139 5.5 Twenty-One. A Game-Theoretic Model 145 5.5.1 Strategies and Payoff Functions 145 5.6 Soccer. A Game-Theoretic Model of Resource Allocation 147 Exercises 152 www.it-ebooks.info CONTENTS vii 6 Negotiation Models 155 Introduction 155 6.1 Models of Resource Allocation 155 6.1.1 Cake Cutting 155 6.1.2 Principles of Fair Cake Cutting 157 6.1.3 Cake Cutting with Subjective Estimates by Players 158 6.1.4 Fair Equal Negotiations 160 6.1.5 Strategy-Proofness 161 6.1.6 Solution with the Absence of Envy 161 6.1.7 Sequential Negotiations 163 6.2 Negotiations of Time and Place of a Meeting 166 6.2.1 Sequential Negotiations of Two Players 166 6.2.2 Three Players 168 6.2.3 Sequential Negotiations. The General Case 170 6.3 Stochastic Design in the Cake Cutting Problem 171 6.3.1 The Cake Cutting Problem with Three Players 172 6.3.2 Negotiations of Three Players with Non-Uniform Distribution 176 6.3.3 Negotiations of n Players 178 6.3.4 Negotiations of n Players. Complete Consent 181 6.4 Models of Tournaments 182 6.4.1 A Game-Theoretic Model of Tournament Organization 182 6.4.2 Tournament for Two Projects with the Gaussian Distribution 184 6.4.3 The Correlation Effect 186 6.4.4 The Model of a Tournament with Three Players and Non-Zero Sum 187 6.5 Bargaining Models with Incomplete Information 190 6.5.1 Transactions with Incomplete Information 190 6.5.2 Honest Negotiations in Conclusion of Transactions 193 6.5.3 Transactions with Unequal Forces of Players 195 6.5.4 The “Offer-Counteroffer” Transaction Model 196 6.5.5 The Correlation Effect 197 6.5.6 Transactions with Non-Uniform Distribution of Reservation Prices 199 6.5.7 Transactions with Non-Linear Strategies 202 6.5.8 Transactions with Fixed Prices 207 6.5.9 Equilibrium Among n-Threshold Strategies 210 6.5.10 Two-Stage Transactions with Arbitrator 218 6.6 Reputation in Negotiations 221 6.6.1 The Notion of Consensus in Negotiations 221 6.6.2 The Matrix Form of Dynamics in the Reputation Model 222 6.6.3 Information Warfare 223 6.6.4 The Influence of Reputation in Arbitration Committee. Conventional Arbitration 224 6.6.5 The Influence of Reputation in Arbitration Committee. Final-Offer Arbitration 225 6.6.6 The Influence of Reputation on Tournament Results 226 Exercises 228 www.it-ebooks.info viii CONTENTS 7 Optimal Stopping Games 230 Introduction 230 7.1 Optimal Stopping Game: The Case of Two Observations 231 7.2 Optimal Stopping Game: The Case of Independent Observations 234 7.3 The Game ΓN(G) Under N ≥ 3 237 7.4 Optimal Stopping Game with Random Walks 241 7.4.1 Spectra of Strategies: Some Properties 243 7.4.2 Equilibrium Construction 245 7.5 Best Choice Games 250 7.6 Best Choice Game with Stopping Before Opponent 254 7.7 Best Choice Game with Rank Criterion. Lottery 259 7.8 Best Choice Game with Rank Criterion. Voting 264 7.8.1 Solution in the Case of Three Players 265 7.8.2 Solution in the Case of m Players 268 7.9 Best Mutual Choice Game 269 7.9.1 The Two-Shot Model of Mutual Choice 270 7.9.2 The Multi-Shot Model of Mutual Choice 272 Exercises 276 8 Cooperative Games 278 Introduction 278 8.1 Equivalence of Cooperative Games 278 8.2 Imputations and Core 281 8.2.1 The Core of the Jazz Band Game 282 8.2.2 The Core of the Glove Market Game 283 8.2.3 The Core of the Scheduling Game 284 8.3 Balanced Games 285 8.3.1 The Balance Condition for Three-Player Games 286 8.4 The 𝜏-Value of a Cooperative Game 286 8.4.1 The 𝜏-Value of the Jazz Band Game 289 8.5 Nucleolus 289 8.5.1 The Nucleolus of the Road Construction Game 291 8.6 The Bankruptcy Game 293 8.7 The Shapley Vector 298 8.7.1 The Shapley Vector in the Road Construction Game 299 8.7.2 Shapley’s Axioms for the Vector 𝜑i(v) 300 8.8 Voting Games. The Shapley–Shubik Power Index and the Banzhaf Power Index 302 8.8.1 The Shapley–Shubik Power Index for Influence Evaluation in the 14th Bundestag 305 8.8.2 The Banzhaf Power Index for Influence Evaluation in the 3rd State Duma 307 8.8.3 The Holler Power Index and the Deegan–Packel Power Index for Influence Evaluation in the National Diet (1998) 309 8.9 The Mutual Influence of Players. The Hoede–Bakker Index 309 Exercises 312 www.it-ebooks.info CONTENTS ix 9 Network Games 314 Introduction 314 9.1 The KP-Model of Optimal Routing with Indivisible Traffic. The Price of Anarchy 315 9.2 Pure Strategy Equilibrium. Braess’s Paradox 316 9.3 Completely Mixed Equilibrium in the Optimal Routing Problem with Inhomogeneous Users and Homogeneous Channels 319 9.4 Completely Mixed Equilibrium in the Optimal Routing Problem with Homogeneous Users and Inhomogeneous Channels 320 9.5 Completely Mixed Equilibrium: The General Case 322 9.6 The Price of Anarchy in the Model with Parallel Channels and Indivisible Traffic 324 9.7 The Price of Anarchy in the Optimal Routing Model with Linear Social Costs and Indivisible Traffic for an Arbitrary Network 328 9.8 The Mixed Price of Anarchy in the Optimal Routing Model with Linear Social Costs and Indivisible Traffic for an Arbitrary Network 332 9.9 The Price of Anarchy in the Optimal Routing Model with Maximal Social Costs and Indivisible Traffic for an Arbitrary Network 335 9.10 The Wardrop Optimal Routing Model with Divisible Traffic 337 9.11 The Optimal Routing Model with Parallel Channels. The Pigou Model. Braess’s Paradox 340 9.12 Potential in the Optimal Routing Model with Indivisible Traffic for an Arbitrary Network 341 9.13 Social Costs in the Optimal Routing Model with Divisible Traffic for Convex Latency Functions 343 9.14 The Price of Anarchy in the Optimal Routing Model with Divisible Traffic for Linear Latency Functions 344 9.15 Potential in the Wardrop Model with Parallel Channels for Player-Specific Linear Latency Functions 346 9.16 The Price of Anarchy in an Arbitrary Network for Player-Specific Linear Latency Functions 349 Exercises 351 10 Dynamic Games 352 Introduction 352 10.1 Discrete-Time Dynamic Games 353 10.1.1 Nash Equilibrium in the Dynamic Game 353 10.1.2 Cooperative Equilibrium in the Dynamic Game 356 10.2 Some Solution Methods for Optimal Control Problems with One Player 358 10.2.1 The Hamilton–Jacobi–Bellman Equation 358 10.2.2 Pontryagin’s Maximum Principle 361 10.3 The Maximum Principle and the Bellman Equation in Discrete- and Continuous-Time Games of N Players 368 10.4 The Linear-Quadratic Problem on Finite and Infinite Horizons 375 www.it-ebooks.info x CONTENTS 10.5 Dynamic Games in Bioresource Management Problems. The Case of Finite Horizon 378 10.5.1 Nash-Optimal Solution 379 10.5.2 Stackelberg-Optimal Solution 381 10.6 Dynamic Games in Bioresource Management Problems. The Case of Infinite Horizon 383 10.6.1 Nash-Optimal Solution 383 10.6.2 Stackelberg-Optimal Solution 385 10.7 Time-Consistent Imputation Distribution Procedure 388 10.7.1 Characteristic Function Construction and Imputation Distribution Procedure 390 10.7.2 Fish Wars. Model without Information 393 10.7.3 The Shapley Vector and Imputation Distribution Procedure 398 10.7.4 The Model with Informed Players 399 Exercises 402 References 405 Index 411 www.it-ebooks.info Preface This book offers a combined course of lectures on game theory which the author has delivered for several years in Russian and foreign universities. In addition to classical branches of game theory, our analysis covers modern branches left without consideration in most textbooks on the subject (negotiation models, potential games, parlor games, best choice games, and network games). The fundamentals of mathematical analysis, algebra, and probability theory are the necessary prerequisites for reading. The book can be useful for students specializing in applied mathematics and informatics, as well as economical cybernetics. Moreover, it attracts the mutual interest of mathematicians operating in the field of game theory and experts in the fields of economics, management science, and operations research. Each chapter concludes with a series of exercises intended for better understanding. Some exercises represent open problems for conducting independent investigations. As a matter of fact, stimulation of reader’s research is the main priority of the book. A comprehensive bibliography will guide the audience in an appropriate scientific direction. For many years, the author has enjoyed the opportunity to discuss derived results with Russian colleagues L.A. Petrosjan, V.V. Zakharov, N.V. Zenkevich, I.A. Seregin, and A.Yu. Garnaev (St. Petersburg State University), A.A. Vasin(Lomonosov Moscow State University), D.A. Novikov (Trapeznikov Institute of Control Sciences, Russian Academy of Sciences), A.V.Kryazhimskii and A.B. Zhizhchenko (Steklov Mathematical Institute, Russian Academy of Sciences), as well as with foreign colleagues M. Sakaguchi (Osaka University), M. Tamaki (Aichi University), K. Szajowski (Wroclaw University of Technology), B. Monien (Univer- sity of Paderborn), K. Avratchenkov (INRIA, Sophia-Antipolis), and N. Perrin (University of Lausanne). They all have my deep and sincere appreciation. The author expresses profound gratitude to young colleagues A.N. Rettieva, J.S. Tokareva, Yu.V. Chirkova, A.A. Ivashko, A.V. Shiptsova and A.Y. Kondratjev from Institute of Applied Mathematical Research (Kare- lian Research Center, Russian Academy of Sciences) for their assistance in typing and formatting of the book. Next, my frank acknowledgement belongs to A.Yu. Mazurov for his careful translation, permanent feedback, and contribution to the English version of the book. A series of scientific results included in the book were established within the framework of research supported by the Russian Foundation for Basic Research (projects no. 13-01-00033- a, 13-01-91158), Russian Academy of Sciences (Branch of Mathematics) and the Strategic Development Program of Petrozavodsk State University. www.it-ebooks.info www.it-ebooks.info Introduction “Equilibrium arises from righteousness, and righteousness arises from the meaning of the cosmos.” From Hermann Hesse’s The Glass Bead Game Game theory represents a branch of mathematics, which analyzes models of optimal decision- making in the conditions of a conflict. Game theory belongs to operations research, a science originally intended for planning and conducting military operations. However, the range of its applications appears much wider. Game theory always concentrates on models with several participants. This forms a fundamental distinction of game theory from optimization theory. Here the notion of an optimal solution is a matter of principle. There exist many definitions of the solution of a game. Generally, the solution of a game is called an equilibrium, but one can choose different concepts of an equilibrium (a Nash equilibrium, a Stackelberg equilibrium, a Wardrop equilibrium, to name a few). In the last few years, a series of outstanding researchers in the field of game theory were awarded Nobel Prize in Economic Sciences. They are J.C. Harsanyi, J.F. Nash Jr., and R. Selten (1994) “for their pioneering analysis of equilibria in the theory of non-cooperative games,” F.E. Kydland and E.C. Prescott (2004) “for their contributions to dynamic macroeconomics: the time consistency of economic policy and the driving forces behind business cycles,” R.J. Aumann and T.C. Schelling (2005) “for having enhanced our understanding of conflict and cooperation through game-theory analysis,” L. Hurwicz, E.S. Maskin, and R.B. Myerson (2007) “for having laid the foundations of mechanism design theory.” Throughout the book, we will repeatedly cite these names and corresponding problems. Depending on the number of players, one can distinguish between zero-sum games (antagonistic games) and nonzero-sum games. Strategy sets are finite or infinite (matrix games and games on compact sets, respectively). Next, players may act independently or form coalitions; the corresponding models represent non-cooperative games and cooperative games. There are games with complete or partial incoming information. Game theory admits numerous applications. One would hardly find a field of sciences focused on life and society without usage of game-theoretic methods. In the first place, it is necessary to mention economic models, models of market relations and competition, pricing models, models of seller-buyer relations, negotiation, and stable agreements, etc. The pioneer- ing book by J. von Neumann and O. Morgenstern, the founders of game theory, was entitled Theory of Games and Economic Behavior. The behavior of market participants, modeling www.it-ebooks.info xiv INTRODUCTION of their psychological features forms the subject of a new science known as experimental economics. Game-theoretic methods generated fundamental results in evolutionary biology. The notion of evolutionary stable strategies introduced by British biologist J.M. Smith enabled explaining the evolution of several behavioral peculiarities of animals such as aggressiveness, migration, and struggle for survival. Game-theoretic methods are intensively used in rational nature management problems. For instance, fishing quotas distribution in the ocean, timber extraction by several participants, agricultural pricing are problems of game theory. Today, it seems even impossible to implement intergovernmental agreements on natural resources utilization and environmental pollution reduction (e.g., The Kyoto Protocol) without game- theoretic analysis. In political sciences, game theory concerns voting models in parliaments, influence assessment models for certain political factions, as well as models of defense resources distribution for stable peace achievement. In jurisprudence, game theory is applied in arbitration for assessing the behavioral impact of conflicting sides on judicial decisions. We have recently observed a technological breakthrough in the analysis of the virtual information world. In terms of game theory, all participants of the global computer network (Internet) and mobile communication networks represent interacting players that receive and transmit information by appropriate data channels. Each player pursues individual interests (acquire some information or complicate this process). Players strive for channels with high- level capacities, and the problem of channel distribution among numerous players arises naturally. And game-theoretic methods are of assistance here. Another problem concerns the impact of user service centralization on system efficiency. The estimate of the centralization effect in a system, where each participant follows individual interests (maximal channel capacity, minimal delay, the maximal amount of received information, etc.) is known as the price of anarchy. Finally, an important problem lies in defining the influence of information network topology on the efficiency of player service. These are non-trivial problems causing certain paradoxes. We describe the corresponding phenomena in the book. Which fields of knowledge manage without game-theoretic methods? Perhaps, medical science and finance do so, although game-theoretic methods have also recently found some applications in these fields. The approach to material presentation in this book differs from conventional ones. We intentionally avoid a detailed treatment of matrix games, as far as they are described in many publications. Our study begins with nonzero-sum games and the fundamental theorem on equilibrium existence in convex games. Later on, this result is extended to the class of zero-sum games. The discussion covers several classical models used in economics (the models of market competition suggested by Cournot, Bertrand, Hotelling, and Stackelberg, as well as auctions). Next, we pass from normal-form games to extensive-form games and parlor games. The early chapters of the book consider two-player games, and further analysis embraces n-player games (first, non-cooperative games, and then cooperative ones). Subsequently, we provide fundamental results in new branches of game theory, best choice games, network games, and dynamic games. The book proposes new schemes of negotiations, much attention is paid to arbitration procedures. Some results belong to the author and his colleagues. The fundamentals of mathematical analysis, algebra, and probability theory are the necessary prerequisites for reading. This book contains an accompanying website. Please visit www.wiley.com/go/game_ theory. www.it-ebooks.info 1 Strategic-form two-player games Introduction Our analysis of game problems begins with the case of two-player strategic-form (equivalently, normal-form) games. The basic notions of game theory comprise Players, Strategies and Payoffs. In the sequel, denote players by I and II. A normal-form game is organized in the following way. Player I chooses a certain strategy x from a set X, while player II simultaneously chooses some strategy y from a set Y. In fact, the sets X and Y may possess any structure (a finite set of values, a subset of Rn, a set of measurable functions, etc.). As a result, players I and II obtain the payoffs H1(x, y) and H2(x, y), respectively. Definition 1.1 A normal-form game is an object Γ=< I, II, X, Y, H1, H2 >, where X, Y designate the sets of strategies of players I and II, whereas H1, H2 indicate their payoff functions, Hi : X × Y → R, i = 1, 2. Each player selects his strategy regardless of the opponent’s choice and strives for max- imizing his own payoff. However, a player’s payoff depends both on his strategy and the behavior of the opponent. This aspect makes the specifics of game theory. How should one comprehend the solution of a game? There exist several approaches to construct solutions in game theory. Some of them will be discussed below. First, let us consider the notion of a Nash equilibrium as a central concept in game theory. Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info 2 MATHEMATICAL GAME THEORY AND APPLICATIONS Definition 1.2 A Nash equilibrium in a game Γ is a set of strategies (x∗, y∗) meeting the conditions H1(x, y∗) ≤ H1(x∗, y∗), H2(x∗, y) ≤ H2(x∗, y∗) (1.1) for arbitrary strategies x, y of the players. Inequalities (1.1) imply that, as the players deviate from a Nash equilibrium, their payoffs do decrease. Hence, deviations from the equilibrium appear non-beneficial to any player. Interestingly, there may exist no Nash equilibria. Therefore, a major issue in game problems concerns their existence. Suppose that a Nash equilibrium exists; in this case, we say that the payoffs H∗ 1 = H1(x∗, y∗), H∗ 2 = H2(x∗, y∗) are optimal. A set of strategies (x, y)isoften called a strategy profile. 1.1 The Cournot duopoly We mention the Cournot duopoly [1838] among pioneering game models that gained wide popularity in economic research. The term “duopoly” corresponds to a two-player game. Imagine two companies, I and II, manufacturing some quantities of a same product (q1 and q2, respectively). In this model, the quantities represent the strategies of the players. The market price of the product equals an initial price p after deduction of the total quantity Q = q1 + q2. And so, the unit price constitutes (p − Q). Let c be the unit cost such that c < p. Consequently, the players’ payoffs take the form H1(q1, q2) = (p − q1 − q2)q1 − cq1, H2(q1, q2) = (p − q1 − q2)q2 − cq2. (1.2) In the current notation, the game is defined by Γ=< I, II, Q1 = [0, ∞), Q2 = [0, ∞), H1, H2 >. Nash equilibrium evaluation (see formula (1.1)) calls for solving two problems, viz., maxq1 H1(q1, q∗ 2) and maxq2 H2(q∗ 1, q2). Moreover, we have to demonstrate that the maxima are attained at q1 = q∗ 1, q2 = q∗ 2. The quadratic functions H1(q1, q∗ 2) and H2(q∗ 1, q2) get maxi- mized by q1 = 1 2 ( p − c − q∗ 2 ) q2 = 1 2 ( p − c − q∗ 1 ) . Naturally, these quantities must be non-negative, which dictates that q∗ i ≤ p − c, i = 1, 2. (1.3) By resolving the derived system of equations in q∗ 1, q∗ 2, we find q∗ 1 = q∗ 2 = p − c 3 www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 3 that satisfy the conditions (1.3). And the optimal payoffs become H∗ 1 = H∗ 2 = (p − c)2 9 . 1.2 Continuous improvement procedure Imagine that player I knows the strategy q2 of player II. Then his best response lies in the strategy q1 yielding the maximal payoff H1(q1, q2). Recall that H1(q1, q2) is a concave parabola possessing its vertex at the point q1 = 1 2(p − c − q2). (2.1) We denote the best response function by q1 = R(q2) = 1 2 (p − c − q2). Similarly, if the strategy q1 of player I becomes known to player II, his best response is the strategy q2 corresponding to the maximal payoff H2(q1, q2). In other words, q2 = R(q1) = 1 2(p − c − q1). (2.2) Draw the lines of the best responses (2.1)–(2.2) on the plane (q1, q2) (see Figure 1.1). For any initial strategy q0 2, construct the sequence of the best responses q(0) 2 → q(1) 1 = R(q(0) 2 ) → q(1) 2 = R(q(1) 1 ) → ⋯ → q(n) 1 = R(q(n−1) 2 ) → q(n) 2 = R(qn 1) → … The sequence (qn 1, qn 2) is said to be the best response sequence. Such iterative procedure agrees with the behavior of sellers on a market (each of them modifies his strategy depending on the actions of the competitors). According to Figure 1.1, the best response sequence of the players tends to an equilibrium for any initial strategy q(0) 2 . However, we emphasize that the best response sequence does not necessarily brings a Nash equilibrium. q2 q1p-c (q q,)21 ** p-c q2 (0) 0 p-c 2 p-c 2 Figure 1.1 The Cournot duopoly. www.it-ebooks.info 4 MATHEMATICAL GAME THEORY AND APPLICATIONS 1.3 The Bertrand duopoly Another two-player game which models market pricing concerns the Bertrand duopoly [1883]. Consider two companies, I and II, manufacturing products A and B, respectively. Here the players choose product prices as their strategies. Assume that company I declares the unit prices of c1, while company II declares the unit prices of c2. As the result of prices quotation, one observes the demands for each product on the market, i.e., Q1(c1, c2) = q − c1 + kc2 and Q2(c1, c2) = q − c2 + kc1. The symbol q means an initial demand, and the coefficient k reflects the interchangeability of products A and B. By analogy to the Cournot model, the unit cost will be specified by c. Consequently, the players’ payoffs acquire the form H1(c1, c2) = (q − c1 + kc2)(c1 − c), H2(c1, c2) = (q − c2 + kc1)(c2 − c). The game is completely defined by: Γ=< I, II, Q1 = [0, ∞), Q2 = [0, ∞), H1, H2 >. Fix the strategy c1 of player I. Then the best response of player II consists in the strategy c2 guaranteeing the maximal payoff maxc2 H2(c1, c2). Since H2(c1, c2) forms a concave parabola, its vertex is at the point c2 = 1 2(q + kc1 + c). (3.1) Similarly, if the strategy c2 of player II is fixed, the best response of player I becomes the strategy c1 ensuring the maximal payoff maxc1 H1(c1, c2). We easily find c1 = 1 2(q + kc2 + c). (3.2) There exists a unique solution to the system of equations (3.1)–(3.2): c∗ 1 = c∗ 2 = q + c 2 − k . We seek for positive solutions; therefore, k < 2. The resulting solution represents a Nash equilibrium. Indeed, the best response of player II to the strategy c∗ 1 lies in the strategy c∗ 2; and vice versa, the best response of player I to the strategy c∗ 2 makes the strategy c∗ 1. The optimal payoffs of the players in the equilibrium are given by H∗ 1 = H∗ 2 = [ q − c(1 − k) 2 − k ]2 . Draw the lines of the best responses (3.1)–(3.2) on the plane (c1, c2) (see Figure 1.2). Denote by R(c1), R(c2) the right-hand sides of (3.1) and (3.2). For any initial strategy c0 2, construct the best response sequence c(0) 2 → c(1) 1 = R(c0) 2 ) → c(1) 2 = R(c(1) 1 ) → ⋯ → c(n) 1 = R(c(n−1) 2 ) → c(n) 2 = R(cn 1) → ⋯ www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 5 c2 c2 c1 c2)(c1 , q+c ** q+c (0) 0 q+c q+c 2-k 2-k Figure 1.2 The Bertrand duopoly. Figure 1.2 demonstrates the following. The best response sequence tends to the equilibrium (c∗ 1, c∗ 2) for any initial strategy c(0) 2 . 1.4 The Hotelling duopoly This two-player game introduced by Hotelling [1929] also belongs to pricing problems but takes account of the location of companies on a market. Consider a linear market (see Figure 1.3) representing the unit segment [0, 1]. There exist two companies, I and II, located at points x1 and x2. Each company quotes its price for the same product (the parameters c1 and c2, respectively). Subsequently, each customer situated at point x compares his costs to visit each company, Li(x) = ci + |x − xi|, i = 1, 2, and chooses the one corresponding to smaller costs. Within the framework of the Hotelling model, the costs L(x) can be interpreted as the product price supplemented by transport costs. And all customers are decomposed into two sets, [0, x) and (x, 1]. The former prefer company I, whereas the latter choose company II. The boundary of these sets x follows from the equality L1(x) = L2(x): x = x1 + x2 2 + c2 − c1 2 . In this case, we understand the payoffs as the incomes of the players, i.e., H1(c1, c2) = c1x = c1 [ x1 + x2 2 + c2 − c1 2 ] , (4.1) H2(c1, c2) = c2(1 − x) = c2 [ 1 − x1 + x2 2 − c2 − c1 2 ] . (4.2) x1 x2x10 III Figure 1.3 The Hotelling duopoly on a segment. www.it-ebooks.info 6 MATHEMATICAL GAME THEORY AND APPLICATIONS A Nash equilibrium (c∗ 1, c∗ 2) satisfies the equations 𝜕H1(c1,c∗ 2) 𝜕c1 = 0, 𝜕H2(c∗ 1,c2) 𝜕c2 = 0. And so, 𝜕H1(c1, c2) 𝜕c1 = c2 − c1 2 + x1 + x2 2 − c1 2 = 0, 𝜕H2(c1, c2) 𝜕c2 = 1 − c2 − c1 2 − x1 + x2 2 − c2 2 = 0. Summing up the above equations yields c∗ 1 + c∗ 2 = 2, which leads to the equilibrium prices c∗ 1 = 2 + x1 + x2 3 , c∗ 2 = 4 − x1 − x2 3 . Substitute the equilibrium prices into (4.1)–(4.2) to get the equilibrium payoffs: H1(c∗ 1, c∗ 2) = [2 + x1 + x2]2 18 , H2(c∗ 1, c∗ 2) = [4 − x1 − x2]2 18 . Just like in the previous case, here the payoff functions (4.1)–(4.2) are concave parabolas. Hence, the strategy improvement procedure tends to the equilibrium. 1.5 The Hotelling duopoly in 2D space The preceding section proceeded from the idea that a market forms a linear segment. Actually, a market makes a set in 2D space. Let a city be a unit circle S with a uniform distribution of customers (see Figure 1.4). For the sake of simplicity, suppose that companies I and II are located at diametrally opposite points (−1, 0) and (1, 0). Each company announces a certain product price ci, i = 1, 2. Without loss of generality, we believe that c1 < c2. 10 (x, y) b2 -1 S2S1 Figure 1.4 The Hotelling duopoly in 2D space. www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 7 A customer situated at point (x, y) ∈ S compares the costs to visit the companies. Denote by 𝜌1(x, y) = √ (x + 1)2 + y2 and 𝜌2(x, y) = √ (x − 1)2 + y2 the distance to each company. Again, the total costs comprise a product price and transport costs: Li(x, y) = ci + 𝜌i(x, y), i = 1, 2. The set of all customers is divided into two subsets, S1 and S2, whose boundary meets the equation c1 + √ (x + 1)2 + y2 = c2 + √ (x − 1)2 + y2. After trivial manipulations, one obtains x2 a2 − y2 b2 = 1, where a = (c2 − c1)∕2, b = √ 1 − a2. (5.1) Therefore, the boundary of the sets S1 and S2 represents a hyperbola. The players’ payoffs take the form H1(c1, c2) = c1s1, H2(c1, c2) = c2s2, with si(i = 1, 2) meaning the areas occupied by appropriate sets. As far as s1 + s2 = 𝜋, it suffices to evaluate s2. Using Figure 1.4, we have s2 = 𝜋 2 − 2 [ a b2 ∫ 0 √ 1 + y2 b2 dy + 1 ∫ b2 √ 1 − y2dy ] = 𝜋 2 − 2 [ ab b ∫ 0 √ 1 + y2dy + 1 ∫ b2 √ 1 − y2dy ] . (5.2) The Nash equilibrium (c∗ 1, c∗ 2) of this game follows from the conditions 𝜕H1(c1, c2) 𝜕c1 = 𝜋 − s2 − c1 𝜕s2 𝜕c1 = 0, (5.3) 𝜕H2(c1, c2) 𝜕c2 = s2 + c2 𝜕s2 𝜕c2 = 0. (5.4) Revert to formula (5.2) to derive 𝜕s2 𝜕c1 = b2 − a2 b b ∫ 0 √ 1 + y2dy + a2 √ 1 + b2. (5.5) www.it-ebooks.info 8 MATHEMATICAL GAME THEORY AND APPLICATIONS By virtue of 𝜕a 𝜕c1 =−𝜕a 𝜕c2 ,wearriveat 𝜕s2 𝜕c2 =− 𝜕s2 𝜕c1 . (5.6) The function s2(c1, c2) strictly increases with respect to the argument c1. This fact is immediate from an important observation. If player I quotes a higher price, then the customer from S2 (characterized by greater costs to visit company I in comparison with company II) still benefits by visiting company II. To proceed, let us evaluate the equilibrium in this game. Owing to (5.6), the expressions (5.3)–(5.4) yield s2 ( 1 + c1 c2 ) = 𝜋. And so, if c1 < c2, then s2 must exceed 𝜋∕2. Meanwhile, this contradicts the following idea. Imagine that the price declared by company I appears smaller than the one offered by the opponent; in this case, the set of customers preferring this company (S1) becomes larger than S2, i.e., s2 <𝜋∕2. Therefore, the solution to the system (5.3)–(5.4) (if any) exists only under c1 = c2. Generally speaking, this conclusion also follows from the symmetry of the problem. Thus, we look for the solution in the class of identical prices: c1 = c2. Then s1 = s2 = 𝜋∕2 and the notation a = 0, b = 1 from (5.5) brings to 𝜕s2 𝜕c1 = 1 ∫ 0 √ 1 + y2dy = 1 2 [√ 2 +ln(1 + √ 2) ] . Formulas (5.3)–(5.4) lead to the equilibrium prices c∗ 1 = c∗ 2 = 𝜋√ 2 +ln(1 + √ 2) ≈ 1.3685. 1.6 The Stackelberg duopoly Up to here, we have studied two-player games with equal rights of the opponents (they choose decisions simultaneously). The Stackelberg duopoly [1934] deals with a certain hierarchy of players. Notably, player I chooses his decision first, and then player II does. Player I is called a leader, and player II is called a follower. Definition 1.3 A Stackelberg equilibrium in a game Γ is a set of strategies (x∗, y∗) such that y∗ = R(x∗) represents the best response of player II to the strategy x∗ which solves the problem H1(x∗, y∗) =maxx H1(x, R(x)). Therefore, in a Stackelberg equilibrium, a leader knows that a follower chooses the best response to any strategy and easily finds the strategy x∗ maximizing his payoff. www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 9 Now, analyze the Stackelberg model within the Cournot duopoly. There exist two com- panies, I and II, manufacturing a same product. At step 1, company I announces its product output q1. Subsequently, company II chooses its strategy q2. Recall the outcomes of Section 1.1; the best response of player II to the strategy q1 is the strategy q2 = R(q1) = (p − c − q1)∕2. Knowing this, player I maximizes his payoff H1(q1, R(q1)) = q1(p − c − q1 − R(q1)) = q1(p − c − q1)∕2. Clearly, the optimal strategy of this player lies in q∗ 1 = p − c 2 . Accordingly, the optimal strategy of player II makes up q∗ 2 = p − c 4 . The equilibrium payoffs of the players equal H∗ 1 = (p − c)2 8 , H∗ 2 = (p − c)2 16 . Obviously, the leader gains twice as much as the follower does. 1.7 Convex games Nash equilibria do exist in all games discussed above. Generally speaking, the class of games admitting no equilibria appears much wider. The current section focuses on this issue. For the time being, note that the existence of Nash equilibria in the duopolies relates to the form of payoff functions (all economic examples considered employ continuous concave functions). Definition 1.4 A function H(x) is called concave (convex) on a set X ⊆ Rn, if for any x, y ∈ X and 𝛼 ∈ [0, 1] the inequality H(𝛼x + (1 − 𝛼)y) ≥ (≤)𝛼H(x) + (1 − 𝛼)H(y) holds true. Interestingly, this definition directly implies the following result. Concave functions also meet the inequality H ( p∑ i=1 𝛼ixi ) ≥ p∑ i=1 𝛼iH(xi) for any convex combination of the points xi ∈ X, i = 1, … , p, where 𝛼i ≥ 0, i = 1, … , p and∑ 𝛼i = 1. The Nash theorem [1951] forms a central statement regarding equilibrium existence in such games. Prior to introducing this theorem, we prove an auxiliary result to-be-viewed as an alternative definition of a Nash equilibrium. www.it-ebooks.info 10 MATHEMATICAL GAME THEORY AND APPLICATIONS Lemma 1.1 A Nash equilibrium exists in a game Γ=< I, II, X, Y, H1, H2 > iff there is a set of strategies (x∗, y∗) such that maxx,y { H1(x, y∗) + H2(x∗, y) } = H1(x∗, y∗) + H2(x∗, y∗). (7.1) Proof: The necessity part. Suppose that a Nash equilibrium (x∗, y∗) exists. According to Definition 1.2, for arbitrary (x, y)wehave H1(x, y∗) ≤ H1(x∗, y∗), H2(x∗, y) ≤ H2(x∗, y∗) . Summing these inequalities up yields H1(x, y∗) + H2(x∗, y) ≤ H1(x∗, y∗) + H2(x∗, y∗) (7.2) for arbitrary strategies x, y of the players. And the expression (7.1) is immediate. The sufficiency part. Assume that there exists a pair (x∗, y∗) satisfying (7.1) and, hence, (7.2). By choosing x = x∗ and, subsequently, y = y∗ in inequality (7.2), we arrive at the conditions (1.1) that define a Nash equilibrium. The proof of Lemma 1.1 is finished. Lemma 1.1 allows to use the conditions (7.1) or (7.2) instead of equilibrium verification in formula (1.1). Theorem 1.1 Consider a two-player game Γ=< I, II, X, Y, H1, H2 >. Let the sets of strate- gies X, Y be compact convex sets in the space Rn, and the payoffs H1(x, y), H2(x, y) represent continuous convex functions in x and y, respectively. Then the game possesses a Nash equi- librium. Proof: We apply the ex contrario principle. Suppose that no Nash equilibria actually exist. In this case, the above lemma requires that, for any pair of strategies (x, y), there is (x′, y′) violating the condition (7.2), i.e., H1(x′, y) + H2(x, y′) > H1(x, y) + H2(x, y). Take the sets S(x′,y′) = { (x, y):H1(x′, y) + H2(x, y′) > H1(x, y) + H2(x, y) } , representing open sets due to the continuity of the functions H1(x, y) and H2(x, y). The whole space of strategies X × Y is covered by the sets S(x′,y′), i.e., ⋃ (x′,y′)∈X×Y S(x′,y′) = X × Y. Owing to the compactness of X × Y, one can separate out a finite subcovering ⋃ i=1,…,p S(xi,yi) = X × Y. For each i = 1, … , p, denote 𝜑i(x, y) = [ H1(xi, y) + H2(x, yi) − (H1(x, y) + H2(x, y)) ]+ , (7.3) www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 11 where a+ =max{a, 0}. All functions 𝜑i(x, y) enjoy non-negativity; moreover, at least, for a single i = 1, … , p the function 𝜑i(x, y) is positive according to the definition of S(xi,yi). Hence, it appears that p∑ i=1 𝜑i(x, y) > 0, ∀(x, y). Now, we define the mapping 𝜑(x, y):X × Y → X × Y by 𝜑(x, y) = ( p∑ i=1 𝛼i(x, y)xi, p∑ i=1 𝛼i(x, y)yi ) , where 𝛼i(x, y) = 𝜑i(x, y) p∑ i=1 𝜑i(x, y) , i = 1, … , p, p∑ i=1 𝛼i(x, y) = 1. The functions H1(x, y), H2(x, y) are continuous, whence it follows that the mapping 𝜑(x, y) turns out continuous. By the premise, X and Y form convex sets; consequently, the convex combinations p∑ i=1 𝛼ixi ∈ X, p∑ i=1 𝛼iyi ∈ Y. Thus, 𝜑(x, y) makes a self-mapping of the convex compact set X × Y. The Brouwer fixed point theorem states that this mapping has a fixed point (̄x, ̄y) such that 𝜑(̄x, ̄y) = (̄x, ̄y), or ̄x = p∑ i=1 𝛼i(̄x, ̄y)xi, ̄y = p∑ i=1 𝛼i(̄x, ̄y)yi. Recall that the functions H1(x, y) and H2(x, y) are concave in x and y, respectively. And so, we naturally arrive at H1(̄x, ̄y) + H2(̄x, ̄y) = H1 ( p∑ i=1 𝛼ixi, ̄y ) + H2 ( ̄x, p∑ i=1 𝛼iyi ) ≥ p∑ i=1 𝛼iH1(xi, ̄y) + p∑ i=1 𝛼iH2(̄x, yi). (7.4) On the other hand, by the definition 𝛼i(x, y) is positive simultaneously with 𝜑i(x, y). For a positive function 𝜑i(̄x, ̄y) (there exists at least one such function), one obtains (see (7.3)) H1(xi, ̄y) + H2(̄x, yi) > H1(̄x, ̄y) + H2(̄x, ̄y). (7.5) Indexes j corresponding to 𝛼j(̄x, ̄y) = 0 satisfy the inequality 𝛼j(̄x, ̄y) ( H1(xj, ̄y) + H2(̄x, yj) ) >𝛼j(̄x, ̄y) ( H1(̄x, ̄y) + H2(̄x, ̄y) ) . (7.6) Multiply the expression (7.5) by 𝛼i(̄x, ̄y) and sum up with (7.6) over all indexes i, j = 1, … , p. These manipulations yield the inequality p∑ i=1 𝛼iH1(xi, ̄y) + p∑ i=1 𝛼iH2(̄x, yi) > H1(̄x, ̄y) + H2(̄x, ̄y), www.it-ebooks.info 12 MATHEMATICAL GAME THEORY AND APPLICATIONS which evidently contradicts (7.4). And the conclusion regarding the existence of a Nash equilibrium in convex games follows immediately. This concludes the proof of Theorem 1.1. 1.8 Some examples of bimatrix games Consider a two-player game Γ=< I, II, M, N, A, B >, where players have finite sets of strate- gies, M = {1, 2, … , m} and N = {1, 2, … , n}, respectively. Their payoffs are defined by matrices A and B. In this game, player I chooses row i, whereas player II chooses column j; and their payoffs are accordingly specified by a(i, j) and b(i, j). Such games will be called bimatrix games. The following examples show that Nash equilibria may exist or not exist in such games. Prisoners’ dilemma. Two prisoners are arrested on suspicion of a crime. Each of them chooses between two actions, viz., admitting the crime (strategy “Yes”) and remaining silent (strategy “No”). The payoff matrices take the form A = ( Yes No Yes −60 No −10 −1 ) B = ( Yes No Yes −6 −10 No 0 −1 ) . Therefore, if the prisoners admit the crime, they sustain a punishment of 6 years. When both remain silent, they sustain a small punishment of 1 year. However, admitting the crime seems very beneficial (if one prisoner admits the crime and the other does not, the former is set at liberty and the latter sustains a major punishment of 10 years). Clearly, a Nash equilibrium lies in the strategy profile (Yes, Yes), where players’ payoffs constitute (−6, −6). Indeed, by deviating from this strategy, a player gains −10. Prisoners’ dilemma has become popular in game theory due to the following features. It models a Nash equilibrium leading to guaranteed payoffs (however, being appreciably worse than payoffs in the case of coordinated actions of the players). Battle of sexes. This game involves two players, a “husband” and a “wife.” They decide how to pass away a weekend. Each spouse chooses between two strategies, “boxing” and “theater.” Depending on their choice, the payoffs are defined by the matrices A = ( Boxing Theater Boxing 4 0 Theater 0 1 ) B = ( Boxing Theater Boxing 1 0 Theater 0 4 ) . In the previous game, we have obtained a single Nash equilibrium. Contrariwise, the battle of sexes admits two equilibria (actually, there exist three Nash equilibria—see the discus- sion below). The list of Nash equilibria includes the strategy profiles (Boxing, Boxing) and (Theater, Theater), but spouses have different payoffs. One gains 1, whereas the other gains 4. The Hawk-Dove game. This game is often involved to model the behavior of different animals; it proceeds from the following assumption. While assimilating some resource V (e.g., a territory), each individual chooses between two strategies, namely, aggressive strategy (Hawk) or passive strategy (Dove). In their rivalry, Hawk always captures the whole of the www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 13 resource from Dove. When two Doves meet, they share the resource equally. And finally, both individuals with aggressive strategy struggle for the resource. In this case, an individual obtains the resource with the identical probability of 1∕2, but both Hawks suffer from the losses of c. Let us present the corresponding payoff matrices: A = ( Hawk Dove Hawk 1 2 V − cV Dove 0 V∕2 ) B = ( Hawk Dove Hawk 1 2 V − c 0 Dove VV∕2 ) . Depending on the relationship between the available quantity of the resource and the losses, one obtains a game of the above types. If the losses c are smaller than V∕2, prisoners’ dilemma arises immediately (a single equilibrium, where the optimal strategy is Hawk for both players). At the same time, the condition c ≥ V∕2 corresponds to the battle of sexes (two equilibria, (Hawk, Dove) and (Dove, Hawk)). The Stone-Scissors-Paper game. In this game, two players assimilate 1 USD by simulta- neously announcing one of the following words: “Stone,” “Scissors,” and “Paper.” The payoff is defined according to the rule: Stone breaks Scissors, Scissors cut Paper, and Paper wraps up Stone. And so, the players’ payoffs are expressed by the matrices A = ⎛ ⎜ ⎜⎝ Stone Scissors Paper Stone 0 1 −1 Scissors −10 1 Paper 1 −10 ⎞ ⎟ ⎟⎠ B = ⎛ ⎜ ⎜⎝ Stone Scissors Paper Stone 0 −11 Scissors 1 0 −1 Paper −11 0 ⎞ ⎟ ⎟⎠ . Unfortunately, the Stone-Scissors-Paper game admits no Nash equilibria among the consid- ered strategies. It is impossible to suggest a strategy profile such that a player would not benefit by unilateral deviation from his strategy. 1.9 Randomization In the preceding examples, we have observed the following fact. There may exist no equilibria in finite games. The “way out” concerns randomization. For instance, recall the Stone- Scissors-Paper game; obviously, one should announce a strategy randomly, and an opponent would not guess it. Let us extend the class of strategies and seek for a Nash equilibrium among probabilistic distributions defined on the sets M = {1, 2, … , m} and N = {1, 2, … , n}. Definition 1.5 A mixed strategy of player I is a vector x = (x1, x2, … , xm), where xi ≥ 0, i = 1, … , m and m∑ i=1 xi = 1. Similarly, introduce a mixed strategy of player II as y = (y1, y2, … , yn), where yj ≥ 0, j = 1, … , n and n∑ i=1 yj = 1. Therefore, xi (yj) represents a probability that player I (II) chooses strategy i (j, respec- tively). In contrast to new strategies, we call i ∈ M, j ∈ N by pure strategies. Note that pure strategy i corresponds to the mixed strategy x = (0, … ,0,1,0,....0), where 1 occupies position i (in the sequel, we simply write x = i for compactness). Denote by X (Y)thesetofmixed strategies of player I (player II, respectively). Those pure strategies adopted with a positive probability in a mixed strategy form the support or spectrum of the mixed strategy. www.it-ebooks.info 14 MATHEMATICAL GAME THEORY AND APPLICATIONS Now, any strategy profile (i, j) is realized with the probability xiyj. Hence, the expected payoffs of the players become H1(x, y) = m∑ i=1 n∑ j=1 a(i, j)xiyj, H2(x, y) = m∑ i=1 n∑ j=1 b(i, j)xiyj. (9.1) Thus, the extension of the original discrete game acquires the form Γ=< I, II, X, Y, H1, H2 >, where players’ strategies are probabilistic distributions of x and y, and the payoff functions have the bilinear representation (9.1). Interestingly, strategies x and y make sim- plexes X = {x : m∑ i=1 xi = 1, xi ≥ 0, i = 1, … , m} and Y = {y : n∑ j=1 yj = 1, yj ≥ 0, j = 1, … , n} in the spaces Rm and Rn, respectively. The sets X and Y form convex polyhedra in Rm and Rn, and the payoff functions H1(x, y), H2(x, y) are linear in each variable. And so, the resulting game Γ=< I, II, X, Y, H1, H2 > belongs to the class of convex games, and Theorem 1.1 is applicable. Theorem 1.2 Bimatrix games admit a Nash equilibrium in the class of mixed strategies. The Nash theorem establishes the existence of a Nash equilibrium, but offers no algorithm to evaluate it. In a series of cases, one can benefit by the following assertion. Theorem 1.3 A strategy profile (x∗, y∗) represents a Nash equilibrium iff for any pure strategies i ∈ M and j ∈ N: H1(i, y∗) ≤ H1(x∗, y∗), H2(x∗, j) ≤ H2(x∗, y∗). (9.2) Proof: The necessity part is immediate from the definition of a Nash equilibrium. Indeed, the conditions (1.1) hold true for arbitrary strategies x and y (including pure strategies). The sufficiency of the conditions (9.2) can be shown as follows. Multiply the first inequal- ity H1(i, y∗) ≤ H1(x∗, y∗)byxi and perform summation over all i = 1, … , m. These operations yield the condition H1(x, y∗) ≤ H1(x∗, y∗) for an arbitrary strategy x. Analogous reasoning applies to the second inequality in (9.2). The proof of Theorem 1.3 is completed. Theorem 1.4 (on complementary slackness) Let (x∗, y∗) be a Nash equilibrium strategy profile in a bimatrix game. If for some i: x∗ i > 0, then the equality H1(i, y∗) = H1(x∗, y∗) takes place. Similarly, if for some j: y∗ j > 0, we have H2(x∗, j) = H2(x∗, y∗). Proof is by ex contrario. Suppose that for a certain index i′ such that x∗ i′ > 0 one obtains H1(i′, y∗) < H1(x∗, y∗). Theorem 1.3 implies that the inequality H1(i, y∗) ≤ H1(x∗, y∗) is valid for the rest indexes i ≠ i′. Therefore, we arrive at the system of inequalities H1(i, y∗) ≤ H1(x∗, y∗), i = 1, … , n,(9.2′) where inequality i′ turns out strict. Multiply (9.2′)byx∗ i and perform summation to get the contradiction H(x∗, y∗) < H(x∗, y∗). By analogy, one easily proves the second part of the theorem. Theorem 1.4 claims that a Nash equilibrium involves only those pure strategies leading to the optimal payoff of a player. Such strategies are called equalizing. www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 15 Theorem 1.5 A strategy profile (x∗, y∗) represents a mixed strategy Nash equilibrium profile iff there exist pure strategy subsets M0 ⊆ M, N0 ⊆ N and values H1,H2 such that ∑ j∈N0 H1(i, j)y∗ j { ≡ ≤ } H1, for { i ∈ M0 i ∉ M0 } (9.3) ∑ i∈M0 H2(i, j)x∗ i { ≡ ≤ } H2, for { j ∈ N0 i ∉ N0 } (9.4) and ∑ i∈M0 x∗ i = 1, ∑ j∈N0 y∗ j = 1. (9.5) Proof (the necessity part). Assume that (x∗, y∗) is an equilibrium in a bimatrix game. Set H1 = H1(x∗, y∗), H2 = H2(x∗, y∗) and M0 = {i ∈ M : x∗ i > 0}, N0 = {j ∈ N : y∗ j > 0}. Then the conditions (9.3)–(9.5) directly follow from Theorems 1.3 and 1.4. The sufficiency part. Suppose that the conditions (9.3)–(9.5) hold true for a certain strategy profile (x∗, y∗). Formula (9.5) implies that (a) x∗ i = 0fori ∉ M0 and (b) y∗ j = 0forj ∉ N0. Multiply (9.3) by x∗ i and (9.4) by y∗ j , as well as perform summation over all i ∈ M and j ∈ N, respectively. Such operations bring us to the equalities H1(x∗, y∗) = H1, H2(x∗, y∗) = H2. This result and Theorem 1.3 show that (x∗, y∗) is an equilibrium. The proof of Theorem 1.5 is concluded. Theorem 1.5 can be used to evaluate Nash equilibria in bimatrix games. Imagine that we know the optimal strategy spectra M0, N0. It is possible to employ equalities from the conditions (9.3)–(9.5) and find the optimal mixed strategies x∗, y∗ and the optimal payoffs H∗ 1 , H∗ 2 from the system of linear equations. However, this system can generate negative solutions (which contradicts the concept of mixed strategies). Then one should modify the spectra and go over them until an equilibrium appears. Theorem 1.5 demonstrates high efficiency, if all xi, i ∈ M and yj, j ∈ N have positive values in an equilibrium. Definition 1.6 An equilibrium strategy profile (x∗, y∗) is called completely mixed, if xi > 0, i ∈ M and yj > 0, j ∈ N. Suppose that a bimatrix game admits a completely mixed equilibrium strategy profile (x, y). According to Theorem 1.5, it satisfies the system of linear equations ∑ j∈N H1(i, j)y∗ j = H1, i ∈ M ∑ i∈M H2(i, j)x∗ i = H2, j ∈ N ∑ i∈M x∗ i = 1, ∑ j∈N y∗ j = 1. (9.6) Actually, the system (9.6) comprises n + m + 2 equations with n + m + 2 unknowns. Its solution gives a Nash equilibrium in a bimatrix game and the values of optimal payoffs. www.it-ebooks.info 16 MATHEMATICAL GAME THEORY AND APPLICATIONS 1.10 Games 2 × 2 A series of bimatrix games can be treated via geometric considerations. The simplest case covers players choosing between two strategies. The mixed strategies of players I and II take the form (x,1− x) and (y,1− y), respectively. And their payoffs are defined by the matrices A = ( y 1 − y xa11 a12 1 − xa21 a22 ) B = ( y 1 − y xb11 b12 1 − xb21 b22 ) . The mixed strategy payoffs of the players become H1(x, y) = a11xy + a12x(1 − y) + a21(1 − x)y + a22(1 − x)(1 − y) = Axy + (a12 − a22)x + (a21 − a22)y + a22, H2(x, y) = b11xy + b12x(1 − y) + b21(1 − x)y + b22(1 − x)(1 − y) = Bxy + (b12 − b22)x + (b21 − b22)y + b22, where A = a11 − a12 − a21 + a22, B = b11 − b12 − b21 + b22. By virtue of Theorem 1.3, the equilibrium (x, y) follows from inequalities (9.2), i.e., H1(0, y) ≤ H1(x, y), H1(1, y) ≤ H1(x, y), (10.1) H2(x,0)≤ H2(x, y), H2(x,1)≤ H1(x, y). (10.2) Rewrite inequalities (10.1) as (a21 − a22)y + a22 ≤ Axy + (a12 − a22)x + (a21 − a22)y + a22, Ay + (a21 − a22)y + a12 ≤ Axy + (a12 − a22)x + (a21 − a22)y + a22, and, consequently, (a22 − a12)x ≤ Axy, (10.3) Ay(1 − x) ≤ (a22 − a12)(1 − x). (10.4) Now, take the unit square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 (see Figure 1.5) and draw the set of points (x, y) meeting the conditions (10.3)–(10.4). If x = 0, then (10.3) is immediate, whereas the condition (10.4) implies the inequality Ay ≤ a22 − a12. In the case of x = 1, the expression (10.4) is valid, and (10.3) leads to Ay ≥ a22 − a12. And finally, under 0 ≤ x ≤ 1, the conditions (10.3)–(10.4) bring to Ay = a22 − a12. Similar analysis of inequalities (10.2) yields the following. If y = 0, then Bx ≤ b22 − b21. In the case of y = 1, we have Bx ≥ b22 − b21.If0≤ y ≤ 1, then Bx = b22 − b21. Depending on the signs of A and B, these conditions result in different sets of feasible equilibria in a bimatrix game (zigzags inside the unit square). Prisoners’ dilemma. Recall the example studied above; here A = B =−6 + 10 + 0 − 1 = 3 and a22 − a12 = b22 − b21 =−1. Hence, an equilibrium represents the intersection of two lines x = 1 and y = 1 (see Figure 1.6). Therefore, the equilibrium is unique and takes the form x = 1, y = 1. www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 17 y x1 1 0 b22-b21 a22-a21 B A Figure 1.5 A zigzag in a bimatrix game. y x1 1 0 Figure 1.6 A unique equilibrium in the prisoners’ dilemma game. Battle of sexes. In this example, one obtains A = B = 4 − 0 − 0 + 1 = 5 and a22 − a12 = 1, b22 − b21 = 4. And so, the zigzag defining all equilibrium strategy profiles is shown in Figure 1.7. Obviously, the game under consideration includes three equilibria. Among them, two equilibria correspond to pure strategies (x = 0, y = 1), (x = 1, y = 0), while the third one has the mixed strategy type (x = 4∕5, y = 1∕5). The payoffs in these equilibria make up (H∗ 1 = 4, H∗ 2 = 1), (H∗ 1 = 1, H∗ 2 = 4) and (H∗ 1 = 4∕5, H∗ 2 = 4∕5), respectively. y x1 1 0 4 5 1 5 Figure 1.7 Three equilibria in the battle of sexes game. www.it-ebooks.info 18 MATHEMATICAL GAME THEORY AND APPLICATIONS The stated examples illustrate the following aspect. Depending on the shape of zigzags, bimatrix games may admit one, two, or three equilibria, or even the continuum of equilibria. 1.11 Games 2 × n and m × 2 Suppose that player I chooses between two strategies, whereas player II has n strategies available. Consequently, their payoffs are defined by the matrices A = ( y1 y2 ... yn xa11 a12 ... a1n 1 − xa21 a22 ... a2n ) B = ( y1 y2 ... yn xb11 b12 ... b1n 1 − xb21 b22 ... b2n ) . In addition, assume that player I uses the mixed strategy (x,1− x). If player II chooses strategy j, his payoff equals H2(x, j) = b1,jx + b2j(1 − x), j = 1, … , n. We show these payoffs (linear functions) in Figure 1.8. According to Theorem 1.3, the equilibrium (x, y) corresponds to maxj H2(x, j) = H2(x, y). For any x, construct the maximal envelope l(x) =maxj H2(x, j). As a matter of fact, l(x) represents a jogged line composed of at most n + 1 segments. Denote by x0 = 0, x1, … , xk = 1, k ≤ n + 1 the salient points of this envelope. Since the function H1(x, y) is linear in x, its maximum under a fixed strategy of player II is attained at the points xi, i = 0, … , k. Hence, equilibria can be focused only in these points. Imagine that the point xi results from intersection of the straight lines H2(x, j1) and H2(x, j2). This means that player II optimally plays the mix of his strategies j1 and j2 in response to the strategy x by player I. Thus, we obtain a game 2 × 2 with the payoff matrices A = ( a1j1 a1j2 a2j1 a2j2 ) B = ( b1j1 b1j2 b2j1 b2j2 ) . x l(x) 0 x1 H2(x, j1) H2(x, j2) x2 x3 1 Figure 1.8 The maximal envelope l(x). www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 19 6 5 4 BA I II Figure 1.9 The road selection game. It has been solved in the previous section. To verify the optimality of the strategy xi, one can adhere to the following reasoning. The strategies xi and the mix (y,1− y) of the strategies j1 and j2 form an equilibrium, if there exists y,0≤ y ≤ 1 such that H1(1, y) = H1(2, y). In this case, the payoff of player I is independent from x, and the best response of player II to the strategy xi lies in mixing the strategies j1 and j2. Rewrite the last condition as a1j1 y + a1j2 (1 − y) = a2j1 y + a2j2 (1 − y). (11.1) Let us consider this procedure using an example. Road selection. Points A and B communicate through three roads. One of them is one-way road right-to-left (see Figure 1.9). A car (player I) leaves A, and another car (player II) moves from B. The journey-time on these roads varies (4, 5, and 6 hours, respectively, for a single car on a road). If both players choose the same road, the journey-time doubles. Each player has to select a road for the journey. And so, player I (player II) chooses between two strategies (among three strategies, respectively). The payoff matrices take the form A = ( x −8 −4 −4 1 − x −5 −10 −5 ) B = ( x −8 −5 −6 1 − x −4 −10 −6 ) . Find the payoffs of player II: H2(x,1)=−4x − 5, H2(x,2)= 5x − 10, H2(x,3)=−6. Draw these functions on Figure 1.10 and the maximal envelope l(x) (see the thick line). The salient points of l(x) are located at x = 0, x = 0.5, x = 0.8, and x = 1. The correspond- ing equilibria form (x = 0, y = (1,0,0)), (x = 1, y = (0, 1, 0)). The point x = 1∕2 answers x H2(x, 1) H2(x, 3) H2(x, 2) l(x) 0 0.5 0.8 1 -4 -10 Figure 1.10 The maximal envelope in the road selection game. www.it-ebooks.info 20 MATHEMATICAL GAME THEORY AND APPLICATIONS for intersection of the functions H2(x, 1) and H2(x, 3). The condition (11.1) implies that y = (1∕4, 0, 3∕4). However, this condition fails at the point x = 0.8. Therefore, the game in question admits three equilibrium solutions: 1. car I moves on the first road, and car II chooses the second road, 2. car I moves on the second road, and car II chooses the first road, 3. car I moves on the first or second road with identical probabilities, and car II chooses the first road with the probability of 1∕4 or the third road with the probability of 3∕4. Interestingly, in the third equilibrium, the mean journey time of player I constitutes 5 h, whereas player II spends 6 h. It would seem that player II “has the cards” (owing to the additional option of using the third road). For instance, suppose that the third road is closed. In the worst case, player II requires just 5 h for the journey. This contradicting result is known as the Braess paradox. We will discuss it later. In fact, if player I informs the opponent that he chooses the road by coin-tossing, player II has nothing to do but to follow the third strategy above. 1.12 The Hotelling duopoly in 2D space with non-uniform distribution of buyers Let us revert to the problem studied in Section 1.5. Consider the Hotelling duopoly in 2D space with non-uniform distribution of buyers in a city (according to some density function f(x, y)). As previously, we believe that a city represents a circle S having the radius of 1 (see Figure 1.11). It contains two companies (players I and II) located at different points P1 and P2. The players strive for defining optimal prices for their products depending on their location in the city. Again, players I and II quote prices for their products (some quantities c1 and c2, respec- tively). A buyer located at a point P ∈ S compares his costs (for the sake of simplicity, here we define them by F(ci, 𝜌(P, Pi)) = ci + 𝜌2) and chooses the company with the minimal value. S P P1 P2 Figure 1.11 The Hotelling duopoly. www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 21 S2 S1 P1P2 0 1x y x2 x1 Figure 1.12 P1, P2 have the same ordinate y. Therefore, all buyers in S get decomposed into two subsets (S1 and S2) according to their priorities of companies I and II. Then the payoffs of players I and II are specified by H1(c1, c2) = c1𝜇(S1), H2(c1, c2) = c2𝜇(S2), (12.1) where 𝜇(S) = ∫ S f(x, y)dxdy denotes the probabilistic measure of the set S. First, we endeavor to evaluate equilibrium prices under the uniform distribution of buyers. Rotate the circle S such that the points P1 and P2 have the same ordinate y (see Figure 1.12). Designate the abscissas of P1 and P2 by x1 and x2, respectively. Without loss of generality, assume that x1 ≥ x2. Within the framework of the Hotelling scheme, the sets S1 and S2 form sectors of the circle divided by the straight line c1 + (x − x1)2 = c2 + (x − x2)2, which is parallel to the axis Oy with the coordinate x = 1 2(x1 + x2) + c1 − c2 2(x1 − x2) . (12.2) According to (12.1), the payoffs of the players in this game constitute H1(c1, c2) = c1 ( arccos x − x √ 1 − x2 ) ∕𝜋, (12.3) H2(c1, c2) = c2 ( 𝜋 − arccos x + x √ 1 − x2 ) ∕𝜋, (12.4) where x meets (12.2). We find the equilibrium prices via the equation 𝜕H1 𝜕c1 = 𝜕H2 𝜕c2 = 0. www.it-ebooks.info 22 MATHEMATICAL GAME THEORY AND APPLICATIONS Evaluate the derivative of (12.3) with respect to c1: 𝜋 𝜕H1 𝜕c1 = arccos x − x √ 1 − x2 + c1 [ − 1√ 1 − x2 1 2(x1 − x2) − √ 1 − x2 1 2(x1 − x2) + 2x2 2 √ 1 − x2 1 2(x1 − x2) ] . Using the first-order necessary optimality conditions, we get c1 = (x1 − x2) [ arccos x√ 1 − x2 − x ] . (12.5) Similarly, the condition 𝜕H2 𝜕c2 = 0 brings to c2 = (x1 − x2) [ x + 𝜋 − arccos x√ 1 − x2 ] . (12.6) Finally, the expressions (12.2), (12.5), and (12.6) imply that the equilibrium prices can be rewritten as c1 = x1 − x2 2 [ 𝜋√ 1 − x2 − 2 ( x1 + x2 2 − x )] , (12.7) c2 = x1 − x2 2 [ 𝜋√ 1 − x2 + 2 ( x1 + x2 2 − x )] , (12.8) where x = x1 + x2 4 − 𝜋∕2 − arccos x 2 √ 1 − x2 . (12.9) Remark 1.1 If x1 + x2 = 0, then x = 0 due to (12.2). Hence, c1 = c2 = 𝜋x1 according to (12.5)–(12.6), and H1 = H2 = 𝜋x1∕2 according to (12.3)–(12.4). The maximal equilibrium prices are achieved under x1 = 1 and x2 =−1; they make up c1 = c2 = 𝜋. The optimal payoffs take the values of H1 = H2 = 𝜋∕2 ≈ 1.570. Thus, if buyers possess the uniform distribution in the circle, the companies should be located as far as possible from each other (in the optimal solution). To proceed, suppose that buyers are distributed non-uniformly in the circle. Analyze the case when the density function in the polar coordinates acquires the form (see Figure 1.13) f(r, 𝜃) = 3(1 − r)∕𝜋,0≤ r ≤ 1, 0 ≤ 𝜃 ≤ 2𝜋. (12.10) Obviously, buyers lie closer to the city center. www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 23 0 1x θ r Figure 1.13 Duopoly in the polar coordinates. Note that it suffices to consider the situation of x1 + x2 ≥ 0 (otherwise, simply reverse the signs of x1, x2). The expected incomes of the players (12.1) are given by H1(c1, c2) = 6 𝜋 c1A(x), H2(c1, c2) = c2 ( 1 − 6 𝜋 A(x) ) , (12.11) where A(x) = 1 ∫ x r(1 − r) arccos (x r ) dr = 1 6 [ arccos x − x √ 1 − x2 − 2x 1 ∫ x √ r2 − x2dr ] = 1 6 [ arccos x − 2x √ 1 − x2 − x3 log x + x3 log ( 1 + √ 1 − x2 )] , such that 𝜋 6 𝜕H1 𝜕c1 = A(x) + c1A′(x) 𝜕x 𝜕c1 = A(x) − c1 2(x1 − x2) 1 ∫ x √ r2 − x2dr, 𝜕H2 𝜕c2 = 1 − 6 𝜋 A(x) − c2 6 𝜋 A′(x) 𝜕x 𝜕c2 = 1 − 6 𝜋 A(x) − 6 𝜋 c2 2(x1 − x2) 1 ∫ x √ r2 − x2dr, since A′(x) =− 1 ∫ x r(1 − r)√ r2 − x2 dr =− 1 ∫ x √ r2 − x2dr. (12.12) www.it-ebooks.info 24 MATHEMATICAL GAME THEORY AND APPLICATIONS The conditions 𝜕H1 𝜕c1 = 𝜕H2 𝜕c2 = 0 yield that c1 = 2(x1 − x2)A(x)∕ 1 ∫ x √ r2 − x2dr, (12.13) c2 = 2(x1 − x2) ( 𝜋 6 − A(x) ) ∕ 1 ∫ x √ r2 − x2dr. (12.14) By substituting c1 and c2 into x = 1 2(x1 + x2) + c1 − c2 2(x1 − x2), (12.2′) we arrive at x − 1 2(x1 + x2) = (2A(x) − 𝜋∕6)∕ 1 ∫ x √ r2 − x2dr. (12.15) Remark 1.2 It follows from (12.12) that A(x) represents a convex decreasing function such that A(0) = 𝜋∕12 and A(1) = 0. The right-hand side of (12.15) is negative, which leads to x ≤ (x1 + x2)∕2. Below we demonstrate that equation (12.15) admits a unique solution. Rewrite it as B(x) =− [ x − 1 2(x1 + x2) ] A′(x) − (2A(x) − 𝜋∕6) = 0. (12.16) The derivative of the function B(x), i.e., B′(x) =−3A′(x) − A′′(x) ( x − x1 + x2 2 ) = 1 ∫ x [ 3 √ r2 − x2 + x√ r2 − x2 ( x1 + x2 2 − x )] dr, possesses positive values exclusively. Hence, B(x) increases within the interval [0, x1+x2 2 ] such that B(0) =−x1+x2 4 < 0 and B( x1+x2 2 ) = 𝜋∕6 − 2A( x1+x2 2 ) ≥ 0. If x1 + x2 = 0, then x = 0 satisfies equation (12.15). Moreover, the conditions (12.13)– (12.14) lead to c1 = c2 = 2 3 𝜋x1, whereas formula (12.11) implies that H1 = H2 = 1 3 𝜋x1. Under x1 = 1, x2 =−1, we have c1 = c2 = 2 3 𝜋 ≈ 2.094 and H1 = H2 = 1 3 𝜋 ≈ 1.047. www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 25 1.13 Location problem in 2D space In the preceding subsection, readers have observed the following fact. If the location points P1 and P2 are fixed, there exist equilibrium prices c1 and c2. In other words, c1 and c2 make some functions of x1, x2. In this context, an interesting question arises immediately. Are there equilibrium points x∗ 1, x∗ 2 of location for these companies? Such problem often appears during infrastructure planning for regional socioeconomic systems. Consider the posed problem in the case of non-uniform distribution of companies. Suppose that player II chooses some point x2 < 0. Player I aims at finding a certain point x1 which maximizes his income H1(c1, c2). Let us solve the equation 𝜕H1 𝜕x1 = 0. By virtue of (12.11), 𝜋 6 𝜕H1 𝜕x1 = 𝜕c1 𝜕x1 A(x) + c1A′(x) 𝜕x 𝜕x1 = 0. (13.1) Differentiation of (12.13) and (12.16) with respect to x1 gives 1 2 𝜕c1 𝜕x1 =−A(x) A′(x) − (x1 − x2) [ 1 − A′(x)A′′(x) [A′(x)]2 ] 𝜕x 𝜕x1 (13.2) and − ( 𝜕x 𝜕x1 − 1 2 ) A′(x) − [ x − 1 2(x1 + x2) ] A′′(x) 𝜕x 𝜕x1 − 2A′(x) 𝜕x 𝜕x1 = 0 . Therefore, 𝜕x 𝜕x1 = A′(x) [ 6A′(x) + 2 ( x − x1 + x2 2 ) A′′(x) ]−1 . (13.3) Equations (13.1)–(13.3) can serve for obtaining the optimal response x1 of player I. Owing to the symmetry of this problem, if an equilibrium exists, it has the form (x1, x2 =−x1). In this case, x = 0, A(0) = 𝜋∕12, A′(0) =−1∕2, A′′(0) = 0. The expres- sion (13.3) brings to 𝜕x 𝜕x1 = (−1∕2)∕(−3 + 0) = 1∕6. On the other hand, formula (13.2) yields 𝜕c1 𝜕x1 = 𝜋 3 − 2 3x1. Substitute these results into (13.1) to derive ( 𝜋 3 − 2 3x1 ) 𝜋 12 + ( 2 3 𝜋x1 ) ⋅ ( −1 2 ) ⋅ 1 6 = 0, www.it-ebooks.info 26 MATHEMATICAL GAME THEORY AND APPLICATIONS and, finally, x∗ 1 = 𝜋 4 . Thus, the optimal location points of the companies become x∗ 1 = 𝜋∕4, x∗ 2 =−𝜋∕4; the corresponding equilibrium prices and incomes constitute c1 = c2 = 𝜋2∕6 and H1 = H2 = 𝜋2∕12, respectively. Remark 1.3 Recall the case of the uniform distribution of buyers discussed earlier. Similar argumentation generates the following outcomes. It appears from (12.3), (12.7), and (12.9) that 𝜋 𝜕H1 𝜕x1 = 𝜕c1 𝜕x1 ( arccos x − x √ 1 − x2 ) − 2c1 √ 1 − x2 𝜕x 𝜕x1 , 𝜕c1 𝜕x1 = 𝜋 2 √ 1 − x2 + x − x1 + x1 − x2 2 ( 2 + 𝜋x(1 − x2)−3∕2 ) 𝜕x 𝜕x1 , 𝜕x 𝜕x1 = 1 4 [ 1 + 1 2(1 − x2) + x 2(1 − x2)3∕2 ( 𝜋 2 − arccos x )]−1 . Consequently, 𝜋 [𝜕H1 𝜕x1 ] x=0 = [𝜕c1 𝜕x1 ] x=0 𝜋 2 − 2[c1]x=0 [ 𝜕x 𝜕x1 ] x=0 = ( 𝜋 2 − 2x1 3 ) 𝜋 2 − 2𝜋x1 1 6 = 𝜋 4 ( 𝜋 − 8 3x1 ) > 0, ∀x1 ∈ (0, 1). And so, the maximal incomes are attained at the points x∗ 1 =−x∗ 2 = 1. According to (12.3) and (12.7), these points correspond to c∗ i = 𝜋 ≈ 3.1415 and H∗ i = 𝜋∕2 ≈ 1.5708, i = 1, 2. (13.4) Exercises 1. The crossroad problem. Two automobilists move along two mutually perpendicular roads and simultaneously meet at a crossroad. Each of them may stop (strategy I) or continue motion (strategy II). By assumption, a player would rather stop than suffer a catastrophe; on the other hand, a player would rather continue motion if the opponent stops. This conflict can be represented by a bimatrix game with the payoff matrix ( (1, 1) (1 − 𝜀,2) (2, 1 − 𝜀) (0, 0) ) . Here 𝜀 ≥ 0 is a number characterizing player’s displeasure of his stop to let the opponent pass. Find pure strategy Nash equilibria and mixed strategy Nash equilibria in the cross- road problem. www.it-ebooks.info STRATEGIC-FORM TWO-PLAYER GAMES 27 2. Games 2 × 2. Evaluate Nash equilibria in the bimatrix games below: A = ( −60 −9 −1 ) , B = ( −6 −9 0 −1 ) . A = ( 1 −2 32 ) , B = ( −12 1 −1 ) . 3. Find a Nash equilibrium in a bimatrix game defined by A = ⎛ ⎜ ⎜⎝ 36 8 43 2 7 −5 −1 ⎞ ⎟ ⎟⎠ , B = ⎛ ⎜ ⎜⎝ 743 773 466 ⎞ ⎟ ⎟⎠ . 4. Evaluate a Stackelberg equilibrium in a two-player game with the payoff functions H1(x1, x2) = bx1(c − x1 − x2) − d , H2(x1, x2) = bx2(c − x1 − x2) − d . 5. Consider a general bimatrix game and demonstrate that (x, y) is a mixed strategy equi- librium profile iff the following inequalities hold true: (x − 1)(ay − 𝛼) ≥ 0, x(ay − 𝛼) ≥ 0, (y − 1)(bx − 𝛽) ≥ 0, y(bx − 𝛽) ≥ 0, where a = a11 − a12 − a21 + a22 , 𝛼 = a22 − a12 , b = b11 − b12 − b21 + b22 , 𝛽 = a22 − a21 . 6. Prove the following result. If a bimatrix game admits a completely mixed Nash equilib- rium strategy profile, then n = m. 7. Find an equilibrium in a game 2 × n described by the payoff matrices A = ( 205 223 ) , B = ( 221 078 ) . 8. Find an equilibrium in a game m × 2 described by the payoff matrices A = ⎛ ⎜ ⎜ ⎜⎝ 82 27 39 64 ⎞ ⎟ ⎟ ⎟⎠ , B = ⎛ ⎜ ⎜ ⎜⎝ 14 84 72 29 ⎞ ⎟ ⎟ ⎟⎠ . 9. Consider the company allocation problem in 2D space. Evaluate equilibrium prices (p1, p2) under the cost function F2 = p2 + 𝜌2. 10. Consider the company allocation problem in 2D space. Find the optimal allocation of companies (x1, x2) under the cost function F2 = p2 + 𝜌2. www.it-ebooks.info 2 Zero-sum games Introduction The previous chapter has been focused on games Γ=< I, II, X, Y, H1, H2 >, where players’ payoffs H1(x, y) and H2(x, y) represent arbitrary functions defined on the set product X × Y. However, there exists a special case of normal-form games when H1(x, y) + H2(x, y) = 0for all (x, y). Such games are called zero-sum games or antagonistic games. Here players have opposite goals—the payoff of a player equals the loss of the opponent. It suffices to specify the payoff function of player II for complete description of a game. Definition 2.1 A zero-sum game is a normal-form game Γ=< I, II, X, Y, H >, where X, Y indicate the strategy sets of players I and II, and H(x, y) means the payoff function of player I, H : X × Y → R. Each player chooses his strategy regardless of the opponent. Player I strives for maximiz- ing the payoff H(x, y), whereas player II seeks to minimize this function. Zero-sum games satisfy all properties established for normal-form games. Nevertheless, the former class of games enjoys a series of specific features. First, let us reformulate the notion of a Nash equilibrium. Definition 2.2 A Nash equilibrium in a game Γ is a set of strategies (x∗, y∗) meeting the conditions H(x, y∗) ≤ H(x∗, y∗) ≤ H(x∗, y)(1.1) for arbitrary strategies x, y of the players. Inequalities (1.1) imply that, as player I deviates from a Nash equilibrium, his payoff goes down. If player II deviates from the equilibrium, his opponent gains more (accordingly, player II loses more). Hence, none of the players benefit by deviating from a Nash equilibrium. Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info ZERO-SUM GAMES 29 Remark 2.1 Constant-sum games (H1(x, y) + H2(x, y) = c = const for arbitrary strategies x, y) can be reduced to zero-sum games. Notably, find a solution to a zero-sum game with the payoff function H1(x, y). Then any Nash equilibrium (x∗, y∗) in this game also acts as a Nash equilibrium in the corresponding constant-sum game. Indeed, according to Definition (1.1), for any x, y we have H1(x, y∗) ≤ H1(x∗, y∗) ≤ H1(x∗, y). At the same time, H1(x, y) = c − H2(x, y), and the second inequality can be rewritten as c − H2(x∗, y∗) ≤ c − H2(x∗, y), or H2(x∗, y) ≤ H2(x∗, y∗), ∀y. Thus, (x∗, y∗) is also a Nash equilibrium in the zero-sum game. By analogy to the general class of normal-form games, zero-sum games may admit no Nash equilibria. A major role in zero-sum games analysis belongs to the concepts of minimax and maximin. 2.1 Minimax and maximin Suppose that player I employs some strategy x. In the worst case, he has the payoff infy H(x, y). Naturally, he would endeavor to maximize this quantity. In the worst case, the guaranteed payoff of player I makes up sup x infy H(x, y). Similarly, player II can guarantee the maximum loss of infy sup x H(x, y). Definition 2.3 The minimax ̄v =infy sup x H(x, y) is called the upper value of a game Γ, and themaximinv=sup x infy H(x, y) is called the lower value of this game. The lower value of any game does not exceed its upper value. Lemma 2.1 v ≤ ̄v. Proof: For any (x, y), the inequality H(x, y) ≤ sup x H(x, y) holds true. By evaluating inf in both sides, we obtain infy H(x, y) ≤ infy sup x H(x, y). This inequality involves a function of x in the left-hand side; this function is bounded above by the quantity infy sup x H(x, y). Therefore, sup x infy H(x, y) ≤ infy sup x H(x, y). Now, we provide a simple criterion to verify Nash equilibrium existence in this game. www.it-ebooks.info 30 MATHEMATICAL GAME THEORY AND APPLICATIONS Theorem 2.1 A Nash equilibrium (x∗, y∗) in the zero-sum game exists iff infy sup x H(x, y) = miny sup x H(x, y) and sup x infy H(x, y) =maxx infy H(x, y). Moreover, v = ̄v. (1.2) Proof: Assume that (x∗, y∗) forms a Nash equilibrium. Definition (1.1) implies that H(x, y∗) ≤ H(x∗, y∗), ∀x. Then it follows that sup x H(x, y∗) ≤ H(x∗, y∗); hence, ̄v =infy sup x H(x, y) ≤ sup x H(x, y∗) ≤ H(x∗, y∗). (1.3) Similarly, H(x∗, y∗) ≤ infy H(x∗, y) ≤ sup x infy H(x, y) = v. (1.4) However, Lemma 2.1 claims that v ≤ ̄v. Therefore, all inequalities in (1.3)–(1.4) become strict equalities, i.e., the values corresponding to the external operators sup, inf are achieved and v = ̄v. This proves necessity. The sufficiency part. Denote by x∗ a point, where maxx infy H(x, y) =infx H(x∗, y). By analogy, y∗ designates a point such that miny sup x H(x, y) =sup x H(x, y∗). Consequently, H(x∗, y∗) ≥ infy H(x∗, y) = v. On the other hand, H(x∗, y∗) ≤ sup x H(x, y∗) = ̄v. In combination with the condition (1.2) v = ̄v, this fact leads to H(x∗, y∗) =infy H(x∗, y) =sup x H(x, y∗). The last expression immediately shows that for all (x, y): H(x, y∗) ≤ sup x H(x, y∗) = H(x∗, y∗) =infy H(x∗, y) ≤ H(x∗, y), i.e., (x∗, y∗) makes a Nash equilibrium. Theorem 2.1 implies that, in the case of several equilibria, optimal payoffs do coincide. This value v = H(x∗, y∗) (identical for all equilibria) is said to be game value. Moreover, readers would easily demonstrate the following circumstance. Any combina- tion of optimal strategies also represents a Nash equilibrium. www.it-ebooks.info ZERO-SUM GAMES 31 Theorem 2.2 Suppose that (x1, y1) and (x2, y2) are Nash equilibria in a zero-sum game. Then (x1, y2) and (x1, y2) is also a Nash equilibrium. Proof: According to the definition of a Nash equilibrium, for any (x, y): H(x, y1) ≤ H(x1, y1) ≤ H(x, y) (1.5) and H(x, y2) ≤ H(x2, y2) ≤ H(x2, y). (1.6) Set x = x1, y = y2 in inequality (1.5) and x = x2, y = y1 in equality (1.6). This generates a chain of inequalities with the same quantity H(x2, y1) in their left- and right-hand sides. Therefore, all inequalities in (1.5)–(1.6) appear strict equalities. And (x1, y2) becomes a Nash equilibrium, since for any (x, y): H(x, y2) ≤ H(x2, y2) = H(x1, y2) = H(x1, y1) ≤ H(x1, y). Similar result applies to (x2, y1). These properties express distinctive features of the games in question from nonzero-sum games. We have observed that, in nonzero-sum games, different combinations of optimal strategies may form no equilibria, and players’ payoffs in equilibria may vary appreciably. In addition, the values of minimaxes and maximins play a considerable role in antagonistic games. It is possible to evaluate maximins for each player in nonzero-sum games; they give the guaranteed payoff if the opponent plays against a given player (by ignoring his own payoff). This approach will be adopted in negotiations analysis. 2.2 Randomization Imagine that Nash equilibria do not exist. In this case, one can employ randomization, i.e., extend the strategy set by mixed strategies. Definition 2.4 Mixed strategies of players I and II are probabilistic measures 𝜇 and 𝜈 defined on the sets X, Y. Randomization generates a new game, where players’ strategies represent distribution functions and the payoff function is the expected value of the payoff H(𝜇, 𝜈) = ∫ X ∫ Y H(x, y)d𝜇(x)d𝜈(y). This formula contains the Lebesgue–Stieltjes integral. In the sequel, we also write H(𝜇, y) = ∫ X H(x, y)d𝜇(x) H(x, 𝜈) = ∫ Y H(x, y)d𝜈(y). Find a Nash equilibrium in the stated extension of the game. www.it-ebooks.info 32 MATHEMATICAL GAME THEORY AND APPLICATIONS Definition 2.5 A mixed strategy Nash equilibrium in the game Γ are measures (𝜇∗, 𝜈∗) meeting the inequalities H(𝜇, 𝜈∗) ≤ H(𝜇∗, 𝜈∗) ≤ H(𝜇∗, 𝜈) for arbitrary measures 𝜇, 𝜈. We begin with the elementary case when each player chooses his strategy from a finite set (X = {1, … , m} and Y = {1, … , n}). Consequently, the payoff of player II can be defined using some matrix A = [a(i, j)], i = 1, … , m, j = 1, … , n. Such games are referred to as matrix games. Mixed strategies form vectors x = (x1, … , xm) ∈ Rm and y = (y1, … , ym) ∈ Rn. In terms of new strategies, the payoff acquires the form H(x, y) = m∑ i=1 n∑ j=1 a(i, j)xiyj.Note that matrix games make a special case of bimatrix games discussed in Chapter 1. Therefore, they enjoy the following property. Theorem 2.3 Matrix games always admit a mixed strategy Nash equilibrium, i.e., a strategy profile (x∗, y∗) such that m∑ i=1 n∑ j=1 a(i, j)xiy∗ j ≤ m∑ i=1 n∑ j=1 a(i, j)x∗ i y∗ j ≤ m∑ i=1 n∑ j=1 a(i, j)x∗ i yj ∀x, y. Interestingly, games with continuous payoff functions always possess a Nash equilibrium. Prior to demonstrating this fact rigorously, we establish an intermediate result. Lemma 2.2 If the function H(x, y) is continuous on a compact set X × Y, then H(𝜇, y) = ∫ X H(x, y)d𝜇(x) turns out continuous in y. Proof: Let H(x, y) be continuous on a compact set X × Y; hence, this function enjoys uniform continuity. Notably, ∀𝜖>0 ∃𝛿 such that, if 𝜌(y1, y2) <𝛿, then |H(x, y1) − H(x, y2)| <𝜖for all x ∈ X. And so, it follows that |H(𝜇, y1) − H(𝜇, y2)| ≤ | ∫ X [H(x, y1) − H(x, y2)]d𝜇(x)| ≤ ∫ X |H(x, y1) − H(x, y2)|d𝜇(x) ≤ 𝜖 ∫ X d𝜇(x) ≤ 𝜖. Theorem 2.4 Consider a zero-sum game Γ=< I, II, X, Y, H >. Suppose that the strategy sets X, Y form compact sets in the space Rm × Rn, while the function H(x, y) is continuous. Then this game has a mixed strategy Nash equilibrium. Proof: According to Theorem 2.1, it suffices to show that minimax and maximin are attain- able and do coincide. First, we prove that v =sup𝜇 inf𝜈 H(𝜇, 𝜈) =max𝜇 min𝜈 H(𝜇, 𝜈). www.it-ebooks.info ZERO-SUM GAMES 33 Lemma 2.2 claims that, for an arbitrary strategy 𝜇, the function H(𝜇, y) = ∫ X H(x, y)d𝜇(x) is continuous in y. By the hypothesis, Y represents a compact set; therefore, the function H(𝜇, y) reaches its maximum. Consequently, sup𝜇 inf𝜈 =sup𝜇 min𝜈 . By definition of sup, for any n there exists a measure 𝜇n such that miny H(𝜇n, y) > v − 1 n . (2.1) Recall that X is a compact set. By virtue of Helly’s theorem (see Shiryaev, 1996), take the sequence {𝜇n, n = 1, 2 …} and choose a subsequence 𝜇nk , k = 1, 2, … which converges to the probabilistic measure 𝜇∗. Moreover, for an arbitrary continuous function f(x), the sequence of integrals ∫ X f(x)d𝜇nk (x) will tend to the integral ∫ X f(x)d𝜇∗(x). Then for any y we obtain ∫ X H(x, y)d𝜇nk (x) → ∫ X H(x, y)d𝜇∗(x) = H(𝜇∗, y). The inequality v ≤ miny H(𝜇∗, y) and the condition (2.1) yield v =miny H(𝜇∗, y) =max𝜇 miny H(𝜇∗, y). By analogy, one can demonstrate that minimax is also achieved. Now, we should vindicate that v = ̄v. Owing to the compactness of X and Y, for any n there exists a finite 1∕n-network (i.e., finite set points Xn = {x1, … , xk} ∈ X and Yn = {y1, … , ym} ∈ Y) such that for any x ∈ X, y ∈ Y it is possible to find points xi ∈ Xn and yj ∈ Yn satisfying the conditions 𝜌(x, xi) < 1∕n and 𝜌(y, yj) < 1∕n. Fix some positive number 𝜖. Select a sufficiently small quantity n such that, if for arbitrary (x, y), (y, y′)wehave𝜌(x, x′) < 1∕n and 𝜌(y, y′) < 1∕n, then |H(x, y) − H(x′, y′)| <𝜖.Thisis always feasible due to the continuity of H(x, y) on the corresponding compact set (ergo, due to the uniform continuity of this function). To proceed, construct the payoff matrix [H(xi, yj)], i = 1, … , k, j = 1, … , m at the nodes of the 1∕n-network and solve the resulting matrix game. Denote by p(n) = (p1(n), … , pk(n)) and q(n) = (q1(n), … , qm(n)) the optimal mixed strategies in this game, and let the game value be designated by vn. The mixed strategy p(n) corresponds to the probabilistic measure 𝜇n, where for A ⊂ X: 𝜇n(A) = ∑ i:xi∈A pi(n). In this case, for any yj ∈ Yn we have H(𝜇n, yj) = k∑ i=1 H(xi, yj)pi(n) ≥ vn. (2.2) According to Lemma 2.2, for any y ∈ Y ∃ yj ∈ Yn such that 𝜌(y, yj) < 1∕n; and so, |H(x, y) − H(x, yj)| <𝜖. This immediately leads to |H(𝜇n, y) − H(𝜇n, yj)| ≤ 𝜖. www.it-ebooks.info 34 MATHEMATICAL GAME THEORY AND APPLICATIONS In combination with (2.2), the above condition yields the inequality H(𝜇n, y) > vn − 𝜖, for any y ∈ Y. Therefore, v =max𝜇 miny H(𝜇n, y) ≥ miny H(𝜇n, y) > vn − 𝜖. (2.3) Similar reasoning gives ̄v < vn + 𝜖. (2.4) It appears from (2.3)–(2.4) that ̄v < v + 2𝜖. So long as 𝜖 is arbitrary, we derive ̄v ≤ v. This result and Lemma 2.1 bring to the equality ̄v = v. The proof of Theorem 2.4 is completed. 2.3 Games with discontinuous payoff functions The preceding section has demonstrated that games with continuous payoff functions and compact strategy sets admit mixed strategy equilibria. Here we show that, if a payoff function suffers from discontinuities, there exist no equilibria in the class of mixed strategies. The Colonel Blotto game. Colonel Blotto has to capture two passes in mountains (see Figure 2.1). His forces represent some unit resource to-be-allocated between two passes. His opponent performs similar actions. If the forces of a player exceed those of the opponent at a given pass, then his payoff equals unity (and vanishes otherwise). Furthermore, at a certain pass Colonel Blotto’s opponent has already concentrated additional forces of size 1∕2. x y II 1/2 1 - x I (Blotto) 1 - y Figure 2.1 The Colonel Blotto game. www.it-ebooks.info ZERO-SUM GAMES 35 y x1 1 0 0 0 -2 -1-1 Figure 2.2 The payoff function of player I. Therefore, we face a constant-sum game Γ=< I, II, X, Y, H >, where X = [0, 1], Y = [0, 1] indicate the strategy sets of players I and II. Suppose that Colonel Blotto and his opponent have allocated their forces (x,1− x) and (y,1− y) between the passes. Subse- quently, the payoff function of player I takes the form H(x, y) = sgn(x − y) + sgn ( 1 − x − (1 2 + 1 − y )) . For its curve, see Figure 2.2. The function H(x, y) possesses discontinuities at x = y and y = x + 1∕2. Evaluate maximin and minimax in this game. Assume that the measure 𝜇 is concentrated at the points x = {0, 1∕2, 1} and has identical weights of 1∕3. Then for any y ∈ [0, 1] the inequality H(𝜇, y) = 1 3H(0, y) + 1 3H(1∕2, y) + 1 3H(1, y) ≥ −2 3 holds true. And it appears that v =sup𝜇 inf𝜈 ≥ −2∕3. On the other hand, choose strategy y by the following rule: if 𝜇[1∕2, 1] ≥ 2∕3, set y = 1, and then H(𝜇,1)≤ −2∕3; if 𝜇[1∕2, 1] < 2∕3, then 𝜇[0, 1∕2) > 1∕3, and there exists 𝛿 such that 𝜇[0, 1∕2 − 𝛿) ≥ 1∕3, set y = 1∕2 − 𝛿. Obviously, we also obtain H(𝜇,1)≤ −2∕3inthis case. Hence, for any 𝜇: infy H(𝜇, y) ≤ −2∕3, which means that sup𝜇 inf𝜈 ≤ −2∕3. We have successfully evaluated the lower value of the game: v =sup𝜇 inf𝜈 =−2∕3. (3.1) Now, calculate the upper value ̄v of the game. Suppose that the measure 𝜈 is concentrated at the points y = {1∕4, 1∕2, 1} and has the weights 𝜈(1∕4) = 1∕7, 𝜈(1∕2) = 2∕7, and 𝜈(1) = 4∕7. Using Figure 2.2, we find H(0, 𝜈) = H(1, 𝜈) =−4∕7 H(1∕4, 𝜈) =−5∕7, H(1∕2, 𝜈) = −6∕7, and H(x, 𝜈) =−6∕7forx ∈ (0, 1∕4), H(x, 𝜈) =−4∕7forx ∈ (1∕4, 1∕2) and H(x, 𝜈) = −8∕7forx ∈ (1∕2, 1). And so, this strategy of player II leads to H(x, 𝜈) ≤ −4∕7 for all x. This gives ̄v =inf𝜈 sup𝜇 H(𝜇, 𝜈) ≤ −4∕7. www.it-ebooks.info 36 MATHEMATICAL GAME THEORY AND APPLICATIONS To prove the inverse inequality, select player I strategy x according to the following rule. If 𝜈(1) ≤ 4∕7, set x = 1; then player I guarantees his payoff H(1, 𝜈) ≥ −4∕7. Now, assume that 𝜈(1) > 4∕7, i.e., 𝜈[0, 1) < 3∕7. Two alternatives appear here, viz., either 𝜈[0, 1∕2) ≤ 2∕7, or 𝜈[0, 1∕2) > 2∕7. In the first case, set x = 0, and player I ensures his payoff H(0, 𝜈) ≥ −4∕7. In the second case, there exists 𝛿>0 such that 𝜈[0, 1∕2 − 𝛿) ≥ 2∕7. In combination with the condition 𝜈[0, 1) < 3∕7, this yields 𝜈[1∕2 − 𝛿,1)< 1∕7. By choosing the strategy x = 1∕2 − 𝛿, we obtain H(1∕2 − 𝛿, 𝜈) > −2∕7 > −4∕7. Evidently, under an arbitrary mixed strategy 𝜈, player I guarantees his payoff supx H(x, 𝜈) ≥ −4∕7. Hence it follows that ̄v =inf𝜈 sup𝜇 H(𝜇, 𝜈) ≥ −4∕7. We have found the exact upper value of the game: ̄v =inf𝜈 sup𝜇 H(𝜇, 𝜈) =−4∕7. (3.2) Direct comparison of the expressions (3.1) and (3.2) indicates the following. In the Colonel Blotto game, the lower and upper values do differ—this game admits no equilibria. However, in a series of cases, equilibria may exist under discontinuous payoff functions. The general equilibria evaluation scheme for such games is described below. Theorem 2.5 Consider an infinite game Γ=< I, II, X, Y, H >. Suppose that there exists a Nash equilibrium (𝜇∗, 𝜈∗), while the payoff functions H(𝜇∗, y) and H(x, 𝜈∗) are continuous in y and x, respectively. Then the following conditions take place: H(𝜇∗, y) = v, ∀y on the support of the measure 𝜈∗, (3.3) H(x, 𝜈∗) = v, ∀x on the support of the measure 𝜇∗, (3.4) where v corresponds to the value of the game Γ. Proof: Let 𝜇∗ be the optimal mixed strategy of player I. In this case, H(𝜇∗, y) ≥ v for all y ∈ Y. Assume that (3.3) fails, i.e., H(𝜇∗, y′) > v at a certain point y′. Due to the continuity of the function H(𝜇∗, y), this inequality is then valid in some neighborhood Uy′ of the point y′. The point y′ belongs to the support of the measure 𝜈∗, which means that 𝜈∗(Uy′ ) > 0. And we arrive at the contradiction: H(𝜇∗, 𝜈∗) = ∫ Y H(𝜇∗, y)d𝜈∗(y) = ∫ Uy′ H(𝜇∗, y)d𝜈∗(y) + ∫ Y⧵Uy′ H(𝜇∗, y)d𝜈∗(y) > v. This proves (3.3). A similar line of reasoning demonstrates validity of the condition (3.4). By performing differentiation in (3.3)–(3.4), we obtain the differential equations 𝜕H(𝜇∗, y) 𝜕y = 0, ∀y on the support of the measure 𝜈∗, and 𝜕H(x, 𝜈∗) 𝜕x = 0, ∀x on the support of the measure 𝜇∗. www.it-ebooks.info ZERO-SUM GAMES 37 They serve to find optimal strategies. We will illustrate their application by discrete arbitration procedures. Note that Theorem 2.5 provides necessary conditions for mixed strategy equilibrium evaluation in games with discontinuous payoff functions H(x, y). Moreover, it is possible to obtain optimal strategies even if the functions H(𝜇∗, y) and H(x, 𝜈∗) appear discontinuous. Most importantly, we need the conditions (3.3)–(3.4) on the supports of distributions (the rest x and y must meet the inequalities H(x, 𝜈∗) ≤ v ≤ H(𝜇∗, x)). 2.4 Convex-concave and linear-convex games Games, where strategy sets X ⊂ Rm, Y ⊂ Rn represent compact convex sets and the payoff function H(x, y) is continuous, concave in x and convex in y, are called concave-convex games. According to Theorem 1.1, we can formulate the following result. Theorem 2.6 Concave-convex games always admit a pure strategy Nash equilibrium. A special case of concave-convex games concerns linear convex games Γ=< X, Y, H(x, y) >, where strategies are points from the simplexes X = {(x1, … , xm):xi ≥ 0, i = 1, … , m; m∑ i=1 xi = 1} and Y = {(y1, … , yn):yj ≥ 0, j = 1, … , n; n∑ j=1 yj = 1}, while the payoff function is described by some matrix A = [a(i, j)], i = 1, … , m, j = 1, … , n: H(x, y) = m∑ i=1 xif ( n∑ j=1 a(i, j)yj ) . (4.1) In formula (4.1), we believe that f is a non-decreasing convex function. Interestingly, there exists a connection between equilibria in such games and equilibria in a matrix game defined by A. Theorem 2.7 Any Nash equilibrium in a matrix game defined by the matrix A gives an equilibrium for a corresponding linear convex game. Proof: Let (x∗, y∗) be a Nash equilibrium in the matrix game. By the definition, m∑ i=1 n∑ j=1 a(i, j)xiy∗ j ≤ m∑ i=1 n∑ j=1 a(i, j)x∗ i y∗ j ≤ m∑ i=1 n∑ j=1 a(i, j)x∗ i yj ∀x, y. (4.2) The convexity of f implies that H(x∗, y) = m∑ i=1 x∗ i f ( n∑ j=1 a(i, j)yj ) ≥ f ( m∑ i=1 x∗ i n∑ j=1 a(i, j)yj ) . This result, the monotonicity of f and inequality (4.2) lead to f ( m∑ i=1 x∗ i n∑ j=1 a(i, j)yj ) ≥ f ( m∑ i=1 x∗ i n∑ j=1 a(i, j)y∗ j ) . (4.3) www.it-ebooks.info 38 MATHEMATICAL GAME THEORY AND APPLICATIONS Now, notice that the left-hand side of (4.2) holds true for arbitrary x, particularly, for all pure strategies of player I: n∑ j=1 a(i, j)y∗ j ≤ m∑ i=1 n∑ j=1 a(i, j)x∗ i y∗ j , i = 1, … , m. The monotonous property of f brings to f ( n∑ j=1 a(i, j)y∗ j ) ≤ f ( m∑ i=1 n∑ j=1 a(i, j)x∗ i y∗ j ) , i = 1, … , m. By multiplying these inequalities by xi and summing up the resulting expressions, we arrive at m∑ i=1 xi f ( n∑ j=1 a(i, j)y∗ j ) ≤ f ( m∑ i=1 n∑ j=1 a(i, j)x∗ i y∗ j ) . (4.4) It follows from (4.3) and (4.4) that the inequalities H(x∗, y) ≥ H(x, y∗) take place for arbitrary x, y. This immediately implies that (x∗, y∗) makes an equilibrium in the game under consideration. We underline that the inverse proposition fails. For instance, if f is a constant function, then any strategy profile (x, y) forms an equilibrium in this game (but a matrix game may have a specific set of equilibria). Linear convex games arise in resource allocation problems. As an example, study the city defense problem. The city defense problem. Imagine the following situation. Player I (Colonel Blotto) attacks a city using tanks, whereas player II conducts a defense by anti-tank artillery. Consider a game, where player I must allocate his resources between light and heavy tanks, and player II distributes his resources between light and heavy artillery. For simplicity, suppose that the resources of both players equal unity. Define the efficiency of different weaponry. Let the rate of fire of heavy artillery units be three times higher than of light artillery ones. In addition, light artillery units must open fire on heavy tanks five times more quickly than heavy artillery units do. The survival probabilities of tanks are specified by the table below. ⎛ ⎜ ⎜ ⎜⎝ Light artillery Heavy artillery Light tanks 1∕21∕4 Heavy tanks 3∕41∕2 ⎞ ⎟ ⎟ ⎟⎠ Suppose that player I has x light and 1 − x heavy tanks. On the other hand, player II organizes city defense by y light and 1 − y heavy artillery units. After a battle, Colonel Blotto www.it-ebooks.info ZERO-SUM GAMES 39 will possess the following average number of tanks: H(x, y) = x (1 2 )𝛼y (1 4 )𝛽(1−y) + (1 − x) (3 4 )5𝛼y (1 2 )𝛽(1−y)∕3 . Here 𝛼 and 𝛽 are certain parameters of the problem. Rewrite the function H(x, y)as H(x, y) = x exp [ −ln2(𝛼y + 2∕3𝛽(1 − y)) ] + (1 − x) exp [ −5𝛼y ln 4∕3 − 1∕3𝛽(1 − y) ln 2 ] . Clearly, this game is linear convex with the payoff function f(x) =exp[x]. An equilibrium follows from solving the matrix game described by the matrix −ln2 ( 𝛼 2∕3𝛽 5𝛼(2 −ln2∕ln 3) 1∕3𝛽 ) . For instance, if 𝛼 = 1, 𝛽 = 2, the optimal strategy of player I becomes x∗ ≈ 0.809 (accord- ingly, the optimal strategy of player II is y∗ ≈ 0.383). In this case, the game receives the value v ≈ 0.433. Thus, under optimal behavior, Colonel Blotto will have less than 50% tanks avail- able after a battle. 2.5 Convex games Assume that a payoff function is continuous and concave in x or convex in y. According to the theory of continuous games, generally an equilibrium exists in the class of mixed strategies. However, in the convex case, we can establish the structure of optimal mixed strategies. The following theorem (Helly’s theorem) from convex analysis would facilitate here. Theorem 2.8 Let S be a family of compact convex sets in Rm whose number is not smaller than m + 1. Moreover, suppose that the intersection of any m + 1 sets from this family appears non-empty. Then there exists a point belonging to all sets. Proof: First, suppose that S contains a finite number of sets. We argue by induction. If S consists of m + 1 sets, the above assertion clearly holds true. Next, admit its validity for any family of k ≥ m + 1 sets; we have to demonstrate this result for S comprising k + 1 sets. Denote S = {X1, … , Xk+1} and consider k + 1 new families of the form S ⧵ Xi, i = 1, … , k + 1. Each family S ⧵ Xi consists of k sets; owing to the induction hypothesis, there exists a certain point xi belonging to all sets from the family S except the set Xi, i = 1, … , k + 1. The number of such points is k + 1 ≥ m + 2. Now, take the system of m + 1 linear equations k+1∑ i=1 𝜆ixi = 0, k+1∑ i=1 𝜆i = 0. (5.1) The number of unknowns 𝜆i, i = 1, … , k + 1 exceeds the number of equations; hence, this system possesses a non-zero solution. Decompose them into two groups, depending on www.it-ebooks.info 40 MATHEMATICAL GAME THEORY AND APPLICATIONS their signs. Without loss of generality, we believe that 𝜆i > 0, i = 1, … , l and 𝜆i ≤ 0, i = l + 1, … k + 1. By virtue of formula (5.1), l∑ i=1 𝜆i =− k+1∑ i=l+1 𝜆i = 𝜆>0, and x = l∑ i=1 𝜆i 𝜆 xi = k+1∑ i=l+1 −𝜆i 𝜆 xi. (5.2) According to the construction procedure, for i = 1, … , l all xi ∈ Xl+1, … , Xk+1. The convexity of these sets implies that the convex combination x = l∑ i=1 𝜆i 𝜆 xi belongs to the intersection of these sets. Similarly, for all i = l + 1, … , k + 1wehavexi ∈ X1, … , Xl, ergo, x ∈ l⋂ i=1 Xi.And so, there exists a point x belonging to all sets Xi, i = 1, … , k + 1. Thus, if S represents a finite set, Theorem 2.8 is proved. In what follows, we illustrate its correctness for an arbitrary family. Let a family S = S𝛼 be such that its any finite subsystem possesses a non-empty intersection. Choose some set X from this family and consider the new family Sx𝛼 = S𝛼 ⋂ X. It also consists of compact convex sets. Assume that ⋂ 𝛼 Sx𝛼 =∅. Then its complement becomes ⋂ 𝛼 Sx𝛼 = ⋃ 𝛼 Sx𝛼 = Rm. Due to the compactness of X, a finite subcovering {Sx𝛼i , i = 1, … , r}ofthesetX can be extracted from the finite covering {Sx𝛼}. However, in this case, the finite family {Sx𝛼i , i = 1, … , r} has an empty intersection, which contradicts the premise. Consequently, ⋂ 𝛼 Sx𝛼 = ⋂ 𝛼 S𝛼 appears non-empty. Theorem 2.9 Let X ⊂ Rm and Y ⊂ Rn be compact sets, Y enjoy convexity and the function H(x, y) appear continuous in both arguments and convex in y. Then player II possesses an optimal pure strategy, whereas the optimal strategy of player I belongs to the class of mixed strategies and is concentrated at most in (m + 1) points of the set X. Moreover, the game has the value v =maxx1,…,xm+1 miny max{H(x1, y), … , H(xm+1, y)} =miny maxx H(x, y). Proof: Introduce the function h(x1, … , xm+1, y) =max{H(x1, y), … , H(xm+1, y)}, xi ∈ Rm, i = 1, … , m + 1, y ∈ Rn. It is continuous in both arguments. Indeed, |h(x′ 1, … , x′ m+1, y′) − h(x′′ 1 , … , x′′ m+1, y′′)| = |H(x′ i1 , y′) − H(x′′ i2 , y′′)|, (5.3) www.it-ebooks.info ZERO-SUM GAMES 41 where H(x′ i1 , y′) =max{H(x′ 1, y′), … , H(x′ m+1, y′)}, H(x′′ i2 , y′′) =max{H(x′′ 1 , y′′), … , H(x′′ m+1, y′′)}. In formula (5.3), we have either H(x′ i1 , y′) ≥ H(x′′ i2 , y′′), (5.4) or the inverse inequality. For definiteness, suppose that (5.4) holds true (in the second case, reasoning is by analogy). The function H(x, y) is continuous on the compact set Xm+1 × Y, ergo, uniformly contin- uous. And so, for any 𝜖>0 there exists 𝛿>0 such that, if ||x′ i − x′′ i || <𝛿, i = 1, … , n + 1 and ||y′ − y′′|| <𝛿, then 0 ≥ H(x′ i1 , y′) − H(x′′ i2 , y′′) ≤ H(x′ i1 , y′) − H(x′′ i1 , y′′) <𝜖. This proves continuity of the function h. The established fact directly implies the existence of w =maxx1,…,xm+1 miny h(x1, … , xm+1, y) =maxx1,…,xm+1 miny max{H(x1, y), … , H(xm+1, y)}. So long as miny H(x, y) ≤ max𝜇 miny H(𝜇, y) for any x, we obtain the inequality w ≤ v. To derive the inverse inequality, consider the infinite family of sets Sx = {y : H(x, y) ≤ w}. Since H(x, y) is convex in y and continuous, all sets from this family become convex and compact. Note that any finite subsystem of this family, consisting of m + 1setsSxi = {y : H(xi, y) ≤ w}, i = 1, … , m + 1, possesses a common point. Really, if x1, … , xm+1 are fixed, the function h(x1, … , xm+1, y) attains its maximum at some point ̄y (due to continuity of h and compactness of Y). Consequently, w =maxx1,…,xm+1 miny h(x1, … , xm+1, y) ≥ max{H(x1, ̄y), … , H(xm+1, ̄y) ≥ H(xi, ̄y), i = 1, … , m + 1, i.e., ̄y belongs to all Sxi , i = 1, … , m + 1. Helly’s theorem claims the existence of a point y∗ such that H(x, y∗) ≤ w, ∀x ∈ X. Hence, maxx H(x, y∗) ≤ w, which means that v ≤ w. Therefore, we have shown that v = w or v =maxx1,…,xm+1 miny max{H(x1, y), … , H(xm+1, y)}. www.it-ebooks.info 42 MATHEMATICAL GAME THEORY AND APPLICATIONS Suppose that maximal value corresponds to the points ̄x1, … , ̄xm+1. Then v =miny max{H(̄x1, y), … , H(̄xm+1, y)} =miny max𝜇0 m+1∑ i=1 H(̄xi, y)𝜇i, where ̄𝜇 = (𝜇1, … , 𝜇m+1) is a discrete distribution located at the points ̄x1, … , ̄xm+1. A concave-convex game on a compact set with the payoff function H( ̄𝜇, y) = m+1∑ i=1 H(̄xi, y)𝜇i admits a Nash equilibrium ( ̄𝜇∗, y∗) such that max̄𝜇 miny H( ̄𝜇, y) =miny max̄𝜇 H( ̄𝜇, y) = v. This expression confirms the optimal character of the pure strategy y∗ and the mixed strategy ̄𝜇 located at m + 1 points, since H( ̄𝜇∗, y∗) =miny H( ̄𝜇∗, y) ≤ H( ̄𝜇∗, y), ∀y H( ̄𝜇∗, y∗) =max̄𝜇 H( ̄𝜇, y∗) =max𝜇 H(𝜇, y∗) ≥ H(x, y), ∀x. The proof of Theorem 2.9 is finished. Corollary 2.1 Consider a convex game, where the strategy sets of players represent linear sections. Player II has an optimal pure strategy, whereas the optimal strategy of player I is either mixed or a probabilistic compound of two pure strategies. A similar result applies to a concave game. Theorem 2.10 Let X ⊂ Rm and Y ⊂ Rn be compact sets, X enjoy convexity and the function H(x, y) appear continuous in both arguments and concave in x. Then player I possesses an optimal pure strategy, whereas the optimal strategy of player II belongs to the class of mixed strategies and is concentrated at most in (n + 1) points of the set Y. Moreover, the game has the value v =miny1,…,yn+1 maxx max{H(x, y1), … , H(x, yn+1)} =maxx miny H(x, y). 2.6 Arbitration procedures Consider a two-player game involving player I (Company Trade Union) and player II (Com- pany Manager). The participants of this game have to negotiate a raise for company employ- ees. Each player submits some offer (x and y, respectively). In the case of a conflict (i.e., x > y), both sides go to an arbitration court. The latter must support a certain player. There exist various arbitration procedures, namely, final-offer arbitration, conventional arbitration, bonus/penalty arbitration, as well as their combinations. www.it-ebooks.info ZERO-SUM GAMES 43 We begin analysis with final-offer arbitration. Without a conflict (if x ≤ y), this pro- cedure leads to successful raise negotiation in the interval between x and y. For the sake of definiteness, suppose that the negotiated raise makes up (x + y)∕2. In the case of x < y,the sides address the third party (call it an arbitrator). An arbitrator possesses a specific opinion 𝛼, and he takes the side whose offer is closer to 𝛼. Actually, we have described a game Γ=< I, II, R1, R1, H𝛼 > with the payoff function H𝛼(x, y) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ x+y 2 ,ifx ≤ y x,ifx > y, |x − 𝛼| < |y − 𝛼| y,ifx > y, |x − 𝛼| > |y − 𝛼| 𝛼,ifx > y, |x − 𝛼| = |y − 𝛼| (6.1) The parameter 𝛼 being fixed, an equilibrium lies in the pair of strategies (𝛼, 𝛼). However, the problem becomes non-trivial when the arbitrator may modify his opinion. Consider the non-deterministic case, i.e., 𝛼 represents a random variable with some continuous distribution function F(a), a ∈ R1. Imagine that both players know F(a) and there exists the corresponding density function f(a). If y < x, arbitrator accepts the offer y under 𝛼<(x + y)∕2 (see the payoff function formula and Figure 2.3). Otherwise, he accepts the offer x. The player I has the expected payoff H(x, y) = EH𝛼(x, y) H(x, y) = F (x + y 2 ) y + ( 1 − F (x + y 2 )) x. (6.2) With the aim of evaluating minimax strategies, we perform differentiation in (6.2): 𝜕H 𝜕x = 1 − F (x + y 2 ) + y − x 2 f (x + y 2 ) = 0, 𝜕H 𝜕y = F (x + y 2 ) + y − x 2 f (x + y 2 ) = 0. The difference of these equations yields F (x + y 2 ) = 1 2, III xy yx+___ 2 a Figure 2.3 An arbitration game. www.it-ebooks.info 44 MATHEMATICAL GAME THEORY AND APPLICATIONS i.e., the point (x + y)∕2 coincides with the median of the distribution F. Hence, it appears that (x + y)∕2 = mF. On the other part, summing up these equations gives (y − x)f(mF) = 1. Therefore, if a pure strategy equilibrium does exist, it acquires the form x = mF + 1 2f(mF), y = mF − 1 2f(mF), (6.3) and the game has the value of mF. The following sufficient condition guarantees that (6.3) is an equilibrium: H ( x, mF − 1 2f(mF) ) ≤ mF, ∀x ≥ mF (6.4) H ( mF + 1 2f(mF), y ) ≥ mF, ∀y ≤ mF. (6.5) Recall that mF makes the median of the distribution F and rewrite (6.4) as ⎛ ⎜ ⎜⎝ 1 2 + U(x) ∫ mF f(a)da ⎞ ⎟ ⎟⎠ (x − m + 1 2f(mF) ≥ x − mF, where U(x) = (x + mF − 1 2f(mF) )∕2, or U(x) ∫ mF f(a)da ≥ x − mF − 1∕(2f(mF)) 2(x − mF + 1∕(2f(mF)), ∀x > mF. (6.6) By analogy, the condition (6.5) can be reexpressed as mF ∫ V(y) f(a)da ≥ y − mF + 1∕(2f(mF)) 2(y − mF − 1∕(2f(mF)), ∀y < mF. (6.7) Here V(y) = (y + mF + 1 2f(mF) )∕2. Theorem 2.11 Consider the final-offer arbitration procedure and let the distribution F(a) satisfy the conditions (6.6)–(6.7). Then a Nash equilibrium exists in the class of pure strategies and takes the form x = mF + 1 2f(mF) , y = mF − 1 2f(mF) . For instance, suppose that F(a)istheGaussian distribution with the parameters ̄a and 𝜎. The median coincides with ̄a; according to (6.3), the optimal offers of the players become x = ̄a + √ 𝜋∕2, y = ̄a − √ 𝜋∕2. www.it-ebooks.info ZERO-SUM GAMES 45 Under the uniform distribution F(a) on a segment [c, d], the median is (c + d)∕2 and the optimal offers of the players correspond to the ends of this segment: x = c + d 2 + d − c 2 = d, y = c + d 2 − d − c 2 = c. Interestingly, games with the payoff function (6.2) can be concave-convex. Moreover, if the distribution F turns out discontinuous, the payoff function (6.2) suffers from disconti- nuities, as well. Therefore, this game may have no pure strategy equilibrium. And so, the strategies (6.3) being evaluated, one should verify the equilibria condition. Below we will demonstrate that, even for simple distributions, the final-offer arbitration procedure admits a mixed strategy equilibrium. Conventional arbitration. In contrast to final-offer arbitration (an arbitrator chooses one of submitted offers), conventional arbitration settles a conflict through an arbitrator’s specific opinion. Of course, his opinion depends on the offers of both players. In what follows, we study the combined arbitration procedure proposed by S.J. Brams and S. Merrill. In this procedure, arbitrator’s opinion 𝛼 again represents a random variable with a known distribution function F(a) and density function f(a). Company Trade Union (player I) and Company Manager (player II) submit their offers, x and y.If𝛼 belongs to the offer interval, arbitration is performed by the last offer; otherwise, arbitrator makes his decision. Therefore, we obtain a game Γ=< I, II, R1, R1, H > with the payoff function H(x, y) = EH𝛼(x, y), where H𝛼(x, y) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ x+y 2, if x ≤ y x,ifx >𝛼>y, x − 𝛼<𝛼− y y,ifx >𝛼>y, x − 𝛼>𝛼− y 𝛼, otherwise. (6.8) Suppose that the density function f(a) is a symmetrical unimodal function (i.e., it possesses a unique maximum). To proceed, we demonstrate that (a) an equilibrium exists in the class of pure strategies and (b) the optimal strategy of both players lies in mF—the median of the distribution F(a). Let player II apply the strategy y = mF. According to the arbitration rules, his payoff H(x, mF) constitutes (x + mF)∕2, if x < mF. This is smaller than the payoff gained by the strategy x = mF. In the case of x ≥ mF, formula (6.8) implies that the payoff becomes H(x, mF) = mF ∫ −∞ adF(a) + ∞ ∫ x adF(a) + x+mF 2 ∫ mF mdF(a) + ∫ x+mF 2 xdF(a) = mF − x+mF 2 ∫ mF (a − mF)dF(a) + ∫ x+mF 2 (x − a)dF(a). www.it-ebooks.info 46 MATHEMATICAL GAME THEORY AND APPLICATIONS The last expression does not exceed mF, since the function g(x) =− x+mF 2 ∫ mF (a − mF)dF(a) + ∫ x+mF 2 (x − a)dF(a) is non-increasing. Really, its derivative takes the form g′(x) =−x − mF 2 f ( x + mF 2 ) + ∫ x+mF 2 f(a)da ≤ 0 − x+mF 2 ∫ mF (a − mF)dF(a) + ∫ x+mF 2 (x − a)dF(a) (the value of f(a) at the point (x + mF)∕2 is not smaller than at the points a ∈ [(x + mf )∕2), x]). Consequently, we have H(x, mF) ≤ mF for all x ∈ R1. And so, the best response of player I also lies in mF. Similar arguments cover the behavior of player II. Thus, the arbitration procedure in question also admits a pure strategy equilibrium coinciding with the median of the distribution F(a). Theorem 2.12 Consider the conventional arbitration procedure. If the density function f(a) is symmetrical and unimodal, the arbitration game has a Nash equilibrium consisting of identical pure strategies mF. Penalty arbitration. Earlier, we have studied arbitration procedures, where each player may submit any offers (including the ones discriminating against the opponent). To avoid these situations, an arbitrator can apply penalty arbitration procedures. To proceed, analyze such scheme introduced by Zeng (2003). Arbitrator’s opinion 𝛼 represents a random variable with a known distribution function F(a) and density function f(a). Denote by E the expected value E𝛼 = ∫ R1 adF(a). Company Trade Union (player I) and Company Manager (player II) submit their offers x and y, respectively. Arbitrator follows the conventional mechanism, but adds some quantity to his decision. This quantity depends on the announced offers. As a matter of fact, it can be interpreted as player’s penalty. Imagine that an arbitrator has the decision a.If|x − a| < |y − a|, an arbitrator supports player I and “penalizes” player II by the quantity a − y.In other words, the arbitrator’s decision becomes a + (a − y) = 2a − y. The penalty is higher for greater differences between the arbitrator’s opinion and the offer of player II. In the case of |x − a| > |y − a|, arbitrator “penalizes” player I, and his decision makes up a − (x − a) = 2a − x. Hence, this arbitration game has the payoff function H(x, y) = EH𝛼(x, y), where H𝛼(x, y) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ x+y 2 ,ifx ≤ y 2𝛼 − y,ifx > y, |x − 𝛼| < |𝛼 − y| 2𝛼 − x,ifx > y, |x − 𝛼| > |𝛼 − y| 𝛼,ifx > y, |x − 𝛼| = |𝛼 − y|. (6.9) www.it-ebooks.info ZERO-SUM GAMES 47 Theorem 2.13 Consider the penalty arbitration procedure with the payoff function (6.9). It admits a unique pure strategy Nash equilibrium which consists of identical pure strategies E. Proof: First, we demonstrate that the strategy profile (E, E) forms an equilibrium. Suppose that the players choose pure strategies x and y such that x > y. Consequently, the arbitration game leads to the payoff H(x, y) = x+y 2 −0 ∫ −∞ (2a − x)dF(a) + ∞ ∫ x+y 2 +0 (2a − y)dF(a) + x + y 2 [ F (x + y 2 ) − F (x + y 2 − 0 )] . (6.10) This formula takes into account the following aspect. The distribution function is right continuous and the point (x + y)∕2 may correspond to a discontinuity of F(a). Rewrite (6.10) as H(x, y) = ∞ ∫ −∞ 2adF(a) − 2x + y 2 ( F (x + y 2 ) − F (x + y 2 − 0 )) − xF (x + y 2 − 0 ) − y ( 1 − F (x + y 2 )) + x + y 2 [ F (x + y 2 ) − F (x + y 2 − 0 )] = 2E − y − x − y 2 [ F (x + y 2 ) + F (x + y 2 − 0 )] . (6.11) Assume that player II employs the pure strategy y = E. The expression (6.11) implies that, under x ≥ E, the payoff of player I is defined by H(x, E) = E − x − E 2 [ F (x + E 2 ) + F (x + E 2 − 0 )] ≤ E. In the case of x < E,wehave H(x, E) = x + E 2 < E. By analogy, readers would easily verify that H(E, y) ≥ E for all y ∈ R1. Second, we establish the uniqueness of the equilibrium (E, E). Conjecture that there exists another pure strategy equilibrium (x, y). In an equilibrium, the condition x < y takes no place (otherwise, player I would raise his payoff by increasing the offer within the interval [x, y]). Suppose that x > y.Letz designate the bisecting point of the interval [y, x]. If F(z) = 0, formula (6.11) requires that H(x, y) − H(x, z) = (z − y) + x − z 2 [ F (x + z 2 ) + F (x + z 2 − 0 )] − x − y 2 [F(z) + F(z − 0)] . www.it-ebooks.info 48 MATHEMATICAL GAME THEORY AND APPLICATIONS However, F(z) = 0, which immediately gives F(z − 0) = 0. Therefore, H(x, y) − H(x, z) = (z − y) + x − z 2 [ F (x + z 2 ) + F (x + z 2 − 0 )] > 0. Andso,thestrategyz dominates the strategy y, and y is not the optimal strategy of player II. In the case of F(z) > 0, study the difference H(z, y) − H(x, y) = x − y 2 [F(z) + F(z − 0)] − z − y 2 [ F (z + y 2 ) + F (z + y 2 − 0 )] . So far as x − y = 2(z − y), we obtain H(z, y) − H(x, y) = (z − y) [F(z) + F(z − 0)] − z − y 2 [ F (z + y 2 ) + F (z + y 2 − 0 )] = z − y 2 [ 2F(z) − F (z + y 2 ) + 2F(z − 0) − F (z + y 2 − 0 )] > 0, since F(z) > 0 and F(z) ≥ F( z+y 2 ), F(z − 0) ≥ F( z+y 2 − 0). Thus, here the strategy z of player I dominates x, and (x, y) is not an equilibrium. The equilibrium (E, E) turns out to be unique. This concludes the proof of Theorem 2.13. The above arbitration schemes bring pure strategy equilibria. Such result is guaranteed by several assumptions concerning the distribution function F(a) of an arbitrator. In what follows, we demonstrate the non-existence of pure strategy equilibria in a wider class of distributions. We will look for Nash equilibria among mixed strategies. At the same time, this approach provides new interesting applications of arbitration procedures. Both sides must submit random offers, which generate a series of practical benefits. 2.7 Two-point discrete arbitration procedures For the sake of simplicity, let 𝛼 be a random variable taking the values of −1 and 1 with identical probabilities p = 1∕2. The strategies of players I and II also represent arbitrary values x, y ∈ R1. The payoff function in this game acquires the form (6.1). This game has an equilibrium in the class of mixed strategies. We prove this fact rigorously. Here, define mixed strategies either through distribution functions or through their density functions. Owing to its symmetry, the game under consideration has zero value. And so, the optimal strategies of players must be symmetrical with respect to the origin. Therefore, it suffices to construct an optimal strategy for one player (e.g., player I). Denote by F(y) the mixed strategy of player II. Assume that the support of the distribution F lies on the negative semiaxis. Then the conditions of the game imply the following. For all x < 0, the payoff of player I satisfies H(x, F) ≤ 0. In the case of x ≥ 0, his payoff becomes H(x, F) = 1 2 ⎡ ⎢ ⎢⎣ F(−2 − x)x + ∞ ∫ −2−x ydF(y) ⎤ ⎥ ⎥⎦ + 1 2 ⎡ ⎢ ⎢⎣ F(2 − x)x + ∞ ∫ 2−x ydF(y) ⎤ ⎥ ⎥⎦ . (7.1) www.it-ebooks.info ZERO-SUM GAMES 49 0 x H c + 4c 1 H = H (x, F) Figure 2.4 The payoff function H(x, F). We search for a distribution function F(y) such that (a) its support belongs to the interval [−c − 4, −c], where 0 < c < 1, and (b) the payoff function H(x, F) vanishes on the interval [c, c + 4] and is negative-valued for the rest x. Figure 2.4 illustrates the idea. According to (7.1), we have H(x, F) = 1 2 ⎡ ⎢ ⎢⎣ F(−2 − x)x + −c ∫ −2−x ydF(y) ⎤ ⎥ ⎥⎦ + 1 2x, x ∈ [c, c + 2]. (7.2) Since H(x, F)isfixedontheinterval[c, c + 2], we uniquely determine the distribution function F(y). For this, perform differentiation in (7.2) and equate the result to zero: dH dx = 1 2 [ −F′(−2 − x)x + F(−2 − x) + (−2 − x)F′(−2 − x) ] + 1 2 = 0. (7.3) Substitute −2 − x = y into (7.3) to get the differential equation 2F′(y)(y + 1) =−[F(y) + 1], y ∈ [−c − 4, −c − 2]. Its solution yields the distribution function F(y) on the interval [−c − 4, −c − 2]: F(y) =−1 + const√ −y − 1 , y ∈ [−c − 4, −c − 2]. And finally, the condition F(−c − 4) = 0 brings to F(y) =−1 + √ 3 + c√ −y − 1 , y ∈ [−c − 4, −c − 2]. (7.4) www.it-ebooks.info 50 MATHEMATICAL GAME THEORY AND APPLICATIONS On the interval [c + 2, c + 4], the function H(x, F) takes the form H(x, F) = 1 2 −c ∫ −c−4 ydF(y) + 1 2 ⎡ ⎢ ⎢⎣ F(2 − x)x + −c ∫ 2−x ydF(y) ⎤ ⎥ ⎥⎦ , x ∈ [c + 2, c + 4]. (7.5) By requiring its constancy, we find dH dx = 1 2 [ −F′(2 − x)x + F(2 − x) + (2 − x)F′(2 − x) ] = 0. Next, set 2 − x = y and obtain the differential equation 2F′(y)(y − 1) =−F(y), y ∈ [−c − 2, −c]. The condition F(−c) = 1 leads to F(y) = √ 1 + c√ 1 − y , y ∈ [−c − 2, −c]. (7.6) Let us demand continuity of the function F(y). For this, paste together the functions (7.4) and (7.6) at the point y =−c − 2. This condition √ 1 + c√ 3 + c =−1 + √ 3 + c√ 1 + c generates the quadratic equation (1 + c)(3 + c) = 4. (7.7) Its solution can be represented as c = 2z − 1 ≈ 0.236, where z indicates the “golden section” of the interval [0, 1] (a solution of the quadratic equation z2 + z − 1 = 0). Therefore, we have constructed a continuous distribution function F(y), y ∈ [−c − 4, −c] such that the payoff function H(x, F) of player I possesses a constant value on the interval [c, c + 4]. It forms the optimal strategy of player II, if we prove the following. The function H(x, F) has the shape illustrated by the figure (its curve is below abscissa axis). www.it-ebooks.info ZERO-SUM GAMES 51 The solution to this game is provided by Theorem 2.14 In the discrete arbitration procedure, optimal strategies acquire the form G(x) = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪⎩ 0, x ∈ (−∞, c] 1 − √ 1+c√ x+1 , x ∈ (c, c + 2] 2 − √ 3+c√ x−1 , x ∈ (c + 2, c + 4] 1, x ∈ (c + 4, ∞) (7.8) F(y) = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪⎩ 0, y ∈ (−∞, −c − 4] −1 + √ 3+c√ −y−1 , y ∈ (−c − 4, −c − 2] √ 1+c√ 1−y , y ∈ (−c − 2, −c] 1, y ∈ (−c, ∞) (7.9) where c = √ 5 − 2. For proof, it suffices to show that H(x, F) ≤ 0 for all x ∈ R1. In the case of x ≤ 0, this inequality is obvious (y < 0 holds true almost surely and, due to (6.3), H(x, F)is negative). Recall that, within the interval [c, c + 4], the function H(x, F) possesses a constant value. Let us find the latter. Formula (7.2) yields H(x, F) = H(c + 2) = 1 2 ⎡ ⎢ ⎢⎣ F(−c − 4)(c + 2) + −c ∫ −c−4 ydF(y) ⎤ ⎥ ⎥⎦ + 1 2(c + 2) = 1 2(̄y + c + 2), x ∈ [c, c + 2], (7.10) where ̄y indicates the mean value of the random variable y having the distribution (7.4), (7.6). By performing simple computations, we arrive at ̄y = −c−2 ∫ −c−4 yd √ 3 + c√ −y − 1 + −c ∫ −c−2 yd √ 1 + c√ 1 − y =−c − 2. It follows from (6.3) that H(x, F) = 0, x ∈ [c, c + 2]. Similarly, one obtains H(x, F) = 0 on the interval [c + 2, c + 4]. www.it-ebooks.info 52 MATHEMATICAL GAME THEORY AND APPLICATIONS If x ≥ c + 4, the function H(x, F) acquires the form (7.5). After the substitution 2 − x = y, its derivative becomes dH dx = 1 2[F′(2 − x)(2 − 2x) + F(2 − x)] = 1 2[F′(y)(2y − 2) + F(y)]. Using the expression (7.4) for F, it is possible to get dH dx = 1 2 [ √ 3 + c√ −y − 1 2y y + 1 − 1 ] , y ≤ −c − 2. (7.11) The function (7.11) increases monotonically with respect to y on the interval [−c − 4, −c − 2], reaching its maximum at the point y = c − 2. By virtue of (7.7), the maximal value appears 1 2 [√ 3 + c√ 1 + c 2(−c − 2) −c − 1 − 1 ] =−1 2 (c + 3)2 (1 + c)2 < 0. This testifies that the function H(x, F) is decreasing on the set x ≥ c + 4 and vanishing at the point x = c + 4. Consequently, H(x, F) ≤ 0, x ≥ c + 4. If x ∈ [0, c], then H(x, F) takes the form (7.2). After the substitution −2 − x = y, its derivative (7.3) is determined by dH dx = 1 2 [ F′(y)(2y + 2) + F(y) + 1 ] , y ≥ −c − 2. (7.12) Again, employ the expression (7.12) for F to obtain dH dx = 1 2 [ √ 1 + c√ (1 − y)3 (2y + 2) + √ 1 + c√ 1 − y + 1 ] . This function increases monotonically with respect to y on the interval [−c − 2, −2], reaching its minimum at the point y =−c − 2. Furthermore, the minimal value 1 2 [ √ 1 + c√ (3 + c)3 (−2c − 2) + √ 1 + c√ 3 + c + 1 ] = 1 2 [ 2(1 − c) (3 + c)2 + 1 ] is positive. And so, the function H(x, F) increases under x ∈ [0, c] and vanishes at the point x = c. This fact dictates that H(x, F) < 0, x ≤ c. Therefore, we have demonstrated that H(x, F) ≤ 0, x ∈ R. Hence, any mixed strategy G of player L satisfies the condition H(G, F) ≤ 0. www.it-ebooks.info ZERO-SUM GAMES 53 This directly implies that F—see (7.9)—is the optimal strategy of player II. And finally, we take advantage of the problem’s symmetry. Being symmetrical with respect to the origin, the strategy G defined by (7.8) is optimal for player I. This finishes the proof of Theorem 2.14. Remark 2.2 Interestingly, the optimal rule in the above arbitration game relates to the golden section. The optimal rule proper guides player II to place his offers within the interval [−3 + 2z,1− 2z] ≈ [−4.236, −0.236] on the negative semiaxis. On the other hand, player I must submit offers within the interval [2z − 1, 3 − 2z] ≈ [0.236, 4.236]. Therefore, the game avoids strategy profiles when the offers of player II exceed those of player I. In addition, we emphasize that the mean value of the distributions F and G coincides with the bisecting point of the interval corresponding to the support of the distribution. 2.8 Three-point discrete arbitration procedures with interval constraint Suppose that the random variable 𝛼 is concentrated in the points a1 =−1, a2 = 0 and a3 = 1 with identical probabilities of p = 1∕3. Contrariwise to the model scrutinized in Section 2.6, we believe that players submit offers within the interval x, y ∈ [−a, a]. Let us search for an equilibrium in the class of mixed strategies. Denote by f(x) and g(y) the strategies of players I and II, respectively. Assume that the support of the dis- tribution g(y)(f(x)) lies on the negative semiaxis (positive semiaxis, respectively). In other words, f(x) ≥ 0, x ∈ [0, a], ∫ a 0 f(x)dx = 1, g(y) ≥ 0, y ∈ [−a,0],∫ 0 −a g(y)dy = 1. Owing to the symmetry, the game has zero value, and the optimal strategies must be symmet- rical with respect to ordinate axis: g(y) = f(−y). This condition serves for constructing the optimal strategy of some player (e.g., player I). Theorem 2.15 For a ∈ (0, 8∕3], the optimal strategy acquires the form f(x) = ⎧ ⎪ ⎨ ⎪⎩ 0, 0 ≤ x < a 4√ a 2 √ x3 , a 4 ≤ x ≤ a. (8.1) In the case of a ∈ (8∕3, ∞), it becomes f(x) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ 0, 0 ≤ x < 2 3√ 2 3 1√ x3 , 2 3 ≤ x ≤ 8 3 , 0, 8 3 < x ≤ a (8.2) www.it-ebooks.info 54 MATHEMATICAL GAME THEORY AND APPLICATIONS Proof: We begin with the case when a ∈ (0, 2]. According to the rules of this game, for y ∈ [−a, 0] player II gains the payoff H(f, y) = 1 3 ∫ a 0 yf(x)dx + 1 3 ⎛ ⎜ ⎜⎝ −y ∫ 0 xf(x)dx + a ∫ −y yf(x)dx ⎞ ⎟ ⎟⎠ + 1 3 ∫ a 0 xf(x)dx. Seek for f in the following class of strategies: f(x) = ⎧ ⎪ ⎨ ⎪⎩ 0, 0 ≤ x <𝛼 𝜑(x), 𝛼 ≤ x ≤ 𝛽, 0, 𝛽 0, x ∈ [𝛼, 𝛽] and 𝜑 is continuously differentiable with respect to (𝛼, 𝛽). The strategy (8.3) enjoys optimality, if H(f, y) = 0fory ∈ [−𝛽, −𝛼] and H(f, y) ≥ 0for y ∈ [−a, −𝛽) ∪ (−𝛼, 0]. Note that H(f,0)= 1 3 a ∫ 0 xf(x)dx > 0. The condition H(f, −𝛼) = H(f, −𝛽) = 0 implies that 𝛽 = 4𝛼 and 𝛽 ∫ 𝛼 x𝜑(x)dx = 2𝛼. Clearly, 0 <𝛼≤ a 4 . At the same time, H(f, −a) = 1 3 [−a + 4𝛼]. Therefore, H(f, a) ≥ 0iffa ≤ 4𝛼.And so, 𝛼 = a 4 and 𝛽 = a. Let us find the function 𝜑(x). The condition H(f, y) = 0, y ∈ [𝛽, −𝛼] yields H′(f, y) = H′′(f, y) = 0. Consequently, H′(f, y) = 1 + 2yf(−y) + 2 ∫ −y f(x)dx = 0, H′′(y) = 3f(−y) − 2yf ′(−y) = 0. By setting y =−x, we arrive at the differential equation 3f(x) + 2xf ′(x) = 0. (8.4) It has the solution f(x) = c√ x3 . (8.5) So long as 1 = a ∫ 0 f(x)dx = a ∫ a∕4 c√ x3 = 2c√ a , we evaluate c = √ a 2 . www.it-ebooks.info ZERO-SUM GAMES 55 Thus, f(x) = ⎧ ⎪ ⎨ ⎪⎩ 0, 0 ≤ x < a∕4 √ a 2 √ x3 , a∕4 ≤ x ≤ a. To proceed, verify the optimality condition. Under y ∈ [−a, −a∕4], we have 3H(f, y) = y + −y ∫ a∕4 √ a 2 √ x dx + y a ∫ −y √ a 2 √ x3 dx + a ∫ a∕4 √ a 2 √ x dx = y + √ a √ −y − a 2 − y − √ a √ −y + a 2 = 0. In the case of y ∈ (−a∕4, 0], H(f, y) = y + y a ∫ a∕4 √ a 2 √ x3 dx + a 2 = 2 ( y + a 4 ) > 0. This guarantees the optimal character of the strategy (8.1). Now, let a ∈ (2, 8 3 ]. Consider H(f, y)fory ∈ [−a, −a∕4], where f meets (2). The support of the distribution f is [a∕4, a] and a ≤ 8 3 .Andso,−1 + (−1 − y) ≤ a∕4 and −y ≥ a∕4 for all y ∈ [−a, −a∕4]. This means that, for y ∈ [−a, −a∕4], we get 3H(f, y) = a ∫ a 4 yf(x)dx + ⎛ ⎜ ⎜ ⎜⎝ −y ∫ a 4 xf(x)dx + a ∫ −y yf(x)dx ⎞ ⎟ ⎟ ⎟⎠ + a ∫ a 4 xf(x)dx. Differentiation again brings to equation (8.4). Its solution f(x) is given by (8.1). Follow the same line of reasoning as above to establish that H(f, y) > 0fory ∈ (−a∕4, 0]. Therefore, the strategy (8.1) is also optimal for a ∈ (2, 8 3 ]. And finally, assume that a ∈ ( 8 3 , ∞). In this case, the function H(f, y) becomes somewhat more complicated. As an example, consider the situation when a = 4. Under y ∈ [−4, −2], we have 3H(f, y) = ⎡ ⎢ ⎢⎣ −2−y ∫ 0 xf(x)dx + 4 ∫ −2−y yf(x)dx ⎤ ⎥ ⎥⎦ + ⎡ ⎢ ⎢⎣ −y ∫ 0 xf(x)dx + 4 ∫ −y yf(x)dx ⎤ ⎥ ⎥⎦ + 4 ∫ 0 xf(x)dx. (8.6) For y ∈ [−2, 0], 3H(f, y) = 4 ∫ 0 yf(x)dx + ⎡ ⎢ ⎢⎣ −y ∫ 0 xf(x)dx + 4 ∫ −y yf(x)dx ⎤ ⎥ ⎥⎦ + ⎡ ⎢ ⎢⎣ 2−y ∫ 0 xf(x)dx + 4 ∫ 2−y yf(x)dx ⎤ ⎥ ⎥⎦ . (8.7) www.it-ebooks.info 56 MATHEMATICAL GAME THEORY AND APPLICATIONS Find f in the form (8.3), where 𝛽 = 𝛼 + 2. The conditions H(f, 𝛽) = H(f, 𝛼) = 0 yield 𝛽 ∫ 𝛼 xf(x)dx = 2𝛼 = 𝛽 2 . Consequently, 𝛽 = 4𝛼; in combination with 𝛽 = 𝛼 + 2, this result leads to 𝛼 = 2∕3, 𝛽 = 8∕3. According to (8.6), on the interval [−𝛽, −2] the condition H′′(f, y) = 0 turns out equivalent to [3f(−y) − 2yf ′(−y)] + [3f(−2 − y) − (2 + 2y)f ′(−2 − y)] = 0. (8.8) If y ∈ [−𝛽, −2], then x =−y ∈ [2, 𝛽] and −2 − y ∈ [0, 𝛽 − 2] or −2 − y ∈ [0, 𝛼]. However, for x ∈ [0, 𝛼]wehavef(x) = 0, f ′(x) = 0. Hence, the second expression in square brackets (see (8.8)) equals zero. And the following equation in f(−y) arises immediately: [3f(−y) − 2yf ′(−y)] = 0. As a matter of fact, it completely matches (8.4) under x =−y. Similarly, it is possible to rewrite H′′(f, y)fory ∈ [−2, −𝛼]as [3f(−y) − 2yf ′(−y)] + [3f(2 − y) + (2 − 2y)f ′(2 − y)] = 0. Here −y ∈ [𝛼, 2] and 2 − y ∈ [𝛼 + 2, 4]. Therefore, f(2 − y) = f ′(2 − y) = 0 and we derive the same equation (8.4) in f(x). Within the interval (2∕3, 8∕3), the solution to (8.4) has the form f(x) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ 0, 0 ≤ x < 2∕3 √ 2 3√ x3 ,2∕3 ≤ x ≤ 8∕3 0, 8∕3 < x ≤ 4 (8.9) The optimality of (8.9) can be verified by analogy to the case when a ∈ (0, 8 3 ]. For a ∈ ( 8 3 , ∞), the complete proof of this theorem lies in thorough analysis of the intervals a ∈ (8∕3, 4], a ∈ [4, 14∕3] and a ∈ (14∕3, ∞). The function H(f, y) has the same form as (8.6) and (8.7). 2.9 General discrete arbitration procedures Consider the case when arbitrator’s offer is a random variable 𝛼, taking values {−n, −(n − 1), … , −1, 0, 1, … , n − 1, n} with identical probabilities p = 1∕(2n + 1). The offers submitted by players must belong to the interval x, y ∈ [−a, a] (see Figure 2.5). www.it-ebooks.info ZERO-SUM GAMES 57 III xy 01-1 n-n p p p p p Figure 2.5 The discrete distribution of offers by an arbitrator, p = 1 2n+1 . As earlier, we seek for a mixed strategy equilibrium. Denote by f(x) and g(y)themixed strategies of players I and II. Suppose that the supports of the distributions g(y)(f(x)) lie in the negative (positive) domain, i.e., f(x) ≥ 0, x ∈ [0, a], a ∫ 0 f(x)dx = 1, g(y) ≥ 0, y ∈ [−a,0], 0 ∫ −a g(y)dy = 1. (9.1) Let us search for the strategy of some player (say, player I). Theorem 2.16 Under a ∈ (0, 2(n+1)2 2n+1 ], the optimal strategy of player I takes the form f(x) = ⎧ ⎪ ⎨ ⎪⎩ 0, 0 ≤ x < ( n n+1 )2 a, n √ a 2 √ x3 , ( n n+1 )2 a ≤ x ≤ a. (9.2) In the case of a ∈ ( 2(n+1)2 2n+1 , +∞), it becomes f(x) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ 0, 0 ≤ x < 2n2 2n+1 , n(n+1)√ 2(2n+1) 1√ x3 2n2 2n+1 ≤ x ≤ 2(n+1)2 2n+1 , 0, 2(n+1)2 2n+1 < x ≤ a. (9.3) Proof: First, consider the case of a ∈ (0, 2]. Under y ∈ [−a, 0], the payoff of player I equals H(f, y) = 1 2n + 1 ⎡ ⎢ ⎢⎣ n a ∫ 0 yf(x)dx + ⎛ ⎜ ⎜⎝ −y ∫ 0 xf(x)dx + a ∫ −y yf(x)dx ⎞ ⎟ ⎟⎠ + n a ∫ 0 xf(x)dx ⎤ ⎥ ⎥⎦ . Find the strategy f in the form f(x) = ⎧ ⎪ ⎨ ⎪⎩ 0, 0 ≤ x <𝛼, 𝜑(x), 𝛼 ≤ x ≤ 𝛽, 0, 𝛽 0, x ∈ [𝛼, 𝛽] and 𝜑 is continuously differentiable on (𝛼, 𝛽). www.it-ebooks.info 58 MATHEMATICAL GAME THEORY AND APPLICATIONS The strategy (9.4) appears optimal, if H(f, y) = 0fory ∈ [−𝛽, −𝛼] and H(f, y) ≥ 0for y ∈ [−a, −𝛽) ∪ (−𝛼, 0]. Note that H(f,0)= n 2n+1 a ∫ 0 xf(x)dx > 0. It follows from H(f, −𝛼) = H(f, −𝛽) = 0 that H(f, −𝛼) = 1 2n + 1 ⎡ ⎢ ⎢⎣ −(n + 1)𝛼 + n 𝛽 ∫ 𝛼 x𝜑(x)dx ⎤ ⎥ ⎥⎦ = 0, H(f, −𝛽) = 1 2n + 1 ⎡ ⎢ ⎢⎣ −n𝛽 + (n + 1) 𝛽 ∫ 𝛼 x𝜑(x)dx ⎤ ⎥ ⎥⎦ = 0. This system yields 𝛽 ∫ 𝛼 x𝜑(x)dx = n + 1 n 𝛼 = n n + 1 𝛽 and 𝛽 = ( n+1 n )2𝛼 or 𝛼 = ( n n+1 )2𝛽. For y =−a,wehaveH(f, −a) = 1 2n+1 [−na + n𝛽] = n 2n+1 (𝛽 − a). Hence, if 𝛽 0. These conditions lead to the optimality of (9.2). To proceed, analyze the case of 2 < a ≤ 2(n+1)2 2n+1 . Consider H(f, y) provided that y ∈ [−a, −( n n+1 )2a], where f is defined by (9.2). Recall that the distribution f possesses the support [( n n+1 )2a, a] and a ≤ 2(n+1)2 2n+1 . These facts imply that a − ( n n+1 )2a ≤ 2. Consequently, for y ∈ [−a, −( n n+1 )2a], we have (2n + 1)H(f, y) = n a ∫ ( n n+1 )2 a yf(x)dx + ⎛ ⎜ ⎜ ⎜ ⎜⎝ −y ∫ ( n n+1 )2 a xf(x)dx + a ∫ −y yf(x)dx ⎞ ⎟ ⎟ ⎟ ⎟⎠ + n a ∫ ( n n+1 )2 a xf(x)dx. Again, differentiation yields equation (9.5). Its solution f(x) acquires the form (9.2). Thus, H(f, y) ≡ 0 under y ∈ [−a, −( n n+1 )2a]. Now, it is necessary to show that H(f, y) > 0fory ∈ (−( n n+1 )2a, 0]. Find out the sign of H(f, y) within the interval [−( n n+1 )2a, −( n n+1 )2a + 2]. www.it-ebooks.info 60 MATHEMATICAL GAME THEORY AND APPLICATIONS If y ∈ [−( n n+1 )2a, −a + 2], then H(f, y) = n + 1 2n + 1y + n 2n + 1 a ∫ ( n n+1 )2 a xf(x) dx = n + 1 2n + 1 [ y + ( n n + 1 )2 a ] > 0. On the other hand, under y ∈ [−a + 2, −( n n + 1)2a + 2], we obtain H(f, y) = n + 1 2n + 1y + 1 2n + 1 ⎛ ⎜ ⎜ ⎜ ⎜⎝ 2−y ∫ ( n n+1 )2 a xf(x) dx + a ∫ 2−y yf(x) dx ⎞ ⎟ ⎟ ⎟ ⎟⎠ + n − 1 2n + 1 a ∫ ( n n+1 )2 a xf(x) dx. Then H′(f, y) = 1 2n + 1 ⎡ ⎢ ⎢⎣ n + 1 + (2y − 2)f(2 − y) + a ∫ 2−y f(x) dx ⎤ ⎥ ⎥⎦ = 1 2n + 1 [ n + 1 + (y − 1)n √ a√ (2 − y)3 − n + n √ a√ 2 − y ] = 1 2n + 1 ( 1 + n √ a√ (2 − y)3 ) > 0. Hence, H(f, y) > 0fory ∈ (−( n n+1 )2a, −( n n+1 )2a + 2]. If −( n n+1 )2a + 2 ≥ 0, the proof is completed. Otherwise, shift the interval to the right and demonstrate that H(f, y) > 0fory ∈ (−( n n+1 )2a + 2, −( n n+1 )2a + 4], etc. Thus, we have established the optimality of the strategy (9.2) under a ∈ (2, 2(n+1)2 2n+1 ]. And finally, investigate the case when 2(n+1)2 2n+1 < a ≤ ∞. Here the function H(f, y) becomes more complicated. Let us analyze the infinite horizon case a =∞only. Assume that player I employs the strategy (9.3) and find the payoff function H(f, y). For the sake of simplicity, introduce the notation 𝛼 = 2n2 2n+1 and 𝛽 = 𝛼 + 2 = 2(n+1)2 2n+1 .For y ∈ (−∞, −2n − 𝛽], we accordingly have H(f, y) = 𝛽 ∫ 𝛼 xf(x)dx = 2n(n + 1) 2n + 1 > 0. Set k = 3[ n 2 ] + 2, if n is an odd number and k = 3 n 2 , otherwise. For y ∈ [−2n + 2r − 𝛽, −2n + 2r − 𝛼], where r = 0, 1, … , n, … , k − 1, and for y ∈ [−2n + 2r − 𝛽, 0], where r = k, www.it-ebooks.info ZERO-SUM GAMES 61 one has the following chain of calculations: H(f, y) = r 2n + 1y + 1 2n + 1 ⎡ ⎢ ⎢⎣ −2n+2r−y ∫ 𝛼 xf(x)dx + 𝛽 ∫ −2n+2r−y yf(x)dx ⎤ ⎥ ⎥⎦ + 2n − r 2n + 1 𝛽 ∫ 𝛼 xf(x)dx = 𝛽 ∫ 𝛼 xf(x)dx − r 2n + 1 𝛽 ∫ 𝛼 (x − y)f(x)dx − 1 2n + 1 𝛽 ∫ −2n+2r−y (x − y)f(x)dx. (9.8) Perform differentiation in formula (9.8), where f is determined by (9.2), to derive the equation H′(f, y) = r 2n + 1 + 1 2n + 1 𝛽 ∫ −2n+2r−y f(x)dx + 1 2n + 1(2y + 2n − 2r)f(−2n + 2r − y) = r − n 2n + 1 ( 1 + 2n(n + 1)√ 2(2n + 1)(−2n + 2r − y)3 ) . (9.9) According to (9.9), the expected payoff H(f, y) is constant within the interval y ∈ [−𝛽, −𝛼], where r = n. Furthermore, since H(f, 𝛽) = 𝛽 ∫ 𝛼 xf(x)dx − n 2n + 1 𝛽 ∫ 𝛼 (x + 𝛽)f(x)dx = n + 1 2n + 1 𝛽 ∫ 𝛼 xf(x)dx − n 2n + 1 𝛽 = n + 1 2n + 1 2n(n + 1) 2n + 1 − n 2n + 1 2(n + 1)2 2n + 1 = 0, we have H(f, y) ≡ 0fory ∈ [−𝛽, −𝛼]. In the case of r < n (r > n), formula (9.9) brings to H′(f, y) < 0(H′(f, y) > 0, respectively) on the interval y ∈ [−2n + 2r − 𝛽, −2n + 2r − 𝛼]. Hence, H(f, y) ≥ 0 for all y. This testifies to the optimality of the strategy (9.3). For a ∈ ( 2(n+1)2 2n+1 , ∞), the complete proof of Theorem 2.16 is by similar reasoning as for a =∞. Obviously, optimal strategies in the discrete arbitration scheme with uniform distribution appear randomized. This result differs from the continuous setting discussed in Section 2.6 (players have optimal strategies in the class of optimal strategies). By analogy to the uniform case, optimal strategies of players are concentrated in the boundaries of the interval [−a, a]. Note the following aspect. According to Theorem 2.14, the optimal strategy (9.2) in the discrete scheme with a = n possesses a non-zero measure only on the interval [( n n+1 )2a, a]. Actually, its length tends to zero for large n. In other words, the solutions to the discrete and continuous settings of the above arbitration game do coincide for sufficiently large n. www.it-ebooks.info 62 MATHEMATICAL GAME THEORY AND APPLICATIONS Exercises 1. Find a pure strategy solution of a convex-concave game with the payoff function H(x, y) =−5x2 + 2y2 + xy − 3x − y. and a mixed strategy solution of a convex game with the payoff function H(x, y) = y3 − 4xy + x3. 2. Obtain a mixed strategy solution of a duel with the payoff function H(x, y) = ⎧ ⎪ ⎨ ⎪⎩ 2x − y + xy, x < y, 0, x = y, x − 2y − xy, x > y. 3. Find a solution to the following duel. Player I has two bullets, whereas player II disposes of one bullet. 4. The birthday game. Peter goes home from his work and suddenly remembers that today Kate celebrates her birthday. Is that the case? Peter chooses between two strategies, namely, visiting Kate with or without a present. Suppose that today Kate celebrates no birthday; if Peter visits Kate with a present, his payoff makes 1 (and 0, otherwise). Assume that today Kate celebrates her birthday; if Peter visits Kate with a present, his payoff equals 1.5 (and -10, otherwise). Construct the payoff matrix and evaluate an equilibrium in the stated game. 5. The high-quality amplifier game. A company manufactures amplifiers. Their operation strongly depends on some param- eters of a small (yet, scarce) capacitor. The standard price of this capacitor is 100 USD. However, the company’s costs for warranty return of a failed capacitor constitute 1000 USD. The company chooses between the following strategies: 1) applying an inspection method for capacitors, which costs 100 USD and guarantees failure iden- tification three times out of four; 2) applying a reliable and cheap inspection method, which causes breakdown of an operable capacitor nine times out of ten; 3) purchasing the desired capacitors at the price of 400 USD with full warranty. Construct the payoff matrix for the game involving nature (possible failures of a capacitor) and the company. Evaluate an equilibrium in this game. 6. Obtain a solution of a game 3 × 3 with the payoff matrix A = ⎛ ⎜ ⎜⎝ 36 8 43 2 7 −5 −1 ⎞ ⎟ ⎟⎠ . 7. The game of words. Two players announce a letter as follows. Player I chooses between letters “a” and “i,” whereas player II chooses between letters “f,” “m,” and “t.” If the selected letters form a word, player I gains 1; moreover, player I receives the additional reward of 3 if this www.it-ebooks.info ZERO-SUM GAMES 63 word corresponds to an animate noun or pronoun. When the announced letters form no word, player II gains 2. Therefore, the payoff matrix takes the form a i fmt −211 1 −24 Find a solution in this game. 8. Demonstrate that a game with the payoff function H(x, y) = ⎧ ⎪ ⎨ ⎪⎩ −1, x = 1, y < 1 and x < y < 1, 0, x = y, 1, y = 1, x < 1 and y < x < 1 admits no solution. 9. Provide the complete proof to Theorem 2.15 in the case of a ∈ ( 8 3 , ∞). 10. Find an equilibrium in the arbitration procedure provided that arbitrator’s offers are located at the points −n, −n − 1, … , −1, 1, … , n. www.it-ebooks.info 3 Non-cooperative strategic-form n-player games Introduction In Chapter 1, we have explored nonzero-sum games of two players. The introduced definitions of strategies, strategy profiles, payoffs, and the optimality principle are naturally extended to the case of n players. Let us give the basic definitions for n-player games. Definition 3.1 A normal-form n-player game is an object Γ=< N,{Xi}i∈N,{Hi}i∈N >, where N = {1, 2, … , n} indicates the set of players, Xi represents the strategy set of player i, and Hi : n∏ i=1 Xi → R means the payoff function of player i, i = 1, … , n. As previously, player i chooses some strategy xi ∈ Xi, being unaware of the opponents’ choice. Player i strives for maximizing his payoff Hi(x1, … , xn) which depends on the strate- gies of all players. A set of strategies of all players is called a strategy profile of a game. Consider some strategy profile x = (x1, … , xn). For this profile, the associated notation (x−i, x′ i) = (x1, … , xi−1, x′ i, xi+1, … , xn) designates a strategy profile, where player i has modified his strategy from xi to x′ i, while the rest of the players use the same strategies as before. The major solution approach to n-player games still consists in the concept of Nash equilibria. Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 65 Definition 3.2 A Nash equilibrium in a game Γ is a strategy profile x∗ = (x∗ 1, … , x∗ n) such that the following conditions hold true for any player i ∈ N: Hi(x∗ −i, xi) ≤ Hi(x∗), ∀xi. All strategies in such equilibrium are called optimal. Definition 3.2 implies that, as any player deviates from a Nash equilibrium, his payoff goes down. Therefore, none of players benefit by a unilateral deviation from a Nash equilibrium. Of course, the matter does not concern two or three players simultaneously deviating from a Nash equilibrium. Later on, we will study such games. 3.1 Convex games. The Cournot oligopoly Our analysis of n-player games begins with the following case. Suppose that payoff functions are concave, whereas strategy sets form convex sets. The equilibrium existence theorem for two-player games can be naturally extended to the general case. Theorem 3.1 Consider an n-player game Γ=< N,{Xi}i∈N,{Hi}i∈N >. Assume that strat- egy sets Xi are compact convex sets in the space Rn, and payoff functions Hi(x1, … , xn) are continuous and concave in xi. Then the game always admits a Nash equilibrium. Convex games comprise different oligopoly models, where n companies compete on a market. Similarly to duopolies, one can discriminate between Cournot oligopolies and Bertrand oligopolies. Let us be confined with the Cournot oligopoly. Imagine that there exist n companies 1, 2, … , n on a market. They manufacture some amounts of a product (x1, x2, … , xn, respectively) that correspond to their strategies. Denote by x the above strategy profile. Suppose that product price is a linear function, viz., an initial price p minus the total amount of products Q = n∑ i=1 xi multiplied by some factor b. Therefore, the unit price of the product makes up p − bQ. The cost prices of unit product will be indicated by ci, i = 1, … , n. The payoff functions of the players acquire the form Hi(x) = (p − b n∑ j=1 xj)xi − cixi, i = 1, … , n. Recall that the payoff functions Hi(x) enjoy concavity in xi, and the strategy set of player i is convex. Consequently, oligopolies represent an example of convex games with pure strategy equilibria. A Nash equilibrium satisfies the following system of equations: 𝜕Hi(x∗ i ) 𝜕xi = 0, i = 1, … , n. (1.1) Equations (1.1) bring to the expressions p − ci − b n∑ j=1 xj − bxi = 0, i = 1, … , n. (1.2) www.it-ebooks.info 66 MATHEMATICAL GAME THEORY AND APPLICATIONS By summing up these equalities, we arrive at np − n∑ j=1 cj − b(n + 1) n∑ j=1 xj = 0, and it appears that n∑ j=1 xj = np − n∑ j=1 cj b(n + 1) . Thus, an equilibrium in the oligopoly model is given by x∗ i = 1 b ⎛ ⎜ ⎜ ⎜ ⎜⎝ p n + 1 − ⎛ ⎜ ⎜ ⎜ ⎜⎝ ci − n∑ j=1 cj n + 1 ⎞ ⎟ ⎟ ⎟ ⎟⎠ ⎞ ⎟ ⎟ ⎟ ⎟⎠ , i = 1, … , n. (1.3) The corresponding optimal payoffs become H∗ i = bx∗ i 2, i = 1, … , n. 3.2 Polymatrix games Consider n-player games Γ=< N,{Xi = {1, 2, … , mi}}i∈N,{Hi}i∈N >, where players’ strategies form finite sets and payoffs are defined by a set of multi-dimensional matrices Hi = Hi(j1, … , jn), i ∈ N. Such games are known as polymatrix games. They may have no pure strategy Nash equilibrium. As previously, perform randomization and introduce the class of mixed strategies ̄Xi = {x(i) = (xi 1, … , x(i) mi )}. Here x(i) j gives the probability that player i chooses strategy j ∈ {1, … , mi}. Under a strategy profile x = (x(1), … , x(n) n ), the expected payoff of player i takes the form Hi(x(1), … , x(n)) = m1∑ j1=1 … mn∑ jn=1 Hi(j1, … , jn)x(1) j1 … x(n) jn , i = 1, … , n. (2.1) The payoff functions (2.1) appear concave, while the strategy set of players enjoys compactness and convexity. According to Theorem 3.1, the stated game always has a Nash equilibrium. Theorem 3.2 There exists a mixed strategy Nash equilibrium in an n-player polymatrix game Γ=< N,{Xi = {1, 2, … , mi}}i∈N,{Hi}i∈N >. In a series of cases, it is possible to solve polymatrix games by analytic methods. For the time being, we explore the case when each player chooses between two strategies. Then www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 67 equilibrium evaluation proceeds from geometric considerations. For simplicity of expositions, let us study three-player games Γ=< N = {I, II, III}, {Xi = {1, 2}}i=1,2,3,{Hi}i=1,2,3 > with payoff matrices H1 = {aijk}2 i,j,k=1, H2 = {bijk}2 i,j,k=1, and H3 = {cijk}2 i,j,k=1. Recall that each player possesses just two strategies. And so, we comprehend mixed strategies as x1, x2, x3— the probabilities of choosing strategy 1 by players I, II, and III, respectively. The opposite event occurs with the probability ̄x = 1 − x. A strategy profile (x∗ 1, x∗ 2, x∗ 3) is an equilibrium strategy profile, if for any strategies x1, x2, x3 the following conditions take place: H1(x1, x∗ 2, x∗ 3) ≤ H1(x∗), H2(x∗ 1, x2, x∗ 3) ≤ H2(x∗), H3(x∗ 1, x∗ 2, x3) ≤ H3(x∗). Particularly, the equilibrium conditions hold true under xi = 0 and xi = 1, i = 1, 2, 3. For instance, consider these inequalities for player III: H3(x∗ 1, x∗ 2,0)≤ H3(x∗), H3(x∗ 1, x∗ 2,1)≤ H3(x∗). According to (2.1), the first inequality acquires the form c112x1x2 + c122x1 ̄x2 + c212 ̄x1x2 + c222 ̄x1 ̄x2 ≤ c112x1x2 ̄x3 + c122x1 ̄x2 ̄x3 + c212 ̄x1x2 ̄x3 + c222 ̄x1 ̄x2 ̄x3 + c111x1x2x3 + c211 ̄x1x2x3 + c121x1 ̄x2x3 + c221 ̄x1 ̄x2x3. Rewrite it as x3(c111x1x2 + c211 ̄x1x2 + c121x1 ̄x2 + c221 ̄x1 ̄x2 − c112x1x2 − c122x1 ̄x2 − c212 ̄x1x2 − c222 ̄x1 ̄x2) ≥ 0. (2.2) Similarly, the second inequality becomes (1 − x3)(c111x1x2 + c211 ̄x1x2 + c121x1 ̄x2 + c221 ̄x1 ̄x2 − c112x1x2 − c122x1 ̄x2 − c212 ̄x1x2 − c222 ̄x1 ̄x2) ≤ 0. (2.3) Denote by C(x1, x2) the bracketed expression in (2.2) and (2.3). For x3 = 0, inequality (2.2) is immediate, whereas (2.3) brings to C(x1, x2) ≤ 0. (2.4) Next, for x3 = 1, we directly have (2.3), while inequality (2.2) requires that C(x1, x2) ≥ 0. (2.5) And finally, for 0 < x3 < 1, inequalities (2.2)–(2.3) dictate that C(x1, x2) = 0. (2.6) www.it-ebooks.info 68 MATHEMATICAL GAME THEORY AND APPLICATIONS The conditions (2.4)–(2.6) determine the set of strategy profiles acceptable for player III.By analogy, we can derive the conditions and corresponding sets of acceptable strategy profiles for players I and II. Subsequently, all equilibrium strategy profiles can be defined by their intersection. Let us provide an illustrative example. Struggle for markets. Imagine that companies I, II, and III manufacture some product and sell it on market A or market B. In comparison with the latter, the former market is characterized by doubled product price. However, the payoff on any market appears inversely proportional to the number of companies that have selected this market. Notably, the payoff of a company on market A makes up 6, 4, or 2, if one, two, or three companies, respectively, do operate on this market; the corresponding payoffs on market B are 3, 2, and 1, respectively. Construct the set of strategy profiles acceptable for player III. In the present case, C(x1, x2) is given by 2x1x2 + 4̄x1x2 + 4x1 ̄x2 + 6̄x1 ̄x2 − 3x1x2 − 2x1 ̄x2 − 2̄x1x2 − ̄x1 ̄x2 =−3x1 − 3x2 + 5. The conditions (2.4)–(2.6) take the form x3 = 0, x1 + x2 ≥ 5 3, x3 = 1, x1 + x2 ≤ 5 3, 0 < x3 < 1, x1 + x2 = 5 3 . Figure 3.1 demonstrates the set of strategy profiles acceptable for player III. Owing to problem symmetry, similar conditions and sets of acceptable strategy profiles apply to players I and II. The intersection of these sets yields four equilibrium strategy profiles. Three of them represent pure strategy equilibria, viz., (A,A,B), (A,B,A), (B,A,A). And the fourth one is a mixed strategy profile: x1 = x2 = x3 = 5 6 . Under the first free equilibria, players I and II have x2 x1 x3 1 1 1 0 Figure 3.1 The set of strategy profiles acceptable for player III. www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 69 the payoff of 4, whereas player III gains 3. With the fourth equilibrium, all players receive the identical payoff: H∗ = 2(5∕6)3 + 4(1∕6)(5∕6)2 + 4(1∕6)(5∕6)2 + 6(1∕6)2(5∕6) + 3(5∕6)2(1∕6) + 2(1∕6)(5∕6)2 + 2(1∕6)(5∕6)2 + (1∕6)3 = 8∕3. In a pure strategy equilibrium above, some player gets the worst of the game (his payoff is smaller as against the opponents). The fourth equilibrium is remarkable for the following. All players enjoy equal rights; nevertheless, their payoff appears smaller even than the minimal payoff of the players under a pure strategy equilibrium. 3.3 Potential games Games with potentials were studied by Monderer and Shapley [1996]. Consider a normal- form n-player game Γ=< N,{Xi}i∈N,{Hi}i∈N >. Suppose that there exists a certain function P: n∏ i=1 Xi → R such that for any i ∈ N we have the inequality Hi(x−i, x′ i) − Hi(x−i, xi) = P(x−i, x′ i) − P(x−i, xi) (3.1) for arbitrary x−i ∈ ∏ j≠i Xj and any strategies xi, x′ i ∈ Xi. If this function exists, it is called the potential of the game Γ, whereas the game proper is referred to as a potential game. Traffic jamming. Suppose that companies I and II, each possessing two trucks, have to deliver some cargo from point A to point B. These points communicate through two roads (see Figure 3.2), and one road allows a two times higher speed than the other. Moreover, assume that the journey time on any road is proportional to the number of trucks moving on it. Figure 3.2 indicates the journey time on each road depending on the number of moving trucks. Therefore, players choose the distribution of their trucks by roads as their strategies. And so, the possible strategies of players are one of the combinations (2, 0), (1, 1), (0, 2). The costs of a player equal the total journey time of both his trucks. Consequently, the payoff matrix is determined by ⎛ ⎜ ⎜⎝ (2, 0) (1, 1) (0, 2) (2, 0) (−8, −8) (−6, −5) (−4, −8) (1, 1) (−5, −6) (−6, −6) (−7, −12) (0, 2) (−8, −4) (−12, −7) (−16, −16) ⎞ ⎟ ⎟⎠ . BA Figure 3.2 Traffic jamming. www.it-ebooks.info 70 MATHEMATICAL GAME THEORY AND APPLICATIONS 2 4 6 1 2 Figure 3.3 Animal foraging. Obviously, the described game admits three pure strategy equilibria. These are strategy profiles, where (a) the trucks of one player move on road 1, whereas the other player chooses different roads for his trucks, and (b) both players select different roads for their trucks. The game in question possesses the potential P = ⎛ ⎜ ⎜⎝ (2, 0) (1, 1) (0, 2) (2, 0) 13 16 13 (1, 1) 16 16 10 (0, 2) 13 10 1 ⎞ ⎟ ⎟⎠ . Animal foraging. Two animals choose one or two areas among three areas for their foraging (see Figure 3.3). These areas provide 2, 4 and 6 units of food, respectively. If both animals visit a same area, they equally share available food. The payoff of each player is the total units of food gained at each area minus the costs to visit this area (we set them equal to 1). Therefore, the strategies of players lie in choosing areas for their foraging: (1), (2), (3), (1, 2), (1, 3), and (2, 3). And the payoff matrix becomes ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝ (1) (2) (3) (1, 2) (1, 3) (2, 3) (1) (0, 0) (1, 3) (1, 5) (0, 3) (0, 5) (1, 8) (2) (3, 1) (1, 1) (3, 5) (1, 2) (3, 6) (1, 6) (3) (5, 1) (5, 3) (2, 2) (5, 4) (2, 3) (2, 5) (1, 2) (3, 0) (2, 1) (4, 5) (1, 1) (3, 5) (2, 6) (1, 3) (5, 0) (6, 3) (3, 2) (5, 3) (2, 2) (3, 5) (2, 3) (8, 1) (6, 1) (5, 2) (6, 2) (5, 3) (3, 3) ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠ . www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 71 This game has three pure strategy equilibria. In the first one, both animals choose areas 2 and 3. In the second and third pure strategy equilibria, one player selects areas 1 and 3, while the other chooses areas 2 and 3. The game under consideration also admits the potential P = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝ (1) (2) (3) (1, 2) (1, 3) (2, 3) (1) 1 4 6 4 6 9 (2) 4 4 8 5 9 9 (3) 6 8 7 9 8 10 (1, 2) 4 5 9 5 9 10 (1, 3) 6 9 8 9 8 11 (2,3)9910101111 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠ . Theorem 3.3 Let an n-player game Γ=< N,{Xi}i∈N,{Hi}i∈N > have a potential P. Then a Nash equilibrium in the game Γ represents a Nash equilibrium in the game Γ′ =< N,{Xi}i∈N, P >, and vice versa. Furthermore, the game Γ admits at least one pure strategy equilibrium. Proof: The first assertion follows from the definition of a potential. Indeed, due to (3.1), the conditions Hi(x∗ −i, xi) ≤ Hi(x∗), ∀xi, and P(x∗ −i, xi) ≤ P(x∗), ∀xi do coincide. Hence, if x∗ is a Nash equilibrium in the game Γ, it forms a Nash equilibrium in the game Γ′, and vice versa. Now, we argue that the game Γ′ always has a pure strategy equilibrium. Let x∗ be the pure strategy profile maximizing the potential P(x)ontheset n∏ i=1 Xi. For any x ∈ n∏ i=1 Xi,the inequality P(x) ≤ P(x∗) holds true at this point, particularly, P(x∗ −i, xi) ≤ P(x∗), ∀xi. Therefore, x∗ represents a Nash equilibrium in the game Γ′ and, hence, in the game Γ. And so, if the game admits a potential, it necessarily has a pure strategy equilibrium. For instance, revert to the examples of traffic jamming and animal foraging. The Cournot oligopoly. In the previous section, we have considered the Cournot oligopoly with the payoff functions Hi(x) = (p − b n∑ j=1 xj)xi − cixi, i = 1, … , n. www.it-ebooks.info 72 MATHEMATICAL GAME THEORY AND APPLICATIONS This game is potential, as well. Here potential makes the function P(x1, … , xn) = n∑ j=1 (p − cj)xj − b ( n∑ j=1 x2 j + ∑ 1≤i, where N = {1, … , n} stands for the set of players, and M = {1, … , m} means the set of some objects for strategy formation. A strategy of player i is the choice of a certain subset from M. The set of all feasible strategies makes the strategy set of player i, denoted by Si,i= 1, … , n. Each object j ∈ M is associated with a function cj(k), 1 ≤ k ≤ n, which represents the payoff (or costs) of each player from k players that have selected strategies containing j. This function depends only on the total number k of such players. Imagine that players have chosen strategies s = (s1, … , sn). Each si forms a set of objects from M. Then the payoff function of player i is determined by the total payoff on each object: Hi(s1, … , sn) = ∑ j∈si cj(kj(s1, … , sn)). Here kj(s1, … , sn) gives the number of players whose strategies incorporate object j, i = 1, … , n. www.it-ebooks.info 74 MATHEMATICAL GAME THEORY AND APPLICATIONS Theorem 3.4 A symmetrical congestion game is potential, ergo admits a pure strategy equilibrium. Proof: Consider the function P(s1, … , sn) = ∑ j∈∪i∈Nsi ⎛ ⎜ ⎜⎝ kj(s1,…,sn)∑ k=1 cj(k) ⎞ ⎟ ⎟⎠ and demonstrate that this is a potential of the game. Let us verify the conditions (2.1). On the one part, Hi(s−i, s′ i) − Hi(s−i, si) = ∑ j∈s′i cj(kj(s−i, s′ i)) − ∑ j∈si cj(kj(s−i, si)). For all j ∈ si ∩ s′ i, the payoffs cj in the first and second sums are identical. Therefore, Hi(s−i, s′ i) − Hi(s−i, si) = ∑ j∈s′i⧵si cj(kj(s) + 1) − ∑ j∈si⧵s′i cj(kj(s)). Accordingly, we find P(s−i, s′ i) − P(s−i, si) = ∑ j∈∪l≠isl∪s′i ⎛ ⎜ ⎜⎝ kj(s−i,s′ i)∑ k=1 cj(k) ⎞ ⎟ ⎟⎠ − ∑ j∈∪l∈Nsl ⎛ ⎜ ⎜⎝ kj(s−i,si)∑ k=1 cj(k) ⎞ ⎟ ⎟⎠ . Under j ∉ si ∪ s′ i, the corresponding summands in these expressions coincide, which means that P(s−i, s′ i) − P(s−i, si) = ∑ j∈si∪s′i ⎛ ⎜ ⎜⎝ kj(s−i,s′ i)∑ k=1 cj(k) − kj(s−i,si)∑ k=1 cj(k) ⎞ ⎟ ⎟⎠ = ∑ j∈s′i⧵si ⎛ ⎜ ⎜⎝ kj(s−i,s′ i)∑ k=1 cj(k) − kj(s−i,si)∑ k=1 cj(k) ⎞ ⎟ ⎟⎠ + ∑ j∈si⧵s′i ⎛ ⎜ ⎜⎝ kj(s−i,s′ i)∑ k=1 cj(k) − kj(s−i,si)∑ k=1 cj(k) ⎞ ⎟ ⎟⎠ . In the case of j ∈ s′ i ⧵ si,wehavekj(s−i, s′ i) = kj(s) + 1; if j ∈ si ⧵ s′ i, the equality kj(s−i, s′ i) = kj(s) − 1 takes place. Consequently, P(s−1, s′ i) − P(s−1, si) = ∑ j∈s′i⧵si ⎛ ⎜ ⎜⎝ kj(s)+1∑ k=1 cj(k) − kj(s)∑ k=1 cj(k) ⎞ ⎟ ⎟⎠ + ∑ j∈si⧵s′i ⎛ ⎜ ⎜⎝ kj(s)−1∑ k=1 cj(k) − kj(s)∑ k=1 cj(k) ⎞ ⎟ ⎟⎠ = ∑ j∈s′i⧵si cj(kj(s) + 1) − ∑ j∈si⧵s′i cj(kj(s). www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 75 This result matches the expression of Hi(s−i, s′ i) − Hi(s−i, si). The proof of Theorem 3.4 is finished. Thus, symmetrical congestion games admit (at least) one pure strategy equilibrium. Generally speaking, a mixed strategy equilibrium may exist, as well. In applications, the major role belongs to the existence of pure strategy equilibria. We also acknowledge that the existence of such equilibria is connected with (a) the additive form of payoff functions and (b) the homogeneous form of players’ payoffs in symmetrical games. To continue, let us explore player-specific congestion games, where different players may have different payoffs. Our analysis focuses on the case of simple strategies—each player chooses merely one object from the set M = {1, … , m}. 3.5 Player-specific congestion games Definition 3.4 A player-specific congestion game is an n-player game Γ=< N, M,{cij}i∈N,j∈M >, where N = {1, … , n} designates the set of players and M = {1, … , m} specifies the finite set of objects. The strategy of player i is choosing some object from M. Therefore, M can be interpreted as the set of players’ strategies. The payoff of player i selecting strategy j is defined by a function cij = cij(kj), where kj denotes the number of players employing strategy j, 0 ≤ kj ≤ n. For the time being, suppose that cij represent non-increasing functions. In other words, the more players have chosen a given strategy, the smaller is the payoff. Denote by s = (s1, … , sn) the strategy profile composed of strategies selected by players. Each strategy profile s corresponds to the congestion vector k = (k1, … , km), where kj makes up the number of players choosing strategy j. Then the payoff function of player i is defined by Hi(s1, … , sn) = cisi (ksi ), i = 1, … , n. We are concerned with pure strategy profiles. For such games, the definition of a pure strategy Nash equilibrium can be reformulated as follows. In an equilibrium s∗, any player i does not benefit by deviating from the optimal strategy s∗ i . And so, the optimal strategy payoff cis∗i (ks∗i ) is not smaller than the one ensured by any other strategy j of all players, i.e., cij(kj + 1). Definition 3.5 A Nash equilibrium in a game Γ=< N, M,{cij}i∈N,j∈M > is a strategy profile s∗ = (s∗ 1, … , s∗ n), where the following conditions hold true for any player i ∈ N: cis∗i (ks∗i ) ≥ cij(kj + 1), ∀j ∈ M. (5.1) We provide another constructive conception proposed by Monderer and Shapley [1996]. It will serve to establish equilibrium existence in congestion games. Definition 3.6 Suppose that a game Γ=< N, M,{cij}i∈N,j∈M > has a sequence of strategy profiles s(t), t = 0, 1, … , where (a) each profile differs from the preceding one in a single www.it-ebooks.info 76 MATHEMATICAL GAME THEORY AND APPLICATIONS component and (b) the payoff of a player that has modified his strategy is strictly higher. Then such sequence is called an improvement sequence. If any improvement sequence in Γ is finite, we say that this game meets the final improvement property (FIP). Clearly, if an improvement sequence is finite, then the terminal strategy profile represents a Nash equilibrium (it meets the conditions (4.1)). However, there exist games with Nash equilibria, which do not enjoy the FIP. In such games, improvement sequences can be infinite and have cyclic repetitions. This fact follows from finiteness of strategy sets. Nevertheless, games Γ with two-element strategy sets M demonstrate the FIP. Theorem 3.5 A player-specific congestion game Γ=< N, M,{cij}i∈N,j∈M >, where M = {1, 2}, admits a pure strategy Nash equilibrium. Proof: We show that a congestion game with two strategies possesses the FIP. Suppose that this is not the case. In other words, there exists an infinite improvement sequence s(0), s(1), …. Extract its cyclic subsequence s(0), … , s(T), i.e., s(0) = s(T) and T > 1. In this chain, each strategy profile s(t), t = 1, … , T corresponds to a congestion vector k(t) = (k1(t), k2(t)), t = 0, … , T. Obviously, k2 = n − k1. Find the element with the maximal value of k2(t). Without loss of generality, we believe that such element is k2(1). Otherwise, just renumber the elements of the sequence owing to its cyclic character. Then k1(1) = n − k2(1) makes the minimal element in the chain. And so, at the initial instant player i switches from strategy 1 to strategy 2, i.e., ci2(k2(1)) > ci1(k1(1) + 1). (5.2) Since k2(1) ≥ k2(t), ∀t, the monotonous property of the payoff function implies that ci2(k2(1)) ≤ ci2(k2(t)), t = 0, … , T. On the other hand, ci1(k1(1) + 1) ≥ ci1(k1(t) + 1), t = 0, … , T. In combination with (5.2), this leads to the inequality ci2(k2(t)) > ci1(k1(t) + 1), t = 0, … T, i.e., player i strictly follows strategy 2. But at the initial instant t = 0, ergo at the instant t = T, he applied strategy 1. The resulting contradiction indicates that a congestion game with two strategies neces- sarily enjoys the FIP. Consequently, such a game has a pure strategy equilibrium profile. We emphasize a relevant aspect. The proof of Theorem 3.5 is based on the maximal congestion of strategy 1 corresponding to the minimal congestion of strategy 2. Generally www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 77 speaking, this fails even for games with three strategies. Congestion games with three and more strategies may disagree with the FIP. A congestion game without the FIP. Consider a two-player congestion game with two strategies and the payoff matrix ⎛ ⎜ ⎜ ⎜⎝ (0, 4) (5, 6) (5, 3) (4, 5) (3, 1) (4, 3) (2, 5) (2, 6) (1, 2) ⎞ ⎟ ⎟ ⎟⎠ This game has the infinite cyclic improvement sequence (1, 1) → (3, 1) → (3, 2) → (2, 2) → (2, 3) → (1, 3) → (1, 1). Therefore, it does not satisfy the FIP. Still, there exist two (!) pure strategy equilibrium profiles: (1, 2) and (2, 1). To establish equilibrium existence in the class of pure strategies in the general case, we introduce a stronger solution improvement condition. Notably, assume that each player in an improvement sequence chooses the best response under a given strategy profile (ensuring his maximal payoff). If there are several best responses, a player selects one of them. Such improvement sequence will be called a best response sequence. Definition 3.7 Suppose that a game Γ=< N, M,{cij}i∈N,j∈M > admits a sequence of strategy profiles s(t), t = 0, 1, … such that (a) each profile differs from the preceding one in a single component and (b) the payoff of a player that has modified his strategy is strictly higher, gaining the maximal payoff to him in this strategy profile. Such sequence is called a best response sequence. If any improvement sequence in the game Γ appears finite, we say that this game meets the final best-reply property (FBRP). Evidently, any best response sequence forms an improvement sequence. The opposite statement is false. Now, we prove the basic result. Theorem 3.6 A player-specific congestion game Γ=< N, M,{cij}i∈N,j∈M > has a pure strategy Nash equilibrium. Proof: Apply induction by the number of players. For n = 1, the assertion becomes trivial, player chooses the best strategy from M. Hypothesize this result for player n − 1 and prove it for player n. Consider an n-player game Γ=< N, M,{cij}i∈N,j∈M >. First, eliminate player n from further analysis. In the reduced game Γ′ with n − 1 players and m strategies, the induction hypothesis implies that there exists an equilibrium s′ = (s1(0), … , sn−1(0)). Denote by k′(0) = (k′ 1(0), … , k′ m(0)) the corresponding congestion vector. Then cisi(0)(k′ si(0)) ≥ cij(k′ j + 1), ∀j ∈ M, i = 1, … , n − 1. Revert to the game Γ and let player n choose the best response (the strategy j(0) = sn(0)). Consequently, just one component varies in the congestion vector k′ = (k′ 1, … , k′ m) (component j(0) increases by unity). Now, construct the best response sequence. The initial term of the sequence takes the form s(0) = (s1(0), … , sn−1(0), sn(0)). The corresponding congestion vector will be designated www.it-ebooks.info 78 MATHEMATICAL GAME THEORY AND APPLICATIONS by k(0) = (k1(0), … , km(0)). In the strategy profile s(0), payoffs possibly decrease only for players having the strategy j(0); the rest of the players obtain the same payoff and do not benefit by modifying their strategy. Suppose that player i1 (actually applying the strategy j(0)) can guarantee a higher payoff by another strategy. If such a player does not exist, an equilibrium in the game Γ is achieved. Select his best response j(1) and denote by s(1) the new strategy profile. In the corresponding congestion vector k(1), component j(0) (component j(1)) decreases (increases, respectively) by unity. Under the new strategy profile s(1), payoffs can be improved only by players adhering to the strategy j(1). The rest players gain the same payoffs (in comparison with the original strategy profile). Assume that player i2 can improve his payoff. Choose his best response j(2), and continue the procedure. Therefore, we have built the best response sequence s(t). It corresponds to a sequence of congestion vectors k(t), where any component kj(t) either equals the value at the initial instant k′ j (0), or exceeds it by unity. The last situation occurs, if at the instant t − 1 player it switches to the strategy j(t). In other cases, the number of players employing the strategy j ≠ j(t) constitutes k′ j (0). Interestingly, each player can switch to another strategy just once. Indeed, imagine that at the instant t − 1 player it switches to the strategy j; then at the instant t this strategy is adopted by the maximal number of players. Hence, at the subsequent instants the number of players with such strategy remains the same or even goes down (accordingly, the number of players choosing other strategies appears the same or goes up). Due to the monotonicity of payoff functions, player it is unable to get higher payoff. The game involves a finite number of players; hence, the resulting best response sequence is finite: s(t), t = 1, 2, … , T, where T ≤ n. There may exist several sequences of this form. Among them, take the one s(t), t = 1, … , T with the maximal value of T. Finally, demonstrate that the last strategy profile s(T) = (s1(T), … , sn(T)) is a Nash equilibrium. We have mentioned that players deviating from their original strategies would not increase their payoffs in the strategy profile s(T). And so, consider players preserving their strategies during the period of T. Suppose that, among them, there is a player belonging to the group with the strategy j(T); if he improves his payoff, we would extend the best response sequence to the instant T + 1. However, this contradicts the maximality of T. Assume that, among them, there is a player belonging to the group with a strategy j ≠ j(T). The number of players in this group is the same as at the initial instant (see the discussion above). And this player is unable to increase his payoff, as well. Therefore, we have argued that, under the strategy profile s(T) = (s1(T), … , sn(T)), any player i = 1, … , n meets the conditions cisi(T)(ksi(T)(T)) ≥ cij(kj(T) + 1), ∀j ∈ M. Consequently, s(T) represents a Nash equilibrium for n players. The proof of Theorem 3.6 is finished. 3.6 Auctions We analyze non-cooperative n-player games in the class of mixed strategies. The present section deals with models of auctions. For simplicity, consider the symmetrical case when all www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 79 n players are in identical conditions. An auction offers for sale some item possessing a same value V for all players. Players simultaneously bid for the item (suggest prices (x1, … , xn), respectively). The item passes to a player announcing the highest price. As a matter of fact, there are different schemes of auctions. We will study first-price auctions and second-price auctions. First-price auction. Imagine the following auction rules. A winner (a player suggesting the maximal price) gets the item and actually pays nothing. The rest players have to pay the price they have announced (for participation). If several players bid the maximal price, they equally share the payoff. And so, the payoff function of this game acquires the form Hi(x1, … , xn) = ⎧ ⎪ ⎨ ⎪⎩ −xi,ifxi < y−i, V mi(x) − xi,ifxi = y−i, V,ifxi > y−i, (6.1) where y−i =max j≠i {xj} and mi(x) is the number of players whose bids coincide with xi, i = 1, … , n. Obviously, the game admits no pure strategy equilibrium, and we search in the class of mixed strategies. By virtue of symmetry, consider player 1 only. Suppose that players {2, … , n} apply a same mixed strategy with a distribution function F(x), x ∈ [0, ∞). The payoff of player 1 depends on the distribution of y−1 =max{x2, … , xn}. Clearly, the distribution of this maximum is simply Fn−1(x) = Fn−1(x). Accordingly, the bid of player 1 turns out maximal with the probability of [F(x)]n−1, and he gains the payoff V. Another player announces a higher price with the probability of 1 − [F(x)]n−1, and player 1 would have to pay x. Under his pure strategy x, player 1 has the payoff function H1(x, n−1⏞⏞⏞ F, … , F) = V[F(x)]n−1 − x ( 1 − [F(x)]n−1 ) = (V + x)[F(x)]n−1 − x. (6.2) The following sufficient condition guarantees that the strategy profile (F(x), … , F(x)) forms an equilibrium: H1(x, n−1⏞⏞⏞ F, … , F) = const or 𝜕H1(x, n−1⏞⏞⏞ F, … , F)∕𝜕x = 0. The last expression brings to the differential equation dFn−1(x) dx = 1 − Fn−1(x) V + x ,0≤ x < ∞ with the boundary condition Fn−1(0) = 0. Here integration yields Fn−1(x) = x V + x . www.it-ebooks.info 80 MATHEMATICAL GAME THEORY AND APPLICATIONS Hence, the optimal mixed strategy is defined by F∗(x) = ( x V + x )1∕(n−1) , while the density function of this distribution becomes f ∗(x) = 1 n − 1 ( x V + x )− n−2 n−1 . Substitute the derived distribution into (6.2) to find H1(x, n−1⏞⏞⏞⏞⏞⏞⏞ F∗, … , F∗) = 0 for any x ≥ 0. Therefore, player 1 receives zero payoff regardless of his mixed strategy. And so, the game has zero value. Theorem 3.7 A first-price auction with the payoff function (6.1) admits the mixed strategy equilibrium F∗(x) = ( x V + x )1∕(n−1) , and the game value is zero. Second-price auction. Here all players pay their announced prices for participation in an auction, while a winner pays merely the second highest price. Such auctions are called Vickrey auctions. If several players make the maximal bid, they share V equally. Therefore, the payoff function takes the form Hi(x1, … , xn) = ⎧ ⎪ ⎨ ⎪⎩ −xi,ifxi < y−i, V mi − xi,ifxi = y−i, V − y−i,ifxi > y−i, (6.3) where y−i =max j≠i {xj} and mi have the same interpretations as in the first-price auction model. Unfortunately, Vickrey auctions admit no pure strategy equilibria. If all bids do not exceed V, one should maximally increase the bid; however, if at least one bid is higher than V,itis necessary to bid zero price. Let us evaluate a mixed strategy equilibrium. Again, symmetry enables considering just player 1. Suppose that players {2, … , n} adopt a same mixed strategy with some distribution function F(x), x ∈ [0, ∞). The payoff of player 1 depends on the distribution of the vari- able y−1 =max{x2, … , xn}. Recall that its distribution is simply Fn−1(x) = Fn−1(x) (see the discussion above). Now, we express the payoff of player 1 under the mixed strategy x: H1(x, n−1⏞⏞⏞ F, … , F) = x ∫ 0 (V − t)dFn−1(t) − ∞ ∫ x xdFn−1(t). www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 81 Since the support of F(x)is[0,∞), the sufficient condition of equilibrium existence (H1(x, n−1⏞⏞⏞ F, … , F) = const or, equivalently, 𝜕H1(x, n−1⏞⏞⏞ F, … , F))∕𝜕x = 0) naturally leads to the differential equation dFn−1(x) dx = 1 − Fn−1(x) V . Its general solution possesses the form Fn−1(x) = 1 − c exp ( − x V ) . So long as F(0) = 0, we find Fn−1(x) = 1 −exp(− x V ). And so, the function F∗(x) is defined by F∗(x) = ( 1 −exp ( − x V )) 1 n−1 . (6.4) Therefore, if players {2, … , n} adhere to the mixed strategy F∗(x), the payoff of player 1 turns out constant: H1(x, n−1⏞⏞⏞⏞⏞⏞⏞ F∗, … , F∗) = 0. Regardless of the strategy selected by player 1, his payoff in this strategy profile equals zero. This means optimality of the strategies F∗(x). Theorem 3.8 Consider a second-price auction with the payoff function (6.3). An equilib- rium consists in the mixed strategies F∗(x) = ( 1 −exp ( − x V )) 1 n−1 . For n = 2, the density function of (6.4) takes the form f ∗(x) = V−1e−x∕V. In the case of n ≥ 3, we obtain f ∗(x) = 1 n − 1 ( 1 − e−x∕V ) 1 n−1 −1 ⋅ 1 V e−x∕V → { +∞,ifx ↓ 0 0, if x ↑ ∞. Despite slight difference in the conditions of the above auctions, the corresponding optimal strategies vary appreciably. In the former case, the matter concerns power functions, whereas the latter case yields an exponential distribution. Surprisingly, both optimal strategies can bring to bids exceeding a given value of V if n = 2. In first-price auctions, this probability makes up 1 − F∗(V) = 1 − (1∕2)−1 = 0.5. In second-price auctions, it is smaller: 1 − F∗(V) = 1 − (1 −exp(−1))1∕(n−1) ≈ 0.368. www.it-ebooks.info 82 MATHEMATICAL GAME THEORY AND APPLICATIONS 3.7 Wars of attrition Actually, there exists another biological interpretation of the game studied in Section 3.5. This model is close to the animal competition model for some resource V, suggested by British biologist M. Smith. Assume that V = V(x), a positive decreasing function of x, represents a certain resource on a given area. Next, n animals (players) struggle for this resource. The game runs on unit interval. Animal i shows its strength for a specific period xi ∈ [0, 1], i = 1, … , n.The resource is captured by the animal with the longest strength period. The costs of players are proportional to their strength periods, and winner’s costs coincide with the period when the last competitor “leaves the battlefield.” We seek for a mixed strategy equilibrium as the distribution functions F(x) = I(0 ≤ x < a) x ∫ 0 h(t)dt + I(a ≤ x ≤ 1), where a is some value from [0, 1] and IA denotes the indicator of event A. Imagine that all players {2, … , n} adopt a same strategy F, while player 1 chooses a pure strategy x ∈ [0, 1]. His expected payoff becomes H1(x, n−1⏞⏞⏞ F, … , F) = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪⎩ x ∫ 0 (V(x) − t)d (F(t))n−1 − x { 1 − (F(x))n−1 } ,if0≤ x < a, a ∫ 0 (V(x) − t) d (F(t))n−1 ,ifa < x ≤ 1, (7.1) where t indicates the instant of leaving the battlefield by second strongest player. Let Q(x) = V(x) (F(x))n−1 ,for0< x < a. (7.2) Under 0 < x < a, formula (7.1) can be rewritten as H1(x, F, … , F) = Q(x) − x ∫ 0 td (F(t))n−1 − x { 1 − Q(x) V(x) } = Q(x) + x ∫ 0 Q(t) V(t)dt − x. (7.3) The condition 𝜕H1 𝜕x = 0 yields the linear differential equation Q′(x) + Q(x) V(x) = 1, Q(0) = 0. (7.4) www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 83 Its solution is determined by Q(x) = e− ∫ (V(x))−1dx [ ∫ e∫ (V(x))−1dxdx + c ] , (7.5) where c designates an arbitrary constant. For instance, set V(x) = x,0≤ x ≤ 1. In this case, we have Q(x) = x ⎡ ⎢ ⎢⎣ x ∫ 0 dt∕t + c ⎤ ⎥ ⎥⎦ = x(−logx + c). The boundary conditions Q(0) = 0 imply that c = 0; hence, Q(x) =−x log x. (7.6) In combination with (7.2), this brings to F(x) = (−logx) 1 n−1 ,0≤ x ≤ a. (7.7) The above function is increasing such that F(0) = 0 and F(a) = ( −loga ) 1 n−1 . The condition F(a) = 1 yields a = 1 − e−1 ≈ 0.63212. For F(x) defined by (7.4), the payoff (7.1)–(7.3) of player 1 equals H1(x, F, … , F) =−x log x + x ∫ 0 (−logt)dt − x = 0 within the interval 0 < x < a. Indeed, the second term in the right-hand side is given by x log x + x (see the expression ∫ (1 +logt)dt =−t log t). In the case of a < x ≤ 1, the function H1(x, F, … , F) decreases in x according to (7.1). Consequently, if the choice of F∗(x) meets (7.4), then H1(F, F∗, … , F∗) ≤ H1(F∗, F∗, … , F∗) = 0, ∀ distribution function F(x). In other words, we finally arrive at the following result. Theorem 3.9 Consider a war of attrition with the resource V(x) = x. A Nash equilibrium is achieved in the class of mixed strategies F∗(x) = I(0 ≤ x ≤ a)(−logx) 1 n−1 + I(a < x ≤ 1), with zero payoff for each player. Here a = 1 − e−1(≈ 0.632). www.it-ebooks.info 84 MATHEMATICAL GAME THEORY AND APPLICATIONS xbca0 1 e1/ 4 ≈ 1.284 1 2 e ≈1.359 e f 3 (x ) f2 (x ) 1 Figure 3.4 The solution under n = 2 and n = 3, V(x) = x. Notation: b = 1 − e−1∕4 ≈ 0.221, c = 1 − e−1∕2 ≈ 0.393, and a ≈ 0.632. For instance, under n = 2, the optimal density function becomes f ∗ 2 (x) = (−logx). In the case of n = 3, we accordingly obtain f ∗ 3 (x) = 1 2x ( −logx )1∕2 → { +∞,ifx ↓ 0 e∕2 ≈ 1.359, if x ↑ a. Their curves are illustrated in Figure 3.4. Interestingly, the form of mixed strategies changes drastically. If n = 2, with higher probability one should struggle for the resource as long as possible. As the number of opponents grows, with higher probability one should immediately leave the battlefield. Similar argumentation serves to establish a more general result. Theorem 3.10 For V (x) = 1 k x,(0< k ≤ 1), a Nash equilibrium is achieved in the class of mixed strategies F∗(x) = [( k∕k ) { (x)k−1 − 1 }] 1 n−1 ,0≤ x < a, where a stands for the unique root of the equation −k log a =−logk within the interval (0, 1). Furthermore, each player has zero optimal payoff. Note that limk→1−0 ( x )k−1 − 1 k =−logx and, hence, limk→1−0 F∗(x) = (−logx) 1 n−1 . www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 85 3.8 Duels, truels, and other shooting accuracy contests Consider shooting accuracy contests involving n players. It is required to hit some target (in the special case, an opponent). Each player has one bullet and can shoot at any instant from the interval [0, 1]. Starting at the instant t = 0, he moves to the target and can reach it at the instant t = 1; the player must shoot at the target at some instant. Let A(t) be the probability of target hitting provided that shooting occurs at instant t ∈ [0, 1]. We believe that the function A(t) is differentiable, A′(t) > 0, A(0) = 0 and A(1) = 1. The payoff of a player makes up 1, if he successfully hits the target earlier than the opponents (and 0, otherwise). The payoff of several players simultaneously hitting the target equals 0. Each player strives for a strategy maximizing the mathematical expectation of target hitting. The following assumption seems natural owing to problem symmetry. All optimal strate- gies of players do coincide in an equilibrium. Suppose that all players choose the same mixed strategies with a distribution function F(t) and density function f(t), a ≤ t ≤ 1, where a ∈ [0, 1] is a parameter. If player 1 shoots at instant x and other players apply mixed strategies F(t), his expected payoff becomes H1(x, n−1⏞⏞⏞ F, … , F) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ A(x), if 0 ≤ x < a, A(x) ⎡ ⎢ ⎢⎣ 1 − x ∫ a A(t)f(t)dt ⎤ ⎥ ⎥⎦ n−1 ,ifa ≤ x ≤ 1. (8.1) Really, under a ≤ x ≤ 1, player 1 obtains the payoff of 1 only if all opponents 2 ∼ n did not shoot or shot before the instant x but missed the target. Let v be the optimal payoff common for all players. Then the sufficient condition of an equilibrium takes the form H1(x, F, … , F) { ≤ = } v,for { 0 ≤ x < a a ≤ x ≤ 1 } . (8.2) In the case of a ≤ x ≤ 1, apply the first-order necessary optimality conditions to (8.1) to obtain the differential equation f ′(x) f(x) =−2n − 1 n − 1 [ A′(x) A(x) − A′′(x) A′(x) ] . (8.3) Integration from a to x yields f(x) f(a) = A′(x) A′(a) ( A(x) A(a) )− 2n−1 n−1 , (8.4) whence it appears that f(x) = c (A(x))− 2n−1 n−1 A′(x). (8.5) www.it-ebooks.info 86 MATHEMATICAL GAME THEORY AND APPLICATIONS The condition 1 ∫ a f(t)dt = 1gives c−1 = 1 ∫ a (A(x))− 2n−1 n−1 A′(x)dx = (n − 1 n )[ (A(a))− n n−1 − 1 ] . (8.6) The condition (8.2) on the interval a ≤ x ≤ 1 requires that A(x) ⎡ ⎢ ⎢⎣ 1 − x ∫ a A(t)f(t)dt ⎤ ⎥ ⎥⎦ n−1 ≡ v. After some simplifications, this result and formula (8.3) bring to the equality c(n − 1) [ (A(a))− 1 n−1 − (A(x))− 1 n−1 ] = 1 − v 1 n−1 (A(x))− 1 n−1 , ∀x ∈ [a,1]. (8.7) Eliminate c according to (8.4) to derive the equality (A(a))− 1 n−1 − (A(x))− 1 n−1 = 1 n ⎡ ⎢ ⎢⎣ 1 − ( v A(x) ) 1 n−1 ⎤ ⎥ ⎥⎦ [ (A(a))− n n−1 − 1 ] , ∀x ∈ (a,1). (8.8) Hence, the following expressions must be valid: (A(a))− n n−1 − n (A(a))− 1 n−1 − 1 = 0 and v 1 n−1 [ (A(a))− n n−1 − 1 ] = n. (8.9) These equations yield v− 1 n−1 = (A(a))− 1 n−1 , and v = A(a). Moreover, by multiplying both sides of the first equation in (8.9) by (A(a)) n n−1 ,wearrive at the equation (A(a)) n n−1 + nA(a) − 1 = 0. (8.10) And finally, it suffices to establish the condition H1(x, F, … , F) ≤ v, ∀x ∈ [0, a]. It holds true, since A(x) ≤ A(a) = v, ∀x ∈ [0, a) due to the above assumptions. The stated reasoning immediately generates Theorem 3.11 Let 𝛼n be a unique root of the equation 𝛼 n n−1 + n𝛼 − 1 = 0 (8.11) within the interval [0, 1]. www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 87 Then the game admits the mixed strategy Nash equilibrium f ∗(x) = 1 n − 1 (𝛼n ) 1 n−1 (A(x))− 2n−1 n−1 A′(x), for A−1(𝛼n) = an ≤ x ≤ 1. (8.12) In the equilibrium, the optimal payoffs of players constitute 𝛼n. Readers can make a series of observations. First, the optimal payoff of players (𝛼n)is independent from the shooting accuracy function A(t). Second, the initial point of the optimal strategy support a depends on A(t). Furthermore, formula (8.12) implies that the probability of draw (i.e., all players gain nothing) becomes (𝛼n ) n n−1 . In the case of n = 2 (a duel), the expected payoff equals 𝛼n = √ 2 − 1 ≈ 0.414. In a truel (n = 3), this quantity is 𝛼n ≈ 0.283. The interval of the distribution support depends on the form of the shooting accuracy function. Example 3.1 Select A(x) = x𝛾, 𝛾>0. Then an = A−1(𝛼n) = 𝛼1∕𝛾 n , and the optimal strategy possesses the density function f ∗(x) = 𝛾 n − 1(𝛼n) 1 n−1 x− ( n n−1 𝛾+1 ) ,for𝛼1∕𝛾 n ≤ x ≤ 1. If 𝛾 = 1 and n = 2 (a duel), we have an = 𝛼n = √ 2 − 1. In other words, players should shoot after the instant of 0.414. For any n ≥ 2, the quantity an increases if the parameter 𝛾 does. This fact agrees with the following intuitive expectation. The lower is the shooting accuracy of a player, the later he should shoot. Example 3.2 Now, choose A(x) = ex − 1 e − 1 . Consequently, an = A−1(𝛼n) =log { 1 + (e − 1)𝛼n } . Hence, an decreases if n goes up. In the case of a duel (n = 2), an =log { ( √ 2 − 1)(e + √ 2) } ≈ 0.537. For a truel (n = 3), we obtain an ≈ 0.396. The optimal strategies are defined by the density function f ∗(x) = 1 n − 1 (𝛼n ) 1 n−1 (e − 1)−1(ex − 1)− 2n−1 n−1 ex,foran ≤ x ≤ 1. www.it-ebooks.info 88 MATHEMATICAL GAME THEORY AND APPLICATIONS 3.9 Prediction games Imagine that n players endeavor to predict the value u of a random variable U which has the uniform distribution U[0,1] on the interval [0, 1]. The game is organized as follows. A winner is a player who predicts a value closest to u (but not exceeding the latter). His payoff makes up 1, whereas the rest n − 1 players benefit nothing. Each player strives for maximizing his expected payoff. We search for an equilibrium in the form of distributions whose support belongs to some interval [0, a], a ≤ 1. Notably, let G(x) = I(x < a) x ∫ 0 g(t)dt + I(x ≥ a). Suppose that player 1 predicts x and his opponents choose the mixed strategies with the distribution function G(t) and density function g(t). Then the expected payoff of player 1 is defined by H1(x, n−1⏞⏞⏞⏞⏞ G, … , G) = x,ifa < x < 1. (9.1) According to the conditions, for 0 < x < a we have H1(x, G, … , G) = (G(x))n−1 x + n−1∑ k=1 ( n − 1 k ) k (G(x))n−1−k a ∫ x g(t) ( G(t) )k−1 (t − x)dt, (9.2) since k players (1 ≤ k ≤ n − 1) can predict higher values than x, and the rest n − 1 − k players predict smaller values than x. The density function of the random variable min(X1, … , Xk) takes the form k(G(t))k−1g(t). Partial integration yields the equality a ∫ x (t − x) ( G(t) )k−1 g(t)dt = 1 k a ∫ x ( G(t) )k dt. And so, we rewrite (9.2) as H1(x, n−1⏞⏞⏞⏞⏞ G, … , G) = (G(x))n−1 x + n−1∑ k=1 ( n − 1 k ) (G(x))n−1−k a ∫ x ( G(t) )k dt, (9.3) www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 89 provided that 0 < x < a. Denote by v the optimal expected payoff of each player. We address the mixed equilibrium condition for G(x): H1(x, G, … , G) { ≡ ≤ } v,for { 0 ≤ x < a a < x ≤ 1 } . (9.4) Using (9.3)–(9.4), transform the equation 𝜕 𝜕x H1(x, g, … , g) = 0ontheinterval0≤ x < a. Divide both sides of the equation by (G(x))n−1 and perform some simplifying operations to get the equation 1 + n−1∑ k=1 ( n − 1 k )( G(x) G(x) )k = g(x) G(x) ⎡ ⎢ ⎢⎣ (n − 1)x + n−1∑ k=1 ( n − 1 k ) (n − 1 − k) a ∫ x ( G(t) G(x) )k dt ⎤ ⎥ ⎥⎦ . (9.5) The left-hand side of (9.5) equals [G(x)]−(n−1), whereas its right-hand counterpart can be reexpressed by g(x) G(x) ⋅ (n − 1) ⎡ ⎢ ⎢⎣ a + a ∫ x { 1 + G(t) G(x) }n−2 dt ⎤ ⎥ ⎥⎦ . Therefore, we rewrite (9.5) as a[G(x)]n−2 + a ∫ x (G(x) + G(t))n−2dt = [(n − 1)g(x)]−1,0< x < a, ∀n ≥ 2. (9.6) Undoubtedly, g(x), G(x) and a depend on n. For compact notation, we omit the subscript n. Consider the sequence of functions sk(x) = ⎡ ⎢ ⎢⎣ a[G(x)]k + a ∫ x (G(x) + G(t))kdt ⎤ ⎥ ⎥⎦ / x, ∀k = 1, 2, … , n − 2. (9.7) Obviously, the following inequalities hold true: 1 ≡ s0(x) ≥ s1(x) ≥ s2(x) ≥ ⋯ ≥ sn−2(x) ≥ 0, ∀x ∈ [0, a]. (9.8) Multiply both sides of (9.7) by x and perform differentiation. Such manipulations yield the recurrent differential equations xs′ k(x) − sk(x) = kg(x)xsk−1(x) − 1, www.it-ebooks.info 90 MATHEMATICAL GAME THEORY AND APPLICATIONS or, equivalently, s′ k(x) + ( 1 − sk(x) ) ∕x = kg(x)sk−1(x), ∀k = 1, 2, … , n − 2, (9.9) with the boundary conditions sk(a) = 1, ∀k = 1, 2, … , n − 2. Formulas (9.6)–(9.7) imply that sn−2(x) = [(n − 1)xg(x)]−1, (9.10) which is equivalent to g(x) = [ (n − 1)xsn−2(x) ]−1 ≥ [ (n − 1)x ]−1 (from (9.8)). The mean value of this distribution is defined by a ∫ 0 xg(x)dx = a ∫ 0 xdx (n − 1)xsn−2(x) . (9.11) Theorem 3.12 Let {s1, … , sn−2} be the solution to the system of differential equations (9.9) and g(x) = 1 (n−1)(1−x)sn−2(x) . Choose a according to the condition a ∫ 0 g(x)dx = 1. Then g(x) gives the optimal mixed strategy in the prediction game. The system (9.9) and formula (9.10) can serve to solve the problem. We describe the corresponding solution algorithm. First, fix the initial value of the parameter a and consider the system of differential equations (9.9) on the interval [0, a]. As soon as the solution with the boundary condition sk(a) = 1, k = 1, … , n − 2 is found, define the density function g(x) = [(n − 1)sn−2(x)(1 − x)]−1, x ∈ [0, a]. Next, evaluate a from the condition a ∫ 0 g(x) = 1. The case of n = 2. It appears from (9.1)–(9.3) that H1(x, G) = ⎧ ⎪ ⎨ ⎪⎩ G(x)x + a ∫ x (t − x)g(t)dt,for0< x < a x,fora < x < 1. www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 91 Under 0 < x < a, equation (9.6) yields g(x) = 1∕x, whence it follows that G(x) =−logx, a = 1 − e−1 ≈ 0.632. If a < x < 1, we obtain H1(x, g∗) = x ≤ a = H1(a, g∗); hence, the condition (9.4) is satisfied. The total value of the game constitutes e−1 ≈ 0.367. The case of n = 3. H1(x, G, G) = ⎧ ⎪ ⎨ ⎪⎩ (G(x))2 x + 2G(x) a ∫ x (t − x)g(t)dt + 2 a ∫ x (t − x)G(t)g(t)dt,if0< x < a x,ifa < x < 1. (9.12) Under n = 3, equation (9.9) leads to the differential equation s′ 1(x) + ( 1 − s1(x) ) ∕x = g(x)s0(x) = g(x), and s1(a) = 1. (9.13) After some simplifications, formula (2.7) yields xs1(x) = aG(x) + a ∫ x ( G(x) + G(t) ) dt = 1 2g(x) (from (9.6) under n = 3). (9.14) By eliminating g(x) from (9.13)–(9.14), we arrive at the differential equation s1s′ 1 s2 1 − s1 + 1 2 = 1 x ,0< x < a, s1(a) = 1. (9.15) The function g(x) = ( 2s1(x)x )−1 is positive and continuous; it represents a density function if a ∫ 0 g(x)dx = 1. And so, 1 = a ∫ 0 g(x)dx = a ∫ 0 { s′ 1(x) + 1 − s1(x) 1 − x } dx = 1 − s1(0) + a ∫ 0 1 − s1(x) 1 − x dx, leading to s1(0) = a ∫ 0 1 − s1(x) 1 − x dx = 1 ∫ s1(0) s1s1 s2 1 − s1 + 1 2 ds1 (from (2.15)) =−1 + s1(0) + 𝜋 4 − tan−1(2s1(0) − 1) ( since ∫ ds1 s2 1 − s1 + r2 = 2tan−12x ) . www.it-ebooks.info 92 MATHEMATICAL GAME THEORY AND APPLICATIONS Therefore, s1(0) = 1 2 { 1 − tan ( 1 − 𝜋 4 )} ≈ 0.391. (9.16) Perform integration in both sides of (9.15) from x to a to obtain the control law ( s2 1 − s1 + 1 2 ) 1 2 etan−1(2s1−1) = 1√ 2 e𝜋∕4a∕x. (9.17) Here, substitution of x = 0 and s1(0) ≈ 0.391 from (2.16) yields a = 1 − { 2 ( s1(0) )2 − 2s1(0) + 1 }1∕2 e−1 ≈ 0.7156. (9.18) By virtue of (9.12), the condition (9.4) holds true with v = a = 0.284. The corresponding solutions under n = 2 and n = 3areshowninFigure3.5. The case of n = 4. If n = 4, formulas (9.1)–(9.3) bring to H1(x, G, G, G, G) = ⎧ ⎪ ⎨ ⎪⎩ (G(x))3 x + ∑3 k=1 ( 3 k ) (G(x))3−k a ∫ x ( G(t) )k dt,0< x < a x, a < x < 1. (9.19) 0.3910 1/2 1 1.279 e ≈ 2.718 a2 ≈ 0.632 a3 ≈ 0.716 x 1.923 s1 (x) s0 (x) g3 (x) = (2xs1(x))−1 g2 (x) = 1/x Figure 3.5 The solutions under n = 2 and n = 3. www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 93 The system (9.9)–(9.10) acquires the form ⎧ ⎪ ⎨ ⎪⎩ s′ 1(x) + ( 1 − s1(x) ) ∕x = g(x)s0(x) = g(x), s1(0) = 1 s′ 2(x) + ( 1 − s2(x) ) ∕x = 2g(x)s1(x), s2(0) = 1 s2(x) = ( 3g(x)x )−1 . (9.20) The density function is defined by g(x) = ( 3xs2(x) )−1, and we can choose a such that 1 = a ∫ 0 dx 3xs2(x) . Since the right-hand side meets the inequality ≥ 2 3 a ∫ 0 dx 2xs1(x) , it is possible to adopt the solution under n = 3. Elimination of g(x) from (9.20) yields the ordinary differential equation { s′ 1(x) + ( 1 − s1(x) ) ∕x = ( 3xs2(x) )−1 s′ 2(x) + ( 1 − s2(x) ) ∕x = 2 3 s1(x)∕ ( xs2(x) ) . (9.21) Computations lead to a ≈ 0.791. The condition (9.4) is valid with v = a ≈ 0.208. Other examples. If n ≥ 5, computations result in the following: n = 2; a ≈ 0.632, v ≈ 0.367, a ∫ 0 xg(x)dx ≈ 0.367 3; 0.715, 0.284, 4; 0.791, 0.208, 5; 0.828, 0.171, 0.425 7; 0.873, 0.126, 0.442 10; 0.908, 0.091, 0.457 Apparently, as n increases, a ↑ 1; at the same time, the optimal payoffs ↓ 0. In an equilibrium, the density functions asymptotically tend to the uniform distributions U[0,1]. Exercises 1. The city transport game. This game involves n players. Each player chooses transport for today’s trip, namely, xi = 0 (private automobile) or xi = 1 (public transport). The payoff of player i depends on the number of other players choosing the same transport as he does. Notably, the payoff function takes the form Hi(x1, … , xn) = { a(t), xi = 1, b(t), xi = 0, where t = 1∕n n∑ j=1 xj, a(t) and b(t) are demonstrated in Figure 3.6. www.it-ebooks.info 94 MATHEMATICAL GAME THEORY AND APPLICATIONS H tt0 t1 a(0) b(0) a(1) b(1) 0 1 Figure 3.6 The payoff function in the city transport game. This figure illustrates the following aspect. If the share of players choosing 1 exceeds t1, city traffic is less intensive, and automobilists feel better than public passengers. However, if the share of automobilists is higher than 1 − t0, city traffic gets intensified such that public transport becomes preferable. Prove that solution to this game lies in a set x∗ = (x∗ 1, … , x∗ n) such that t0 + 1 n ≤ 1 n n∑ j=1 x∗ j ≤ t1 − 1 n . 2. The commune problem. Imagine that n dwellers keep sheep on a farm. Each dweller has qi sheep. Denote by G = q1 + ⋯ qn the total number of sheep. The maximal number of sheep kept by dwellers is Gmax. Each sheep gains some profits v(G) and requires costs c for keeping. Suppose that v(G) > 0 under G < Gmax and v(G) = 0 under G > Gmax. Find the payoff of the commune provided that dwellers keep same numbers of sheep. Construct a Nash equilibrium and show that the number of sheep is higher than in the case of their uniform distribution among dwellers. 3. The environmental protection problem. Three enterprises (players I, II, and III) exploit water resources from a natural reservoir. Each of them chooses between the following pure strategies: building water purification facilities or releasing polluted water. By assumption, water in the reservoir remains usable if just one enterprise releases polluted water. In this case, enterprises incur no costs. However, if at least two enterprises release polluted water in the reservoir, then each player has the costs of 3. The maintenance of water purification facilities requires the costs of 1 from each enterprise. Draw the strategy profile cube and players’ payoffs at corresponding nodes. Find the set of equilibrium strategy profiles by intersection of the sets of acceptable strategy profiles of each player. 4. Bayesian games. A Bayesian game is a game of the form G =< N,{xi}n i=1, Ti, Hi > , where N = 1, 2, … , n is the set of players, and ti ∈ Ti denote the types of players (unknown to their opponents). The game is organized as follows. Nature reports to player i his type, players choose their strategies xi(ti) and receive the payoffs Hi(x1, … , xn, ti). www.it-ebooks.info NON-COOPERATIVE STRATEGIC-FORM n-PLAYER GAMES 95 An equilibrium in a Bayesian game is a set of strategies (x∗ 1(t1), … , x∗ n(tn)) such that for any player i and any type ti the function ∑ t−i Hi(x∗ 1(t1), … , x∗ i−1(ti−1), x, x∗ i+1(ti+1), … , x∗ n(tn), ti)P(t−i|ti), where t−i = (t1, … , ti−1, ti+1, … , tn), attains its maximum at the point x∗ i . Reexpress the environmental protection problem as a Bayesian game provided that the payoff of −4 is supplemented by random variables ti with the uniform distribution on [0, 1]. 5. Auction. This game employs two players bidding for an item at an auction. Player I offers a price b1 and estimates item’s value by v1 ∈ [0, 1]. Player II offers a price b2, and his value of the item is v2 ∈ [0, 1]. The payoff of player i takes the form Hi(b1, b2, v) = ⎧ ⎪ ⎨ ⎪⎩ v − bi, bi > bj, 0, bi < bj, (v − bi)∕2, bi = bj , where i, j = 1, 2, i ≠ j. Evaluate equilibrium prices in this auction. 6. Demonstrate that x∗ represents a Nash equilibrium in an n-player game iff x∗ makes the global maximum point of the function F(x) = n∑ i=1 (Hi(x) −maxyi∈Xi Hi(x−i, yi)) on the set X. 7. Consider the Cournot oligopoly model. Find optimal strategy profiles in the sense of Nash for n players with the payoff functions Hi(x) = xi ( a − b n∑ j=1 xj − cj ) . 8. The traffic jamming game with three players. The game engages three companies each possessing two automobiles. They have to move from point A to point B along one of two roads. Delay on the first (second) road equals 2k (3k, respectively), where k indicates the number of moving automobiles. Evaluate a Nash equilibrium in this game. 9. Prove that the Bertrand oligopoly is a potential game. 10. Find solutions to the duel problem and truel problem in the following case. The target hitting probability is given by A(x) = ln(x+1) ln 2 . www.it-ebooks.info 4 Extensive-form n-player games Introduction Chapters 1–3 have studied normal-form games, where players make their offers at the very beginning of a game and their payoffs are determined accordingly. However, real games evolve in time—players can modify their strategies depending on opponents’ strategy profiles and their own interests. Therefore, we naturally arrive at the concept of dynamic games that vary depending on the behavior of players. Furthermore, a game may incorporate uncertainties occurring by chance. In such positions, a game evolves in a random way. The mentioned factors lead to extensive-form games (also known as positional games). Definition 4.1 An extensive-form game with complete information is a pair Γ=< N, G >, where N = {1, 2, ..., n} indicates the set of players and G = {X, Z} represents a directed graph without cycles (a finite tree) having the initial node x0, the set of nodes (positions) X and Z(x) as the set of nodes directly following node x. Figure 4.1 demonstrates the tree of such a game with the initial state x0. For each player, it is necessary to define the position of his decision making. Definition 4.2 A partition of the position set X into n + 1 non-intersecting subsets X = X1 ∪ X2 ∪ ... ∪ Xn ∪ T is called a partition into the personal position sets of the players. Player i moves in positions belonging to the set Xi,i= 1, ..., n. The set T contains terminal nodes, where the game ends. Terminal nodes x ∈ T satisfy the property Z(x) =∅. The payoffs of all players are specified in terminal positions: H(x) = (H1(x), ..., Hn(x)), x ∈ T. In each position x from the personal position set Xi, player i chooses a node from the set Z(x) = {y1, ..., yk} (referred to as an alternative in the position x), and the game passes to a new position. Sometimes, it appears convenient to identify alternatives with arcs incident to x. Thus, each player has to choose a next position in each set of his personal positions. Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info EXTENSIVE-FORM n-PLAYER GAMES 97 yG x0 Gy Figure 4.1 The tree of an extensive-form game G and a subtree Gy. Definition 4.3 A strategy of player i is a function ui(x) defined on the personal position set Xi,i= 1, ..., n, whose values are alternatives of the position x. A set of all strategies u = (u1, ..., un) is a strategy profile in the game. For each strategy profile, one can uniquely define a corresponding play in an extensive- form game. Indeed, this game begins in the position x0. Suppose that x0 ∈ Xi1 . Hence, player i1 makes his move. Following his strategy ui1 (x0) = x1 ∈ Z(x0), the play passes to the position x1.Next,x1 belongs to the personal position set of some player i2. His strategy ui2 (x1)shifts the play to the position x2 ∈ Z(x1). The play continues until it reaches an end position xk ∈ T. The game ends in a terminal position, and player i obtains the payoff Hi(xk), i = 1, ..., n. Therefore, each strategy profile in an extensive-form game corresponds to a certain payoff of each player. It is possible to comprehend payoffs as functions of players’ strategies, i.e., H = H(u1, ..., un). Now, we introduce the notion of a solution in such games. 4.1 Equilibrium in games with complete information For convenience, set u = (u1, ..., u′ i, ..., un) as a strategy profile, where just one strategy ui is replaced by u′ i. Denote the new strategy by (u−i, u′ i). Definition 4.4 A strategy profile u∗ = (u∗ 1, ..., u∗ n) is a Nash equilibrium, if for each player the following condition holds true: Hi(u∗ −i, ui) ≤ Hi(u∗), ∀ui, i = 1, ..., n. (1.1) Inequalities (1.1) imply that, as player i deviates from an equilibrium, his payoff goes down. We will show that the game Γ may have many equilibria. But one of them is special. To extract it, we present the notion of a subgame of Γ. Definition 4.5 Let y ∈ X. A subgame Γ(y) of the game Γ, which begins in the position y, is a game Γ(y) =< N, Gy >, where the subgraph Gy = {Xy, Z} contains all nodes following y, the personal position sets of players are defined by the intersection Yi = Xi ∩ Xy, i = 1, ..., n, www.it-ebooks.info 98 MATHEMATICAL GAME THEORY AND APPLICATIONS the set of terminal positions is Ty = T ∩ Xy, and the payoff of player i in the subgame is given by Hy i (x) = Hi(x), x ∈ Ty. Figure 4.1 illustrates the subtree corresponding to a subgame with the initial position y. We understand a strategy of player i in a subgame Γ(y) as a restriction of the strategy ui(x)in the game Γ to the set Yy i . Designate such strategies by uy i (x). A set of strategies u = (uy 1, .., uy n) is a strategy profile in the subgame. Each strategy profile in the subgame corresponds to a play in the subgame and the payoff of each player Hy(uy 1, ..., uy n). Definition 4.6 A Nash equilibrium strategy profile u∗ = (u∗ 1, ..., u∗ n) in the game Γ is called a subgame-perfect equilibrium, if for any y ∈ X the strategy profile (u∗)y forms a Nash equilibrium in the subgame Γ(y). Below we argue that any finite game with complete information admits a subgame-perfect equilibrium. For this, separate out all positions preceding terminal positions and denote the resulting set by Z1. Assume that a position x ∈ Z1 belongs to the personal position set of player i. Consider the set of terminal positions Z(x) = {y1, ..., yki } that follow the position x, and select the one maximizing the payoff of player i: Hi(yj) =max{Hi(yi), ..., Hi(yki )}. Subsequently, shift the payoff vector H(yj) to the position yj and make it terminal. Proceed in this way for all positions x ∈ Z1. And the game tree decreases its length by unity. Similarly, extract the set Z2 of preterminal positions followed by positions from Z1.Take a position x ∈ Z2 and suppose that x ∈ Xl (i.e., player l moves in this position). Consider the set of positions Z(x) = {y1, ..., ykl } that follow the position x and separate out the one (e.g., ym) maximizing the payoff of player l. Transfer the payoff vector H(yj) to the position ym and make it terminal. Repeat the procedure T → Z1 → Z2 → ... until the initial state x0 is reached. This surely happens, since the game possesses a finite tree. At each step, this algorithm yields equilibrium strategies in each subgame. In the final analysis, it brings to a subgame-perfect equilibrium. Actually, we have proven Kuhn’s theo- rem. Theorem 4.1 An extensive-form game with complete information possesses a subgame- perfect equilibrium. Figure 4.2 presents an extensive-form two-player game. The set of personal positions of player I is indicated by circles, while boxes mark positions corresponding to player II moves. Therefore, X1 = {x0, x1, ..., x4} and X2 = {y1, y2}. The payoffs of both players are specified in terminal positions T = {t1, ..., t8}. And so, the strategy of player I lies in the vector u = (u0, ..., u4), whereas the strategy of player II is the vector v = (v1, v2). Their components can be rewritten as “l” (left alternative) and “r” (right alternative). For instance, the strategy profiles u = (l,r,l,r,l),v = (r, l) correspond to a play bringing to the terminal node t3 with players’ payoffs H1 = 2, H2 = 4. Figure 4.3 illustrates the backward induction method that leads to the subgame-perfect equilibrium u∗ = (l, r, r, r, r),v∗ = (l, r), where players’ payoffs are H1 = 6, H2 = 3. These strategies yield a Nash equilibrium in any subgame shown by Figure 4.2. At the same time, we underline a relevant aspect. There exist other strategy profiles representing a Nash equilibrium but not a subgame-perfect equilibrium. For instance, consider the following strategies of the players: ̄u = (r,r,r,r,l),̄v = (l, l). This strategy profile corresponds to a play bringing to the terminal node t6 with players’ payoffs H1 = 7, H2 = 1. www.it-ebooks.info EXTENSIVE-FORM n-PLAYER GAMES 99 x0 y1 y2 x4x1 x2 x3 t1 t2 t3 t4 t5 t6 t7 t8 () ()() ()() ()() ()4 1 6 3 2 4 5 1 0 2 7 1 2 0 5 3 * Figure 4.2 An extensive-form game of length 3. Notation: ◦—personal positions of player I; □—personal positions of player II;(∗)—subgame-perfect equilibrium. The strategy profile ̄u, ̄v forms a Nash equilibrium. This is obvious for player I, since he gains the maximal payoff H1 = 7. If player II deviates from his strategy ̄v and chooses alternative “r” in the position y2, the game ends in the terminal position t7 and player II receives the payoff H2 = 0 (smaller than in an equilibrium). Thus, we have argued that ̄u, ̄v is a Nash equilibrium. However, this strategy profile is not a subgame-perfect equilibrium, since it is not a Nash equilibrium in the subgame with initial node x4. In this position, player I moves left and gains 2 (instead of choosing the right alternative and obtain 5). Such situation can be treated as player I pressure on the opponent in order to reach the terminal position t7. 4.2 Indifferent equilibrium A subgame-perfect equilibrium may appear non-unique in an extensive-form game. This happens when the payoffs of some player do coincide in terminal positions. Then his behavior depends on his attitude to the opponent. And the concept of player’s type arises naturally. We will distinguish between benevolent and malevolent attitude of players to each other. For instance, consider an extensive-form game in Figure 4.4. To evaluate a subgame- perfect equilibrium, apply the backward induction method. In the position x1, the payoff of player I is independent from his behavior (in both cases, it equals 6). However, the choice of player I appears important for his opponent: if player I selects “l,” player II gains 1 (in the case of “r,” the latter gets 2). Imagine that player I is benevolent to player II. Consequently, he chooses “r” in the position x1. A similar situation occurs in the position x3. Being benevolent, )( )( )()()()( )( 6 3 6 3 6 3 5 1 7 1 5 3 5 3 ⇒⇒ Figure 4.3 Subgame-perfect equilibrium evaluation by the backward induction method. www.it-ebooks.info 100 MATHEMATICAL GAME THEORY AND APPLICATIONS x1 x0 x2 x3 x4 y2y1 t1 t2 t3 t4 t5 t6 t7 t8 6 1 6 2 * **3 2 1 3 5 2 5 6 2 4 7 3 Figure 4.4 An extensive-form game. Notation: (∗)—an equilibrium for benevolent players; (∗∗)—an equilibrium for malevolent players. player I also chooses the alternative “r.” In the positions x2 and x4, further payoffs of player I do vary. And so, he moves in a common way by maximizing his payoff. The backward induction method finally brings to the subgame-perfect equilibrium ̄u = (l,r,l,r,r),̄v = (l, l) and the payoffs H1 = 6, H2 = 2. Now, suppose that players demonstrate malevolent attitude to each other. In the positions x1 and x3, player I then chooses the alternative “l.” The backward induction method yields the subgame-perfect equilibrium ̄u = (r,l,l,l,r),̄v = (r, r) with players’ payoffs H1 = 7, H2 = 3. Therefore, we have faced a paradoxical situation—the malevolent attitude of players to each other results in higher payoffs of both players (in comparison with their benevolent attitude). No doubt, the opposite situation is possible as well (when benevolence increases the payoff of players). But this example elucidates the non-trivial character of benevolence. As a matter of fact, there exists another approach to avoid ambiguity in a player’s behavior when any continuation of a game yields the same payoffs. Such an approach was proposed by L. Petrosjan [1996]. It utilizes the following idea: in a given position, a player randomizes feasible alternatives with identical probabilities. A corresponding equilibrium is called an “indifferent equilibrium.” Let us pass from the original game Γ to a new game described below. In positions x ∈ X, where player i(x) appears indifferent to feasible alternatives y ∈ Zi(x), he chooses each of them yk ∈ Zi(x), k = 1, ..., |Zi(x)| with an identical probability of 1∕|Zi(x)|. Denote the new game by ̄Γ. Definition 4.7 A Nash equilibrium strategy profile u∗ = (u∗ 1, ..., u∗ n) in the game Γ is an indifferent equilibrium, if it forms a subgame-perfect equilibrium in the game ̄Γ. For instance, evaluate an indifferent equilibrium for the game in Figure 4.4. We employ the backward induction method. In the position x1, player 1 turns out indifferent to both feasible alternatives. Therefore, he chooses any with the probability of 1∕2. Such behavior guarantees the expected payoffs H1 = 6, H2 = 3∕2 to the players. In the position x2, the alternative “l” yields a higher payoff. In the position x3, player I is again indifferent—he selects any of the two feasible alternatives with the probability of 1∕2. And the expected payoffs of the players in this position become H1 = 5, H2 = 4. Finally, the alternative “r” is optimal in the position x4. Now, analyze the subgame Γ(y1). In the position y1, player II moves. The alternative “l” corresponds to the payoff 3∕2, which is smaller than 2 gained by the alternative “r.” Thus, his optimal strategy in the position y1 consists in the alternative “r.” Consider the subgame www.it-ebooks.info EXTENSIVE-FORM n-PLAYER GAMES 101 Γ(y2) and the position y2. The alternative “r” ensures the payoff of 7 to player II, whereas the alternative “l” leads to 4. Hence, the optimal strategy of player II in the position y2 is “r.” And finally, study the subgame Γ(x0). The first move belongs to player I. The alternative “l” yields his payoff of 3, while the alternative “r” gives 5. Evidently, the optimal strategy of player I becomes “r.” Therefore, we have established the indifferent equilibrium u∗ = (r, 1 2 l+ 1 2 r, l, 1 2 l+ 1 2 r, r), ̄v = (r, l) and the corresponding payoffs H1 = 5, H2 = 4. 4.3 Games with incomplete information Some extensive-form games incorporate positions, where a play may evolve randomly. For instance, in parlor games, players first receive cards (in a random way), and a play continues according to the strategies selected by players. Therefore, players do not know for sure the current position of a play. They can merely make certain assumptions on it. In this case, we have the so-called games with incomplete information. A key role here belongs to the concept of an information set. Definition 4.8 An extensive-form game with incomplete information is an n player game Γ=< N, G > on a tree graph G = {X, Z} with an initial node x0 and a set of nodes (positions) X such that 1. There is a given partition of the position set X into n + 2 non-intersecting subsets X = X0 ∪ X1 ∪ ... ∪ Xn ∪ T, where Xi indicates the personal position set of player i, i = 1, ..., n and X0 designates the position set of random moves, the set T contains terminal nodes with defined payoffs of all players H(x) = (H1(x), ..., Hn(x)). 2. There is a given partition of each set Xi, i = 1, ..., n into non-intersecting subsets Xj i, j = 1, ..., Ji (the so-called information sets of player i) with the following property: all nodes entering a same information set have an identical number of alternatives, and none of them follows a certain node from the same information set. Each position x ∈ X0 with random moves has a given probability distribution on the set of alternatives of the node x. For instance, if Z(x) = {y1, ..., yk}, then the probabilities of play transition to the next position, p(y|x), y ∈ Z(x), are defined in the node x. We provide a series of examples to show possible informational partitions and their impact on optimal solution. Example 4.1 A non-cooperative game with complete information. Move 1. Player I chooses between the alternatives “l” and “r.” Move 2. Random move; one of the alternatives, “l” or “r,” is selected equiprobably. Move 3. Being aware of the choice of player I and the random move, player II chooses between the alternatives “l” and “r.” The payoff of player I in the terminal positions makes up H(l,l,l) = 1, H(l,l,r) = -1, H(l,r,l) = 0, H(l,r,r) = 1, H(r,l,l) = -2, H(r,l,r) = 3, H(r,r,l) = 1, H(r,r,r) = -2, and Figure 4.5 demonstrates the corresponding tree of the game. The strategy u of player I possesses two values, “l” and “r.” The strategy v = (v1, v2, v3, v4) of player II has 24 = 16 feasible values. However, to avoid complete enumeration of all strategies, let us simplify the game. In the position y1, y2, y3, and y4, player II optimally chooses the alternative “r,” “l,” “l,” “r,” respectively. www.it-ebooks.info 102 MATHEMATICAL GAME THEORY AND APPLICATIONS -11 11 -2 -20 3 x0 y4y3y2y1 1 2 1_ 2 1_ 2 _ 1 2 _ Figure 4.5 A game with complete information. Therefore, in the initial position x0, player I obtains the payoff 1 2 (−1) + 1 2 0 =−1∕2(by choosing “l”) or 1 2 (−2) + 1 2 (−2) =−2 (by choosing “r”). The resulting equilibrium is u = (l), v = (r,l,l,r) and the game has the value of −1∕2, i.e., it is beneficial to player II. Example 4.2 A non-cooperative game without information. Move 1. Player I chooses between the alternatives “l” and “r.” Move 2. Random move; one of the alternatives, “l” or “r,” is selected equiprobably. Move 3. Being aware of the choice of player I only, player II chooses between the alternatives “l” and “r.” The payoff of player I in the terminal positions turns out the same as in the previous example. Figure 4.6 demonstrates the corresponding tree of the game (dashed line highlights the information set of player II). Again, here the strategy u of player I takes two values, “l” and “r.” This is also the case for player II (the strategy v), since he does not know the current position of the play. The payoff matrix of the game is described by ( lr l 1 2 0 r − 1 2 1 2 ) . -11 11 -2 -20 3 x0 y4y3y2y1 1 2 1_ 2 1_ 2 _ 1 2 _ Figure 4.6 A game without information. www.it-ebooks.info EXTENSIVE-FORM n-PLAYER GAMES 103 -11 11 -2 -20 3 x0 y4y3y2y1 1 2 1_ 2 1_ 2 _ 1 2 _ Figure 4.7 A game with incomplete information. Indeed, H(l,l) = 1 2 1 + 1 2 0 = 1 2 , H(l,r) = 1 2 (−1) + 1 2 1 = 0, H(r,l) = 1 2 (−2) + 1 2 1 =−1 2 ,H(r,r)= 1 2 3 + 1 2 (−2) = 1 2 . And the equilibrium of this game is attained in the mixed strategies ( 2 3 , 1 3 ) and ( 1 3 , 2 3 ), the value of this game constitutes 1∕6. Apparently, the absence of information for player II makes the game non-beneficial to him. Example 4.3 A non-cooperative game with incomplete information. Move 1. Player I chooses between the alternatives “l” and “r.” Move 2. Random move; one of the alternatives, “l” or “r,” is selected equiprobably. Move 3. Being aware of the random move only, player II chooses between the alternatives “l” and “r.” The payoff of player I in the terminal positions coincides with Example 4.1. And Figure 4.7 presents the corresponding tree of the game (dashed line indicates the information set of player II). It differs from Example 4.2, since it comprises two subset X1 2 and X2 2. Here, the strategy u of player I takes two values, “l” and “r.” The strategy v = (v1, v2) of player II consists of two components (for each information set X1 2 and X2 2) and has four possible values. The payoff matrix of this game is defined by ( ll lr rl rr l 1 2 0 − 1 2 0 r − 1 2 −22 1 2 ) . Really, H(l,ll) = 1 2 1 + 1 2 0 = 1 2 , H(l,lr) = 1 2 1 + 1 2 1 = 1, H(l,rl) = 1 2 (−1) + 1 2 0 =−1 2 , H(l,rr) = 1 2 (−1) + 1 2 1 = 0, H(r,ll) = 1 2 (−2) + 1 2 1 =−1 2 , H(r,lr) = 1 2 (−2) + 1 2 (−2) =−2, H(r,rl) = 1 2 3 + 1 2 1 = 2, H(r,rr) = 1 2 3 + 1 2 (−2) = 1 2 . The game admits the mixed strategy equilibria ( 5 7 , 2 7 ) and (0, 1 7 ,0, 6 7 ) and has the value of 1∕7. Obviously, some information available to player II allows to reduce his loss (in comparison with the previous example). Examples 4.1–4.3 show the relevance of informational partitions for game trees. Being in an information set, a player does not know the current position of a play. All positions in www.it-ebooks.info 104 MATHEMATICAL GAME THEORY AND APPLICATIONS a given information set appear identical for a player. Thus, his strategy depends on a given information set only. Let the personal position set of player i be decomposed into information sets X1 i ∪ ... ∪ XJi i . Here we comprehend alternatives as arcs connecting nodes x and y ∈ Z(x). Definition 4.9 Suppose that, in a position x ∈ Xj i, player i chooses among kj alternatives, i.e., Z(x) = {y1, ..., ykj }. A pure strategy of player i in a game with incomplete information is a function ui = ui(Xj i), j = 1, ..., Ji, which assigns some alternative k ∈ {1, ..., kj} to each information set. Similar to games with complete information, specification of a pure strategy profile (u1, ..., un) and random move alternatives uniquely define a play of the game and the payoffs of any player. Actually, each player possesses a finite set of strategies—their number makes up k1 × ... × kJi , i = 1, ..., n. Definition 4.10 A mixed strategy of player i in a game with incomplete information is a probability distribution 𝜇i = 𝜇i(ui) on the set of pure strategies of player i. Here 𝜇i(ui) means the realization probability of the pure strategy (ui(X1 i ) = k1, ..., ui(XJi i ) = kJi ). Definition 4.11 A position x ∈ X is feasible for a pure strategy ui (𝜇i), if there exists a strategy profile u = (u1, ..., ui, ..., un) (𝜇 = (𝜇1, ..., 𝜇i, ..., 𝜇n)) such that a play passes through the position x with a positive probability. Denote by Possui (Poss𝜇i) the set of such positions. An information set Xj i is relevant for ui (𝜇i), if it contains, at least, one feasible position for ui (𝜇i). The collection of sets relevant for ui (𝜇i) will be designated by Relui (Rel𝜇i). Consider some terminal node t ∈ T; denote by [x0, t] a play beginning at x0 and ending at t. Assume that player i possesses a certain position in the play [x0, t]. Let x indicate his last position in the play, x ∈ Xj i ∩ [x0, t], and k be an alternative in this position, which belongs to the play [x0, t], i = 1, ..., n. Under a given strategy profile 𝜇 = (𝜇1, ..., 𝜇n), the realization probability of such play in the game becomes P𝜇[x0, t] = ⎛ ⎜ ⎜⎝ ∑ u: Xj i∈Relui,ui(Xj i)=k n∏ i=1 𝜇i(ui) ⎞ ⎟ ⎟⎠ ∏ x∈X0∩[x0,t],y∈Z(x)∩[x0,t] p(y|x). (3.1) Formula (3.1) implies summation over all pure strategy profiles realizing a given play and multiplication by the probabilities of alternatives belonging to this play (for random moves). Under a given mixed strategy profile, the payoffs of players are the mean values Hi(𝜇1, ..., 𝜇n) = ∑ t∈T Hi(t)P𝜇[x0, t], i = 1, ..., n. (3.2) Recall that the number of pure strategies appears finite. And so, this extensive-form game is equivalent to some non-cooperative normal-form game. The general theory of non- cooperative games claims the existence of a mixed strategy Nash equilibrium. Theorem 4.2 An extensive-form game with incomplete information has a mixed strategy Nash equilibrium. www.it-ebooks.info EXTENSIVE-FORM n-PLAYER GAMES 105 4.4 Total memory games Although extensive-form games with incomplete information possess solutions in the class of mixed strategies, they do not seem practicable due to high dimensionality. Subsequent models of real games involve the so-called behavioral strategies. Definition 4.12 A behavioral strategy of player i is a vector function 𝛽i defining for each information set Xj i a probability distribution on the alternative set (1, ..., kj) for positions x ∈ Xj i,j= 1, ..., Ji. Clearly, kj∑ k=1 𝛽i(Xj i, k) = 1, j = 1, ..., Ji. Consider some terminal position t and the corresponding play [x0, t]. Under a given behavioral strategy profile 𝛽 = (𝛽1, ..., 𝛽n), the realization probability of the play [x0, t] takes the form P𝛽[x0, t] = ∏ i∈N, j=1,...,Ji, k∈[x0,t] 𝛽i(Xj i, k) ∏ x∈X0∩[x0,t],y∈Z(x)∩[x0,t] p(y|x). (4.1) And the expected payoff of players is described by Hi(𝛽1, ..., 𝛽n) = ∑ t∈T Hi(t)P𝛽[x0, t], i = 1, ..., n. (4.2) Naturally enough, each behavioral strategy corresponds to a certain mixed strategy (the converse statement fails). Still, one can look for behavioral strategy equilibria in a wide class of games known as total memory games. Definition 4.13 A game Γ is a total memory game for player i, if for any pure strategy ui and any information set Xj i such that Xj i ∈ Relui it follows that any position x ∈ Xj i is feasible. According to this definition, any position from a relevant information set is feasible in a total memory game. Moreover, any player can exactly recover his alternatives at preceding moves. Theorem 4.3 In the total memory game Γ, any mixed strategy 𝜇 corresponds to some behavioral strategy ensuring the same probability distribution on the set of plays. Proof: Consider the total memory game Γ and a mixed strategy 𝜇. Using it, we construct a special behavioral strategy for each player. Let Xj i be the information set of player i and k represent a certain alternative in the position x ∈ Xj i, k = 1, ..., kj. Introduce P𝜇(Xj i) = ∑ ui: Xj i∈Relui 𝜇i(ui) (4.3) www.it-ebooks.info 106 MATHEMATICAL GAME THEORY AND APPLICATIONS as the choice probability of the pure strategy ui admitting the information set Xj i, and P𝜇(Xj i, k) = ∑ ui: Xj i∈Relui,ui(Xj i)=k 𝜇i(ui) (4.4) as the choice probability of the pure strategy ui admitting the information set Xj i and the alternative ui(Xj i) = k. The following equality holds true: kj∑ i=1 P𝜇(Xj i, k) = P𝜇(Xj i). Evidently, a total memory game enjoys the following property. If the play [x0, t] with the terminal position t passes through the position x1 ∈ Xj i of player i, alternative k, and the subsequent position of player i is x2 ∈ Xl i (see Figure 4.8), the pure strategy sets {ui : Xj i ∈ Relui, ui(Xj i) = k} and {ui : Xl i ∈ Relui} Figure 4.8 A total memory game. www.it-ebooks.info EXTENSIVE-FORM n-PLAYER GAMES 107 do coincide. Therefore, P𝜇(Xj i, k) = P𝜇(Xl i). (4.5) For each player i = 1, ..., n, define a behavioral strategy as follows. If Xj i turns out relevant for 𝜇i, then 𝛽i(Xj i, k) = P𝜇(Xj i, k) P𝜇(Xj i) , k = 1, ..., kj. (4.6) Otherwise, the denominator in (4.6) vanishes. Let us set 𝛽i(Xj i, k) = ∑ ui: ui(Xj i)=k 𝜇i(ui). For instance, analyze the play [x0t] in Figure 4.8. It passes through two information sets of player i, his pure strategy makes a pair of alternatives u = (ll, lm, lr, rl, rm, rr). In this case, his mixed strategy can be rewritten as the vector 𝜇 = (𝜇1, ..., 𝜇6). By virtue of (4.6), the corresponding behavioral strategy acquires the following form. In the first information set, we obtain 𝛽(X1,l)= 𝜇1 + 𝜇2 + 𝜇3, 𝛽(X1,r)= 𝜇4 + 𝜇5 + 𝜇6. In the second information set, we obtain 𝛽(X2,l)= 𝜇4 𝜇4 + 𝜇5 + 𝜇6 , 𝛽(X2,m)= 𝜇5 𝜇4 + 𝜇5 + 𝜇6 , 𝛽(X2,r)= 𝜇6 𝜇4 + 𝜇5 + 𝜇6 . Obviously, the behavioral strategy in this play 𝛽(X1,r)𝛽(X2,m)= 𝜇5 completely matches the mixed strategy of the realization (r,m). Now, we demonstrate that the behavioral strategy (4.6) yields precisely the same proba- bility distribution on all plays as the mixed strategy 𝜇. Select the play [x0, t], where t is a terminal node. Suppose that the play [x0, t] sequentially intersects the information sets X1 i , ..., XJi i of player i and alternatives k1, ..., kJi belonging to the path [x0, t] are chosen. If, at least, one of these sets appears irrelevant for 𝜇i, then P𝜇[x0, t] = P𝛽[x0, t] = 0. Therefore, suppose that all Xj i ∈ Rel 𝜇i, j = 1, ..., Ji. It follows from (4.5) that, for 𝛽 determined by (4.6) and any i, we have the equality ∏ j=1,...,Ji, k∈[x0,t] 𝛽i(Xj i, k) = ∏ j=1,...,Ji, k∈[x0,t] P𝜇(Xj i, k) P𝜇(Xj i) = P𝜇(XJi i , kJi ). www.it-ebooks.info 108 MATHEMATICAL GAME THEORY AND APPLICATIONS Evaluate P𝛽[x0, t]for𝛽 defined by (4.6). To succeed, transform the first product in formula (4.1): ∏ i∈N, j=1,...,Ji, k∈[x0,t] 𝛽i(Xj i, k) = ∏ i∈N, kJi ∈[x0,t] P𝜇(XJi i , kJi ) = ∏ i∈N ⎛ ⎜ ⎜ ⎜⎝ ∑ ui: XJii ∈Relui,ui(XJii )=kJi 𝜇i(ui) ⎞ ⎟ ⎟ ⎟⎠ = ⎛ ⎜ ⎜ ⎜⎝ ∑ u: XJii ∈Relui,ui(XJii )=kJi n∏ i=1 𝜇i(ui) ⎞ ⎟ ⎟ ⎟⎠ . Thus, P𝛽[x0, t] = ⎛ ⎜ ⎜ ⎜⎝ ∑ u: XJii ∈Relui,ui(XJii )=kJi n∏ i=1 𝜇i(ui) ⎞ ⎟ ⎟ ⎟⎠ ∏ x∈X0∩[x0,t],y∈Z(x)∩[x0,t] p(y|x). The last expression coincides with the representation (3.1) for P𝜇[x0, t]. We have argued that total memory games possess identical distributions of mixed strategies and corresponding behavioral strategies. Hence, the expected payoffs also coincide for such strategies. And so, while searching for equilibrium strategy profiles in such games, one can be confined to a wider class of behavioral strategies. The application of behavioral strategies will be illustrated in forthcoming chapters of the book. Exercises 1. Consider a game with complete information described by the tree in Figure 4.9. 2 x1 (1) x1 (2) x2 (2) x2 (1) x3 (1) x4 (1) x5 (1) 3 2 8 3 1 1 5 0 0 8 5 5 8 1 10 Figure 4.9 A game with complete information. Find a subgame-perfect equilibrium in this game. 2. Evaluate an equilibrium in a game described by the tree in Figure 4.10: (a) under benevolent behavior and (b) under malevolent behavior of players. www.it-ebooks.info EXTENSIVE-FORM n-PLAYER GAMES 109 10 x1 (1) x1 (2) x2 (2) x2 (1) x3 (1) x4 (1) x5 (1) 1 3 5 4 5 1 1 1 2 2 1 0 1 5 0 Figure 4.10 A game with complete information. 3. Establish a condition when malevolent behavior yields higher payoffs to both players (in comparison with their benevolent behavior). 4. Find an indifferent equilibrium in game no. 2. 5. Reduce the following game (see the tree in Figure 4.11) to the normal form. x1 (1) x1 (2) x2 (2) x2 (1) x3 (1) x4 (1) x5 (1) 3 -3 2 -2 5 -5 -2 2 -4 4 -1 1 -1 1 -5 5 Figure 4.11 Zero-sum game in extensive form. 6. Consider a game with incomplete information described by the tree in Figure 4.12. x1 (1) x1 (2) x2 (2) x2 (1) x3 (1) x4 (1) x5 (1) 3 -3 2 -2 5 -5 -2 2 -4 4 -1 1 -1 1 -5 5 Figure 4.12 A game with incomplete information. Find a subgame-perfect equilibrium in this game. www.it-ebooks.info 110 MATHEMATICAL GAME THEORY AND APPLICATIONS 7. Evaluate an equilibrium in a game with incomplete information described by the tree in Figure 4.13. x1 (1) x1 (2) x2 (2) x2 (1) x3 (1) x4 (1) x5 (1) 3 -3 2 -2 5 -5 -2 2 -4 4 -1 1 -1 1 -5 5 Figure 4.13 A game with incomplete information. 8. Give an example of a partial memory game in the extensive form. 9. A card game. This game involves two players. Each of them has two cards: x1 = 0, x2 = 1 (player I) and y1 = 0, y2 = 1 (player II). Two additional cards lie on the table: z1 = 0, z2 = 1. The top card on the table is turned up. Each player chooses one of his cards and puts it on the table. The winner is the player putting a higher card; his payoff is the value of the opponent’s card. If both players put identical cards, the game is drawn. Construct the corresponding tree and specify information sets. 10. Construct the tree for game no. 9 provided that x, y, z represent independent random variables with the uniform distribution on [0, 1]. www.it-ebooks.info 5 Parlor games and sport games Introduction Parlor games include various card games, chess, draughts, etc. Many famous mathematicians (J. von Neumann, R. Bellman, S. Karlin, T. Ferguson, M. Sakaguchi, to name a few) endeav- ored to apply game theory methods to parlor games. We have mentioned that chess analysis provokes small interest (this is a finite game with complete information—an equilibrium does exist). Recent years have been remarkable for the development of very powerful chess computers (e.g., Junior, Hydra, Pioneer) that surpass human capabilities. On the other hand, card games represent games with incomplete information. Therefore, it seems attractive to model psychological effects (risk, bluffing, etc.) by game theory methods. Here we will search for equilibria in the class of behavioral strategies. Our investigation begins with poker. Let us describe this popular card game. A poker pack consists of 52 cards of four suits (spades, clubs, diamonds, and hearts). Cards within a suit differ by their denomination. There exist 13 denominations: 2, 3, … , 10, jack, queen, king, and ace. In poker, each player is dealt five cards. Different combinations of cards (called hands) have specific rankings. A typical hand ranking system is as follows. The highest ranking belongs to (a) royal flush (10, jack, queen, king, ace—all having a same suit); the corresponding probability is approximately 1.5 ⋅ 10−6. Lower rankings (in the descending order of their probabilities) are assigned to the following hands: (b) four of a kind or quads (all four cards of one denomination and any other (unmatched) card); the probability makes up 0.0002; (c) full house (three matching cards of one denomination and two matching cards of another denomination); the probability equals 0.0014; Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info 112 MATHEMATICAL GAME THEORY AND APPLICATIONS (d) straight flush (five cards of sequential denomination in at least two different suits); the probability is 0.0035; (e) three of a kind or trips (three cards of the same denomination, plus two cards which are not of this denomination nor the same as each other); the probability constitutes 0.0211; (f) two pairs (two cards of the same denomination, plus two cards of another denom- ination (that match each other but not the first pair), plus any card not of either denomination); the probability makes up 0.0475; (g) one pair (two cards of one denomination, plus three cards which are not of this denomination nor the same as each other); the probability is given by 0.4225. After cards selection, players make bets. Subsequently, they open the cards and the one having the highest ranking hand breaks the bank. Players do not know the cards of their opponents—poker is a game with incomplete information. 5.1 Poker. A game-theoretic model As a mathematical model of this card game, let us consider a two-player game. In the beginning of a play, both players (e.g., Peter and Paul) contribute the buy-ins of 1. Afterwards, they are dealt two cards of denominations x and y, respectively (a player has no information on the opponent’s card). Peter moves first. He either passes (losing his buy-in), or makes a bet c > 1. In this case, the move is given to Paul who chooses between the same alternatives. If Paul passes, he loses his buy-in; otherwise, the players open up the cards and the one having a higher denomination becomes the winner. Note that the cards of players possess random denominations. It is necessary to define the probabilistic character of all possible outcomes. Assume that the denominations of cards lie within the interval from 0 to 1 and appear equiprobable. In other words, the random variables x and y obey the uniform distribution on the interval [0, 1]. Now, specify strategies in this game. Each player merely knows his card; hence, his decision is based on this knowledge. Therefore, we understand Peter’s strategy as a function 𝛼(x)—the probability of betting under the condition that he disposes of the card x. Since 𝛼 represents a probability, its values satisfy 0 ≤ 𝛼 ≤ 1 and the function ̄𝛼 = 1 − 𝛼 corresponds to the probability of passing. Similarly, if Peter bets something, Paul’s strategy consists in a function 𝛽(y)—the probability of calling provided that he has the card y. Obviously, 0 ≤ 𝛽 ≤ 1. Different combinations of cards (hands) appear in the course of a play. Thus, the payoff of each player represents a random quantity. As a criterion, we adopt the expected value of the payoff. Imagine that the players have selected their strategies (𝛼 and 𝛽). By virtue of the game conditions, the expected payoff of player I makes up −1, with the probability of ̄𝛼(x), +1, with the probability of 𝛼(x) ̄𝛽(y), (c + 1)sgn(x − y), with the probability of 𝛼(x)𝛽(y). Here the function sgn(x − y) equals 1, if x > y; −1, if x < y and 0, if x = y. www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 113 Due to these expressions, the expected payoff of Peter becomes H(𝛼, 𝛽) = 1 ∫ 0 1 ∫ 0 [ −̄𝛼(x) + 𝛼(x) ̄𝛽(y) + (c + 1)sgn(x − y)𝛼(x)𝛽(y) ] dxdy. (1.1) For the time being, the game is completely defined. Actually, we have described the strategies and payoffs of both players. Player I strives for maximizing the expected payoff (1.1), whereas player II seeks to minimize it. 5.1.1 Optimal strategies Readers would easily guess the form of optimal strategies. By extracting the terms containing 𝛼(x), rewrite the payoff (1.1) as H(𝛼, 𝛽) = 1 ∫ 0 𝛼(x) [ 1 + ∫ 1 0 ( ̄𝛽(y) + (c + 1)sgn(x − y)𝛽(y) ) dy ] dx − 1. (1.2) Denote by Q(x) the bracketed expression in (1.2). It follows from (1.2) that Peter’s optimal strategy 𝛼∗(x), maximizing his payoff, takes the following form. If Q(x) > 0, then 𝛼∗(x) = 1 and, if Q(x) < 0, then 𝛼∗(x) = 0. In the case of Q(x) = 0, the function 𝛼∗(x) possesses any values. The function sgn(x − y), as well as the function Q(x) proper, are non-decreasing. Figure 5.1 illustrates that the optimal strategy 𝛼∗(x) must be defined by some threshold a. If the dealt card x has a denomination smaller than a, the player should pass (and bet otherwise). Similarly, we reexpress the payoff H(𝛼, 𝛽)as H(𝛼, 𝛽) = 1 ∫ 0 𝛽(y) [ 1 ∫ 0 𝛼(x) (−(c + 1)sgn(y − x) − 1) dx ] dy + 1 ∫ 0 (2𝛼(x) − 1)dx. (1.3) 0 a 1 x Q(x) Figure 5.1 The function Q(x). www.it-ebooks.info 114 MATHEMATICAL GAME THEORY AND APPLICATIONS And Paul’s optimal strategy 𝛽∗(y) also gets conditioned by a certain threshold b. If his card’s denomination exceeds this threshold, Paul makes a bet (and passes otherwise). Let us evaluate the stated optimal thresholds a∗, b∗. Suppose that Peter employs the strategy 𝛼 with a threshold a. According to (1.3), Paul’s payoff makes up H(𝛼, 𝛽) = 1 ∫ 0 𝛽(y)G(y)dy + 2(1 − a) − 1, (1.4) where G(y) = 1 ∫ a [−(c + 1)sgn(y − x) − 1]dx. A series of standard calculations lead to G(y) = 1 ∫ a cdx = c(1 − a), if y < a, G(y) = y ∫ a (−c − 2)dx + 1 ∫ y cdx =−2(c + 1)y + a(c + 2) + c,ify ≥ a. Figure 5.2 shows the curve of G(y). Obviously, the optimal threshold b is defined by −2(c + 1)b + a(c + 2) + c = 0, whence it appears that b = 1 2(c + 1)[a(c + 2) + c]. (1.5) Therefore, the optimal threshold of player II is uniquely defined by the corresponding threshold of the opponent. And the minimal value of Paul’s loss equals H(𝛼, 𝛽) = 1 ∫ b G(y)dy + 2(1 − a) − 1 = 1 ∫ b [−2(c + 1)y + a(c + 2) + c]dy + 2(1 − a) − 1. 0 a 1by G(y) Figure 5.2 The function G(y). www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 115 Integration yields H(𝛼, 𝛽) =−(c + 1)(1 − b2) + [a(c + 2) + c](1 − b) − 2a + 1 = (c + 1)b2 − b[a(c + 2) + c] + ac. (1.6) Substitute the optimal value of b (see (1.5)) into formula (1.6) to represent Paul’s minimal loss as a function of argument a: H(a) = 1 4(c + 1)[a(c + 2) + c]2 − 1 2(c + 1)[a(c + 2) + c]2 + ac. Some uncomplicated manipulations bring to H(a) = (c + 2)2 4(c + 1) [ −a2 + 2a c2 (c + 2)2 − c2 (c + 2)2 ] . (1.7) Recall that a forms Peter’s strategy—he strives for maximizing the minimal loss of Paul, see (1.7). Therefore, we finally arrive at the maximization problem for the parabola h(a) =−a2 + 2a c2 (c + 2)2 − c2 (c + 2)2 . This function is demonstrated in Figure 5.3. Its maximum lies at the point a∗ = ( c c + 2 )2 within the interval [0, 1]. Substitute this value into (1.5) to find the optimal threshold of player II: b∗ = c c + 2 . 0 a 1* a h(a) Figure 5.3 The parabola h(a). www.it-ebooks.info 116 MATHEMATICAL GAME THEORY AND APPLICATIONS The payoff of player I (being the best for Peter and Paul) results from substituting the optimal threshold a∗ into (1.7): H∗ = H(a∗, b∗) = (c + 2)2 4(c + 1) [( c c + 2 )4 − ( c c + 2 )2] =− ( c c + 2 )2 . Apparently, the game has negative value, i.e., player I (Peter) is disadvantaged. 5.1.2 Some features of optimal behavior in poker We have evaluated the optimal payoffs of both players and the game value. The optimal threshold of player I is smaller than that of his opponent. In other words, Peter should be more careful. The game possesses a negative value. This aspect admits the following explanation. The move of player I provides some information on his card to player II. Now, we discuss uniqueness of optimal strategies. Figure 5.2 elucidates the following. If player I (Peter) employs the optimal strategy 𝛼∗(x) with the threshold a∗ = ( c c+2 )2, then the best response of player II (Paul) is also the threshold strategy 𝛽∗(y) with the threshold b∗ = c c+2 . Notably, Paul’s optimal strategy appears uniquely defined. Fix Paul’s strategy with the threshold b∗ and find the best response of Peter. For this, address the expression (1.2) and compute the function Q(x). Under the given b∗, we establish that, if x < b∗, then Q(x) = 1 + 1 ∫ 0 ( ̄𝛽(y) + (c + 1)sgn(x − y)𝛽(y) ) dy = 1 + b∗ ∫ 0 dy − 1 ∫ b∗ (c + 1)dy = 1 + b∗ − (c + 1)(1 − b∗) = 0. On the other hand, if x ≥ b∗, Q(x) = 1 + b∗ + x ∫ b∗ (c + 1)dy − 1 ∫ x (c + 1)dy = 1 + b∗ + (c + 1)(x − b∗) − (c + 1)(1 − x) = 2(c + 1)x + (c + 2)(b∗ + 1). Figure 5.4 demonstrates that the function Q(x) is positive on the interval (b∗, 1]. If Peter has a card x > b∗, his best response consists in betting. However, if x lies within the interval [0, b∗], then Q(x) = 0 and 𝛼∗(x) may possess any values (this does not affect the payoff (1.2)). Of course, the evaluated strategy with the threshold a∗ meets this condition. Is there another Peter’s strategy 𝛼(x) such that Paul’s optimal strategy coincides with 𝛽∗(y)? Such strategies do exist. For instance, consider the following strategy 𝛼(x). If x ≥ b∗, player I makes a bet; in the case of x < b∗, he makes a bet with the probability of p = 2 c+2 (and passes with the probability ̄p = 1 − p = c c+2 , accordingly). Find the best response of www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 117 0 b* 1 x Q(x) Figure 5.4 The function Q(x). player II to the described strategy of the opponent. Again, rewrite the payoff in the form (1.4). The function G(y) is then defined by G(y) = b∗ ∫ 0 p(−(c + 1)sgn(y − x) − 1)dx + 1 ∫ b∗ (−(c + 1)sgn(y − x) − 1)dx, whence it follows that, under y < b∗, G(y) = p y ∫ 0 (−(c + 1) − 1)dx + p b∗ ∫ y (c + 1 − 1)dx + 1 ∫ b∗ (c + 1 − 1)dx =−2p(c + 1)y + pcb∗ + c(1 − b∗), and, under y ≥ b∗, G(y) = p b∗ ∫ 0 (−c − 2)dx + y ∫ b∗ (−c − 2)dx + 1 ∫ y cdx =−2(c + 1)y + (c + 2)b∗(1 − p) + c. The curve of G(y) can be observed in Figure 5.5. Interestingly, the choice of p leads to G(b∗) = 0 and the best strategy of Paul is still 𝛽∗(y). Therefore, we have obtained another solution to this game. It fundamentally differs from the previous one for player I. Now, Peter can make a bet even having a card of a small denomination. Such effect in card games is well-known as bluffing. A player feigns that he has a high-rank card, thus compelling the opponent to pass. However, the probability of bluffing is smaller for larger bets c. For instance, if c = 100, the probability of bluffing must be less than 0.02. www.it-ebooks.info 118 MATHEMATICAL GAME THEORY AND APPLICATIONS 01b* y G(y) Figure 5.5 The function G(y). 5.2 The poker model with variable bets The above poker model proceeds from fixed bets c. In real games, bets vary. Consider a variable-bet modification of the model. As usual, Peter and Paul contribute the buy-ins of 1 in the beginning of a play. Afterwards, they are dealt two cards of denominations x and y, respectively (a player has no information on the opponent’s card). At first shot, Peter makes a bet c(x) depending on the denomination x of his card. The move is given to Paul who chooses between the same alternatives. If Paul passes, he loses his buy-in; otherwise, he calls the opponent’s bet and adds c(x) to the bank. The players open up the cards and the one having a higher denomination becomes the winner. In this model, Peter wins 1 or (1 + c(x))sgn(x − y). The problem is to find the optimal function c(x) and optimal response of player II. It was originally formulated by R. Bellman in the late 1950s [Bellman et al. 1958]. Our analysis starts with the discrete model of this game. Suppose that Peter’s bet takes any value from a finite set 0 < c1 < c2 < ⋯ < cn. Then the strategy of player I is a mixed strat- egy 𝛼(x) = (𝛼1(x), … , 𝛼n(x)), where 𝛼i(x) denotes the probability of betting ci, i = 1, … , n, provided that his card has the denomination of x. Consequently, ∑n i=1 𝛼i = 1. The strategy of player II lies in a behavioral strategy 𝛽(y) = (𝛽1(y), … , 𝛽n(y)), where 𝛽i(y) designates the probability of calling the bet ci,0≤ 𝛽i ≤ 1, i = 1, … , n, under the selected card y. Accord- ingly, ̄𝛽i(y) = 1 − 𝛽i(y) gives the probability of passing under the bet ci and the card y. The expected payoff of player I acquires the form H(𝛼, 𝛽) = 1 ∫ 0 1 ∫ 0 n∑ i=1 [𝛼i(x) ̄𝛽i(y) + (1 + ci)sgn(x − y)𝛼i(x)𝛽i(y) ] dxdy. (2.1) First, consider the case of n = 2. 5.2.1 The poker model with two bets Assume that player I can bet c1 or c2 (c1 < c2) depending on a selected card x. Therefore, his strategy can be defined via the function 𝛼(x)—the probability of the bet c1. The quantity www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 119 ̄𝛼(x) = 1 − 𝛼(x) indicates the probability of the bet c2. Player II strategy is completely described by two functions, 𝛽1(y) and 𝛽2(y), that specify the probabilities of calling the bets c1 and c2, respectively. The payoff function (2.1) becomes H(𝛼, 𝛽) = 1 ∫ 0 1 ∫ 0 [𝛼(x) ̄𝛽1(y) + (1 + c1)sgn(x − y)𝛼(x)𝛽1(y) + (1 − 𝛼(x)) ̄𝛽2(y) + (1 + c2)sgn(x − y)(1 − 𝛼(x))𝛽2(y) ] dxdy. (2.2) We evaluate the optimal strategy of player II. Take formula (2.2) and extract terms with 𝛽1 and 𝛽2.Theyare 1 ∫ 0 𝛽1(y)dy ⎡ ⎢ ⎢⎣ 1 ∫ 0 𝛼(x)(−1 + (1 + c1)sgn(x − y))dx ⎤ ⎥ ⎥⎦ (2.3) and 1 ∫ 0 𝛽2(y)dy ⎡ ⎢ ⎢⎣ 1 ∫ 0 (1 − 𝛼(x))(−1 + (1 + c2)sgn(x − y))dx ⎤ ⎥ ⎥⎦ . (2.4) The function sgn(x − y) is non-increasing in y. Hence, the bracketed expressions in (2.3)– (2.4) (denote them by Gi(y), i = 1, 2) represent non-increasing functions of y (see Figure 5.6). Suppose that the functions Gi(y) intersect axis Oy within the interval [0, 1] at some points bi, i = 1, 2. Player II aims at minimizing the functionals (2.3)–(2.4). The integrals in these formulas possess minimal values under the following necessary condition. The function 𝛽i(y) vanishes for Gi(y) > 0 and equals 1 for Gi(y) < 0, i = 1, 2. And so, the optimal strategy of player II has the form 𝛽i(y) = I(y ≥ bi), i = 1, 2, where I(A) means the indicator of the set A. In other words, player II calls the opponent’s bet under sufficiently high denominations of cards (exceeding 01bi y Gi(y) Figure 5.6 The function Gi(y). www.it-ebooks.info 120 MATHEMATICAL GAME THEORY AND APPLICATIONS the threshold bi, i = 1, 2). So long as c1 < c2, a natural supposition is the following. The threshold for calling a higher bet must be greater, as well: b1 < b2. The thresholds b1, b2 are defined by the equations Gi(bi) = 0, i = 1, 2, or, according to (2.3)–(2.4), b1 ∫ 0 (−2 − c1)𝛼(x)dx + 1 ∫ b1 c1𝛼(x)dx = 0 (2.5) and b2 ∫ 0 (−2 − c2)̄𝛼(x)dx + 1 ∫ b2 c2 ̄𝛼(x)dx = 0. (2.6) Now, construct the optimal strategy of player I—the function 𝛼(x). In the payoff (2.2), extract the expression containing 𝛼(x): 1 ∫ 0 𝛼(x)dx ⎡ ⎢ ⎢⎣ 1 ∫ 0 𝛽2(y) − 𝛽1(y) + sgn(x − y) ( (1 + c1)𝛽1(y) − (1 + c2)𝛽2(y) ) dy ⎤ ⎥ ⎥⎦ . Designate by Q(x) the bracketed expression above. For x such that Q(x) < 0(Q(x) > 0), the optimal strategy 𝛼(x) equals zero (unity, respectively). In the case of Q(x) = 0, the function 𝛼(x) takes arbitrary values. After some transformations, we obtain Q(x) = x ∫ 0 ( c1𝛽1(y) − c2𝛽2(y) ) dy + 1 ∫ x ( (2 + c2)𝛽2(y) − (2 + c1)𝛽1(y) ) dy. Recall the form of the strategies 𝛽i(x), i = 1, 2. The derivative of the function Q(x), Q′(x) = (2 + 2c1)𝛽1(x) − (2 + 2c2)𝛽2(x), allows being represented by Q′(x) = ⎧ ⎪ ⎨ ⎪⎩ 0, if x ∈ [0, b1] 2 + 2c1,ifx ∈ (b1, b2) −2(c2 − c1), if x ∈ [b2,1]. Therefore, the function Q(x) appears constant on the interval [0, b1], increases on the interval (b1, b2), and decreases on the interval [b2,1]. Require that the function Q(x) vanishes on the interval [0, b1] and crosses axis Ox at some point a on the interval [b2, 1] (see Figure 5.7). Consequently, we will have b1 < b2 < a. www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 121 0 b2b1 a1 1 x Q(x) Figure 5.7 The function Q(x). For it is necessary that Q(0) = 1 ∫ b2 (2 + c2)dy − 1 ∫ b1 (2 + c1)dy = 0 and Q(a) = a ∫ b1 c1dy − ∫ a b2 c2dy + 1 ∫ a (c2 − c1)dy = 0. Further simplification of the above conditions yields (1 − b1)(2 + c1) = (1 − b2)(2 + c2), (c2 − c1)(2a − 1) = c2b2 − c1b1. Under these conditions, the optimal strategy of player I acquires the form: 𝛼(x) = ⎧ ⎪ ⎨ ⎪⎩ an arbitrary value, if x ∈ [0, b1] 1, if x ∈ (b1, a) 0, if x ∈ [a,1]. And the conditions (2.5)–(2.6) are rewritten as b1 ∫ 0 𝛼(x)dx = c1(a − b1) 2 + c1 = b1 − c2(1 − a) 2 + c2 . (2.7) www.it-ebooks.info 122 MATHEMATICAL GAME THEORY AND APPLICATIONS Therefore, the parameters of players’ optimal strategies meet the system of equations (1 − b1)(2 + c1) = (1 − b2)(2 + c2), (c2 − c1)(2a − 1) = c2b2 − c1b1, (2.8) c1(a − b1) 2 + c1 = b1 − c2(1 − a) 2 + c2 . The system of equations (2.8) possesses a solution 0 ≤ b1 < b2 ≤ a ≤ 1. We demonstrate this fact in the general case. The optimal strategy of player I is remarkable for the following reasons. It takes arbitrary values on the interval [0, b1] such that the condition (2.7) holds true. This corresponds to a bluffing strategy, since player I can make a high bet for small rank cards. The optimal strategy of player II dictates to escape a play under small rank cards (and call a certain bet of the opponent under sufficiently high denominations of the cards). For instance, we select c1 = 2, c2 = 4 to obtain the following optimal parameters: b1 = 0.345, b2 = 0.563, a = 0.891. If the rank of his cards is less than 0.345, player I bluffs. He bets 2, if the card rank exceeds the above threshold yet is smaller than 0.891. And finally, for cards whose denomination is higher than 0.891, player I bets 4. Player II calls the bet of 2, if the rank of his cards belongs to the interval [0.345, 0.563] and calls the bet of 4, if the card rank exceeds 0.563. In the rest situations, player II prefers to pass. 5.2.2 The poker model with n bets Now, assume that player I is dealt a card x and can bet any value from a finite set 0 < c1 < ⋯ < cn. Then his strategy lies in a mixed strategy 𝛼(x) = (𝛼1(x), … , 𝛼n(x)), where 𝛼i(x) represents the probability of making the bet ci. The next shot belongs to player II. Depending on a selected card y, he either passes (losing his buy-in in the bank), or continues the play. In the latter case, player II has to call the opponent’s bet. Subsequently, both players open up their cards; the winner is the one whose card possesses a higher denomination. The strategy of player II is a behavioral strategy 𝛽(y) = (𝛽1(y), … , 𝛽n(y)), where 𝛽i(y) indicates the probability of calling the bet of player I (the quantity ci, i = 1, … , n). And the payoff function takes the form H(𝛼, 𝛽) = 1 ∫ 0 1 ∫ 0 n∑ i=1 [𝛼i(x) ̄𝛽i(y) + (1 + ci)sgn(x − y)𝛼i(x)𝛽i(y) ] dxdy. (2.9) First, we evaluate the optimal strategy of player II. To succeed, rewrite the function (2.9) as H(𝛼, 𝛽) = n∑ i=1 1 ∫ 0 𝛽i(y)dy ⎡ ⎢ ⎢⎣ 1 ∫ 0 𝛼i(x) ( −1 + (1 + ci)sgn(x − y) ) dx ⎤ ⎥ ⎥⎦ + 1. (2.10) www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 123 Denote by Gi(y) the bracketed expression in the previous formula. For each fixed strategy 𝛼(x) and bet ci, player II strives to minimize (2.10). Therefore, for any i = 1, … , n, his optimal strategy is given by 𝛽i(y) = { 0, if Gi(y) > 0 1, if Gi(y) < 0. Obviously, the function Gi(y) =−(2 + ci) y ∫ 0 𝛼i(x)dx + ci 1 ∫ y 𝛼i(x)dx does not increase in y. Furthermore, Gi(0) = ci 1 ∫ 0 𝛼i(x)dx ≥ 0 and Gi(1) =−(2 + ci) 1 ∫ 0 𝛼i(x)dx ≤ 0. Hence, the equation Gi(y) = 0 always admits a root bi (see Figure 5.6). The quantity bi satisfies the equation bi ∫ 0 𝛼i(x)dx = ci 2 + ci 1 ∫ bi 𝛼i(x)dx. (2.11) And so, the optimal strategy of player II becomes 𝛽i(y) = { 0, if 0 ≤ y < bi 1, if bi ≤ y ≤ 1 i = 1, … , n. Interestingly, the values bi, i = 1, … , n meeting (2.11) do exist for any strategy 𝛼(x). Construct the optimal strategy of player I. Reexpress the payoff function (2.9) as H(𝛼, 𝛽) = n∑ i=1 1 ∫ 0 𝛼i(x)Qi(x)dx, (2.12) where Qi(x) = 1 ∫ 0 ( ̄𝛽i(y) + (1 + ci)sgn(x − y)𝛽i(y) ) dy or Qi(x) = bi + (1 + ci) ⎛ ⎜ ⎜⎝ x ∫ 0 𝛽i(y)dy − 1 ∫ x 𝛽(y)dy ⎞ ⎟ ⎟⎠ . (2.13) For each x, player I seeks a strategy 𝛼(x) maximizing the payoff (2.12). Actually, this is another optimization problem which differs from the one arising for player II. www.it-ebooks.info 124 MATHEMATICAL GAME THEORY AND APPLICATIONS Here 𝛼(x) forms a mixed strategy, n∑ i=1 𝛼i(x) = 1. The maximal value of the payoff (2.12) is attained by 𝛼(x) such that 𝛼i(x) = 1, if for a given x the function Qi(x) takes greater values than other functions Qj(x), j ≠ i,or𝛼i(x) = 0 (otherwise). The function 𝛼(x) may possess arbitrary values, if all values Qi(x) coincide for a given value x. We search for the optimal strategy 𝛼(x) in a special class. Let all functions Qi(x) coincide on the interval [0, b1), i.e., Q1(x) =…=Qn(x). This agrees with bluffing by player I. Set a1 = b1 and suppose that Q1(x) > max{Qj(x), j ≠ 1} on the interval [a1, a2), Q2(x) > max{Qj(x), j ≠ 2} on the interval [a2, a3), and so on. Moreover, assume that the maximal value on the interval [an, 1] belongs to Qn(x). Then the optimal strategy of player I acquires the form: 𝛼i(x) = ⎧ ⎪ ⎨ ⎪⎩ an arbitrary value, if x ∈ [0, b1] 1, if x ∈ [ai, ai+1) 0, otherwise. (2.14) We specify the function Qi(x). Further simplification of (2.13) yields Qi(x) = { bi − (1 + ci)(1 − bi), if 0 ≤ x < bi (1 + ci)(2x − 1) − cibi,ifbi ≤ x ≤ 1. The function Qi(x) has constant values on the interval [0, bi]. Require that these values are identical for all functions Qi(x), i = 1, … , n: bi − (1 + ci)(1 − bi) = k, i = 1, … , n. In this case, all bi, i = 1, … , n satisfy the formula bi = 1 + k + ci 2 + ci = 1 − 1 − k 2 + ci , i = 1, … , n. (2.15) It is immediate from (2.15) that b1 < b2 < ⋯ < bn. This fact conforms with intuitive reasoning that player II must call a higher bet under higher-rank cards. The function Qi(x) is linear on the interval [bi,1]. Let ai, i = 2, … , n designate the intersection points of the functions Qi−1(x) and Qi(x). In addition, a1 = b1. To assign the form (2.14) to the optimal strategy 𝛼(x), we require that a1 < a2 < ⋯ < an. Then the function Qi(x)(i = 1, … , n) is maximal on the interval [ai, ai+1). Figure 5.8 demonstrates the functions Qi(x), i = 1, … , n. The intersection points ai result from the equations (1 + ci−1)(2ai − 1) − ci−1bi−1 = (1 + ci)(2ai − 1) − cibi, i = 2, … , n, or, after simple transformations, ai = 1 − ̄k (2 + ci−1)(2 + ci), i = 2, … , n, (2.16) where ̄k = 1 − k. www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 125 x 1 0 k a2 b1 a3 1 b2 b3 2 3 Figure 5.8 Optimal strategies. It remains to find k. Recall that the optimal thresholds bi of player II strategies satisfy equation (2.11). By virtue of (2.14), this equation takes the form b1 ∫ 0 𝛼i(x)dx = ci 2 + ci (ai+1 − ai), i = 1, … , n. (2.17) By summing up all equations (2.17) and considering the condition n∑ i=1 𝛼i(x) = 1, we arrive at b1 = n∑ i=1 ci 2 + ci (ai+1 − ai). Hence, it follows that 1 + k + c1 2 + c1 = ̄kA, where A = n∑ i=1 ci(ci+1 − ci−1) (2 + ci−1)(2 + ci)2(2 + ci+1) . We believe that c0 =−1, cn+1 =∞in the sum above. Consequently, k = 1 − 2 + c1 A(2 + c1) + 1 . Clearly, A and ̄k are both positive. Therefore, the sequence ai appears monotonous, a1 < a2 < ⋯ < an. And all thresholds ai lie within the interval [0, 1]. www.it-ebooks.info 126 MATHEMATICAL GAME THEORY AND APPLICATIONS Let us summarize the outcomes. The optimal strategy of player I is defined by 𝛼∗ i (x) = ⎧ ⎪ ⎨ ⎪⎩ an arbitrary function meeting the condition (2.17) if x ∈ [0, b1] 1, if x ∈ [ai, ai+1) 0, otherwise where ai = 1 − ̄k (2+ci−1)(2+ci) , i = 2, … , n. Note that player I bluffs on the interval [0, b1). Under small denominations of cards, he can bet anything. For definiteness, it is possible to decompose the interval [0, b1) into successive subintervals of the length ci 2+ci (ai+1 − ai), i = 1, … , n (by construction, their sum equals b1) and set 𝛼∗ i (x) = ci on a corresponding interval. For x > b1, player I has to bet ci on the interval [ai, ai+1). The optimal strategy of player II is defined by 𝛽∗ i (y) = { 0, if 0 ≤ y < bi 1, if bi ≤ y ≤ 1 where bi = 1+k+ci 2+ci , i = 1, … , n. Find the value of this game from (2.12): H(𝛼∗, 𝛽∗) = n∑ i=1 1 ∫ 0 𝛼∗ i (x)Qi(x)dx = b1 ∫ 0 k n∑ i=1 𝛼∗ i (x)dx + n∑ i=1 ai+1 ∫ ai Qi(x)dx, whence it appears that H(𝛼∗, 𝛽∗) = kb1 + n∑ i=1 (ai+1 − ai) [ (1 + ci)(ai + ai+1) − (1 + ci + cibi ] . (2.18) As an illustration, select bets c1 = 1, c2 = 3, and c3 = 6. In this case, readers easily obtain the following values of the parameters: A = c1(c2 + 1) (2 + c1)2(2 + c2) + c2(c3 − c2) (2 + c1)(2 + c2)2(2 + c3) + c3 (2 + c2)(2 + c3)2 ≈ 0.122, k = 1 − 2 + c1 A(2 + c1) + 1 ≈−1.193, and the corresponding optimal strategies b1 = 1 + k + c1 2 + c1 ≈ 0.269, b2 = 1 + k + c2 2 + c2 ≈ 0.561, b3 = 1 + k + c3 2 + c3 ≈ 0.725, a1 = b1 ≈ 0.269, a2 = 1 − 1 − k (2 + c1)(2 + c2) ≈ 0.854, a3 = 1 − 1 − k (2 + c2)(2 + c3) ≈ 0.945. www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 127 Finally, the value of the game constitutes H(𝛼∗, 𝛽∗) ≈−0.117. This quantity is negative—the game is non-beneficial for player I. 5.2.3 The asymptotic properties of strategies in the poker model with variable bets Revert to the problem formulated in the beginning of Section 5.2. Suppose that, being dealt a card x, player I can make a bet c(x) possessing an arbitrary value from R. Our analysis employs the results established in subsection 5.2.2. Choose a positive value B and draw a uniform net {B∕n, B∕n, … , Bn∕n} on the segment [0, B], where n is a positive integer. Imagine that nodes of this net represent bets in a play, i.e., ci = Bi∕n. Moreover, increase n and i infinitely such that the equality Bi∕n = c holds for some c. Afterwards, we will increase B infinitely. Find the limit values of the parameters determining the optimal strategies of players in a play with such bets. First, evaluate the limit of A. A = n∑ i=1 ci(ci+1 − ci−1) (2 + ci−1)(2 + ci)2(2 + ci+1) = n∑ i=1 B i n 2 B n( 2 + B i−1 n )( 2 + B i n )2( 2 + B i+1 n ). As n → ∞, the above integral sum tends to the integral: A → B ∫ 0 2c (2 + c)4 dc = 1 12 − 2B 3(2 + B)3 − 1 3(2 + B)2 , which has the limit A = 1∕12 as B → ∞. This immediately brings to the limit value k = 1 − 2 2A+1 =−5∕7. It is possible to compute the threshold for player I bluffing: b1 = a1 = 1 − (1 − k)∕(2 + B∕n) → 1 − (1 + 5∕7)∕2 = 1∕7. Thus, if player I receives cards with denominations less than 1∕7, he should bluff. Now, we define the bet of player I depending on the rank x of his card. According to the above optimal strategy 𝛼∗(x) of player I, he make a bet ci within the interval [ai, ai+1), where ai = 1 − 1 − k (2 + ci−1)(2 + ci) . Therefore, the bet c = c(x) corresponding to the card x satisfies the equation x = 1 − 1 − k (2 + c)2 . www.it-ebooks.info 128 MATHEMATICAL GAME THEORY AND APPLICATIONS x10 0 1 y 7 4 7 1 7 1 Figure 5.9 Optimal strategies in poker. And so, c(x) = √ 12 7(1 − x) − 2. (2.19) The expression (2.19) is non-negative if x ≥ 4∕7; hence, player I bets nothing under 1∕7 <≤ x < 4∕7. In the case of x ≥ 4∕7, his bet obeys formula (2.19). Let us explore the asymptotic behavior of player II. Under y < 1∕7, he should pass. If y ≥ 1∕7, the expression (2.15) states that player II should call the bet c of the opponent provided that y ≥ 1 − 1 − k 2 + c = 1 − 12 7(2 + c) . The optimal behavior of both players can be observed in Figure 5.9. It remains to compute the limit value of the game. Take advantage of the expression (2.18). Passing to the limit yields kb1 + ∞ ∫ 0 2(1 − k) (2 + c)3 (1 + c) [ 2(1 − 1 − k (2 + c)2 − ( 1 + c(1 + k + c) (1 + c)(2 + c) )] dc = − 5 49 + 24 7 ∞ ∫ 0 1 + c (2 + c)3 ( 1 − 24 7(2 + c)2 − c ( 2 7 + c ) (1 + c)(2 + c) ) dc =−5 49 + 18 49 . (2.20) We emphasize an important feature. The limit value (2.20) must be added to the payoff resulting from player I cards from the interval [1∕7, 4∕7], i.e., 4 7 ∫ 1 7 ⎛ ⎜ ⎜ ⎜⎝ 1 7 + 1 ∫ 1 7 sgn(x − y)dy ⎞ ⎟ ⎟ ⎟⎠ dx = 3 49 4 7 ∫ 1 7 ( 2x − 8 7 ) dx =−6 49 . (2.21) www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 129 Summing up (2.20) and (2.21) yields H(𝛼∗, 𝛽∗) =−5 49 + 18 49 − 6 49 = 1 7 . Therefore, the limit value of the game equals 1/7. It turns out beneficial for player I. Optimal strategies are completely defined. Interestingly, the payoff and strategies are expressed via 7—this number has repeatedly emerged in analytical formulas. 5.3 Preference. A game-theoretic model Preference represents another popular card game. It engages two, three, four, or five players. A preference pack consists of 32 cards of four suits. Suits have the following ranking (in the ascending order): spades, clubs, diamonds, and hearts. Cards within a suit differ by their denomination. There exist eight denominations: 7, 8, 9, 10, jack, queen, king, and ace. In the beginning of a play, each player is dealt a certain number of cards. And the remaining cards form a talon. Next, players bid for the privilege of gaining the talon (declaring the contract and trump suit and playing as the soloist). Players gradually increase their bids; accordingly, they choose whisting or passing. As soon as bidding is finished, players start the game proper, revealing their cards one by one. Preference has many rules, and we do not pretend to their complete coverage. Instead, we describe the elementary model of the preference card game and endeavor to identify the characteristic features of this game. When should a player choose whisting or passing? When should a player take a talon? What is bluffing in preference? As a game-theoretic model of preference, consider a two-player game P played Peter and Paul. In the beginning of a play, players are dealt cards of denominations x and y. Another card of rank z forms a talon. Suppose that card denominations represent random variables within the interval [0, 1] and any values appear equiprobable. In this case, we say that the random variables x, y, z possess the uniform distribution on the interval [0, 1]. Peter moves first. He chooses between whisting (i.e., improving his cards) or passing. In the former case, he may get the talon z, select the higher-rank card and discard the one having a smaller denomination. Subsequently, players open up their cards. If Peter’s card is higher than Paul’s one, he receives some payment A from Paul. Similarly, if Paul’s card ranks over Peter’s one, Peter pays a sum B to Paul, where B > A. If the cards have identical denominations, the play is drawn. Imagine that Peter passes at the first shot. Then the move belongs to the opponent, and Paul chooses between the same alternatives. If Paul chooses whisting, he may get the talon z and discard the smaller-rank card. Next, both players open up the cards. Paul receives some payment A from Peter, if his card is higher than that of the opponent. Otherwise, he pays a sum B to Peter. The play is drawn when the cards have the same ranks. A distinctive feature of preference is the so-called all-pass game. If both players pass, the talon remains untouched and they immediately open up the cards. But the situation reverses totally. The winner is the player having the lower-rank card. If x < y, Peter receives a payment C from Paul; if x > y, Paul pays an amount C to Peter. Otherwise, the play is drawn. All possible situations can be described by the following table. For compactness, we use the notation max(x, z) = x ∨ z. www.it-ebooks.info 130 MATHEMATICAL GAME THEORY AND APPLICATIONS Table 5.1 The payoffs of players. Peter Paul (shot 1) (shot 2) Peter’s payoff A,ifx ∨ z > y Talon 0, if x ∨ z = y −B,ifx ∨ z < y −A,ify ∨ z > x Pass Talon 0, if y ∨ z = x B,ify ∨ z < x C,ifx < y Pass Pass 0, if x = y −C,ifx > y 5.3.1 Strategies and payoff function Let us define strategies in this game. Each player is aware of his card only; hence, his decision bases on such knowledge exclusively. Therefore, we comprehend Peter’s strategy as a function 𝛼(x)—the probability of whisting provided that his card possesses the rank x.So long as 𝛼 represents a probability, it takes values between 0 and 1. Accordingly, the quantity ̄𝛼 = 1 − 𝛼 specifies the probability of passing. If Peter passes, Paul’s strategy consists in a function 𝛽(y)—the probability of whisting provided that his card has the denomination y. Obviously, 0 ≤ 𝛽 ≤ 1. Recall that the payoff in this game forms a random variable. As a criterion, we involve the mean payoff (i.e., the expected value of the payoff). Suppose that players have chosen their strategies 𝛼, 𝛽. By virtue of the definition of this game (see Table 5.1), the payoff of player I has the following formula depending on a given combination of cards x, y, and z: 1. with the probability of 𝛼(x), the payoff equals A on the set x ∨ z > y and −B on the set x ∨ z < y; 2. with the probability of ̄𝛼(x)𝛽(y), the payoff equals −A on the set y ∨ z > x and B on the set y ∨ z < x; 3. with the probability of ̄𝛼(x) ̄𝛽(y), the payoff equals C on the set x < y and −C on the set x > y. Since x, y, and z take any values from the interval [0, 1], the expected payoff of player I represents the triple integral H(𝛼, 𝛽) = 1 ∫ 0 1 ∫ 0 1 ∫ 0 { 𝛼(x) [ AI{x∨z>y} − BI{x∨zx} + BI{y∨zy} ]} dxdydz. www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 131 For convenience, this expression incorporates the function IA(x, y, z) (the so-called indicator of set A),whichis1,if(x, y, z) belongs to A and 0, otherwise. We illustrate calculation of the first integral: 1 ∫ 0 1 ∫ 0 1 ∫ 0 𝛼(x)I{x∨z>y}dxdydz = 1 ∫ 0 𝛼(x)dx [ 1 ∫ 0 1 ∫ 0 I{x∨z>y}dydz ] . (3.2) The double integral in brackets can be computed as the iterated integral J(x) = 1 ∫ 0 dz 1 ∫ 0 I{x∨z>y}dy by dividing into two integrals. Notably, J(x) = x ∫ 0 dz 1 ∫ 0 I{x∨z>y}dy + 1 ∫ x dz 1 ∫ 0 I{x∨z>y}dy. In the first integral, we have x ≥ z; hence, I{x∨z>y} = I{x>y}. On the contrary, the second integral is remarkable for that I{x∨z>y} = I{z>y}. And it follows that J(x) = x ∫ 0 dz 1 ∫ 0 I{x>y}dy + 1 ∫ x dz 1 ∫ 0 I{z>y}dy = x 1 ∫ 0 I{x>y}dy + 1 ∫ x dz 1 ∫ 0 I{z>y}dy = x x ∫ 0 dy + 1 ∫ x dz z ∫ 0 dy = x2 + 1 ∫ x zdz = x2 + 1 2 . Therefore, the triple integral in (3.2) is transformed into the integral 1 ∫ 0 𝛼(x) x2+1 2 dx.Pro- ceeding by analogy, readers can easily calculate the rest integrals in (3.1). After certain manipulations, we rewrite (3.1) as H(𝛼, 𝛽) = 1 ∫ 0 𝛼(x) [ x2(A + B)∕2 + (A − B)∕2 − C(1 − 2x) ] dx + 1 ∫ 0 𝛽(y) [ −y2(A + B)∕2 − (A − B)∕2 + C(1 − 2y) ] dy + 1 ∫ 0 𝛼(x)dx [ (A − x(A + B) − C) x ∫ 0 𝛽(y)dy + (A + C) 1 ∫ x 𝛽(y)dy ] . (3.3) The payoff H(𝛼, 𝛽) in formula (3.3) has the following representation. The first and second rows contain expressions with 𝛼 and 𝛽 separately, whereas the third row includes their product. www.it-ebooks.info 132 MATHEMATICAL GAME THEORY AND APPLICATIONS Now, find the strategies 𝛼∗(x) and 𝛽∗(y) that meet the equations max𝛼 H(𝛼, 𝛽∗) =min𝛽 H(𝛼∗, 𝛽) = H(𝛼∗, 𝛽∗). (3.4) Then for any other strategies 𝛼 and 𝛽 we have the inequalities H(𝛼, 𝛽∗) ≤ H(𝛼∗, 𝛽∗) ≤ H(𝛼∗, 𝛽), i.e., the strategies 𝛼∗, 𝛽∗ form an equilibrium. 5.3.2 Equilibrium in the case of B−A B+C ≤ 3A−B 2(A+C) Assume that, at shot 1, Peter always adheres to whisting: 𝛼∗(x) = 1. In this case, nothing depends on Paul’s behavior. Formula (3.1) implies that Peter’s payoff H(𝛼∗, 𝛽) constitutes the quantity 1 ∫ 0 1 ∫ 0 1 ∫ 0 [ AI{x∨z>y} − BI{x∨z 0, 0, if G(x) < 0, an arbitrary value in the interval [0, 1], if G(x) = 0. (3.9) Clearly, the function G(x) consists of two parabolas, see (3.8). Below we demonstrate that, if a is an arbitrary value from the interval U = [0, 1] ∩ [ B−A B+C , 3A−B 2(A+C) ] , then the function G(x) possesses the curve in Figure 5.10 (for given values of the parameters A, B, C, a). Indeed, for any a ∈ U, formula (3.8) implies that G(0) = A − B 2 − C + (A + C)(1 − a) == 3A − B 2 − (A + C)a ≥ 0, since a ≤ 3A−B 2(A+C) , and G(1) = A − B + a(B + C) ≥ 0 due to a ≥ B−A B+C . The function G(x) has the curve presented in the figure. And it appears from (3.9) that the strategy 𝛼∗(x) maximizing V(𝛼, 𝛽∗) takes the form 𝛼∗ ≡ 1. Therefore, if Peter and Paul choose the strategies 𝛼∗ ≡ 1 and 𝛽∗(y) = I{y≥a}, respectively, where a ∈ U, the players arrive at the following outcome. Peter guarantees the payoff of 2A−B 3 , and Paul will not let him gain more. In other words, max𝛼 H(𝛼, 𝛽∗) =min𝛽 H(𝛼∗, 𝛽) = H(𝛼∗, 𝛽∗) = 2A − B 3 , which proves optimality of the strategies 𝛼∗, 𝛽∗. www.it-ebooks.info 134 MATHEMATICAL GAME THEORY AND APPLICATIONS And so, when B−A B+C ≤ 3A−B 2(A+C) , the optimal strategies in the game P are defined by 𝛼∗ ≡ 1 and 𝛽∗(y) = I{y≥a}, where a represents an arbitrary value from the interval U = [0, 1] ∩[ B−A B+C , 3A−B 2(A+C) ] . Moreover, the game has the value H∗ = 2A−B 3 . 5.3.3 Equilibrium in the case of 3A−B 2(A+C) < B−A B+C Now, suppose that 3A − B 2(A + C) < B − A B + C . If Paul adopts the strategy 𝛽(y) (3.6), formula (3.8) shows the following. For a belonging to the interval U = [ 3A−B 2(A+C) , B−A B+C ] , one obtains G(0) = 3A − B 2 − (A + C)a ≤ 0, G(1) = (A − B) + a(B + C) ≤ 0. The curve of y = G(x)—see (3.8)—intersects axis x according to Figure 5.11. Thus, when Paul prefers the strategy 𝛽(y) = I{y≥a} with some a ∈ U, the best response of Peter is the strategy 𝛼∗(x) = { 1, if b1 ≤ x ≤ b2, 0, if x < b1, x > b2 . (3.10) This function admits the compact form 𝛼∗(x) = I{b1≤x≤b2}, where b1, b2 solve the system of equations G(b1) = 0, G(b2) = 0. Departing from (3.8), we derive the system of equations b2 1 A + B 2 + A − B 2 − C(1 − 2b1) + (A + C)(1 − a) = 0, (3.11) − b2 2 A + B 2 + b2(A + B)a + A − B 2 + A − a(A − C) = 0. (3.12) 0 ab1 b2 1 x G(x) Figure 5.11 The function G(x). www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 135 To proceed, assume that Peter follows the strategy (3.10) 𝛼∗(x) = I{b1≤x≤b2} and find Paul’s best response 𝛽∗(y), which minimizes H(𝛼∗, 𝛽)in𝛽. By substituting 𝛼∗(x) into (3.1), one can rewrite the function H(𝛼∗, 𝛽)as H(𝛼∗, 𝛽) = 1 ∫ 0 𝛽(y)R(y)dy + b2 ∫ b1 [ x2(A + B) 2 + A − B 2 − c(1 − 2x) ] dx. (3.13) Here the second component is independent from 𝛽(y), and R(y) acquires the form R(y) = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪⎩ −y2(A + B)∕2 − 2Cy − (A − B)∕2 + C+ + (A − C)(b2 − b1) − (A + B)(b2 2 − b2 1)∕2, if y < b1 − (A − B)∕2 + C + b2(A − C) − b2 2(A + B)∕2− − b1(A + C), if b1 ≤ y ≤ b2 − y2(A + B)∕2 − 2Cy − (A − B)∕2 + C+ + (A + C)(b2 − b1), if b2 < y ≤ 1. (3.14) The representation (3.13) immediately implies that the optimal strategy 𝛽∗(y) is defined by the expressions 𝛽∗(y) = ⎧ ⎪ ⎨ ⎪⎩ 1, if R(y) < 0, 0, if R(y) > 0, an arbitrary value within the interval [0, 1], if R(y) = 0. (3.15) Interestingly, the function R(y) possesses constant values on the interval [b1, b2]. Set it equal to zero: − b2 2 A + B 2 + b2(A − C) − b1(A + C) − A − B 2 + C = 0. (3.16) Then it takes the form demonstrated in Figure 5.12. 0 b1 b2 1 y R(y) Figure 5.12 The function R(y). www.it-ebooks.info 136 MATHEMATICAL GAME THEORY AND APPLICATIONS Table 5.2 The optimal strategies of players. 𝛼∗(x) = I{b1≤x≤b2} 𝛽∗(y) = I{y≥a} ABC b1 b2 aH∗ 5 6 8 0 1 0.071 1.333 3 4 1 0 1 0.2 0.666 3 5 0 0 1 0.4 0.333 1 20 0 0.948 0.951 0.948 −0.024 1 2 1 0.055 0.962 0.307 −0.344 1 2 2 0.046 0.964 0.229 −0.417 1 2 3 0.039 0.968 0.184 −0.467 1 2 4 0.034 0.971 0.154 −0.504 1 4 3 0.288 0.824 0.359 −0.519 1 3 4 0.146 0.892 0.242 −0.592 1 2 20 0.010 0.989 0.044 −0.669 1 10 10 0.356 0.792 0.392 −1.366 3 8 2 0.282 0.845 0.413 −2.058 According to (3.15), Paul’s optimal strategy is, in particular, 𝛽∗(y) = I{y≥a}, where a means an arbitrary value from the interval [b1, b2]. The system of equations (3.11), (3.12), (3.16) yields the solution to the game. 5.3.4 Some features of optimal behavior in preference Let us analyze the obtained solution. In the case of B−A B+C ≤ 3A−B 2(A+C) , Peter should choose whisting, get the talon and open up the cards. His payoff makes up 2A−B 3 . Interestingly, if B < 2A, the game becomes beneficial for him. However, under 3A−B 2(A+C) < B−A B+C , the optimal strategy changes. Peter should adhere to whist- ing, when his card possesses an intermediate rank (between low and high ones). Otherwise, he should pass. Paul’s optimal strategy seems easier. He selects whisting, if his card has a high denomination (otherwise, Paul passes). This game incorporates the bluffing effect. As soon as Peter announces passing, Paul has to guess the rank of Peter’s card (low or high). Table 5.2 combines the optimal strategies of players and the value of this game under different values of A, B, and C. The value of this game can be computed by formula (3.13) through substituting 𝛽∗(y) = I{y≥a}, with the values a1, b1, b2 resulting from the system (3.11), (3.12), (3.16). 5.4 The preference model with cards play In the preceding sections, we have modeled different combinations of cards in poker and preference by a single random variable from the unit interval. For instance, many things in preference depend on specific cards entering a combination, as well as on a specific sequence players open up their cards. Notably, cards are revealed one by one, a higher-rank card beats a lower-rank one, and the move comes to a corresponding player. Here we introduce a natural www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 137 generalization of the previous model when a set of cards of each player is modeled by two random variables. Consider a two-player game engaging Peter and Paul. Both players contribute the buy-ins of 1. In the beginning of a play, each of them is dealt two cards whose denominations represent random variables within the interval [0, 1]. A player chooses between two alternatives, viz., passing or making a bet A > 1. If a player passes, the opponent gets the bank. When Peter and Paul pass simultaneously, the game is drawn. If both players make bets, they open up the cards and the winner is the player whose lowest-rank card exceeds the highest-rank card of the opponent. And the winner sweeps the board. 5.4.1 The preference model with simultaneous moves First, analyze the preference model with simultaneous moves of players. Suppose that their cards xi, yi(i = 1, 2) make independent random variables uniformly distributed on the interval [0, 1]. Without loss of generality, we believe that x1 ≤ x2, y1 ≤ y2. Reexpress the payoff of player I as the matrix ( betting passing betting A(I{x1>y2} − I{y1>x2})1 passing −10 ) , where I{A} designates the indicator of A. Define strategies in this game. Denote by 𝛼(x1, x2) the strategy of player I. Actually, this is the probability that player I makes a bet under given cards x1, x2. Then the quantity 𝛼 = 1 − 𝛼 characterizes the probability of his passing. Similarly, for player II, the function 𝛽(y1, y2) specifies the probability of making a bet under given cards y1, y2, whereas 𝛽 = 1 − 𝛽 equals the probability of his passing. The expected payoff of player I becomes H(𝛼, 𝛽) = 1 ∫ 0 1 ∫ 0 1 ∫ 0 1 ∫ 0 {𝛼𝛽 − 𝛼𝛽 + A𝛼𝛽[I{x1>y2} − I{y1>x2}]}dx1dx2dy1dy2 = 2 1 ∫ 0 dx1 1 ∫ x1 𝛼(x1, x2)dx2 − 2 1 ∫ 0 dy1 1 ∫ y1 𝛽(y1, y2)dy2 + 4A 1 ∫ 0 1 ∫ x1 1 ∫ 0 1 ∫ y1 𝛼(x1, x2)𝛽(y1, y2)[I{x1>y2} − I{y1>x2}]dx1dx2dy1dy2 = 2 1 ∫ 0 dx1 1 ∫ x1 𝛼(x1, x2)dx2 − 2 1 ∫ 0 dy1 1 ∫ y1 𝛽(y1, y2)dy2 + 4A 1 ∫ 0 1 ∫ y1 𝛽(y1, y2)[ 1 ∫ y2 dx1 1 ∫ x1 𝛼(x1, x2)dx2 − y1 ∫ 0 dx1 y1 ∫ x1 𝛼(x1, x2)dx2]dy1dy2 www.it-ebooks.info 138 MATHEMATICAL GAME THEORY AND APPLICATIONS x 1 x2 a a0 Figure 5.13 The strategy of player I. Theorem 5.1 In the game with the payoff function H(𝛼, 𝛽), the optimal strategies take the form 𝛼∗(x1, x2) = I{x2≥a}, 𝛽∗(y1, y2) = I{y2≥a}, where a = 1 − 1√ A . The game has zero values. Proof: Assume that player I applies the strategy 𝛼∗(x1, x2) = I{x2≥a} (see Figure 5.13), where a = 1 − 1√ A . Find the best response of player II. Rewrite the payoff function as H(𝛼∗, 𝛽) = 2 1 ∫ 0 1 ∫ x1 𝛼∗(x1, x2)dx1dx2 + 2 1 ∫ 0 1 ∫ y1 𝛽(y1, y2) ⋅ R(y1, y2)dy1dy2, (4.1) with the function R(y1, y2) = ⎧ ⎪ ⎨ ⎪⎩ −2Ay2(1 − a) − Aa2 + A − 1, if y1 ≤ y2 < a Ay2 2 − 2Ay2 + A − 1, if y1 < a ≤ y2 A(y2 2 − y2 1) − 2Ay2 + Aa2 + A − 1, if a ≤ y1 ≤ y2. (4.2) The first summand in (4.1) appears independent from 𝛽(y1, y2). Hence, the optimal strategy of player II, which minimizes the payoff H(𝛼∗, 𝛽), is given by 𝛽∗(y1, y2) = ⎧ ⎪ ⎨ ⎪⎩ 1, if R(y1, y2) < 0 0, if R(y1, y2) > 0 an arbitrary value from [0, 1], if R(y1, y2) = 0. www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 139 Formula (4.2) implies that the function R(y1, y2) depends on y2 only in the domains y1 ≤ y2 < a and y1 < a ≤ y2. Furthermore, the function R(y1, y2) decreases on this set, vanishing at the point y2 = a due to the choice of a. And so, the first (second) row in (4.2) is always positive (non-positive, respectively). The expression in the third row possesses non-positive values in the domain a ≤ y1 ≤ y2. Really, the function R(y1, y2) decreases with respect to both variables y2 and y1, ergo reaches its maximum under y2 = a, y1 = a (the maximal value equals R(a, a) = Aa2 − 2Aa + A − 1 = 0 under a = 1 − 1√ A ). Therefore, R(y1, y2) > 0fory2 < a, and R(y1, y2) ≤ 0fory2 ≥ a. Consequently, the best strategy of player II lies in 𝛽∗(y1, y2) = I{y2≥a}, where a = 1 − 1√ A . By virtue of the problem symmetry, the best response of player I to the strategy 𝛽∗(y1, y2) = I{y2≥a} is the strategy 𝛼∗(x1, x2) = I{x2≥a}. Hence it appears that max𝛼 V(𝛼, 𝛽∗) =min𝛽 V(𝛼∗, 𝛽), which immediately brings to the following. The strategies 𝛼∗, 𝛽∗ form an equilibrium in the game. 5.4.2 The preference model with sequential moves Now, imagine that the players announce their decisions sequentially. II y1, y2 whisting −1 passing 1 whisting A[I{x1>y2} − I{y1>x2}] I x1, x2 passing whisting Player cards move 1 move 2 payment - - - In the beginning of a play, both players contribute the buy-ins of 1. The first move belongs to player I. He selects between two alternatives, viz., passing or making a bet A > 1. In the latter case, the move goes to player II who may choose passing (and the game is over) or calling the bet. If he calls the opponent’s bet, both players open up the cards. The winner is the player whose lowest-rank card exceeds the highest-rank card of the opponent. And the winner gains some payoff A > 1. Otherwise, the game is drawn. www.it-ebooks.info 140 MATHEMATICAL GAME THEORY AND APPLICATIONS In such setting of the game, the payoff function of player I acquires the form H(𝛼, 𝛽) = 1 ∫ 0 1 ∫ 0 1 ∫ 0 1 ∫ 0 {−𝛼 + 𝛼𝛽 + A𝛼𝛽[I{x1>y2} − I{y1>x2}]}dx1dx2dy1dy2 = 2 1 ∫ 0 1 ∫ 0 𝛼(x1, x2)dx1dx2 − 1 − 1 ∫ 0 1 ∫ 0 1 ∫ 0 1 ∫ 0 𝛼(x1, x2)𝛽(y1, y2)dx1dx2dy1dy2 + 4A 1 ∫ 0 1 ∫ x1 1 ∫ 0 1 ∫ y1 𝛼(x1, x2)𝛽(y1, y2)[I{x1>y2} − I{y1>x2}]dx1dx2dy1dy2 = 4 1 ∫ 0 dx1 1 ∫ x1 𝛼(x1, x2)dx2 − 1 + 2 1 ∫ 0 1 ∫ x1 𝛼(x1, x2)[2A x1 ∫ 0 dy1 x1 ∫ y1 𝛽(y1, y2)dy2 − 2A 1 ∫ x2 dy1 1 ∫ y1 𝛽(y1, y2)dy2 − 2 1 ∫ 0 dy1 1 ∫ y1 𝛽(y1, y2)dy2]dx1dx2. Certain simplifications bring us to the expression H(𝛼, 𝛽) = 4 1 ∫ 0 dx1 1 ∫ x1 𝛼(x1, x2)dx2 − 1 + 2 1 ∫ 0 1 ∫ y1 𝛽(y1, y2)[2A 1 ∫ y2 dx1 1 ∫ x1 𝛼(x1, x2)dx2 − 2A y1 ∫ 0 dx1 y1 ∫ x1 𝛼(x1, x2)dx2 − 2 1 ∫ 0 dx1 1 ∫ x1 𝛼(x1, x2)dx2]dy1dy2. Suppose that player I uses the strategy 𝛼∗(x1, x2) = I{x2≥a} with some threshold a such that a ≤ A − 1 A + 1 . (4.3) Find the best response of player II. Rewrite the payoff function H(𝛼, 𝛽)as H(𝛼∗, 𝛽) = 4 1 ∫ 0 1 ∫ x1 𝛼∗(x1, x2)dx1dx2 − 1 + 2 1 ∫ 0 1 ∫ y1 𝛽(y1, y2) ⋅ R(y1, y2)dy1dy2, www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 141 where the function R(y1, y2) takes the form R(y1, y2) = ⎧ ⎪ ⎨ ⎪⎩ −2Ay2(1 − a) + (A − 1)(1 − a2), if y1 ≤ y2 ≤ a A(1 − y2)2 − (1 − a2), if y1 ≤ a < y2 A[(1 − y2)2 − (y2 1 − a2)] − (1 − a2), if a < y1 ≤ y2. Let us repeat the same line of reasoning as above. Consequently, we obtain that the optimal strategy 𝛽∗ depends only on the sign of R(y1, y2). Interestingly, the function R(y1, y2) in the domain y1 ≤ y2 ≤ a depends merely on y2 ∈ [0, a]. Thus, for a fixed quantity y1, the function R(y1, y2) decreases on this interval and R(y1,0)> 0. Owing to the choice of a (see formula (4.3)), we have R(y1, a) = A(1 − a)2 − (1 − a2) ≥ 0, i.e., the function R(y1, y2) is non-negative in the domain y1 ≤ y2 ≤ a. This means that, in the domain under consideration, the best response of player II consists in 𝛽(y1, y2) = 0. In the domain y1 ≤ a < y2, the function R(y1, y2) = A(1 − y2)2 − (1 − a2) depends on y2 only; moreover, it represents a continuous decreasing function such that R(y1, y2 = a) = A(1 − a)2 − (1 − a2) ≥ 0 and R(y1, y2 = 1) =−(1 − a2) < 0. Hence, there exists a point b ∈ [a,1) meeting the condition R(y1, b) = 0. This point b is a root of the equation A(1 − b)2 = 1 − a2, i.e., b = 1 − √ 1 − a2 A . (4.4) And so, the best response 𝛽∗(y1, y2) of player II in the domain y1 ≤ a < y2 takes the form 𝛽∗(y1, y2) = I{y2≥b}, where b is defined by (4.4). Let us partition the domain a < y1 ≤ y2 into two subsets, {a < y1 ≤ c, y1 ≤ y2} and {c < y1 ≤ y2}, where c = a2 A+1 2A + A−1 2A . Evidently, a ≤ c ≤ b, where a ∈ [ 0, A−1 A+1 ] . Consider the equation R(y1, y2) = 0 in the domain {a < y1 ≤ c, y1 ≤ y2}. It can be rewrit- ten as y1 = f(y2) = √ y2 2 − 2y2 + a2 A + 1 A + A − 1 A . (4.5) We see that the function f(y2) is continuous and decreases on the interval y2 ∈ [c, b]; in addition, f(c) = c and f(b) = a. Therefore, the optimal strategy of player II in the domain {a < y1 ≤ c, y1 ≤ y2} has the following form: 𝛽∗(y1, y2) = 1fory1 ≥ f(y2), and 𝛽∗(y1, y2) = 0fory1 < f(y2). The set {c < y1 ≤ y2} corresponds to R(y1, y2) < 0; thus, the best response lies in 𝛽∗(y1, y2) = 1. The above argumentation draws an important conclusion. The optimal strategy of player II is given by 𝛽∗(y1, y2) = I{(y1,y2)∈}, see Figure 5.14 which demonstrates the set . Recall that the boundary of the domain on the set [a, c] × [c, b] obeys equation (4.5). www.it-ebooks.info 142 MATHEMATICAL GAME THEORY AND APPLICATIONS x 1 x 2 a a0 y1 y2 b c ac0 Figure 5.14 The optimal strategies of players I and II. The parameters a, b, and c specifying this domain satisfy the conditions ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ 0 ≤ a ≤ c ≤ b ≤ A−1 A+1 , c = a2 A+1 2A + A−1 2A , b = 1 − √ 1−a2 A . (4.6) Now, suppose that player II applies the strategy 𝛽∗(y1, y2) = I{(y1,y2)∈}, where the domain has the above shape with the parameters a, b, and c. According to the definition, the payoff of player I becomes H(𝛼, 𝛽∗) = 2 1 ∫ 0 1 ∫ x1 𝛼(x1, x2)G(x1, x2)dx1dx2 − 1, where the function G(x1, x2) = 2A x1 ∫ 0 dy1 x1 ∫ y1 𝛽(y1, y2)dy2 − 2A 1 ∫ x2 dy1 1 ∫ y1 𝛽(y1, y2)dy2 −2 1 ∫ 0 dy1 1 ∫ y1 𝛽(y1, y2)dy2 + 2. (4.7) Formula (4.7) implies the following. The function G(x1, x2) is non-increasing in both arguments and, in the domain x1 ≤ x2 ≤ a, depends on x2 only: G(x1, x2) = 2Ax2(1 − b) − 2AS + 2(1 − S), (4.8) where S = 1 ∫ 0 1 ∫ y1 𝛽∗(y1, y2)dy1dy2 gives the area of the domain . www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 143 This quantity can be reexpressed by S = 1 − b2 2 + b ∫ c ( y2 − √ y2 2 − 2y2 + a2 A + 1 A + A − 1 A ) dy2. (4.9) Due to the conditions imposed on a, b, and c,wehave 2c = a2 A + 1 A + A − 1 A and b2 − 2b + 2c = a2. By virtue of these relations, S takes the form S = 1 − c 2 + a1 − b 2 + 2c − 1 2 ln |||| 2c − 1 a + b − 1 |||| . Choose a as a = S 1 − b − 1 − S A(1 − b) . (4.10) In this case, the function G(x1, x2) (4.8) possesses negative values in the domain x1 ≤ x2 ≤ a, and vanishes on its boundary x2 = a: G(x1, a) = 0. We have mentioned that the function G(x1, x2) is non-decreasing in both arguments, ergo non-negative in the residual domain x2 > a. This fact immediately brings to the following. The best response 𝛼∗ of player II, which maximizes the payoff H(𝛼, 𝛽∗), has the form 𝛼∗ = I{x2≥a}. Finally, it is necessary to establish the existence of a solution to the system of equations (4.6) and (4.10). Earlier, we have shown that any a ∈ [ 0, A−1 A+1 ] corresponds to unique b, c and S meeting (4.6) and (4.9). Now, argue that there exists a∗ such that the condition (4.10) holds true. Introduce the function Δ(a) =Δ(a, b(a), c(a)) = S 1−b − 1−S A(1−b) − a. The equation Δ(a) = 0 admits the solution a∗, since Δ(a) is continuous and takes values of different signs in the limits of the interval [ 0, A−1 A+1 ] . Indeed, Δ(a = 0) = (A−1)2+(A+1)lnA 4A √ A ≥ 0 under A ≥ 1, since this function increases in A and vanishes if A = 1. Furthermore, Δ ( a = A−1 A+1 ) =−(A−1)2 2A(A+1) < 0. The resulting solution is formulated as Theorem 5.2 The optimal solution of the game with the payoff function H(𝛼, 𝛽) is defined by 𝛼∗(x1, x2) = I{x2≥a∗}, 𝛽∗(y1, y2) = I{(y1,y2)∈}, www.it-ebooks.info 144 MATHEMATICAL GAME THEORY AND APPLICATIONS where a, b, c, and S follow from the system of equations ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪⎩ S 1−b − 1−S A(1−b) − a = 0, S = 1−c 2 + a 1−b 2 + 2c−1 2 ln ||| 2c−1 a+b−1 ||| , c = a2 A+1 2A + A−1 2A , b = 1 − √ 1−a2 A , and the set is described above. Table 5.3 provides the parameters of optimal strategies in different cases. Obviously, the game has negative value. The unfairness of this game for player I can be explained as follows. He moves first and gives some essential information to the opponent; player II uses such information in his optimal game. H(𝛼∗, 𝛽∗) =−1 + 4 1 ∫ 0 1 ∫ x1 𝛼∗(x1, x2)dx1dx2 + 2 1 ∫ 0 1 ∫ y1 𝛽∗(y1, y2)R(y1, y2)dx1dx2dy1dy2 = 1 − 2a2 + 4 3Aa3(b − c) + 2 { a 1 ∫ c [A(1 − y)2 − (1 − a2)]dy+ + A 3 b ∫ c (y2 − 2y + 2c)3∕2dy + 1 ∫ c dy2 y2 ∫ a [A(1 − y2)2 − A(y2 1 − a2) − (1 − a2)]dy1 − b ∫ c √ y2 − 2y + 2c[A(1 − y)2 + Aa2 − (1 − a2)]dy } . Table 5.3 The parameters of optimal strategies. AabcSH(𝛼∗, 𝛽∗) 1.00 0.0000 0.0000 0.0000 0.5000 0.0000 2.00 0.2569 0.3166 0.2995 0.4503 −0.0070 3.00 0.3604 0.4614 0.4199 0.3956 −0.0302 4.00 0.4248 0.5473 0.4878 0.3538 −0.0590 5.00 0.4711 0.6055 0.5331 0.3215 −0.0883 6.00 0.5069 0.6481 0.5665 0.2957 −0.1163 7.00 0.5359 0.6809 0.5927 0.2746 −0.1424 8.00 0.5600 0.7071 0.6139 0.2569 −0.1667 9.00 0.5806 0.7286 0.6317 0.2418 −0.1892 10.00 0.5984 0.7466 0.6469 0.2287 −0.2100 100.00 0.8600 0.9489 0.8685 0.0533 −0.6572 1000.00 0.9552 0.9906 0.9561 0.0099 −0.8833 www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 145 Remark 5.1 In the symmetrical model with two cards, the optimal strategies of both players are such that calling a bet appears reasonable only while having a sufficiently high card. The second card may possess a very small denomination. In the model with sequential moves, the shape of the domain demonstrates an important aspect. The optimal strategy of player II prescribes calling a bet if one of his cards has a sufficiently high rank or both cards have an intermediate denomination (in some cases only). 5.5 Twenty-one. A game-theoretic model Twenty-one is a card game of two players. A pack of 36 cards is sequentially dealt to players, one card by one. In the Russian version of the game, each card possesses a certain denomination (jack—2, queen—3, king—4, ace—11; all other cards are counted as the numeric value shown on the card). Players choose a certain number of cards and calculate their total denomination. Then all cards are opened up, and the winner is the player having the maximal sum of values (yet, not exceeding the threshold of 21). If the total denomination of cards held by a player exceeds 21 and the opponent has a smaller combination, the latter wins anyway. In the rest cases, the game is drawn. Twenty-one (also known as blackjack) is the most widespread casino banking game in the world. A common strategy adopted by bankers lies in choosing cards until their sum exceeds 17. 5.5.1 Strategies and payoff functions Let us suggest the following structure as a game-theoretic model of twenty-one. Suppose that each player actually observes the sum of independent identically distributed random variables s(k) n = n∑ i=1 x(k) i , k = 1, 2. For simplicity, assume that the cards {x(k) i }, i = 1, 2, … have the uniform distribution on the interval [0, 1]. Set the threshold (the maximal admissible sum of cards) equal to 1. Imagine that a certain player employs a threshold strategy u,0< u < 1, i.e., stops choosing cards when the total denomination sn of his cards exceeds u. Denote by 𝜏 the stopping time. Therefore, 𝜏 =min{n ≥ 1:sn ≥ u}. To define the payoff function in this game, we should find the distribution of the stopping sum s𝜏. Reexpress u ≤ x ≤ 1as P{s𝜏 ≤ x} = ∞∑ n=1 P{sn ≤ x, 𝜏 = n} = ∞∑ n=1 P{s1 ≤ u, … , sn−1 ≤ u, sn ∈ [u, x]}. www.it-ebooks.info 146 MATHEMATICAL GAME THEORY AND APPLICATIONS Then P{s𝜏 ≤ x} = ∞∑ n=1 un−1 (n − 1)!(x − u) =exp(u)(x − u). (5.1) Hence, the stopping probability takes the form P{s𝜏 > 1} = 1 − P{s𝜏 ≤ 1} = 1 −exp(u)(1 − u). (5.2) Now, it is possible to construct an equilibrium in twenty-one. Assume that player II uses the threshold strategy u. Find the best response of player I. Let s(1) n = x be the current value of player I sum. If x ≤ u, the expected payoff of player I at stoppage becomes h(x|u) =+P{s(2)𝜏2 > 1} − P{s(2)𝜏2 ≤ 1} = 2P{s(2)𝜏2 > 1} − 1. In the case of x > u, the expected payoff is given by h(x|u) =+P{s(2)𝜏2 < x} + P{s(2)𝜏2 > 1} − P{x < s(2)𝜏2 ≤ 1} = 2 [ P{s(2)𝜏2 < x} + P{s(2)𝜏2 > 1} ] − 1. Taking into account (5.1) and (5.2), we obtain the following. Under stoppage in the state x, the payoff becomes h(x|u) = 2 [ exp(u)(x − u) + 1 −exp(u)(1 − u)} ] − 1 = 1 − 2 exp(u)(1 − x). If player I continues and stops at next shot (receiving some card y), his payoff constitutes 1 − 2 exp(u)(1 − x − y)forx + y ≤ 1 and −P{s(2)𝜏2 ≤ 1} =−exp(u)(1 − u)forx + y > 1. Hence, the expected payoff in the case of continuation is Ph(x|u) = ∫ 1−x 0 (1 − 2 exp(u)(1 − x − y)) dy − ∫ 1 1−x exp(u)(1 − u)dy. Certain simplifications yield Ph(x|u) = 1 − x −exp(u) ( 1 − x(1 + u) + x2 ) . Obviously, the function h(x|u) increases monotonically in x under x ≥ u, whereas Ph(x|u) decreases monotonically. This fact follows from negativity of the derivative dPh(x|u) dx =−1 +exp(u)(1 + u − 2x), since for u > 0 and x ≥ u we have exp(−u) > 1 − u ≥ 1 + u − 2x. www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 147 Therefore, being aware of player II strategy u, his opponent can evaluate the best response by comparing the payoffs in the case of stoppage and continuation. The optimal threshold xu satisfies the equation h(x|u) = Ph(x|u), or 1 − 2 exp(u)(1 − x) = 1 − x −exp(u) ( 1 − x(1 + u) + x2 ) . Rewrite the last equation as x =exp(u) ( 1 − x(1 − u) − x2 ) . (5.3) Due to monotonicity of the functions h(x|u) and Ph(x|u), such threshold is unique (if exists). By virtue of game symmetry, an equilibrium must comprise identical strategies. And so, we set xu = u. According to (5.3), such strategy obeys the equation exp(u) = u 1 − u . (5.4) The solution to (5.4) exists and is u∗ ≈ 0.659. Thus, both players have the following optimal behavior. They choose cards until the total denomination exceeds the threshold u∗ ≈ 0.659. Subsequently, they stop and open up the cards. The probability of exceeding this threshold becomes P{s𝜏 > 1} = 1 −exp(u)(1 − u) = 1 − u ≈ 0.341. Remark 5.2 To find the optimal strategy of a player, we have compared the payoffs in the case of stoppage and continuation by one more shot. However, rigorous analysis requires comparing the payoffs in the case of continuation by an arbitrary number of steps. Chapter 9 treats the general setting of optimal stoppage games to demonstrate the following fact. In the monotonous case (the payoff under stoppage does not decrease and the payoff under continuation by one more shot does not increase), it suffices to consider these payoff functions. 5.6 Soccer. A game-theoretic model of resource allocation Imagine coaches of teams I and II, allocating their players on a football ground. In a real match, each team has 11 footballers in the starting line-up. Suppose that the goal of player I (player II) is located on the right (left, respectively) half of the ground. Let us partition the ground into n sectors. For convenience, assume that n is an odd number. A match starts in the center belonging to sector m = (n + 1)∕2 and continues until the ball crosses either the left or right boundary of the ground—the goal line (player I or player II wins, respectively). Represent movements of the ball as random walks in the state set {0, 1, … , n, n + 1}, where 0 and n + 1 indicate absorbing states (see Figure 5.15). Assume that each state i ∈ {1, … , n} is associated with transition rates (probabilities) pi and qi = 1 − pi to state i − 1 www.it-ebooks.info 148 MATHEMATICAL GAME THEORY AND APPLICATIONS III 0 1 2 i-1 i i+1 n+1n qi pi Figure 5.15 Random walks on a football ground. (to the right) and to state i + 1 (to the left) in a given sector, respectively. These probabilities depend on the ratio of players in the sector. A natural conjecture lies in the following. The higher is the number of team I players in sector i, the greater is the probability of ball transition to the left. Denote by xi (yi) the number of team I players (team II players, respectively) in sector i. In this case, we believe that the probability pi is some non-increasing differentiable function g(xi∕yi) meeting the condition g(1) = 1∕2. In other words, random walks appear symmetrical if the teams accommodate identical resources in this sector. For instance, g can be defined by g(xi∕yi) = yi∕(xi + yi)). In contrast to the classical random walks, transition probabilities depend on a state and strategies of players. Let 𝜋i designate the probability of player I win provided that the ball is in sector i. With the probability qi, the ball moves either to the left, where player I wins with the probability 𝜋i−1, or to the right, where player I wins with the probability 𝜋i+1. Write down the system of Kolmogorov equations in the probabilities {𝜋i}, i = 0, … , n + 1: 𝜋1 = q1 + p1𝜋2, 𝜋i = qi𝜋i−1 + pi𝜋i+1, i = 1, … , n, (6.1) 𝜋n = qn𝜋n−1. The first and last equations take into account that 𝜋0 = 1, 𝜋n+1 = 0. Set si = qi∕pi, i = 1, … , n and redefine the system (6.1) as 𝜋1 − 𝜋2 = s1(1 − x1), 𝜋i − 𝜋i+1 = si(𝜋i−1 − 𝜋i), i = 1, … , n, (6.2) 𝜋n = sn(𝜋n−1 − 𝜋n). Now, denote ci = s1 … si, i = 1, … , n and, using (6.2), find 𝜋n = cn(1 − 𝜋1) and 𝜋i − 𝜋i+1 = ci(1 − 𝜋i)fori = 1, … , n − 1. Summing up these formulas gives 𝜋1 = (c1 + ⋯ + cn)(1 − 𝜋1), whence it follows that 𝜋1 = (c1 + ⋯ + cn)∕(1 + c1 + ⋯ + cn) and 𝜋i = ci + ci+1 + ⋯ + cn 1 + c1 + ⋯ + cn , i = 2, … , n. (6.3) www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 149 Therefore, for a known distribution of footballers by sectors, it is possible to compute the quantities ci = i∏ j=1 1 − g(xj∕yj) g(xj∕yj) , i = 1, … n, (6.4) and the corresponding probabilities 𝜋i of teams in each situation i. For convenience, suppose that players utilize unit amounts of infinitely divisible resources. The strategy of player I consists in a resource allocation vector x = (x1, … , xn), xi ≥ 0, i = 1, … , n by different sectors, where n∑ i=1 xi = 1. Similarly, as his strategy, player II chooses a vector y = (y1, … , yn), yj ≥ 0, j = 1, … , n, where n∑ j=1 yj = 1. Player I strives for maximizing 𝜋m, whereas player II seeks to minimize this probability. To evaluate an equilibrium in this antagonistic game, construct the Lagrange function L(x, y) = 𝜋m + 𝜆1(x1 + ⋯ + xn − 1) + 𝜆2(y1 + ⋯ + yn − 1), and find the optimal strategies (x∗, y∗) from the condition 𝜕L∕𝜕x = 0, 𝜕L∕𝜕y = 0. Notably, 𝜕L 𝜕xk = n∑ j=1 𝜕𝜋m 𝜕cj 𝜕cj 𝜕xk + 𝜆1. According to (6.4), 𝜕cj 𝜕xk = 0, for k > j. Hence, 𝜕L 𝜕xk = n∑ j=k 𝜕𝜋m 𝜕cj 𝜕cj 𝜕xk + 𝜆1. (6.5) If j ≥ k,wehave 𝜕cj 𝜕xk =− g′ ( xk yk ) ykg ( xk yk )( 1 − g( xk yk ))cj =−𝛼kcj, j = k, … , n. (6.6) It appears from (6.5) and (6.6) that, under k ≥ m, 𝜕L 𝜕xk = n∑ j=k 1 + c1 + ⋯ + cm−1 (1 + c1 + ⋯ + cn)2 (−𝛼kcj) + 𝜆1 =− 𝛼k (1 + c1 + ⋯ + cn)2 (1 + c1 + ⋯ + cm−1)(ck + ⋯ + cn) + 𝜆1. (6.7) www.it-ebooks.info 150 MATHEMATICAL GAME THEORY AND APPLICATIONS In the case of k < m, 𝜕L 𝜕xk = m−1∑ j=k −(cm + ⋯ + cn) (1 + c1 + ⋯ + cn)2 (−𝛼kcj) + n∑ j=m 1 + c1 + ⋯ + cm−1) (1 + c1 + ⋯ + cn)2 (−𝛼kcj) + 𝜆1 =− 𝛼k (1 + c1 + ⋯ + cn)2 (1 + c1 + ⋯ + ck−1)(cm + ⋯ + cn) + 𝜆1. (6.8) The expressions (6.7) and (6.8) can be united: 𝜕L 𝜕xk =− 𝛼k (1 + c1 + ⋯ + cn)2 (1 + c1 + ⋯ + cm∧k−1)(cm∨k + ⋯ + cn) + 𝜆1, (6.9) where i ∧ j =min{i, j} and i ∨ j =max{i, j}. Similar formulas take place for 𝜕L∕𝜕yk, k = 1, … , n. First, we find 𝜕cj 𝜕yk = ⎧ ⎪ ⎨ ⎪⎩ 0ifj < k xkg′( xk yk ) y2 kg( xk yk )(1−g( xk yk )) cj = xk yk 𝛼kcj,ifj ≥ k, (6.10) and then 𝜕L 𝜕yk = xk yk 𝛼k (1 + c1 + ⋯ + cn)2 (1 + c1 + ⋯ + cm∨k−1)(cm∨k + ⋯ + cn) + 𝜆2. (6.11) Now, evaluate the stationary point of the function L(x, y). The condition 𝜕L∕𝜕x1 = 0 and the expression (6.9) with k = 1 lead to the equation 𝜆1 = 𝛼1 cm + ⋯ + cn (1 + c1 + ⋯ + cn)2 . Accordingly, the condition 𝜕L∕𝜕y1 = 0 and (6.11) yield the equation 𝜆2 = x1 y1 𝜆1. In the case of k ≥ 2, the conditions 𝜕L∕𝜕xk = 0, 𝜕L∕𝜕yk = 0 imply the following. Under k > m, we have the equalities 𝛼k(1 + c1 + ⋯ + cm−1)(ck + ⋯ + cn) = 𝛼1(cm + ⋯ + cn), xk yk 𝛼k(1 + c1 + ⋯ + cm−1)(ck + ⋯ + cn) = x1 y1 𝛼1(cm + ⋯ + cn). (6.12) If k ≤ m, we obtain 𝛼k(1 + c1 + ⋯ + ck−1) = 𝛼1, xk yk 𝛼k(1 + c1 + ⋯ + cm−1) = x1 y1 𝛼1. (6.13) www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 151 It appears from (6.12)–(6.13) that xk x1 = yk y1 , for all k = 1, … , n. Since n∑ k=1 xk = n∑ k=1 yk = 1, summing up these expressions brings to x1 = y1 and, hence, xk = yk, k = 1, … , n. In an equilibrium, both players have an identical allocation of their resources. According to the condition g(1) = 1∕2 and the definition (6.4), the fact of identical allocations requires that c1 =…=cn = 1. It follows from (6.12)–(6.13) that 𝛼k = ⎧ ⎪ ⎨ ⎪⎩ 1 k 𝛼1 for k ≤ m, n−m+1 m(n−k+1) 𝛼1 for k > m. Formula (6.6) shows that 𝛼k = 4g′(1)∕yk in an equilibrium. Therefore, yk = y1𝛼1∕𝛼k, which means that xk = yk = { ky1 for k ≤ m m(n−k+1) n−m+1) y1 for k > m. Sum up all yk over k = 1, … , n: m∑ k=1 ky1 + n∑ k=m+1 m(n − k + 1) n − m + 1 y1 = y1 ( m∑ k=1 k + m n − m + 1 n−m∑ k=1 k ) = y1 ( m(m + 1) 2 + m n − m + 1 (n − m)(n − m + 1) 2 ) = y1 m(n + 1) 2 . Having in mind that n∑ k=1 yk = 1, we get y1 = 2 m(n+1) . Thus, xk = yk = ⎧ ⎪ ⎨ ⎪⎩ 2k m(n+1) for k ≤ m 2(n−k+1) (n+1)(n−m+1) for k > m. www.it-ebooks.info 152 MATHEMATICAL GAME THEORY AND APPLICATIONS A match begins in the center of the ground: m = (n + 1)∕2. Consequently, the optimal distribution of players on the ground is given by xk = yk = ⎧ ⎪ ⎨ ⎪⎩ 4k (n+1)2 for k ≤ (n + 1)∕2 4(n−k+1) (n+1)2) for k > (n + 1)∕2. (6.14) Remark 5.3 Obviously, the optimal distribution of players has a triangular form. Moreover, players must be located such that the ball demonstrates symmetrical random walks. Suppose that a football ground comprises three sectors, namely, defense, midfield, and attack. Accord- ing to (6.14), the optimal distribution must be x∗ = y∗ = (1∕4, 1∕2, 1∕4). In other words, the center contains 50% of resources, and the rest resources are equally shared by defense and attack. In real soccer with 11 footballers in the starting line-up, an equilibrium distribution corresponds to the formations (3, 6, 2) or (3, 5, 3). Exercises 1. Poker with two players. In the beginning of the game, players I and II contribute the buy-ins of 1. Subse- quently, they are dealt cards of ranks x and y, respectively. Each player chooses between two strategies—passing or playing. In the second case, a player has to contribute the buy-in of c > 0. The extended form of this game is illustrated by Figure 5.16. Figure 5.16 Poker with two players. The player whose card has a higher denomination wins. Find optimal behavioral strategies and the value of this game. 2. Poker with bet raising. In the beginning of the game, players I and II contribute the buy-ins of 1. Subse- quently, they are dealt cards of ranks x and y, respectively. Each player chooses among three strategies—passing, playing, or raising. In the second and third cases, players have to contribute the buy-in of c > 0 and d > 0, respectively. The extended form of this game is illustrated by Figure 5.17. Find optimal behavioral strategies and the value of this game. www.it-ebooks.info PARLOR GAMES AND SPORT GAMES 153 Figure 5.17 Poker with two players. 3. Poker with double bet raising. In the beginning of the game, players I and II contribute the buy-ins of 1. Subse- quently, they are dealt cards of ranks x and y, respectively. Each player chooses among three strategies—passing, playing, or raising. In the second (third) case, players have to contribute the buy-in of 2 (6, respectively). The extended form of this game is illustrated by Figure 5.18. Figure 5.18 Poker with two players. Find optimal behavioral strategies and the value of this game. 4. Construct the poker model for three or more players. 5. Suggest the two-player preference model with three cards and cards play. 6. The exchange game. Players I and II are dealt cards of ranks x and y; these quantities represent inde- pendents random variables with the uniform distribution on [0,1]. Having looked at his card, each player may suggest exchange to the opponent. If both players agree, they exchange the cards. Otherwise, exchange occurs with the probability of p (when one of players agrees) or takes no place (when both disagree). The payoff of a player is the value of his card. Find the optimal strategies of players. 7. The exchange game with dependent cards. Here regulations are the same as in game no. 6. However, the random variables x and y turn out dependent—they possess the joint distribution f(x, y) = 1 − 𝛾(1 − 2x)(1 − 2y), 0≤ x, y ≤ 1 . Find optimal strategies in the game. www.it-ebooks.info 154 MATHEMATICAL GAME THEORY AND APPLICATIONS 8. Construct the preference model for three or more players. 9. Twenty-one. Two players observe the sums of independent identically distributed random vari- ables S(1) n = n∑ i=1 x(1) i and S(2) n = n∑ i=1 x(2) i , where x(1) i and x(2) i have the exponential distribu- tion with the parameters 𝜆1 and 𝜆2. The threshold to-be-not-exceeded equals K = 21. The winner is the player terminating observations with a higher sum than the opponent (but not exceeding K). Find the optimal strategies and the value of the game. 10. Twenty-one with Gaussian distribution. Consider game no. 9 with the observations defined by S(1) n = n∑ i=1 (x(1) i )+ and S(2) n = n∑ i=1 (x(2) i )+,(a+ =max(0, a)), where x(1) i and x(2) i have the normal distributions with different mean values (a1 =−1 and a2 = 1) and the identical variance 𝜎 = 1. Find the optimal strategies and the value of the game. www.it-ebooks.info 6 Negotiation models Introduction Negotiation models represent a traditional field of research in game theory. Different negoti- ations run in everyday life. The basic requirements to negotiations are the following: 1. a well-defined list of negotiators; 2. a well-defined sequence of proposals; 3. well-defined payoffs of players; 4. negotiations finish at some instant; 5. equal negotiators have equal payoffs. 6.1 Models of resource allocation A classical problem in negotiation theory lies in the so-called cake cutting (see Figure 6.1). The word “cake” represents a visual metaphor, actually indicating any (homogeneous or inhomogeneous) resource to-be-divided among parties with proper consideration of their interests. 6.1.1 Cake cutting Imagine a cake and two players striving to divide it into two equal pieces. How could this be done to satisfy both players? The solution seems easy: one participant cuts the cake, whereas the other chooses an appropriate piece. Both players are satisfied—one believes that he has Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info 156 MATHEMATICAL GAME THEORY AND APPLICATIONS Figure 6.1 Cake cutting. cut the cake into equal pieces, whereas the other has selected the “best” piece. We call the described procedure by the cutting-choosing procedure. Now, suppose that cake cutting engages three players. Here the solution is the following (see Figure 6.2). Two players divide the cake by the cutting-choosing procedure. Subsequently, each of them divides his piece into three equal portions and invites player 3 to choose the best portion. All players are satisfied—they believe in having obtained, at least, 1∕3 of the cake. In the case of n players, we can argue by induction. Let the problem be solved for player n − 1: the cake is divided into n − 1 pieces and all n − 1 players feel satisfied. Subsequently, each player cuts his piece into n portions and invites player n. The latter chooses the best portion and his total portion of the cake makes up (n − 1) × 1 (n−1)n = 1 n .Alln players are satisfied. Even at the current stage, such cake cutting procedure is subjected to criticism. For instance, consider the case of n ≥ 3 players; a participant that has already shared his piece with player n can be displeased with that another participant does the same (shares his piece with player n). On the one hand, he is sure of that his piece is not smaller than 1∕n.Atthe same time, he can be displeased with that his piece appears smaller than the opponent’s one. Therefore, the notion of “fairness” may have different interpretations. Let us discuss this notion in a greater detail. In this model, players are identical but estimate the size of the IIIIII Figure 6.2 Cake cutting for three players. www.it-ebooks.info NEGOTIATION MODELS 157 Figure 6.3 Cake cutting by the moving-knife procedure. cake in various ways (the cake is homogeneous). In the general case, the problem seems appreciably more sophisticated. Analysis of the cake cutting problem can proceed from another negotiation model. Sup- pose that there exists an arbitrator moving a long knife along the cake (thus, gradually increasing the size of a cut portion). It is comfortable to reexpress the cake as the unit segment (see Figure 6.3), where the arbitrator moves the knife left-to-right. As soon as a participant utters “Stop!”, he receives the given piece and gets eliminated from further allo- cation. If several participants ask to stop simultaneously, choose one of them in a random way. Subsequently, the arbitrator continues this procedure with the rest of the participants. Such cake cutting procedure will be called the moving-knife procedure. If participants are identical and the cake appears homogeneous, the optimal strategy consists in the following. Stop the arbitrator when his knife passes the boundary of 1∕n. Before or after this instant, it is non-beneficial to stop the arbitrator (some participant gets a smaller piece). 6.1.2 Principles of fair cake cutting The above allocation procedures enjoy fairness, if all participants identically estimate different pieces of the cake. We identify the basic principles of fair cake cutting. Regardless of players’ estimates, the cutting-choosing procedure with two participants guarantees that they surely receive the maximal piece. Player I achieves this during cutting, whereas his opponent—during choosing. In other words, this procedure agrees with the following principles. P1. The absence of discrimination. Each of n participants feels sure that he receives, at least, 1∕n of the case. P2. The absence of envy. Each participant feels sure that he receives not less than other players (does not envy them). P3. (Pareto optimality). Such procedure gives no opportunity to increase the piece of any participant such that the remaining players keep satisfied with their pieces. The moving-knife procedure also meets these principles for two participants. Imagine that the knife passes the point, where the left and right portions are adequate for some participant. Then he stops the arbitrator, since further cutting reduces his piece. For another player, the right portion seems more beneficial. Nevertheless, principle P2 (the absence of envy) is violated for three players in both procedures. Consider the cutting-choosing procedure; a certain player can think that cake cutting without his participation is incorrect. In the case of the moving-knife procedure, a player stopping the arbitrator first (he chooses 1/3 of the cake) can think the following. According to his opinion, further allocation is incorrect, as one of the remaining players receives more than 1/3 of the cake. www.it-ebooks.info 158 MATHEMATICAL GAME THEORY AND APPLICATIONS Figure 6.4 Cake cutting with an arbitrator: the case of three players. Still, there exists the moving-knife procedure that matches the principle of the absence of envy for three players. Actually, it was pioneered by W. Stromquist [1980]. Here, an arbitrator moves the knife left-to-right, whereas three players hold their knives over the right piece (at the points corresponding to the middle of this piece by their viewpoints—see Figure 6.4). As some player utters “Stop!”, he receives the left piece and the remaining players allocate the right piece as follows. The cake is cut at the point corresponding to the middle-position knife of these players. Moreover, the player holding his knife to the left from the opponent (closer to the arbitrator), receives the greater portion. The stated procedure satisfies principle P2. Indeed, the player stopping the arbitrator estimates the left piece not less than 2/3 of the right piece (not smaller than the portions received by other players). His opponents estimate the cut (left) piece not smaller than their portions (in the right piece). Furthermore, each player believes that he receives the greatest piece. Unfortunately, this procedure could not be generalized to the case of n ≥ 3 players. 6.1.3 Cake cutting with subjective estimates by players Let us formalize the problem. Again, apply the negotiation model with a moving knife. For this, reexpress a cake as the unit segment (see Figure 6.3). An arbitrator moves a knife left-to- right. As soon as a participant utters “Stop!”, he receives the given piece and gets eliminated from further allocation. Subsequently, the arbitrator continues this procedure with the rest participants. Imagine that players estimate different portions of the cake in different ways. For instance, somebody prefers a rose on a portion, another player loves chocolate filler or biscuit, etc. Describe such estimation subjectivity by a certain measure. Accordingly, we represent the subjective measure of the interval [0, x] for player i by the distribution function Mi(x) with the density function 𝜇i(x), x ∈ [0, 1]. Moreover, set Mi(1) = 1, i = 1, … , n, which means the following. When the arbitrator’s knife passes the point x, player i is sure that the size of the corresponding piece makes Mi(x), i = 1, … , n. Then, for player i, the half of the cake is either the left or right half of the segment with the boundary x : Mi(x) = 1∕2. Assume that the functions Mi(x), i = 1, … , n appear continuous. Consider the posed problem for two players. Players I and II strive for obtaining the half of the cake in their subjective measure. Allocate the cake according to the procedure below. Denote by x and y the medians of the distributions M1 and M2, i.e., M1(x) = M2(y) = 1∕2. If x ≤ y, then player I receives the portion [0, x], and his opponent gets [y, 1]. In the case of x < y, give the players the additional pieces [x, z] and (z, y], respectively, where z : M1(z) = 1 − M2(z). Both players feel satisfied, since they believe in having obtained more than the half of the cake. www.it-ebooks.info NEGOTIATION MODELS 159 Now, suppose that x > y. Then the arbitrator gives the right piece [y, 1] to player I and the left piece [0, x] to player II. Furthermore, the arbitrator grants additional portions to the participants, (z, x] (player I) and [y, z] (player II), where z : M2(z) = 1 − M1(z). Again, both players are pleased—they believe in having obtained more than the half of the cake. Example 6.1 Take a biscuit cake with 50% chocolate coverage. For instance, player I estimates the chocolate portion of the cake two times higher than the biscuit portion. Con- sequently, his subjective measure can be defined by the density function 𝜇1(x) = 4∕3, x ∈ [0, 1∕2] and 𝜇1(x) = 2∕3, x ∈ [1∕2, 1]. Player II has the uniformly distributed measure on [0, 1]. The condition M1(z) = 1 − M2(z) brings to the equation 4∕3z = 1 − z, yielding the cutting point z = 3∕7. Example 6.2 Imagine that the subjective measure of player I is determined by M1(x) = 2x − x2, x ∈ [0, 1], whereas player II possesses the uniformly distributed measure on [0, 1]. Similarly, the condition M1(z) = 1 − M2(z) leads to the equation 2z − z2 = 1 − z. Its solution yields the cutting point z = (3 − √ 5)∕2 ≈ 0.382. As a matter of fact, this repre- sents the well-known golden section. Now, consider the problem for n players. Demonstrate the feasibility of cake cutting such that each player receives a piece exceeding 1∕n. Let the subjective measures Mi(x), x ∈ [0, 1] be defined for all players i = 1, … , n. Choose a point x1 meeting the condition max i=1,…,n {Mi(x1)} = 1∕n. Player i1 corresponding to the above maximum is called player 1. Cut the portion [0, x1] for him. This player feels satisfied, since he receives 1∕n. For the rest players, the residual piece [x1, 1] has a higher subjective measure than 1 − Mi(x1) ≥ 1 − 1∕n, i = 2, … , n. Then we choose the next portion (x1, x2] such that maxi=2,…,n {Mi(x2)} = 2∕n. By analogy, the player corresponding to this maximum is said to be player 2. He receives the portion (x1, x2]. Again, this player feels satisfied—his piece in the subjective measure is greater or equal to 1∕n. The remaining players are also pleased, as the residual piece (x2, 1] possesses the subjective measure not smaller than 1 − 2∕n for them. We continue the described procedure by induction and arrive at the following result. The last player obtains the residual piece whose subjective measure is not less than 1 − (n − 1)∕n = 1∕n. Therefore, we have constructed a procedure guaranteeing everybody’s satisfaction (each player receives a piece with the subjective measure not smaller than 1∕n). Nevertheless, this procedure disagrees with an important principle as follows. www.it-ebooks.info 160 MATHEMATICAL GAME THEORY AND APPLICATIONS The principle of equality (P4) is a procedure, where all players get identical pieces in their subjective measures. 6.1.4 Fair equal negotiations We endeavor to improve the cake cutting procedure suggested in the previous subsection. The ultimate goal lies in making it fair in the sense of principle P4. Let us start similarly to the original statement. Introduce the parameter z and, for the time being, believe that 0 ≤ z ≤ 1∕n. Choose a point x1 such that max i=1,…,n {Mi(x1)} = z. Player i1 corresponding to this maximum is called player 1; cut the portion [0, x1] for him. According to the subjective measure of this player, the portion has the value z. Choose the next portion (x1, x2] by the condition maxi=2,…,n {Mi(x2) − Mi(x1)} = z. The player corresponding to the above maximum is called player 2; cut the portion (x1, x2] for him. In the subjective measure of player 2, this portion is estimated by z precisely. Next, cut the portion for player 3, and so on. The procedure repeats until the portion (xn−2, xn−1]is cut for player n − 1. Recall that xn−1 ≤ 1. And finally, player n receives the piece (xn−1,1]. As the result of this procedure, each of players {1, 2, … , n − 1} possesses a piece estimated by z in his subjective measure. Player n has the piece (xn−1, 1]. While z is smaller than 1∕n, this piece appears greater or equal to z (in his subjective measure). Now, gradually increase z by making it greater than 1∕n. Owing to continuity and mono- tonicity of the functions Mi(x), the cutting points xi(i = 1, … , n − 1) also represent continuous and monotonous functions of argument z such that x1 ≤ x2 ≤ … ≤ xn−1. Moreover, the sub- jective measure of player n,1− Mn(xn−1), decreases in z (from 1 down to 0). And so, there exists z∗ meeting the equality M1(x1) = M2(x2) − M2(x1) =…=Mn(1) − Mn(xn−1) = z∗. The modified procedure above leads to the following. Each player gets the piece of value 1∕n + z∗ in his subjective measure. All players obtain equal pieces, negotiations are fair. Example 6.3 Consider the cake cutting problem for three players whose subjective mea- sures are defined by the following density functions. The subjective measure of player 1 takes the form M1(x) = 2x − x2, x ∈ [0, 1] (he prefers the left boundary of the cake). Player 2 has the uniformly distributed subjective measure. And the density function for player 3 is given by 𝜇(x) = |2 − 4x|, x ∈ [0, 1] (he prefers boundaries of the cake). Then the distribution functions acquire the form M1(x) = 2x − x2, M2(x) = x, x ∈ [0, 1], and M3(x) = 2x − 2x2,for x ∈ [0, 1∕2], and M3(x) = 1 − 2x + 2x2,forx ∈ [1∕2, 1] (see Figure 6.5). The condition M1(x1) = M2(x2) − M2(x1) = 1 − M3(x2) brings to the system of equations 2x1 − x2 1 = x2 − x1 = 1 − (1 − 2x2 + 2x2 2). www.it-ebooks.info NEGOTIATION MODELS 161 1 2 1 x 1 M x M x M x1 2 3 Figure 6.5 The subjective measures of three players. Its solution gives the cutting points x1 ≈ 0.2476, x2 ≈ 0.6816. Each player receives a piece estimated (in his subjective measure) by z∗ ≈ 0.4339, which exceeds 1∕3. 6.1.5 Strategy-proofness Interestingly, the negotiation model studied in subsection 6.1.4 compels players to be fair. We focus on the model with two players. Suppose that, e.g., player 2 acts according to his subjective measure M2, whereas player 1 reports another measure M′ 1 to the arbitrator. And each player knows nothing about the subjective preferences of the opponent. In this case, it may happen that the median m1 (m′ 1) of the distribution M1 (M′ 1) lies to the left (to the right, respectively) from that of the distribution M2. By virtue of the procedure, player 1 gets the piece from the right boundary (m′ 1, 1], which makes his payoff less than 1∕2. 6.1.6 Solution with the absence of envy The procedure proposed in the previous subsections ensures cake cutting into identical por- tions in the subjective measures of players. Nevertheless, this does not guarantee the absence of envy (see principle P2). Generally speaking, the subjective measure of some player i may estimate the piece of another player j higher than the subjective measure of the latter. To establish the existence of such solution without envy, introduce new notions. Still, we treat the cake as the unit segment [0, 1] to-be-allocated among n players. Definition 6.1 An allocation x = (x1, … , xn) is a vector of cake portions of appropriate players (in their order left-to-right). Consequently, xj ≥ 0, j = 1, … , n and ∑n j=1 xj = 1. The set of allocations forms a simplex S in Rn. This simplex possesses nodes ei = (0, … ,0,1,0,… ,0), where 1 occupies position i (see Figure 6.6). Node ei corresponds to cake cutting, where portion i makes the whole cake. Denote by Si = {x ∈ S : xi = 0} the set of allocations such that player i receives nothing. www.it-ebooks.info 162 MATHEMATICAL GAME THEORY AND APPLICATIONS x2 x1 x3 e2 e1 e3 S3 S2S1 Figure 6.6 The set of allocations under n = 3. Assume that, for each player i and allocation x, there is a given estimation function f j i (x), i, j = 1, … , n, for piece j. We believe that this function takes values in R and is contin- uous in x. For instance, in terms of the previous subsection, f j i (x) = Mi(x1 + ⋯ + xj) − Mi(x1 + ⋯ + xj−1), i, j = 1, … , n. Definition 6.2 For a given allocation x = (x1, … , xn), we say that player i prefers piece j if f j i (x) ≥ f k i (x), ∀k = 1, … , n. Note that a player may prefer one or more pieces. In addition, suppose that none of the players prefer an empty piece. Then an allocation, where each player receives the piece he actually prefers, matches the principle of the absence of envy. Theorem 6.1 There exists an allocation with the absence of envy. Proof: Denote by Aij the set of x ∈ S such that player i prefers piece j. Since the functions f j i (x) enjoy continuity, these sets turn out closed. For any player i,thesetsAij cover S (see Figure 6.7). Moreover, due to the assumption, none of players prefers an empty piece. Hence, it follows that the sets Aij and Sj do not intersect for any i, j. Consider the sets Bij =∩k≠j(S − Aik), i, j = 1, … , n. x2 x1 x3 e2 e1 e3 S3 S2 Ai1Ai2 Ai3S1 Figure 6.7 The preference set of player i. www.it-ebooks.info NEGOTIATION MODELS 163 For given i and j, Bij represents the set of all allocations, where player i prefers only piece j. This is an open set. The sets Bij do not cover S under a fixed i.ThesetS −∪jBij consists of boundaries, where a player may prefer two or more pieces. Now, introduce the structure Uj =∪iBij. In fact, this is a set of allocations such that a certain player prefers piece j exclusively. Each set Uj is open and does not intersect Sj. Consider U =∩jUj. To prove the theorem, it suffices to argue that the set U is non-empty. Indeed, if x ∈ U, then x belongs to each Uj. In other words, piece j is preferred by some player only. Recall that the number of players and the number of pieces coincide. Therefore, each player prefers just one piece. And we have to demonstrate non-emptiness of U. Lemma 6.1 The intersection of sets U1, … , Un is non-empty. Proof: We will consider two cases. First, suppose that the sets Uj, j = 1, … , n cover S. Let dj(x) be the distance from the point x to the set S − Uj and denote D(x) = ∑ j dj(x). Since S =∪jUj, then x belongs to some Uj and dj(x) > 0. Hence, D(x) > 0 for all x ∈ S. Define the mapping f : S → S by f(x) = n∑ j=1 dj(x) D(x)ej. This is a continuous self-mapping of the simplex S, where each face Si corresponds to the interior of the simplex S. Really, if x ∈ Si (i.e., xi = 0), then x ∉ Ui, since the sets Ui and Si do not intersect. In this case, di(x) > 0, which means that component i of the point f(x)is greater than 0. By Brouwer’s fixed-point theorem, there exists a fixed point of the mapping f(x), which lies in the interior of the simplex S. It immediately follows that there is an allocation x such that dj(x) > 0 for all j. Consequently, x ∈ Uj for all j = 1, … , n, ergo x ∈ U. The case when the sets Uj, j = 1, … , n do not cover S can be reduced to case 1. This is possible if the preferences of some players coincide (for such allocations, all players prefer two or more pieces). Then we modify the preferences of players and approximate the sets Aij by the sets A′ ij, where all preferences differ. Afterwards, we pass to the limit. This theorem claims that there exists an allocation without envy; each player receives a piece estimated in his subjective measure at least the same as the pieces of other players. However, such allocation is not necessarily fair. 6.1.7 Sequential negotiations We study another cake cutting model with two players pioneered by A [1982]. Rubinstein. Suppose that players make sequential offers for cake cutting and the process finishes, as one of them accepts the offer of another. Assume that the payoff gets discounted with the course of time. At shot 1, the size of cake is 1; at shot 2, it makes 𝛿<1, at shot 3, 𝛿2, and so on. For definiteness, we believe that, at shot 1 and all subsequent odd shots, player I makes his offer (and even shots correspond to the offers of player II). An offer can be represented as a pair (x1, x2), where x1 indicates the share of cake for player I, and x2 means the share for player II. We will seek for a subgame-perfect equilibrium, i.e., an equilibrium in all subgames of this game. Apply the backward induction technique. www.it-ebooks.info 164 MATHEMATICAL GAME THEORY AND APPLICATIONS We begin with the case of three shots. The scheme of negotiations is as follows. 1. Player I makes the offer (x1,1− x1), where x1 ≤ 1. If player II agrees, the game finishes—players I and II receive the payoffs x1 and 1 − x1, respectively. Otherwise, the game continues to the next shot. 2. Player II makes the new offer (x2,1− x2), where x2 ≤ 1. If player I accepts it, the game finishes. Players I and II gain the payoffs x2 and 1 − x2, respectively. If player I rejects the offer, the game continues to shot 3. 3. The game finishes such that players I and II get the payoffs y and 1 − y, respectively (y ≤ 1 is a given value). In the sequel, we will establish the following fact. This value has no impact on the optimal solution under a sufficiently large duration of negotiations. To find a subgame-perfect equilibrium, apply the backward induction method. Suppose that negotiations run at shot 2 and player II makes an offer. He should make a certain offer x2 to player I such that his payoff is higher than at shot 3. Due to the discounting effect, the payoff of player I at the last shot constitutes 𝛿y. Therefore, player I agrees with the offer x2,iff x2 ≥ 𝛿y. On the other hand, if player II offers x2 = 𝛿y to the opponent, his payoff becomes 1 − 𝛿y. However, if his offer appears non-beneficial to player I, the game continues to shot 3 and player II gains 𝛿(1 − y) (recall the discounting effect). Note that 𝛿(1 − y) < 1 − 𝛿y. Hence, the optimal offer of player II is x∗ 2 = 𝛿y. Now, imagine that negotiations run at shot 1 and the offer belongs to player I. He knows the opponent’s offer at the next shot. And so, player I should make an offer 1 − x1 to the opponent such that the latter’s payoff is at least the same as at shot 2: 𝛿(1 − x∗ 2) = 𝛿(1 − 𝛿y). Player II feels satisfied if 1 − x1 ≥ 𝛿(1 − 𝛿y)or x1 ≤ 1 − 𝛿(1 − 𝛿y). Thus, the following offer of player I is surely accepted by his opponent: x1 = 1 − 𝛿(1 − 𝛿y). If player I offers less, he receives the discounted payoff at shot 2: 𝛿x∗ 2 = 𝛿2y. Still, this quantity turns out smaller than 1 − 𝛿(1 − 𝛿y). Therefore, the optimal offer of player I forms x∗ 1 = 1 − 𝛿(1 − 𝛿y), and it will be accepted by player II. The sequence {x∗ 1, x∗ 2} represents a subgame-perfect equilibrium in this negotiation game with three shots. Arguing by induction, assume that a subgame-perfect equilibrium in the negotiation game with n shots is such that xn 1 = 1 − 𝛿 + 𝛿2 −…+(−𝛿)n−2 + (−𝛿)n−1y. (1.1) Now, consider shot 1 in the negotiation game consisting of n + 1 shots. Player I should offer to the opponent the share 1 − x(n+1) 1 , which is not smaller than the discounted income of www.it-ebooks.info NEGOTIATION MODELS 165 player II at the next shot. By the induction hypothesis, this income takes the form 𝛿(1 − x(n) 1 ). And so, the offer is accepted by player II if 1 − x(n+1) 1 ≥ 𝛿(1 − 𝛿 + 𝛿2 −…+(−𝛿)n−2 + (−𝛿)n−1y) or x(n+1) 1 = 1 − 𝛿(1 − 𝛿 + 𝛿2 −…+(−𝛿)n−2 + (−𝛿)n−1y). (1.2) The expression (1.2) coincides with (1.1) for the negotiation game with n + 1 shots. For large n, the last summand in (1.1), containing y, becomes infinitesimal, whereas the optimal offer of player I equals x∗ 1 = 1∕(1 + 𝛿). Theorem 6.2 The sequential negotiation game of two players admits the subgame-perfect equilibrium ( 1 1 + 𝛿 , 𝛿 1 + 𝛿 ) . Again, these results can be generalized by induction to the case of sequential negotiations among n players. However, we evaluate a subgame-perfect equilibrium differently. First, we describe the scheme of negotiations with n players. 1. Player 1 makes an offer to each player (x1, x2, … , xn), where x1 + ⋯ + xn = 1. If all players agree, the game finishes and player i gets the payoff xi, i = 1, … , n.If somebody disagrees, the game continues at the next shot, and player 1 becomes the last one. 2. Player 2 acts as the leader and makes a new offer (x′ 2, x′ 3, … , x′ n, x′ 1), where x′ 1 + ⋯ + x′ n = 1. The game finishes if all players accept it; then player i gains x′ i, i = 1, … , n. Otherwise (some player rejects the offer), the game continues at the next shot. And player 2 becomes the last one, accordingly. 3. Player 3 acts as the leader and makes his offer. And so on for all players. Actually, the game may have infinite duration. By analogy, suppose that the payoff gets discounted by the quantity 𝛿 at each new shot. Due to this effect, players may benefit nothing by long-term negotiations. We evaluate a subgame-perfect equilibrium via the following considerations. Player 1 makes an offer at shot 1. He should offer to player 2 a quantity x2, being not smaller than the quantity player 2 would choose at the next shot. This quantity equals 𝛿x1 by virtue of the discounting effect. Therefore, player 1 should make the following offer to player 2: x2 ≥ 𝛿x1. Similarly, player 1 should offer to player 3 some quantity x3, being not smaller than the quantity player 3 would choose at shot 3: 𝛿2x1. This immediately leads to the inequality x3 ≥ 𝛿2x1. Adhering to the same line of reasoning, we arrive at an important conclusion. The offer of player 1 satisfies the rest players under the conditions xi ≥ 𝛿i−1x1, i = 2, … , n. www.it-ebooks.info 166 MATHEMATICAL GAME THEORY AND APPLICATIONS By offering xi = 𝛿i−1x1, i = 2, … , n, player 1 pleases the rest of the players. And he receives the residual share of 1 − (𝛿x1 + 𝛿2x1 + ⋯ + 𝛿n−1x1 ) . Anyway, this share must coincide with his own offer x1. Such requirement 1 − (𝛿x1 + 𝛿2x1 + ⋯ + 𝛿n−1x1 ) = x1 yields x∗ 1 = ( 1 + 𝛿 + ⋯ + 𝛿n−1 )−1 = 1 − 𝛿 1 − 𝛿n . Theorem 6.3 The sequential negotiation game of n players admits the subgame-perfect equilibrium ( 1 − 𝛿 1 − 𝛿n , 𝛿(1 − 𝛿) 1 − 𝛿n , … , 𝛿n−1(1 − 𝛿) 1 − 𝛿n ) . (1.3) Formula (1.3) implies that player 1 has an advantage over the others—his payoff increases as the rate of discounting goes down. This is natural, since the cake rapidly vanishes as time evolves; the role of players with large order numbers becomes inappreciable. 6.2 Negotiations of time and place of a meeting An important problem in negotiation theory concerns time and place of a meeting. As a matter of fact, time and place of a meeting represent key factors for participants of a business talk and a conference. These factors may predetermine a certain result of an event. For instance, a suggested time or place can be inconvenient for some negotiators. The “convenience” or “inconvenience” admits a rigorous formulation via a utility function. In this case, each participant strives for maximizing his utility. And the problem is to suggest negotiations design and find a solution accepted by negotiators. Let us apply the formal scheme proposed by C. Ponsati [2007, 2011] in [24–25]. For definiteness, we study the negotiation problem for time of meeting. 6.2.1 Sequential negotiations of two players Imagine two players negotiating time of their meeting. Suppose that their utilities are described by continuous unimodal functions u1(x) and u2(x), x ∈ [0, 1] with maximum points c1 and c2, respectively. If c1 = c2, this value makes the solution. And so, we believe that c1 > c2. Assume that players sequentially announce feasible alternatives, and decision making requires the consent of both participants. Players may infinitely insist on alternatives beneficial for them. To avoid such situations, introduce the discounting factor 𝛿<1. After each session of negotiations, the utility functions of both players get decreased proportionally to 𝛿. Therefore, if players have not agreed about some alternative till instant t, their utilities at this instant acquire the form 𝛿t−1ui(x), i = 1, 2. For definiteness, suppose that u1(x) = x and u2(x) = 1 − x. In this case, the problem becomes equivalent to the cake cutting problem with two players (see Section 6.1.7). Indeed, www.it-ebooks.info NEGOTIATION MODELS 167 u1u2 u1u2 1x 2 1 x0 δ δ δ Figure 6.8a The best response of player I. treat x as a portion of the cake. Then player II receives the residual piece 1 − x. It seems interesting to study geometrical interpretations of the solution. Figure 6.8 shows the curves of the utilities u1(x) and u2(x), as well as their curves at the next shot, i.e., 𝛿u1(x) and 𝛿u2(x). Imagine that player II knows the alternative x chosen by the opponent at the next shot. The alternative is accepted, if he offers to player I an alternative y such that his utility u1(y) is not smaller than the utility at the next shot—𝛿u1(x) (see Figure 6.8a). This brings to the inequality y ≥ 𝛿x. Furthermore, the maximal utility of player II is achieved under y = 𝛿x. Therefore, his optimal response to the opponent’s strategy x makes up x2 = 𝛿x. Now, suppose that player I knows the strategy x2 selected by player II at the next shot. Then his offer y is accepted by player II at the current shot, if the corresponding utility u2(y) of player II appears not smaller than at the next shot (the quantity 𝛿u2(x2)). This condition is equivalent to the inequality 1 − y ≥ 𝛿(1 − 𝛿x), or y ≤ 1 − 𝛿(1 − 𝛿x). Hence, the best response of player I at the current shot makes up x1 = 1 − 𝛿(1 − 𝛿x). The solution x gives a subgame-perfect equilibrium in negotiations if x1 = x,orx = 1 − 𝛿(1 − 𝛿x). It follows that x = 1 1 + 𝛿 . This result coincides with the solution obtained in Section 6.1.7. u1u2 u1δ δ δ u2 1x 1x 20 Figure 6.8b The best response of player I. www.it-ebooks.info 168 MATHEMATICAL GAME THEORY AND APPLICATIONS u1u2 u3 0 1 xx 3c 11 − (1 − x) δ δ Figure 6.9 The best response of player III. 6.2.2 Three players Now, add player III to the negotiations process. Suppose that his utility function possesses a unique maximum on the interval [0, 1]; denote it by c :0< c < 1. For simplicity, let u3(x)be a piecewise linear function (see Figure 6.9): u3(x) = { x c ,if0≤ x < c, 1−x 1−c ,ifx ≤ c ≤ 1. Players sequentially offer different alternatives; accepting an offer requires the consent of all players. We demonstrate the following aspect. An equilibrium may have different forms depending on the correlation between c and 𝛿. First, consider the case of c ≤ 1∕3. Figure 6.9 provides the corresponding illustrations. The sequence of moves is I → II → III → I → …. Suppose that player I announces his strategy x : x ≤ 1∕3. Being informed of that, player III can find his best response. And his offer y will be accepted by player I,ifu1(y) is not smaller than 𝛿u1(x), i.e., y ≥ 𝛿x. The offer y will be accepted by player II,ifu2(y) ≥ 𝛿u2(x), i.e., 1 − y ≥ 𝛿(1 − x). There- fore, any offer from the interval I3 = [𝛿x,1− 𝛿(1 − x)] will be accepted. Finally, player III strives for maximizing his utility u3(y) (within the interval I3, as well). Under the condition c <𝛿x, the best response consists in x3 = 𝛿x;ifc ≥ 𝛿x, the best response becomes x3 = c (see Figure 6.9). We begin with the case of c <𝛿x. The best response of player III makes x3 = 𝛿x.Now, find the best response of player II to this strategy. His offer y is surely accepted by player I, if u1(y) ≥ 𝛿u1(x3), or y ≥ 𝛿2x. The offer y is accepted by player III,ifu3(y) ≥ 𝛿u3(x3), which is equivalent to 𝛿(1 − 𝛿x) c 1 − c ≤ y ≤ 1 − 𝛿(1 − 𝛿x). Clearly, the condition c <𝛿x implies that 𝛿(1 − 𝛿x) c 1 − c ≤ 𝛿2x. Andso,theoffery is accepted by players I and III, if it belongs to the interval I2 = [𝛿2x,1− 𝛿(1 − 𝛿x)]. Consequently, the best response of player II lies in x2 = 𝛿2x (see Figure 6.10a). www.it-ebooks.info NEGOTIATION MODELS 169 u1u2 u3 0 1 xx 2 1δ δ Figure 6.10a The best response of player II. Evaluate the best response of player I to this strategy adopted by player II. His offer y is accepted by player II,ifu2(y) ≥ 𝛿u2(x2), or y ≤ 1 − 𝛿(1 − 𝛿2x), and by player III,ifu3(y) ≥ 𝛿u3(x2), which is equivalent to the condition 𝛿3x ≤ y ≤ 1 − 𝛿3x1 − c c . Hence, any offer from the interval I1 = [𝛿3x,1− 𝛿(1 − 𝛿2x)] is accepted by players I and III. The best response of player I lies in x1 = 1 − 𝛿(1 − 𝛿2x) (see Figure 6.10b). The subgame-perfect equilibrium corresponds to a strategy x∗, where x1 = x. This yields the equation x = 1 − 𝛿(1 − 𝛿2x), whence it follows that x∗ = 1 1 + 𝛿 + 𝛿2 . Recall that this formula takes place under the condition c <𝛿x∗, which appears equivalent to c < 𝛿 1 + 𝛿 + 𝛿2 . u1u2 u3 0 1 x 12x 1δ δ Figure 6.10b The best response of player I. www.it-ebooks.info 170 MATHEMATICAL GAME THEORY AND APPLICATIONS Now, suppose that c ≥ 𝛿 1 + 𝛿 + 𝛿2 . In this case, the best response of player III becomes x3 = c. As earlier, we find the best responses of player II and then of player I. These are the quantities x2 = c𝛿 and x1 = 1 − 𝛿 + 𝛿2c, respectively. Such result holds true under the condition c ≤ 1 − 𝛿(1 − x). Finally, we arrive at the following assertion. Theorem 6.4 For n = 3, the subgame-perfect equilibrium is defined by x∗ = ⎧ ⎪ ⎨ ⎪⎩ 1 1 + 𝛿 + 𝛿2 ,ifc < 𝛿 1 + 𝛿 + 𝛿2 1 − 𝛿 + 𝛿2c,if 𝛿 1 + 𝛿 + 𝛿2 ≤ c ≤ 1 + 𝛿 1 + 𝛿 + 𝛿2 1 + 𝛿2 1 + 𝛿 + 𝛿2 ,ifc > 1 + 𝛿 1 + 𝛿 + 𝛿2 . Theorem 6.4 implies that, in the subgame-perfect equilibrium, the offer of player I is not less than 1∕3; under small values of 𝛿, the offer appears arbitrary close to its maximum. In this sense, player I dominates the opponents. 6.2.3 Sequential negotiations. The general case Let us scrutinize the general case of n players. Their utilities are described by continuous quasiconcave unimodal functions ui(x), i = 1, 2, … , n. Recall that a function u(x)issaidto be quasiconcave, if the set {x : u(x) ≤ a} enjoys convexity for any a. Denote by c1, c2, … , cn the maximum points of the utility functions. Players sequentially offer different alternatives; accepting an alternative requires the consent of all participants. The sequence of moves is 1 → 2 → … → n → 1 → 2 → ….We involve the same idea as in the case of three players. Assume that player 1 announces his strategy x. Knowing this strategy, player n can compute his best response. And his offer y will be accepted by player j,ifuj(y) appears not less than 𝛿uj(x); denote this set by Ij(x). Note that, for any j,thesetIj(x) is non-empty, so long as x ∈ Ij(x). Since uj(x) is quasiconcave, Ij(x) represents a closed interval. Consequently, there exists a closed interval ⋒n−1 j=1 Ij(x), we designate it by [an, bn](x). Maximize the function un(y) on the interval [an, bn](x). Actually, this is the best response of player n, and, by virtue of the above assumptions, it takes the form xn(x) = ⎧ ⎪ ⎨ ⎪⎩ an,ifan > cn, bn,ifbn < cn, cn,ifan ≤ cn ≤ bn. Now, imagine that player n − 1 is informed of the strategy xn to-be-selected by player n at the next shot. Similarly, he will make such offer y to player j, and this offer is accepted if uj(y) ≥ 𝛿uj(xn), j ≠ n − 1. For each j, the set of such offers forms a closed interval; more- over, the intersection of all such intervals is non-empty and turns out to be a closed interval www.it-ebooks.info NEGOTIATION MODELS 171 [an−1, bn−1](xn). Again, maximize the function un−1(y) on this interval. Actually, this maxi- mum gets attained at the point xn−1(xn) = ⎧ ⎪ ⎨ ⎪⎩ an−1,ifan−1 > cn−1, bn−1,ifbn−1 < cn−1, cn−1,ifan−1 ≤ cn−1 ≤ bn−1. Here xn−1 indicates the best response of player n − 1 to the strategy xn chosen by player n. Following this line of reasoning, we finally arrive at the best response of player 1, viz., the function x1(x2). By virtue of the assumptions, all constructed functions xi(x), i = 1, … , n appear continu- ous. And the superposition of the mappings x1(… (xn−1(xn) …)(x) is a continuous self-mapping of the closed interval [0, 1]. Brouwer’s fixed-point theorem claims that there exists a fixed point x∗ such that x1(… (xn−1(xn) …)(x∗) = x∗. Consequently, we have established the following result. Theorem 6.5 Negotiations of meeting time with continuous quasiconcave unimodal utility functions admit a subgame-perfect equilibrium. In fact, Theorem 6.5 seems non-constructive—it merely points to the existence of an optimal behavior in negotiations. It is possible to evaluate an equilibrium, e.g., by progressive approximation (start with some offer of player 1 and compute the best responses of the rest players). Naturally enough, the utility functions of players may possess several equilibria. In this case, the issue regarding the existence of a subgame-perfect equilibrium remains far from settled. 6.3 Stochastic design in the cake cutting problem Revert to the cake cutting problem with unit cake and n players. Modify the design of negotiations by introducing another independent participant (an arbitrator). The latter submits offers, whereas players decide to agree or disagree with them. The ultimate decision is either by majority or complete consent. Assume that the arbitrator represents a random generator. Negotiations run on a given time interval K. At each shot, the arbitrator makes random offers. Players observe their offers and support or reject them. Next, it is necessary to calculate the number of negotiators satisfied by their offer; if this number turns out not less than a given threshold p, the offer is accepted. Otherwise, the offered alternative is rejected and players proceed to the next shot for considering another alternative. The size of the cake is discounted by some quantity 𝛿, where 𝛿<1. If negotiations result in no decision, each player receives a certain portion b, where b << 1∕n. www.it-ebooks.info 172 MATHEMATICAL GAME THEORY AND APPLICATIONS Let the random generator be described by the Dirichlet distribution with the density function f(x1, … , xn) = 1 B(k) n∏ i=1 xki−1 i , where xi ≥ 0, n∑ i=1 xi = 1 and ki ≥ 1. The constant B(k) in this formula, B(k) = B(k1, … , kn) = ∏n i=1 Γ(ki) Γ(k1 + ⋯ + kn), depends on a set of parameters (k1, … , kn). They serve for adjusting the weights of appropriate negotiators. 6.3.1 The cake cutting problem with three players We begin with the case of three players. Negotiations cover the horizon of K shots. Let us count down—suppose that k shots remain. Players receive offers that form a vector (xk 1, xk 2, xk 3). At each shot, offers represent random variables distributed according to the Dirichlet law. In other words, the joint density function takes the form f(x1, x2, x3) = Γ(k1 + k2 + k3) Γ(k1)Γ(k2)Γ(k3)xk1−1 1 xk2−1 2 xk3−1 3 , where x1 + x2 + x3 = 1. For a given offer vector (x1, x2, x3), each player has to choose between two alternatives: (a) accepting a current offer (b) rejecting a current offer (waiting for a better offer at next shots). Below we analyze two possible scenarios of negotiations scheme, namely, complete consent and majority. In the former case, an allocation (x1, x2, x3) takes place if all players agree at some shot of negotiations. In the latter case, an allocation occurs when most of players accept the offer (otherwise, players move to nest shot k − 1). And the discounting effect reduces the size of the cake to 𝛿 ≤ 1. The described process continues until all players (complete consent) or, at least, two of them (majority) support an offer or shot k = 0 comes. If negotiations fail, all players receive small portions b << 1∕3. Complete consent. Consider negotiations, where ultimate decision requires the complete consent of players. Denote by Hk the value of this game when k shots remain to the end of negotiations. Suppose that each player is informed of his personal offer only. Let (x1, x2, x3) specify the offers for players I, II, III, respectively. Since x1 + x2 + x3 = 1, it suffices to handle the variables x1, x2. First, study the symmetrical case of the Dirichlet distribution, where k1 = k2 = k3 = 1: f(x1, x2) = 2, x1 + x2 ≤ 1, x1, x2 ≥ 0. Introduce the strategies 𝜇i(xi), where i = 1, 2, 3. These are the probabilities that player i accepts a current offer xi. By virtue of problem’s symmetry, an equilibrium (if any) belongs to the class of identical strategies of players. www.it-ebooks.info NEGOTIATION MODELS 173 Theorem 6.6 The optimal strategies of players at shot k have the form 𝜇i(xi) = I{xi≥𝛿Hk−1}, i = 1, 2, 3, where IA means the indicator of event A. The value of this game satisfies the recurrent formulas Hk = 𝛿Hk−1 + 1 3(1 − 3𝛿Hk−1)3, H0 = b. Proof: The optimality equation for player I payoff at shot k is defined by Hk =sup𝜇1 2 1 ∫ 0 dx1 1−x1 ∫ 0 dx2 {𝜇1𝜇2𝜇3x1 + (1 − 𝜇1𝜇2𝜇3)𝛿Hk−1 } , (3.1) k = 1, 2, …, H0 = b. Here 𝜇1 = 𝜇1(x1), 𝜇2 = 𝜇2(x2), 𝜇3 = 𝜇3(1 − x1 − x2). Rewrite (3.1) as Hk =sup𝜇1 2 1 ∫ 0 𝜇1(x1)dx1 1−x1 ∫ 0 (x1 − 𝛿Hk−1)𝜇2𝜇3dx2 + 𝛿Hk−1. (3.2) Player I aims at maximizing his payoff. In the expression (3.2), a player can influence the value of the first integral only. Denote Gk(x1) = (x1 − 𝛿Hk−1) 1−x1 ∫ 0 𝜇2𝜇3dx2. Clearly, the optimal strategy of player I becomes 𝜇1(x1) = { 1, if Gk(x1) ≥ 0 0, otherwise. (3.3) Owing to problem’s symmetry, the optimal behavior of players II and III must be identical: 𝜇2 = 𝜇3. Note that Gk(0) ≤ 0 and Gk(1) ≥ 0, since 0 ≤ 𝛿Hk−1 ≤ 1. And so, ∃a such that Gk(a) = 0. We seek for an equilibrium among threshold strategies. Let 𝜇2 = I{x2≥a} and 𝜇3 = I{x3≥a}. Clearly, Gk(x1) has the form Gk(x1) = (x1 − 𝛿Hk−1) 1−x1 ∫ 0 I{x2≥a,1−x1−x2≥a}dx2 = (x1 − 𝛿Hk−1)(1 − x1 − 2a)I{a ≤ x1 ≤ 1 − 2a} + 0I{x1 > 1 − 2a}. So far as Gk(a) = 0, one obtains a = 𝛿Hk−1. www.it-ebooks.info 174 MATHEMATICAL GAME THEORY AND APPLICATIONS Therefore, if players II and III adopt the threshold strategies 𝜇2 = I{x2≥𝛿Hk−1} and 𝜇3 = I{x3≥𝛿Hk−1}, then the best response of player I must be 𝜇1 = I{x1≥𝛿Hk−1}. Substitution of Gk(x1) into (3.2) yields the following equation in Hk: Hk = 2 1−2𝛿Hk−1 ∫ 𝛿Hk−1 (x1 − 𝛿Hk−1)(1 − x1 − 2𝛿Hk−1)dx1 + 𝛿Hk−1 = 𝛿Hk−1 + 1 3(1 − 3𝛿Hk−1)3. Remark 6.1 If 𝛿 = 1, then limk→∞ Hk = 1 3 . This is natural—in the case of no discounting and infinite horizon of negotiations, players wait for a shot when the arbitrator suggests 1/3 of the cake to everybody. Majority rule. Now, suppose that negotiations in the cake cutting problem obey the majority rule. An offer is accepted if, at least, two of three players support it. Again, we believe that (a) the horizon of negotiations is K shots and (b) offers at shot k, i.e., the components of the vector (xk 1, xk 2, xk 3), have the Dirichlet distribution with the parameters k1 = k2 = k3 = 1. Denote by Hk the value of this game when k shots remain till the end. Let (x1, x2, x3) specify the offers for players I, II, and III, respectively. By analogy, introduce the vector 𝜇i(xi), where i = 1, 2, 3. It defines the probability that player i accepts a current offer xi. Set ̄𝜇(x) = 1 − 𝜇(x). Owing to symmetry, we search for an equilibrium among identical strategies. Theorem 6.7 The optimal strategies of players at shot k possess the form 𝜇i(xi) = I{xi≥𝛿Hk−1}, i = 1, 2, 3. The value of this game meets the recurrent formulas Hk = 1 3 − 2𝛿2H2 k−1 ( 1 − 3𝛿Hk−1 ) , H0 = b. Proof: The optimality equation for player I payoff at shot k is given by Hk =sup𝜇1 2 1 ∫ 0 dx1 1−x1 ∫ 0 dx2 { (𝜇1𝜇2𝜇3 + ̄𝜇1𝜇2𝜇3 + 𝜇1 ̄𝜇2𝜇3 + 𝜇1𝜇2 ̄𝜇3)x +( ̄𝜇1 ̄𝜇2 ̄𝜇3 + 𝜇1 ̄𝜇2 ̄𝜇3 + ̄𝜇1𝜇2 ̄𝜇3 + ̄𝜇1 ̄𝜇2𝜇3)𝛿Hk−1 } , k = 1, 2, … (3.4) www.it-ebooks.info NEGOTIATION MODELS 175 H0 = b. Here 𝜇1 = 𝜇1(x1), 𝜇2 = 𝜇2(x2), 𝜇3 = 𝜇3(1 − x1 − x2). Rewrite (3.4) as Hk =sup𝜇1 2 1 ∫ 0 𝜇1(x1)dx1 ⎡ ⎢ ⎢ ⎢⎣ 1−x1 ∫ 0 { (x1 − 𝛿Hk−1)(𝜇2 + 𝜇3 − 2𝜇2𝜇3) } dx2 ⎤ ⎥ ⎥ ⎥⎦ +2 1 ∫ 0 dx1 1−x1 ∫ 0 { (x1 − 𝛿Hk−1)𝜇2𝜇3 + Hk−1 } dx2. (3.5) Take the bracketed expression in the first integral and denote it by Gk(x1) = 1−x1 ∫ 0 { (x1 − 𝛿Hk−1)(𝜇2 + 𝜇3 − 2𝜇2𝜇3) } dx2 Evidently, the optimal strategy of player I is 𝜇1(x1) = I{Gk(x1)≥0}. We have mentioned that, due to problem’s symmetry, the optimal behavior of players II and III is identical: 𝜇2 = 𝜇3. Since Gk(0) ≤ 0 and Gk(1) ≥ 0, then ∃a such that Gk(a) = 0. Seek for an equilibrium among threshold strategies. Let 𝜇2 = I{x2≥a} and 𝜇3 = I{x3≥a}. Clearly, Gk(x1) has different shape on three intervals: Gk(x1) = (x1 − 𝛿Hk−1) ( 2aI{x1 ≤ 1 − 2a} +2(1 − a − x1)I{1 − 2a < x1 ≤ 1 − a} + 0I{1 − a < x1 ≤ 1} ) . It follows from Gk(a) = 0 that a = 𝛿Hk−1. Thus, Gk(x1) can be expressed by Gk(x1) = (x1 − 𝛿Hk−1) ( 2𝛿Hk−1I{x1 ≤ 1 − 2𝛿Hk−1} +2(1 − 𝛿Hk−1 − x1)I{1 − 2𝛿Hk−1 < x1 ≤ 1 − 𝛿Hk−1} +0I{1 − 𝛿Hk−1 < x1 ≤ 1} ) . And so, if players II and III select the threshold strategies 𝜇2 = I{x2≥𝛿Hk−1} and 𝜇3 = I{x3≥𝛿Hk−1}, then the best response of player I must be 𝜇1 = I{x1≥𝛿Hk−1}. Substitute Gk(x1) into (3.5) to derive Hk = 2 1 ∫ 0 𝜇1(x1)Gk(x1)dx1 + 2 1 ∫ 0 1−x1 ∫ 0 { (x1 − 𝛿Hk−1)𝜇2𝜇3 + 𝛿Hk−1 } dx1dx2 = 4𝛿Hk−1 1−2𝛿Hk−1 ∫ Hk−1 (x1 − 𝛿Hk−1)dx1 + 4 1−𝛿Hk−1 ∫ 1−2𝛿Hk−1 (x1 − 𝛿Hk−1)(1 − 𝛿Hk−1 − x1)dx1 +2 1−2𝛿Hk−1 ∫ 0 (x1 − 𝛿Hk−1)(1 − 2𝛿Hk−1 − x1)dx1 + 𝛿Hk−1. www.it-ebooks.info 176 MATHEMATICAL GAME THEORY AND APPLICATIONS This result brings to the recurrent formula Hk = 𝛿Hk−1 + 1 3(1 − 3𝛿Hk−1)(1 − 6𝛿2H2 k−1). The proof of Theorem 6.7 is completed. 6.3.2 Negotiations of three players with non-uniform distribution We endeavor to change the parameters of the Dirichlet distribution and analyze the properties of the corresponding optimal solution. For instance, set k1 = k2 = k3 = 2. Then the joint density function takes the form f(x1, x2) = 120x1x2(1 − x1 − x2), where x1, x2 > 0 and x1 + x2 ≤ 1. Solve this problem under the majority rule. As previously, 𝜇i(xi)(i = 1, 2, 3) indicates the probability that player i accepts a current offer xi. Theorem 6.8 The optimal strategies of players at shot k are defined by 𝜇i(xi) = I{xi≥𝛿Hk−1}, i = 1, 2, 3. The value of this game satisfies the recurrent formulas Hk = 1 3 − 10𝛿4H4 k−1(1 − 3𝛿Hk−1)(3 − 4𝛿Hk−1), H0 = b. Proof: For the payoff at shot k, the optimality equation is given by Hk = 120 1 ∫ 0 x1dx1 1−x1 ∫ 0 x2(1 − x1 − x2)dx2 { (𝜇1𝜇2𝜇3 + ̄𝜇1𝜇2𝜇3 +𝜇1 ̄𝜇2𝜇3 + 𝜇1𝜇2 ̄𝜇3)x1 + ( ̄𝜇1 ̄𝜇2 ̄𝜇3 + 𝜇1 ̄𝜇2 ̄𝜇3 + ̄𝜇1𝜇2 ̄𝜇3 + ̄𝜇1 ̄𝜇2𝜇3)𝛿Hk−1 } , k = 1, 2, … , (3.6) where H0 = b, 𝜇1 = 𝜇1(x1), 𝜇2 = 𝜇2(x2), and 𝜇3 = 𝜇3(1 − x1 − x2). Some transformations of (3.6) yield Hk = 120 1 ∫ 0 x1 ⋅ 𝜇1(x1)dx1 ⎡ ⎢ ⎢ ⎢⎣ 1−x1 ∫ 0 {( x1 − 𝛿Hk−1 )(𝜇2 + 𝜇3 − 2𝜇2𝜇3 )} x2(1 − x1 − x2)dx2 ⎤ ⎥ ⎥ ⎥⎦ +120 1 ∫ 0 x1dx1 1−x1 ∫ 0 {( x1 − 𝛿Hk−1 ) 𝜇2𝜇3 + 𝛿Hk−1 } x2(1 − x1 − x2)dx2. (3.7) www.it-ebooks.info NEGOTIATION MODELS 177 Take the bracketed expression in the first integral and denote it by Gk(x1) = x1 1−x1 ∫ 0 {( x1 − 𝛿Hk−1 )( 𝜇2 + 𝜇3 − 2𝜇2𝜇3 )} x2(1 − x1 − x2)dx2. The optimal strategy of player I acquires the form (3.3). Find an equilibrium in the class of threshold strategies. Let 𝜇2 = I{x2≥a}, 𝜇3 = I{x3≥a} and study three cases as follows: 1. If 0 ≤ x1 ≤ 1 − 2a,wehave 1−x1 ∫ 0 (𝜇2 + 𝜇3 − 2𝜇2𝜇3 ) x2(1 − x1 − x2)dx2 = a ∫ 0 x2(1 − x1 − x2)dx2 + 1−x1 ∫ 1−x1−a x2(1 − x1 − x2)dx2 = 1 3a2 ( 3 − 3x1 − 2a ) . 2. If 1 − 2a < x1 ≤ 1 − a, the value of this integral obeys the formula 1−x1−a ∫ 0 x2(1 − x1 − x2)dx2 + 1−x1 ∫ a x2(1 − x1 − x2)dx2 = 1 3(1 − x1 + 2a)(1 − x1 − a)2. 3. If 1 − a < x1 ≤ 1, the integral under consideration vanishes. Obtain the corresponding expression for the second integral in (3.7): 1−x1 ∫ 0 𝜇2𝜇3 ⋅ x2(1 − x1 − x2)dx2 = 1−a−x1 ∫ a x2(1 − x1 − x2)dx2 = 1 6(1 − x1 − 2a)(1 + 2a − 2a2 − 2x1 − 2ax1 + x2 1). By virtue of the above relationships, it is possible to write down Gk(x1) = x1(x1 − 𝛿Hk−1) ( 1 3a2 ( 3 − 3x1 − 2a ) ⋅ I{x1 ≤ 1 − 2a} +1 3(1 − x1 + 2a)(1 − x1 − a)2 ⋅ I{1 − 2a < x1 ≤ 1 − a} +0 ⋅ I{1 − a < x1 ≤ 1} ) . www.it-ebooks.info 178 MATHEMATICAL GAME THEORY AND APPLICATIONS So far as Gk(a) = 0, we have a = 𝛿Hk−1. Consequently, Gk(x1) = x1(x1 − 𝛿Hk−1) (1 3 𝛿2H2 k−1 ( 3 − 3x1 − 2𝛿Hk−1 ) ⋅ I{x1 ≤ 1 − 2𝛿Hk−1} +1 3(1 − x1 + 2𝛿Hk−1)(1 − x1 − 𝛿Hk−1)2 ⋅ I{1 − 𝛿Hk−1 < x1 ≤ 1 − 𝛿Hk−1} +0 ⋅ I{1 − 𝛿Hk−1 < x1 ≤ 1} ) . Therefore, if players II and III adopt the threshold strategies 𝜇2 = I{x2≥𝛿Hk−1} and 𝜇3 = I{x3≥𝛿Hk−1}, then the best response of player I consists in 𝜇1 = I{x1≥𝛿Hk−1}, as well. Thus and so, Hk = 120 1 ∫ 0 𝜇1(x1) ⋅ Gk(x1)dx1 +120 1 ∫ 0 x1dx1 1−x1 ∫ 0 {( x1 − 𝛿Hk−1 ) 𝜇2𝜇3 + 𝛿Hk−1 } x2(1 − x1 − x2)dx2 = 40𝛿2H2 k−1 1−2𝛿Hk−1 ∫ 𝛿Hk−1 x1 ( x1 − 𝛿Hk−1 )( 3 − 3x1 − 2𝛿Hk−1 ) dx1 +40 1−Hk−1 ∫ 1−2Hk−1 x1 ( x1 − 𝛿Hk−1 ) (1 − x1 + 2𝛿Hk−1)(1 − x1 − 𝛿Hk−1)2dx1 +20 1−2Hk−1 ∫ 0 x1(x1 − 𝛿Hk−1)(1 − x1 − 2𝛿Hk−1) ⋅ ⋅(1 + 2𝛿Hk−1 − 2𝛿2H2 k−1 − 2x1 − 2𝛿Hk−1x1 + x2 1)dx1 + 𝛿Hk−1. And finally, we derive the recurrent formula Hk = 𝛿Hk−1 + 1 3 ( 1 − 3𝛿Hk−1 )( 1 − 90𝛿4H4 k−1 + 120𝛿5H5 k−1 ) . 6.3.3 Negotiations of n players This subsection deals with the general case of negotiations engaging n participants. Decision making requires, at least, p ≥ 1 votes. Assume that ki = 1, i = 1, … , n. The joint density function of the Dirichlet distribution is described by f(x1, … , xn) = (n − 1)!, where xi > 0, i = 1, … , n and n∑ i=1 xi = 1. www.it-ebooks.info NEGOTIATION MODELS 179 Let Hn k indicate the value of this game at shot k. Introduce the symbols 𝜇1 = 𝜇 and 𝜇0 = 1 − 𝜇. In the sequel, we use the notation 𝜇𝜎, where 𝜎 = {0, 1}. Accordingly, Hn k = (n − 1)!sup𝜇1 ⎧ ⎪ ⎨ ⎪⎩ 1 ∫ 0 1−x1 ∫ 0 … 1−x1−…−xn−2 ∫ 0 ∑ (𝜎1𝜎2…𝜎n) ⎧ ⎪ ⎨ ⎪⎩ 𝜇𝜎1 1 𝜇𝜎2 2 … 𝜇𝜎n n ⋅ ⋅ [x1,if n∑ i=1 𝜎i ≥ p 𝛿Hn k−1,if n∑ i=1 𝜎i < p ] } dx1dx2 … dxn−1 } = (n − 1)!sup𝜇1 { 1 ∫ 0 1−x1 ∫ 0 … 1−x1−…−xn−2 ∫ 0 𝜇1 ⋅ ⋅ ∑ (𝜎2…𝜎n) {𝜇𝜎2 2 … 𝜇𝜎n n ⋅ F1,k } dx1 … dxn−1 + 1 ∫ 0 1−x1 ∫ 0 … 1−x1−…−xn−2 ∫ 0 (1 − 𝜇1) ⋅ ∑ (𝜎2…𝜎n) {𝜇𝜎2 2 … 𝜇𝜎n n ⋅ F2,k } dx1 … dxn−1 } , where F1,k = ⎡ ⎢ ⎢ ⎢⎣ x1,if n∑ i=2 𝜎i ≥ p − 1 𝛿Hn k−1,if n∑ i=2 𝜎i < p − 1 F2,k = ⎡ ⎢ ⎢ ⎢⎣ x1,if n∑ i=2 𝜎i ≥ p 𝛿Hn k−1,if n∑ i=2 𝜎i < p. Certain transformations and grouping by 𝜇1 bring to Hn k =sup𝜇1 (n − 1)! 1 ∫ 0 1−x1 ∫ 0 … 1−x1−…−xn−2 ∫ 0 𝜇1 ⋅ ⋅ ∑ (𝜎2…𝜎n) {𝜇𝜎2 2 … 𝜇𝜎n n ⋅ ( F1,k − F2,k )} dx1 … dxn−1 + (n − 1)! ⋅ ⋅ 1 ∫ 0 1−x1 ∫ 0 … 1−x1−…−xn−2 ∫ 0 ∑ (𝜎2…𝜎n) {𝜇𝜎2 2 … 𝜇𝜎n n ⋅ F2,k } dx1 … dxn−1. (3.8) www.it-ebooks.info 180 MATHEMATICAL GAME THEORY AND APPLICATIONS Then Fk = F1,k − F2,k = ⎡ ⎢ ⎢⎣ x1 − 𝛿Hn k−1,ifp − 1 ≤ n∑ i=1 𝜎i < p 0, otherwise. The inequality p − 1 ≤ n∑ i=1 𝜎i < p appears equivalent to n∑ i=1 𝜎i = p − 1; under this condi- tion, we have Fk ≠ 0. As a result, the first integral in (3.8) admits the representation 1 ∫ 0 𝜇1(x1)dx1 1−x1 ∫ 0 … 1−x1−…−xn−2 ∫ 0 ∑ (𝜎2…𝜎n) {𝜇𝜎2 2 … 𝜇𝜎n n ⋅ Fk } dx2 … dxn−1 = 1 ∫ 0 𝜇1(x1)Gn k(x1)dx1. Select 𝜇i = I{xi≥a}, i = 2, … , n and find the best response of player I. Actually, it becomes 𝜇1(x1) = { 1, if Gn k(x1) ≥ 0 0, otherwise. To evaluate Gn k(x1), we introduce the notation Si(1) = {xi : xi ≥ a} ∩ [0, 1 − x1 −…−xi−1] and Si(0) = {xi : xi < a} ∩ [0, 1 − x1 −…−xi−1] for i = 2, n − 1. Then Gn k(x1) can be reexpressed by Gn k(x1) = ∑ (𝜎′ 2…𝜎′ n−1) ∫ S2(𝜎′ 2) … ∫ Sn−1(𝜎′ n−1) ∑ (𝜎2…𝜎n) {𝜇𝜎2 2 … 𝜇𝜎n n ⋅ Fk } dx2 … dxn−1. The following equality holds true: ∫ S2(𝜎′ 2) … ∫ Sn−1(𝜎′ n−1) ∑ (𝜎2…𝜎n) {𝜇𝜎2 2 … 𝜇𝜎n n ⋅ Fk(x1, 𝜎2, … , 𝜎n) } dx2 … dxn−1 = ∫ S2(𝜎′ 2) … ∫ Sn−1(𝜎′ n−1) { 1 ⋅ Fk(x1, 𝜎′ 2, … , 𝜎′ n−1, 𝜎n) } dx2 … dxn−1. www.it-ebooks.info NEGOTIATION MODELS 181 Note that Fk(x1, 𝜎2, … , 𝜎n) ≠ 0, if n∑ i=2 𝜎i = p − 1. The number of sets (𝜎2, … , 𝜎n) such that Fk ≠ 0 equals Cp−1 n−1. Hence, Gn k(x1) = Cp−1 n−1 ( x1 − 𝛿Hn k−1 ) ⋅ ⋅ 1−x1 ∫ 0 … 1−x1−…−xn−2 ∫ 0 I { ∩p i=2{xi ≥ a} ∩n i=p+1 {xi < a} } dx2 … dxn−1. (3.9) Thus, the optimal strategy of player I belongs to the class of threshold strategies; its threshold makes up a = 𝛿Hn k−1. Theorem 6.9 Consider the cake cutting game with n players. The optimal strategies of players at shot k take the form 𝜇i(xi) = I{xi≥𝛿Hk−1}, i = 1, … , n. The value of this game meets the recurrent expressions Hn k = (n − 1)! { 1 ∫ 𝛿Hn k−1 Gn k(x1)dx1 + ∫ 1 0 dx1 ∑ (𝜎2…𝜎n−1) ∫ S2(𝜎2) … ∫ Sn−1(𝜎n−1) F2,kdx2 … dxn−1 } . (3.10) 6.3.4 Negotiations of n players. Complete consent Consider the cake cutting problem with n participants, where decision making requires complete consent: p = n. In this case, the optimality equation is defined by Hn k = (n − 1)! 1 ∫ 𝛿Hn k−1 Gn k(x1)dx1 + 𝛿Hn k−1. (3.11) According to (3.9), the function Gn k(x) acquires the form Gn k(x1) = ( x1 − 𝛿Hn k−1 ) 1−x1 ∫ 𝛿Hn k−1 … 1−x1−…−xn−2 ∫ 𝛿Hn k−1 dx2 … dxn−1 = ⎧ ⎪ ⎨ ⎪⎩ ( x1−𝛿Hn k−1 )( 1−x1−(n−1)𝛿Hn k−1 )n−2 (n−2)! , x1 ≤ 1 − (n − 1)𝛿Hn k−1 0, x1 > 1 − (n − 1)𝛿Hn k−1 . www.it-ebooks.info 182 MATHEMATICAL GAME THEORY AND APPLICATIONS Substitute this result into (3.11) and apply certain simplifications to get the recurrent equation Hn k = 𝛿Hn k−1 + (1 − n𝛿Hn k−1)n n . (3.12) Therefore, we have arrived at the following statement. Theorem 6.10 Consider the cake cutting problem with n players and decision making by complete consent. The optimal strategies of players at shot k are determined by 𝜇i(xi) = I{xi≥𝛿Hk−1}, i = 1, … , n. The value of this game Hn k satisfies the recurrent formulas (3.12). Remark 6.2 The described stochastic procedure of cake allocation can be adapted to dif- ferent real situations. If participants possess identical weights, the parameters of the Dirichlet distribution should coincide. In this case, the cake allocation procedure guarantees equal opportunities for all players. However, if a certain participant has a higher weight, increase its parameter in the Dirichlet distribution. Moreover, the solution depends on the length of negotiations horizon. 6.4 Models of tournaments Generally, negotiations cover not just a single point (e.g., price, the volume of supplies or the period of a contract), but a package of interconnected points. Any changes in a point of a package may affect the rest points and terms. Compiling a project, one should maximize the expected profits and take into account the behavior of opponents. Suppose that players i ∈ N = {1, 2, … , n} submit their projects for a tournament. Projects are characterized by a set of parameters xi = (xi 1, … , xi m). An arbitrator or arbitration commit- tee considers the incoming projects and chooses a certain project by a stochastic procedure with a known probability distribution. The winner (player k) obtains the payoff hk(xk), which depends on the parameters of his project. Assume that project selection employs a multidi- mensional arbitration procedure choosing the project closest to the arbitrator’s opinion. 6.4.1 A game-theoretic model of tournament organization Study the following non-cooperative n player game with zero sum. Players {1, 2, … , n} submit projects for a tournament. Projects are characterized by the vectors {x1, … , xn} from some feasible set S in the space Rm. For instance, the description of a project can include required costs, implementation period, the number of employees, etc. An arbitrator analyzes the incoming proposals and choose a project by the following stochastic procedure. In the space Rm, it is necessary to model a random vector a with a certain probability distribution 𝜇(x1, … , xm) (all tournament participants are aware of this distribution). The vector a is called arbitrator’s decision. A project xk located at the shortest distance to a represents the winner of this tournament. The corresponding player k receives the payoff hk(xk), which depends on the project parameters. Another interpretation of the vector a consists in the following. This www.it-ebooks.info NEGOTIATION MODELS 183 Figure 6.11 The Voronoi diagram on the set of projects. is a set of expert’s opinions, where each component specifies the decision of an appropriate expert. Moreover, experts can be independent or make correlated decisions. Note that the decision of the arbitrator appears random. For a given set of projects {x1, … , xn}, the set S ⊂ Rm gets partitioned into n subsets S1, … , Sn such that, if a ∈ Sk, then the arbitrator selects project k (see Figure 6.11). The described partition is known as the Voronoi diagram. Therefore, the payoff of player k in this game can be defined through the mean value of his payoff as the arbitrator’s decision hits the set Sk, i.e., Hk(x1, … , xn) = ∫Sk hk(xk)𝜇(dx1, … , dxn) = hk(xk)𝜇(Sk), k = 1, … , n. And so, we seek for a Nash equilibrium in this game—a strategy profile x∗ = (x1, … , xn) such that Hk(x∗||yk) ≤ Hk(x∗), ∀yk, k = 1, … , n. For the sake of simplicity, consider the two-dimensional case when a project is described by a couple of parameters. Suppose that players have submitted their projects xi = (xi, yi), i = 1, … , n for a tournament, and two independent arbitrators assess them. The decision of the arbitrators is modeled by a random vector in the 2D space, whose density function takes the form f(x, y) = g(x)g(y). For definiteness, focus on player 1. The set S1 (corresponding to the approval of his project) represents a polygon with sides li1 , … , lik .Herelj designates a straight-line segment www.it-ebooks.info 184 MATHEMATICAL GAME THEORY AND APPLICATIONS passing through the bisecting point of the segment [x1, xj] perpendicular to the latter (see Figure 6.11). Clearly, the boundary lj satisfies the equation x(x1 − xj) + y(y1 − yj) = x2 1 + y2 1 − x2 j − y2 j 2 , or y = lj(x) =− x1 − xj y1 − yj x + x2 1 + y2 1 − x2 j − y2 j 2(y1 − yj) . Let xij , j = 1, … , k denote the abscissas of all vertices of the polygon S1. For convenience, we renumber them such that xi0 ≤ xi1 ≤ xi2 ≤ … ≤ xik ≤ xik+1 , where xi0 =−∞, xik+1 =∞. All interior points (x, y) ∈ S1 meet the following condition. The function lij (x) possesses the same sign as lij (x1), or lij (x)lij (x1) > 0, j = 1, … , k. In this case, the measure 𝜇(S1) admits the representation 𝜇(S1) = k+1∑ j=0 xij+1 ∫ xij g(x)dx ∫ lij (x)lij (x1)>0,j=1,…,k g(y)dy. A similar formula can be derived for any domain Si, i = 1, … , n. 6.4.2 Tournament for two projects with the Gaussian distribution Consider the model of a tournament with two participants and zero sum, where projects are characterized by two parameters. For instance, imagine a dispute on the partition of property (movable estate x and real estate y). Player I strives for maximizing the sum x + y, whereas the opponent (player II) seeks to minimize it. Suppose that, settling such a dispute, an arbitrator applies a procedure with the Gaussian distribution f(x, y) = 1 2𝜋 exp{−(x2 + y2)∕2}. Players submit their offers (x1, y1) and (x2, y2). The 2D space of arbitrator’s decisions is divided into two subsets, S1 and S2. Their boundary represents a straight line passing through the bisecting point of the segment connecting the points (x1, y1) and (x2, y2) (see Figure 6.12). The equation of this line is defined by y =−x1 − x2 y1 − y2 x + x2 1 − x2 2 + y2 1 − y2 2 2(y1 − y2) . www.it-ebooks.info NEGOTIATION MODELS 185 Figure 6.12 A tournament with two projects in the 2D space. Therefore, the payoff of player I in this game acquires the form H(x1, y1; x2, y2) = (x1 + y1)𝜇(S1) = (x1 + y1) ∫R ∫R f(x, y)I { y ≥ −x1 − x2 y1 − y2 x + (x2 1 − x2 2 + y2 1 − y2 2) 2(y1 − y2) } dx dy, (4.1) where I{A} means the indicator of the set A. Problem’s symmetry dictates that, in the optimal strategies, all players have identical values of the parameters. Let x2 = y2 =−a. Then it appears from (4.1) that H(x1, y1) = (x1 + y1) ∫R ∫R f(x, y)I { y ≥ −x1 + a y1 + ax + (x2 1 + y2 1 − 2a2) 2(y1 + a) } dxdy. The best response of player I satisfies the condition 𝜕H 𝜕x1 = 0, 𝜕H 𝜕y1 = 0. And so, 𝜕H 𝜕x1 = 𝜇(S1) + (x1 + y1) 𝜕𝜇(S1) 𝜕x1 = 𝜇(S1) + (x1 + y1) ∫R 1 2𝜋 x − x1 y1 + a exp ⎧ ⎪ ⎨ ⎪⎩ −1 2 ⎛ ⎜ ⎜⎝ x2 + ( −x1 + a y1 + ax + x2 1 + y2 1 − 2a2 2(y1 + a) )2⎞ ⎟ ⎟⎠ ⎫ ⎪ ⎬ ⎪⎭ dx. (4.2) www.it-ebooks.info 186 MATHEMATICAL GAME THEORY AND APPLICATIONS Equate the expression (4.2) to zero and require that the solution is achieved at the point x1 = y1 = a. This leads to the optimal value of the parameter a. Note that, owing to symmetry, 𝜇(S1) = 1∕2. Consequently, 1 2 − 2a ∫R 1 2𝜋 exp { −1 2(x2 + x2) } −x + a 2a dx = 0, whence it follows that ∫ ∞ −∞ (−x + a) 1 2𝜋 e−x2 dx = 1 2 . Finally, we obtain the optimal value a = √ 𝜋. Readers can easily verify the sufficient maximum conditions of the function H(x, y)atthe point (a, a). Therefore, the optimal strategies of players in this game consist in the offers (− √ 𝜋, − √ 𝜋) and ( √ 𝜋, √ 𝜋), respectively. 6.4.3 The correlation effect We have studied the model of a tournament, where projects are assessed by two criteria and arbitrator’s decisions represent Gaussian random variables. Consider the same problem under the assumption that arbitrator’s decisions are dependent. This corresponds to the case when each criterion belongs to a separate expert, and the decisions of experts are correlated. Suppose that the winner is selected by a procedure with the Gaussian distribution f(x, y) = 1 2𝜋 √ 1−r2 exp{− 1 2(1−r2) (x2 + y2 − 2rxy)}. Here r : r ≤ 1 means the correlation factor. Again, we take advantage of the problem symmetry. Imagine that player II adheres to the strategy (−a, −a) and find the best response of player I in the form (x1 = y1 = a). Perform differentiation of the payoff function (4.1) with the new distribution, and substitute the values x1 = y1 = a to derive the equation ∫ ∞ −∞ (−x + a) 1 2𝜋 √ 1 − r2 e− x2 1−r dx = 1 2 . Its solution yields a = √ 𝜋(1 + r). Obviously, the relationship between arbitrator’s decisions allows to increase the optimal offers of the players. www.it-ebooks.info NEGOTIATION MODELS 187 S2 S1 S3 y xu (x1,y1) (-a,0) (0,-a) y=u y=v Figure 6.13 A tournament with three projects in the 2D space. 6.4.4 The model of a tournament with three players and non-zero sum Now, analyze a tournament of projects submitted by three players. Here player I aims at maximizing the sum x + y, whereas players II and III strive for minimization of x and y, respectively. Suppose that an arbitrator is described by the Gaussian distribution in the 2D space: f(x, y) = g(x)g(y), where g(x) = 1√ 2𝜋 exp{−x2∕2}. As usual, we utilize the problem symmetry. The optimal strategies must have the following form: for player I:(c, c), for player II:(−a,0), for player III:(0,−a). To evaluate a and c, we proceed as follows. Assume that players II and III submit to the tournament the projects (−a, 0) and (0, −a), respectively. On the other hand, player I submits the project (x1, y1), where x1, y1 ≥ 0. In this case, the space of projects gets decomposed into three sets (see Figure 6.13) delimited by the lines y = x and l2 : y =−x1 + a y1 x + x2 1 + y2 1 − a2 2y1 , l3 : y =− x1 y1 + ax + x2 1 + y2 1 − a2 2(y1 + a) . The three lines intersect at the same point x = y = x0, where x0 = x2 1 + y2 1 − a2 2(x1 + y1 + a) . www.it-ebooks.info 188 MATHEMATICAL GAME THEORY AND APPLICATIONS S2 1 S1 S3 y x z (c,c) (-a,0) (0,-b) Figure 6.14 A tournament with three projects in the 2D space. We are mostly concerned with the domain S1 having the boundaries l2 and l3. Reexpress the payoff of player I as H1(x1, y1) = (x1 + y1) [ ∫ x0 −∞ g(x)dx ∫ ∞ u g(y)dy + ∫ ∞ x0 g(x)dx ∫ ∞ v g(y)dy ] , (4.3) where u =−x1 + a y1 x + x2 1 + y2 1 − a2 2y1 , v =− x1 y1 + ax + x2 1 + y2 1 − a2 2(y1 + a) . Further simplifications of (4.3) yield H1(x1, y1) = (x1 + y1) [ 1 − ∫ x0 −∞ g(x)G(u)dx − ∫ ∞ x0 g(x)G(v)dx ] , (4.4) where G(x) is the Gaussian distribution function. The maximum of (4.4) is attained under x1 = y1 = c; actually, it appears a certain function of a. Now, fix a strategy (c, c) of player I such that c > 0. Suppose that player III chooses the strategy (0, −b) and seek for the best response (−a, 0) of player II to the strategies adopted by the opponents. The space of projects is divided into three domains (see Figure 6.14). The boundaries of the domain S2 are defined by l1 : y =−c + a c x + 2c2 − a2 2c www.it-ebooks.info NEGOTIATION MODELS 189 and l3 : y = a bx − b2 − a2 2b . The intersection point of these domains possesses the abscissa z = ( 2c2 − a2 2c − a2 − b2 2b ) 1 a∕b + 1 + a∕c . And the payoff of player II constitutes H2(a) = a [ ∫ z −∞ g(x)dx ∫ v2 v1 g(y)dy ] = a[∫ z −∞ ( G(v2) − G(v1) ) f(x)dx, (4.5) where v1 = a bx − b2 − a2 2b , v2 =−c + a c x + 2c2 − a2 2c . Due to the considerations of symmetry, the minimum of (4.5) must be attained at a = b. These optimization problems yield the optimal values of the parameters a and c. Numerical simulation leads to the following approximate values of the optimal parameters: a = b ≈ 1.7148, c ≈ 1.3736. The equilibrium payoffs of the players make up H1 ≈ 0.920, H2 = H3 ≈ 0.570, and the probabilities of entering the appropriate domains equal 𝜇(S1) ≈ 0.335, 𝜇(S2) = 𝜇(S3) ≈ 0.332. Remark 6.3 The game-theoretical model of tournaments with arbitration procedures admits a simple implementation in a software environment. To solve a practical task (e.g., house making), one organizes a tournament and creates a corresponding commission. Experts (arbitrators) assess this task in terms of each parameter. Subsequently, it is necessary to construct a probability distribution which agrees with the opinions of experts. Then players submit their offers for the tournament. The commission may immediately reject the projects whose parameter values are dominated by other projects. And the phase of www.it-ebooks.info 190 MATHEMATICAL GAME THEORY AND APPLICATIONS winner selection follows. The decisions of an arbitrator (or several arbitrators) are modeled by random variables in the space of projects. The winner is the project lying closer to the arbitrator’s decision. Voting takes place in the case of the arbitration committee. 6.5 Bargaining models with incomplete information Negotiations accompany any transactions on a market. Here participants are sellers and buyers. In recent years, such transactions employ the system of electronic tenders. There exist different mechanisms of negotiations. We begin with the double auction model proposed by K. Chatterjee and W.F. Samuelson [1983]. 6.5.1 Transactions with incomplete information Consider a two-player game with incomplete information. It engages player I (a seller) and player II (a buyer). Each player possesses private information unavailable to the opponent. Notably, the seller knows the manufacturing costs of a product (denote them by s), whereas the buyer assigns some value b to this product. These quantities are often called reservation prices. Assume that reservation prices (both for sellers and buyers) have the uniform distribution on a market. In other words, if we select randomly a seller and buyer on the market, their reservation prices s and b represent independent random variables with the uniform distribution within the interval [0, 1]. Players appear on the market and announce their prices for a product, S and B, respectively. Note that these quantities may differ from the reservation prices. Actually, we believe that S = S(s) and B = B(b)—they are some functions of the reservation prices. The transaction occurs if B ≥ S. A natural supposition claims that S(s) ≥ s and B(b) ≤ b, i.e., a seller overprices a product and a buyer underprices it. If the transaction takes place, we suppose that the negotiated price is (S(s) + B(b))∕2. In fact, players gain the difference between the reservation prices and the negotiated price: (S(s) + B(b))∕2 − s (the seller) and b − (S(s) + B(b))∕2 (the buyer). Recall that b and s are random variables, and we define the payoff functions as the mean values Hs(B, S) = Eb,s ( S(s) + B(b) 2 − s ) I{B(b)≥S(s)} (5.1) and Hb(B, S) = Eb,s ( b − S(s) + B(b) 2 ) I{B(b)≥S(s)}. (5.2) The stated Bayesian game includes the functions B(b) and S(s) as the strategies of players. It seems logical that these are non-decreasing functions (the higher the seller’s costs or the buyer’s price, the greater are the offers of the players). Find a Bayesian equilibrium in the game with the payoff functions (5.1)–(5.2). www.it-ebooks.info NEGOTIATION MODELS 191 1 4 3 4 1 b,s 1 4 3 4 1 S s B b Figure 6.15 Optimal strategies. Theorem 6.11 The optimal strategies in the transaction problem have the following form: B(b) = { b if b ≤ 1 4 , 2 3 b + 1 12 if 1 4 ≤ b ≤ 1, (5.3) S(s) = { 2 3 s + 1 4 if 0 ≤ s ≤ 3 4 , s if 3 4 ≤ s ≤ 1. (5.4) Moreover, the probability of transaction constitutes 9∕32, and each player gains 9∕64. Proof: The strategies (5.3)–(5.4) are illustrated in Figure 6.15. Assume that the buyer selects the strategy (5.3) and establishes the best response of the seller under different values of the parameter s. Let s ≥ 1∕4. Then the transaction occurs under the condition B(b) ≥ S. By virtue of (5.3), this is equivalent to 2 3b + 1 12 ≥ S. The last inequality can be reduced to b ≥ 3 2S − 1 8 , where b denotes a random variable with the uniform distribution on [0, 1]. The seller’s payoff acquires the form Hs(B, S) = Eb ( S + B(b) 2 − s ) I{B(b)≥S} = ∫ 1 3 2 S− 1 8 ( 2 3 b + 1 12 + S 2 − s ) db = 3 128(−3 + 16s − 12S)(−3 + 4S). (5.5) www.it-ebooks.info 192 MATHEMATICAL GAME THEORY AND APPLICATIONS 1 4 3 4 1 b 1 4 3 4 1 S B b S s Figure 6.16 The domain of successful negotiations (the transaction domain). As a matter of fact, this curve draws a parabola with the roots of 3∕4 and 4∕3s − 1∕4. The maximum is achieved under S = 1 2 (4 3s − 1 4 + 3 4 ) = 2 3s + 1 4 . (5.6) Interestingly, if s > 3∕4, the quantity (5.6) appears smaller than s. Therefore, the best response of the seller becomes S(s) = s. And so, in the case of s ≥ 1∕4, the best response of the seller to the strategy (5.3) is given by S =max {2 3s + 1 4, s } . Now, suppose that s < 1∕4. We demonstrate that the inequality S(s) ≥ 1∕4 holds true then. Indeed, if S(s) < 1∕4, the transaction occurs under the condition B(b) = b ≥ S. Conse- quently, the seller’s payoff makes up Hs(B, S) = Eb ( S(s) + B(b) 2 − s ) = ∫ 1 4 S (b + S 2 − s ) db + ∫ 1 1 4 ( 2 3 b + 1 12 + S 2 − s ) db =−3 4S2 + (1 2 + s ) S − s + 13 64 . (5.7) The function (5.7) increases within the interval S ∈ [0, 1∕4]. Hence, the optimal response lies in S(s) ≥ 1∕4. The payoff function Hs(B, S) acquires the form (5.5), and the optimal strategy of the seller is defined by (5.4). Similarly, readers can show that the best response of the buyer to the strategy (5.4) coincides with the strategy (5.3). The optimal strategies enjoy the property B(b) ≤ 3∕4 and S(s) ≥ 1∕4. Thus, the transaction fails if b < 1∕4ors > 3∕4 (see Figure 6.16). However, if b ≥ 1∕4 and s ≤ 3∕4, the transaction www.it-ebooks.info NEGOTIATION MODELS 193 takes place under B(b) ≥ S(s), which is equivalent to b ≥ s + 1∕4. Now, we compute the probability that the transaction occurs with the optimal behavior of the players: P{B(b) > S(s)} = ∫ 1 1 4 ∫ b− 1 4 0 dsdb = 9 32 ≈ 0.281. In this case, the payoffs of the players equal Hs = Hb = ∫ 1 1 4 ∫ b− 1 4 0 ( 2∕3b + 1∕12 + 2∕3s + 1∕4 2 − s ) dsdb = 9 128 ≈ 0.070. Remark 6.4 Compare the payoffs ensured by optimal behavior and honest negotiations (when players announce true reservation prices). Evidently, in the equilibrium the proba- bility of transaction P{B(b) > S(s)} = 0.281 turns out appreciably smaller than in honest negotiations (P{b ≥ s} = 0.5). Furthermore, the corresponding mean payoff of 0.070 is also considerably lower than in the case of truth-telling: ∫ 1 0 ds ∫ b 0 ( b+s 2 − s ) db = 1∕12 ≈ 0.0833. In this sense, the transaction problem is equivalent to prisoners’ dilemma, where a good solution becomes unstable. The equilibrium solution yields slightly smaller payoffs to the players than their truth-telling. 6.5.2 Honest negotiations in conclusion of transactions The notion of “honesty” plays a key role in the transaction problem. A transaction is called honest, if its equilibrium belongs to the class of pure strategies and the optimal strategies have the form B(b) = b and S(s) = s. In other words, players benefit by announcing the reservation prices in honest transactions. To make the game honest, we redefine it. There exist two approaches as follows. The honest transaction model with a bonus. Assume that, having concluded a transaction, players receive some bonus. Let ts(B, S) and tb(B, S) designate the seller’s bonus and the buyer’s bonus, respectively. Then the payoff functions in this game acquire the form Hs(B, S) = Eb,s ( S(s) + B(b) 2 − s + ts(B(b), S(s)) ) I{B(b)≥S(s)} and Hb(B, S) = Eb,s ( b − S(s) + B(b) 2 + tb(B(b), S(s)) ) I{B(b)≥S(s)}. www.it-ebooks.info 194 MATHEMATICAL GAME THEORY AND APPLICATIONS It appears that, if the functions ts(B, S) and tb(B, S) are selected as (B−S)+ 2 ,thegame becomes honest. Indeed, if ts(B, S) = tb(B, S) = (B − S)+ 2 , then, for an arbitrary strategy B(b) of the buyer, the seller’s payoff constitutes Hs(B, S) = Eb ( S + B(b) 2 − s + B(b) − S 2 ) I{B(b)≥S} = Eb (B(b) − s) I{B(b)≥S}. (5.8) The integrand in (5.8) is non-negative, so long as B(b) ≥ S(s) ≥ s.AsS goes down, the payoff (5.8) increases, since the domain in the integral grows. Hence, for a given s,the maximum of (5.8) corresponds to the minimal value of S(s). Consequently, S(s) = s. Similarly, we can argue that, for an arbitrary strategy of the seller, the buyer’s optimal strategy acquires the form B(b) = b. The honest transaction model with a penalty. A shortcoming of the previous model concerns the following. Honest negotiations require that somebody pays the players for concluding a transaction. Moreover, players may act in collusion to receive the maximal bonus from the third party. For instance, they can announce extreme values and share the bonus equally. Another approach to honest negotiations dictates that players pay for participation in the transaction. Denote by qs(B, S)(qb(B, S)) the residual payoff of the seller (buyer, respectively) after transaction; of course, these quantities make sense if B(b) ≥ S(s). And the payoffs of the players are defined by Hs(B, S) = Eb,s ( S(s) + B(b) 2 − s ) qs(B(b), S(s))I{B(b)≥S(s)} and Hb(B, S) = Eb,s ( b − S(s) + B(b) 2 ) qb(B(b), S(s))I{B(b)≥S(s)}. Choose the functions qs, qb as qs = (B(b) − S(s))cs, qb = (B(b) − S(s))cb, where cs, cb stand for positive constants. Such choice of penalties has the following grounds. It stimulates players to increase the difference between their offers and compel their truth- telling. Now, we establish this fact rigorously. For convenience, let cs = cb = 1. www.it-ebooks.info NEGOTIATION MODELS 195 Suppose that the buyer’s strategy represents some non-decreasing function B(b). Then the seller’s payoff acquires the form Hs(B, S) = Eb ( S + B(b) 2 − s ) (B(b) − S)I{B(b)≥S} = ∫ 1 B−1(S) ( B2(b) − S2 2 − s(B(b) − S) ) db (5.9) The function (5.9) decreases with respect to S. Really, 𝜕Hs 𝜕S = ∫ 1 B−1(S) (−S + s)db ≤ 0. It follows that the maximal value of the seller’s payoff is attained under the minimal admissible value of S(s). So long as S(s) ≥ s, the optimal strategy is honest: S(s) = s.By analogy, one can show that the above choice of the payoff functions brings to the honest optimal strategy of the buyer: B(b) = b. 6.5.3 Transactions with unequal forces of players In the transaction model studied above, players are “in the same box.” This fails in a series of applications. Assume that, if players reach an agreement, the transaction is concluded at the price of kS(s) + (1 − k)B(b). Here k ∈ (0, 1) indicates a parameter characterizing “the alignment of forces” for sellers and buyers. In the symmetrical case, we have k = 1∕2. If k = 0(k = 1), the interests of the buyer (seller, respectively) are considered only. Accordingly, a game with incomplete information arises naturally, where the payoff functions are defined by Hs(B, S) = Eb,s (kS(s) + (1 − k)B(b) − s) I{B(b)≥S(s)} and Hb(B, S) = Eb,s (b − kS(s) − (1 − k)B(b)) I{B(b)≥S(s)}. Similar reasoning yields the following result. Theorem 6.12 Consider the transaction problem with a force level k. The optimal strategies of the players have the form B(b) = { b if b ≤ 1−k 2 , 1 1+k b + (1−k)k 2(1+k) if 1−k 2 ≤ b ≤ 1, S(s) = { 1 2−k s + 1−k 2 if 0 ≤ s ≤ 2−k 2 , s if 2−k 2 ≤ s ≤ 1. www.it-ebooks.info 196 MATHEMATICAL GAME THEORY AND APPLICATIONS 1 4 1 2 3 4 1 b 1 4 1 2 3 4 1 S k 0.5 k 1 k 0 Figure 6.17 The transaction domain. And the domain of successful negotiations B(b) ≥ S(s) (the transaction domain) is given by b ≥ 1 + k 2 − ks + 1 − k 2 . This domain varies depending on k. In the case of k = 0, we obtain b ≥ s + 1∕4; if k = 1, then b ≥ 2s (see Figure 6.17). Recall that, in the symmetrical case, the probability of transaction equals 9∕32 ≈ 0.281 (which is higher than the corresponding probability under k = 0ork = 1, i.e., 1∕4 = 0.25). 6.5.4 The “offer-counteroffer” transaction model Transactions with non-equal forces of players make payoffs essentially dependent on the force level k (see Section 6.5.3). Moreover, if k = 0ork = 1, the payoffs of players de facto depend on the behavior of one player (known as a strong player). This player moves by announcing a certain price. If the latter exceeds the reservation price of another player, the transaction occurs. Otherwise, the second side makes its offer. The described model of transactions is called a sealed-bid auction (see Perry [1986]). Suppose that the reservation prices of sellers and buyers have a non-uniform distribution on the interval [0, 1] with distribution functions F(s) and G(b), respectively, where b ∈ [0, 1]. The corresponding density functions are f(s), g(b), s, b ∈ [0, 1]. Imagine that the first offer is made by the seller. Another case, when the buyer moves first, can be treated by analogy. Under a reservation price s, he may submit an offer S(s) ≥ s.A random buyer purchases this product with the probability of 1 − G(S). Therefore, the seller’s payoff becomes Hs(S) = (S − s)(1 − G(S)). www.it-ebooks.info NEGOTIATION MODELS 197 The maximum of this function follows from the equation 1 − G(S) − g(S)(S − s) = 0. For instance, in the case of the uniform distribution of buyers, the optimal strategy of the seller satisfies the equation 1 − S − (S − s) = 0, whence it appears that S = 1 + s 2 . Figure 6.17 demonstrates the transaction domain b ≥ S(s) which corresponds to k = 1. The probability of transaction equals 1∕4 = 0.25. The seller’s payoff makes up Hs = ∫ 1 1∕2 ∫ 2b−1 0 (1 + s 2 − s ) dsdb = 1 12, whereas the buyer’s payoff constitutes Hb = ∫ 1 1∕2 ∫ 2b−1 0 ( b − 1 + s 2 ) dsdb = 1 24 . In the mean, the payoff of the buyers is two times smaller than that of the sellers. Now, analyze the non-uniform distribution of reservation prices on the market. For instance, suppose that the density function possesses the form g(b) = 2(1 − b). This agrees with the case when many buyers value the product at a sufficiently low price. The optimal strategy of the seller meets the equation 1 − (2S − S2) − 2(1 − S)(S − s) = 0. Therefore, S = (1 + 2s)∕3. In comparison with the uniform case, the seller should reduce the announced price. 6.5.5 The correlation effect Up to this point, we have discussed the case of independent random variables representing the reservation prices of sellers and buyers. On a real market, reservation prices may be interdependent. In this context, it seems important to discover the impact of reservation prices correlation on the optimal strategies and payoffs of the players. Here we consider the case when the reservation prices (b, s) have the joint density function f(b, s) = 1 + 𝛾(1 − 2b)(1 − 2s), b, s ∈ [0, 1]. www.it-ebooks.info 198 MATHEMATICAL GAME THEORY AND APPLICATIONS 1 s 0.1 0.2 0.3 0.4 0.5 S s r=1 r=0 Figure 6.18 The optimal strategies for different values of 𝛾. The marginal distribution with respect to each parameter is uniform, and the correlation factor equals 𝛾∕3. Assume that the seller makes the first offer. Under a reservation price s, the seller can submit an offer S(s) ≥ s. And the seller’s payoff becomes Hs(S) = ∫ 1 S (S − s)(1 + 𝛾(1 − 2s)(1 − 2b))db = (S − s)(1 − S)(1 + 𝛾(2s − 1)S). The maximum of this function lies at the point S = −1 + 𝛾(1 + s)(2s − 1) + √ 1 + 𝛾2(2s − 1)2(1 − s + s2) + 𝛾(1 + s)(2s − 1) 3𝛾(2s − 1) . Figure 6.18 shows the strategy S(s) for different values of 𝛾. Evidently, as the correlation factor grows, the optimal behavior of the seller requires further reduction in his offers. To proceed, evaluate the payoffs of the players. The seller’s payoff constitutes Hs = ∫ 1 0 ds ∫ 1 S(s) (S(s) − s) f(b, s)db = ∫ 1 0 (S(s) − s)(1 − S(s))(1 + 𝛾(2s − 1)S(s))ds, whereas the buyer receives the payoff Hb = ∫ 1 0 ds ∫ 1 S(s) (b − S(s)) f(b, s)db = 1 6 ∫ 1 0 (1 − S(s))2 (3 − 𝛾(1 − 2s)(1 + 2S(s))) ds. www.it-ebooks.info NEGOTIATION MODELS 199 The payoffs of sellers and buyers go down as the correlation of their reservation prices becomes stronger. This phenomenon admits an obvious explanation. Such tendency decreases the existing uncertainty in the price suggested by the partner. And so, a player must have moderate behavior. 6.5.6 Transactions with non-uniform distribution of reservation prices Now, suppose that the reservation prices of sellers and buyers are distributed non-uniformly on the interval [0, 1]. For instance, let the reservation prices s and b represent independent random variables with the density functions f(s) = 2s, s ∈ [0, 1] g(b) = 2(1 − b), b ∈ [0, 1]. (5.10) This corresponds to the following situation on a market. There are many sellers with high manufacturing costs of a product, and there are many buyers assessing the product at a low price. Find the optimal strategies of the players. As usual, we believe that such strategies are some functions of the reservation prices, S = S(s) and B = B(b). The transaction occurs provided that B ≥ S. If this is the case, assume that the transaction runs at the price of (S(s) + B(b))∕2. The payoff functions of the players have the form (5.1) and (5.2); in the cited expressions, the expectation is evaluated with respect to appropriate distributions. To establish an equilibrium, involve the same considerations as in subsection 6.5.1. Suppose that the buyer selects the strategy B(b) = { b if b ≤ 1 6 , 4 5 b + 1 30 if 1 6 ≤ b ≤ 1. (5.11) Find the best response of the seller under different values of the parameter s. Let s ≥ 1∕6. Then the transaction occurs under the condition B(b) ≥ S,or 4 5b + 1 30 ≥ S. The last inequality is equivalent to b ≥ 5 4S − 1 24 , where b designates a random variable with the distribution g(b), b ∈ [0, 1]. Calculate the payoff of the seller: Hs(B, S) = Eb ( S + B(b) 2 − s ) I{B(b)≥S} = ∫ 1 5 4 S− 1 24 ( 4 5 b + 1 30 + S 2 − s ) 2(1 − b)db =−25 124 (−5 + 36s − 30S)(5 − 6S)2. (5.12) www.it-ebooks.info 200 MATHEMATICAL GAME THEORY AND APPLICATIONS The derivative of this function acquires the form 𝜕Hs 𝜕S = 25 1152(5 + 24s − 30S)(5 − 6S). It appears that the maximum of the payoff (5.12) is achieved at S = 4 5s + 1 6 . If s > 5∕6, the value of S(s) becomes smaller than s. Therefore, in the case of s ≥ 1∕4, the seller’s best response to the strategy (5.11) is defined by S =max {4 5s + 1 6, s } . (5.13) Using the same technique as before, one can demonstrate optimality of this strategy in the case of s < 1∕6, either. Now, suppose that the seller adopts the strategy (5.13). Evaluate the buyer’s best response under different values of the parameter b. Let b ≤ 5∕6. The transaction occurs provided that 4 5 s + 1 6 ≤ B, which is equivalent to s ≤ 5 4B − 5 24 . Here s represents a random variable with the distribution function f(s), s ∈ [0, 1]. Find the buyer’s payoff: Hb(B, S) = Es ( b − S(s) + B 2 ) I{B≥S(s)} = ∫ 5 4 B− 5 24 0 ( b − 4 5 s + 1 6 + B 2 ) 2sds =−25 124 (1 − 36b + 30B)(1 − 6B)2. The derivative of this function takes the form 𝜕Hb 𝜕B = 25 1152(1 + 24b − 30B)(−1 + 6B). It follows that the maximal payoff is achieved at B = 4 5b + 1 30 . If b < 1∕6, the function B(b) has values higher than b. Therefore, the best response of the buyer to the strategy (5.13) becomes B =min {4 5b + 1 30, b } . Actually, we have established the following fact. www.it-ebooks.info NEGOTIATION MODELS 201 1 6 5 6 1 b,s 1 6 5 6 1 S s B b Figure 6.19 The optimal strategies of the players. Theorem 6.13 Consider the transaction problem with the reservation prices distribution (5.10). The optimal strategies of the players possess the form S =max {4 5s + 1 6, s } , B =min {4 5b + 1 30, b } . These optimal strategies are illustrated in Figure 6.19. In this situation, the transaction takes place if B(b) ≥ S(s), i.e., b ≥ s + 1∕6. Figure 6.20 demonstrates the domain of successful negotiations. Figure 6.20 The transaction domain. www.it-ebooks.info 202 MATHEMATICAL GAME THEORY AND APPLICATIONS The optimal behavior results in the transaction with the probability P{B(b) > S(s)} = ∫ 1 1 6 ∫ b− 1 6 0 2s ⋅ 2(1 − b)ds ⋅ db = 1 6 ( 5 6 )4 ≈ 0.080. This quantity is smaller than the probability of honest transaction P{b > s} = ∫ 1 0 ∫ b 0 2s ⋅ 2(1 − b)ds ⋅ db = 1 6 ≈ 0.166. Moreover, the players receive the payoffs Hs = Hb = ∫ 1 1 6 ∫ b− 1 6 0 ( 4∕5b + 1∕30 + 4∕5s + 1∕6 2 − s ) 2s ⋅ 2(1 − b)ds ⋅ db = 1 36 ( 5 6 )4 ≈ 0.0133, being less than in the case of honest transaction: ̄Hs = ̄Hb = ∫ 1 0 ∫ b 0 (b + s 2 − s ) 2s ⋅ 2(1 − b)ds ⋅ db = 1 60 ≈ 0.0166. Interestingly, the honest game yields higher payoffs to the players, yet the corresponding strategy profile appears unstable. Similarly to prisoners’ dilemma, in the honest game a player feels temptation to modify his strategy. We show this rigorously. For instance, the sellers adhere to truth-telling: S(s) = s. Obtain the optimal response of the buyers. The payoff of the buyer makes up Hb(B, S) = ∫ B 0 ( b − B + s 2 ) 2sds = B2(b − 5∕6B). To define the optimal strategy, write down the derivative 𝜕Hb 𝜕B = B(2b − 5 2B). Hence, the optimal strategy of the buyer lies in B(b) = 4∕5b. And the seller’s payoff decreases twice Hs(4∕5b, s) = ∫ 1 0 ∫ 4∕5b 0 ( 4 5 b + s 2 − s ) 2s ⋅ 2(1 − b)ds ⋅ db = 1 3 (2 5 )4 ≈ 0.008. 6.5.7 Transactions with non-linear strategies We study another class of non-uniformly distributed reservation prices of buyers and sellers within the interval [0, 1]. Notably, consider linear distributions as above, but toggle their roles. In other words, suppose that the reservation prices s and b represent independent random variables with the density functions f(s) = 2(1 − s), s ∈ [0, 1] g(b) = 2b, b ∈ [0, 1]. (5.14) www.it-ebooks.info NEGOTIATION MODELS 203 This corresponds to the following situation on a market. There are many sellers supplying a product at a low prime cost and many rich buyers. Find the optimal strategies of the players. As before we believe these are some functions of the reservation prices, S = S(s) and B = B(b), respectively (naturally enough, monotonically increasing functions). Then there exist the inverse functions U = B−1 and V = S−1, where s = V(S) and b = U(B). Let us state optimality conditions for the distributions (5.14) of the reservation prices. The transaction occurs provided that B ≥ S. If the transaction takes place, we assume that the corresponding price is (S(s) + B(b))∕2. The payoff functions of the players have the form (5.1) and (5.2), where expectation engages appropriate distributions. It appears that an equilibrium is now achieved in the class of non-linear functions. For its evaluation, fix a buyer’s strategy B(b) and find the best response of the seller under different values of the parameter s. The condition B(b) ≥ S is equivalent to b ≥ U(S). The seller’s payoff equals Hs(B, S) = Eb ( S + B(b) 2 − s ) I{B(b)≥S} = ∫ 1 U(S) ( B(b) + S 2 − s ) 2bdb. (5.15) Perform differentiation with respect to S in formula (5.15). The best response of the buyer meets the condition 𝜕Hs 𝜕S =−2(S − s)U(S)U′(S) + 1 − U2(S) 2 = 0. It yields the differential equation for the optimal strategies (i.e., the inverse functions) U(B), V(S): U′(S)(S − V(S))U(S) = 1 − U2(S) 4 = 0. (5.16) By analogy, let S(s) be a seller’s strategy. We find the best response of the buyer under different values of the parameter b. Evaluate his payoff: Hb(B, S) = Es ( b − S(s) + B 2 ) I{B≥S(s)} = Es ( b − S(s) + B 2 ) I{s≤V(B)} = ∫ V(B) 0 ( b − S(s) + B 2 ) 2(1 − s)ds. (5.17) By differentiating (5.17) with respect to B, evaluate the best response of the buyer. Notably, 𝜕Hb 𝜕B = 2(b − B)V(B)V′(B) − V2(B) 2 = 0, which gives the second differential equation for the optimal strategies U(B), V(S): V′(B)(U(B) − B)(1 − V(B)) = 2V(B) − V2(B) 4 . (5.18) www.it-ebooks.info 204 MATHEMATICAL GAME THEORY AND APPLICATIONS Figure 6.21 The curves of u(x), v(x). Introduce the change of variables u(x) = U2(x), v(x) = (1 − V(x))2 into (5.16) and (5.18) to derive the system of equations u′(x)(x − 1 + √ v(x)) = 1 − u(x) 2 , v′(x)(x − √ u(x)) = 1 − v(x) 2 . (5.19) Due to (5.19), the functions u(x) and v(x) are related by the expressions u(x) = v(1 − x), v(x) = u(1 − x), u′(x) =−v′(1 − x), v′(x) =−u′(1 − x). (5.20) Rewrite the system (5.19) as x − 1 + √ v(x) = 1 − u(x) 2u′(x) , x − √ u(x) = 1 − v(x) 2v′(x) . By taking into account the expressions (5.20), we arrive at the following equation in v(x): (√ v(x) + 1 − v(x) 2v′(x) ) + (√ v(1 − x) + 1 − v((1 − x) 2v′(1 − x) ) = 1. (5.21) Suppose that the function v(x) decreases, i.e., v′(x) < 0, x ∈ [0, 1]. Formula (5.19) claims that the function u(x) lies above the parabola x2. By virtue of symmetry, v(x)isabovethe parabola 1 − x2 and u′(x) > 0. Figure 6.21 demonstrates the curves of the functions u(x) and v(x); x0 and 1 − x0 are the points where these functions equal 1. We have v(x0) = 1 at the point x0. Then the second equation in (5.19) requires that u(x0) = x2 0 and, subsequently, v(1 − x0) = x2 0. By setting x = x0 in (5.21), we obtain 1 + x0 + 1 − x2 0 2v′(1 − x0) = 1. www.it-ebooks.info NEGOTIATION MODELS 205 Figure 6.22 The curves of U(x), V(x). And it follows that v′(1 − x0) =−u′(x0) =− 1 − x2 0 2x0 . (5.22) Figure 6.22 shows the functions U(x) and V(x). Finally, it is possible to present the optimal strategies B(b) and S(s) (see Figure 6.23). The remaining uncertainty consists in the value of x0. Actually, this is the marginal threshold for successful negotiations—the seller would not agree to a lower price, whereas the buyer would not suggest a higher price than 1 − x0. Assume that the derivative v′(x0) is finite and non-zero. Apply L’Hospital’s rule in (5.19) to get v′(x0) = limx→x0+0 1 − v(x) 2(x − √ u(x)) = −v′(x0) 2 ( 1 − u′(x0) 2 √ u(x0) ). Figure 6.23 The optimal strategies. www.it-ebooks.info 206 MATHEMATICAL GAME THEORY AND APPLICATIONS And so, 1 =− √ u(x0) 2 √ u(x0) − u′(x0) , or u′(x0) = 3 √ u(x0) = 3x0. In combination with (5.22), we obtain that 1 − x2 0 2x0 = 3x0. Hence, x0 = 1∕ √ 7 ≈ 0.3779. Therefore, the following assertion has been argued. Theorem 6.14 Consider the transaction problem with the reservation prices distribution (5.14). The optimal strategies of the players possess the form S = V−1(s), B = U−1(b), where the functions u = U2, v = (1 − V)2 satisfy the system of differential equations (5.19). The corresponding transaction takes place if the prices belong to the interval [1∕ √ 7, 1 − 1∕ √ 7] ≈ [0.3779, 0.6221]. Figure 6.24 illustrates the domain of successful negotiations. It has a curved boundary. Figure 6.24 The transaction domain. www.it-ebooks.info NEGOTIATION MODELS 207 1 a 01s S(s) Figure 6.25 The seller’s strategy. 6.5.8 Transactions with fixed prices As before, we focus on the transaction models with non-uniformly distributed reservation prices. Assume that the reservation prices of the sellers and buyers (s and b) represent independent random variables. Denote the corresponding distribution functions and density functions by F(s), f(s), s ∈ [0, 1] and G(b), g(b), b ∈ [0, 1]. Suppose that the seller adopts the threshold strategy (see Figure 6.25) S(s) = { a if s ≤ a, s if a ≤ s ≤ 1. For small reservation prices, the seller quotes a fixed price a; only if s exceeds a,he announces the actual price s. Find the best response of the buyer under different values of the parameter b. Note that the transaction occurs only if the buyer’s reservation price b appears not less than a.Inthe case of b ≥ a, the transaction may take place provided that B ≥ S(s). Evaluate the buyer’s payoff: Hb(B, S) = Es ( b − S(s) + B 2 ) I{B≥S(s)} = ∫ a 0 ( b − a + B 2 ) f(s)ds + ∫ B a ( b − s + B 2 ) f(s)ds. The derivative of this function acquires the form 𝜕Hb 𝜕B = (b − B)f(B) − F(B) 2 . (5.23) If the expression (1 − B)f(B) − F(B) 2 turns out non-positive within the interval B ∈ [a,1], the derivative (5.23) takes non-positive values, either. Consequently, the maximal payoff is achieved under B(b) = a, b ∈ [a,1]. We naturally arrive at www.it-ebooks.info 208 MATHEMATICAL GAME THEORY AND APPLICATIONS 1 a 0 1 b B(b) Figure 6.26 The buyer’s strategy. Lemma 6.2 Let the seller’s strategy be defined by S(s) =max{a, s}. If the condition (1 − x)f(x) − F(x) 2 ≤ 0 holds true for all x ∈ [a,1], then the best response of the buyer lies in the strategy B(b) = min{a, b}. Similar reasoning applies to the buyer. Imagine that the latter selects the strategy B(b) = min{a, b} (see Figure 6.26). In other words, he establishes a fixed price a for high values of b and prefers truth-telling for small ones b ≤ a. The corresponding payoff of the seller makes up Hs(B, S) = Eb ( S + B(b) 2 − s ) I{B(b)≥S)} = ∫ a S (b + S 2 − s ) g(b)db + ∫ 1 a (a + S 2 − s ) g(b) db. (5.24) Again, we perform differentiation: 𝜕Hs 𝜕S = (s − S)g(S) + 1 − G(S) 2 . If for all x ∈ [0, a]: − xg(x) + 1 − G(x) 2 ≥ 0, the derivative (5.24) is non-negative. Lemma 6.3 Let the buyer’s strategy be given by B(b) =min{a, b}. If the condition xg(x) − 1 − G(x) 2 ≤ 0 www.it-ebooks.info NEGOTIATION MODELS 209 Figure 6.27 The transaction domain. holds true for all x ∈ [0, a], then the best response of the seller consists in the strategy S(s) =max{a, s}. Lemmas 6.2 and 6.3 lead to Theorem 6.15 Assume that the following inequalities are valid for some a ∈ [0, 1]: (1 − x)f(x) − F(x) 2 ≤ 0, x ∈ [a,1]; xg(x) − 1 − G(x) 2 ≤ 0, x ∈ [0, a]. Then the optimal strategies in the transaction problem have the form S(s) =max{a, s}, B(b) =min{a, b}. The domain of successful negotiations is demonstrated by Figure 6.27. Therefore, the transaction always runs at a fixed price a under the conditions of Theorem 6.15. Obviously, this is the case for the distributions F(s) = 1 − (1 − s)n, G(b) = bn, n ≥ 3, with a = 1∕2. And the equilibrium becomes S(s) =max {1 2, s } , B(b) =min {1 2, b } . The expected payoffs of the players constitute Hb = Hs = ∫ 1 1 2 nbn−1db ∫ 1 2 0 (1∕2 − s)n(1 − s)n−1ds = (2n − 1) (2n(n − 1) + 1) 2(n + 1)4n . As n grows, they converge to 1∕2(forn = 3, we have Hb = Hs = 0.232). www.it-ebooks.info 210 MATHEMATICAL GAME THEORY AND APPLICATIONS Interestingly, the conditions of Theorem 6.15 fail when the reservation prices possess the uniform distribution (n = 1) and the linear distribution (n = 2). To find an equilibrium, introduce two-threshold strategies. 6.5.9 Equilibrium among n-threshold strategies First, suppose that the seller chooses a strategy with two thresholds 𝜎1, 𝜎2 as follows: S(s) = ⎧ ⎪ ⎨ ⎪⎩ a1 if 0 ≤ s <𝜎1, a2 if 𝜎1 < s ≤ 𝜎2, s if 𝜎2 ≤ s ≤ 1. Here a1 ≤ a2 and 𝜎2 = a2. For small reservation prices, the seller quotes a fixed price a1;for medium reservation prices, he sets a fixed price a2. And finally, if s exceeds a2, the seller prefers truth-telling—announces the actual price s. Find the best response of the buyer under different values of the parameter b.Notean important aspect. The transaction occurs only if the buyer’s reservation price b is not less than a1. In the case of b ≥ a1, the transaction may take place provided that B ≥ S(s). To proceed, compute the buyer’s payoff, whose reservation price equals b.ForB : a1 ≤ B < a2, the payoff is defined by Hb(B, S) = Es ( b − S(s) + B 2 ) I{B≥S(s)} = ∫ 𝜎1 0 ( b − a1 + B 2 ) f(s)ds. (5.25) If B : a2 ≤ B ≤ b, we accordingly obtain Hb(B, S) = ∫ 𝜎1 0 ( b − a1 + B 2 ) f(s)ds + ∫ 𝜎2 𝜎1 ( b − a2 + B 2 ) f(s)ds + ∫ B 𝜎2 ( b − s + B 2 ) f(s) ds. (5.26) Recall that the transaction fails under b < a1. Therefore, B(b) may have arbitrary values. Assume that b : a1 ≤ b < a2. Then the relationship Hb(B, S) acquires the form (5.25) (since B ≤ b) and represents a decreasing function of B. The maximal value of (5.25) is attained at B = a1. And the corresponding payoff becomes Hb(a1, S) = (b − a1)F(𝜎1). (5.27) In the case of b ≥ a2, the payoff Hb(B, S) may have the form (5.26), either. Its derivative is determined by 𝜕Hb 𝜕B =−1 2F(B) + (b − B)f(B). (5.28) www.it-ebooks.info NEGOTIATION MODELS 211 Suppose that the inequality (1 − x)f(x) − F(x) 2 ≤ 0 holds true on the interval [a2, 1]. Then the expression (5.28) appears non-positive for all B ∈ [a2, 1]. Hence, the function Hb(B, S) does not increase in B. Its maximal value at the point B = a2 equals Hb(a2, S) = ( b − a1 + a2 2 ) F(𝜎1) + (b − a2)(F(𝜎2) − F(𝜎1)). (5.29) For b = a2, the expression (5.29) takes the value of a2−a1 2 F(𝜎1), which is two times smaller than the payoff (5.27). Thus, the buyer’s best response to the strategy S(s) lies in the strategy B(b) = ⎧ ⎪ ⎨ ⎪⎩ b if 0 ≤ b ≤ a1, a1 if a1 ≤ b <𝛽2, a2 if 𝛽2 ≤ b ≤ 1. Here 𝛽2 follows from equality of (5.27) and (5.29): (b − a1)F(𝜎1) = ( b − a1 + a2 2 ) F(𝜎1) + (b − a2)(F(𝜎2) − F(𝜎1)). Readers can easily verify that 𝛽2 = a2 + (a2 − a1)F(𝜎1) 2(F(𝜎2) − F(𝜎1)) . (5.30) Now, suppose that the buyer employs the strategy B(b) = ⎧ ⎪ ⎨ ⎪⎩ b if 0 ≤ b ≤ 𝛽1, a1 if 𝛽1 ≤ b <𝛽2, a2 if 𝛽2 ≤ b ≤ 1, where 𝛽1 = a1 ≤ a2 ≤ 𝛽2. What is the best response of the seller having the reservation price s? Skipping the intermediate arguments (they are almost the same as in the case of the buyer), we provide the ultimate result. Under the condition xg(x) − 1 − G(x) 2 ≤ 0, www.it-ebooks.info 212 MATHEMATICAL GAME THEORY AND APPLICATIONS the best strategy of the seller on the interval [0, a1] acquires the form S(s) = ⎧ ⎪ ⎨ ⎪⎩ a1 if 0 ≤ s <𝜎1, a2 if 𝜎1 < s ≤ 𝜎2, s if 𝜎2 ≤ s ≤ 1, where 𝜎1 = a1 − (a2 − a1)(1 − G(𝛽2)) 2(G(𝛽2) − G(𝛽1)) . (5.31) Furthermore, 𝜎1 ≤ a1 ≤ a2 = 𝜎2. Theorem 6.16 Assume that, for some constants a1, a2 ∈ [0, 1] such that a1 ≤ a2,the following inequalities are valid: (1 − x)f(x) − F(x) 2 ≤ 0, x ∈ [a2,1]; xg(x) − 1 − G(x) 2 ≤ 0, x ∈ [0, a1]. Then the players engaged in the transaction problem have the optimal strategies S(s) = ⎧ ⎪ ⎨ ⎪⎩ a1 if 0 ≤ s <𝜎1, a2 if 𝜎1 < s ≤ 𝜎2, s if 𝜎2 ≤ s ≤ 1, B(b) = ⎧ ⎪ ⎨ ⎪⎩ b if 0 ≤ b ≤ 𝛽1, a1 if 𝛽1 ≤ b <𝛽2, a2 if 𝛽2 ≤ b ≤ 1. In the previous formulas, the quantities 𝜎1 and 𝛽2 meet the expressions (5.30) and (5.31). In addition, 𝜎1 ≤ 𝛽1 = a1 ≤ a2 = 𝜎2 ≤ 𝛽2. Let us generalize this scheme to the case of n-threshold strategies. Suppose that the seller chooses a strategy with n thresholds 𝜎i, i = 1, … , n: S(s) = { ai if 𝜎i−1 ≤ s <𝜎i, i = 1, … , n s if 𝜎n ≤ s ≤ 1, where {ai}, i = 1, … , n and {𝜎i}, i = 1, … , n form a non-decreasing sequence such that 𝜎i ≤ ai, i = 1, … , n. For convenience, we believe that 𝜎0 = 0. Therefore, all sellers are divided into n + 1 groups depending on the values of their reservation prices. If the reservation price s belongs to group i i.e., s ∈ [𝜎i−1, 𝜎i), the seller www.it-ebooks.info NEGOTIATION MODELS 213 announces the price ai, i = 1, … , n. If the reservation price appears sufficiently high (s ≥ an), the seller quotes the actual price s. Find the best response of the buyer under different values of the parameter b. Note that the transaction occurs only if the buyer’s reservation price b is not less than a1. In the case of b ≥ a1, the transaction takes place provided that B ≥ S(s). Evaluate the payoff of the buyer whose reservation price equals b.Fora1 ≤ B < a2,the payoff is defined by Hb(B, S) = Es ( b − S(s) + B 2 ) I{B≥S(s)} = ∫ 𝜎1 0 ( b − a1 + B 2 ) f(s)ds = ( b − a1 + B 2 ) F(𝜎1). (5.32) Next, if ai−1 ≤ B < ai, we obtain Hb(B, S) = i−1∑ j=1 ∫ 𝜎j 𝜎j−1 ( b − aj + B 2 ) f(s)ds = i−1∑ j=1 ( b − aj + B 2 ) ( F(𝜎j) − F(𝜎j−1) ) , i = 1, … , n. (5.33) And finally, for an ≤ B ≤ b, the payoff makes up Hb(B, S) = n∑ j=1 ∫ 𝜎j 𝜎j−1 ( b − aj + B 2 ) f(s)ds + ∫ B 𝜎n ( b − aj + B 2 ) f(s)ds = n∑ j=1 ( b − aj + B 2 ) ( F(𝜎j) − F(𝜎j−1) ) + ∫ B 𝜎n ( b − aj + B 2 ) f(s) ds. (5.34) If b < a1, the transaction fails; hence, B(b) may possess arbitrary values. Set 𝛽1 = a1. Suppose that a1 ≤ b < a2. So far as B ≤ b < a2, the function Hb(B, S) has the form (5.32) and decreases in B. The maximal payoff is attained under B = a1, i.e., Hb(a1, S) = (b − a1)F(𝜎1). This function increases in b; in the point b = a2, its value becomes (a2 − a1)F(𝜎1). (5.35) Now, assume that a2 ≤ b < a3.IfB < a2, then Hb acquires the form (5.32). However, B(b) is greater or equal to a2. In this case, formula (5.33) yields Hb(B, S) = ( b − a2 + B 2 ) F(𝜎2) + a2 − a1 2 F(𝜎1). www.it-ebooks.info 214 MATHEMATICAL GAME THEORY AND APPLICATIONS The above function is maximized in B at the point B = a2: Hb(a2, S) = (b − a2)F(𝜎2) + 1 2(a2 − a1)F(𝜎1). Interestingly, the payoff in the point b = a2 is two times smaller than the one gained by the strategy B = a1 (see (5.35)). And so, switching from the strategy B = a1 to the strategy B = a2 occurs in the point b = 𝛽2 ≥ a2, where 𝛽2 satisfies the equation (b − a1)F(𝜎1) = (b − a2)F(𝜎2) + 1 2(a2 − a1)F(𝜎1). It follows immediately that 𝛽2 = a2 + (a2 − a1)F(𝜎1) 2(F(𝜎2) − F(𝜎1)) . Let a3 be chosen such that a3 ≥ 𝛽2. Further reasoning employs induction. Suppose that the inequality Hb(ak−1, S) ≤ Hb(ak, S) holds true for some k and any b ∈ (𝛽k−1, ak]; here ak−1 ≤ 𝛽k−1, k = 1, … , n. Consider the interval ak ≤ b < ak+1.DuetoB ≤ b < ak+1, the function Hb(B, S) has the form (5.33), where i ≤ k + 1. Moreover, this is a decreasing function of B. The maximal value of (5.33) exists for B = ai−1. And the corresponding payoff constitutes Hb(ai−1, S) = i−1∑ j=1 ( b − aj + ai−1 2 ) ( F(𝜎j) − F(𝜎j−1) ) = (b − ai−1)F(𝜎i−1) + 1 2 i−2∑ j=1 (aj+1 − aj)F(𝜎j). Note that, if i = k + 1, i.e., B ∈ [ak, ak+1), the maximal payoff equals Hb(ak, S) = (b − ak)F(𝜎k) + 1 2 k−1∑ j=1 (aj+1 − aj)F(𝜎j). (5.37) In the case of i = k, the maximal value is Hb(ak−1, S) = (b − ak−1)F(𝜎k−1) + 1 2 k−2∑ j=1 (aj+1 − aj)F(𝜎j). (5.38) At the point b = ak, the expression (5.38) possesses, at least, the same value as the expression (5.37). By equating (5.37) and (5.38), we find the value of 𝛽k, which corresponds to switching from the strategy B = ak to the strategy B = ak+1: 𝛽k = ak + (ak − ak−1)F(𝜎k−1) 2(F(𝜎k) − F(𝜎k−1)), k = 1, … , n. (5.39) www.it-ebooks.info NEGOTIATION MODELS 215 The value 𝛽k lies within the interval [ak, ak+1) under the following condition: ak + (ak − ak−1)F(𝜎k−1) 2(F(𝜎k) − F(𝜎k−1)) ≤ ak+1, k = 1, … , n. If b ≥ an, the payoff Hb(B, S) can also acquire the form (5.34). The derivative of this function is given by 𝜕Hb 𝜕B =−1 2F(B) + (b − B)f(B). Assume that the inequality (1 − x)f(x) − F(x) 2 ≤ 0, x ∈ [a2,1] takes place on the interval [an, 1]. Subsequently, the derivative turns out non-positive for all B ∈ [an, 1], i.e., the function Hb(B, S) does not increase in B. Its maximal value in the point B = an makes up Hb(an, S) = (b − an)F(𝜎n) + 1 2 n−1∑ j=1 (aj+1 − aj)F(𝜎j). Switching from the strategy B = an−1 to the strategy B = an occurs under b = 𝛽n, where the quantity 𝛽n solves the equation (b − an)F(𝜎n) + 1 2 n−1∑ j=1 (aj+1 − aj)F(𝜎j) = (b − an−1)F(𝜎n−1) + 1 2 n−2∑ j=1 (aj+1 − aj)F(𝜎j). Uncomplicated manipulations bring to the formula 𝛽n = an + (an − an−1)F(𝜎n−1) 2(F(𝜎n) − F(𝜎n−1)) . Therefore, we have demonstrated that the best response of the buyer to the strategy S(s) consists in the strategy B(b) = { b if 0 ≤ b ≤ 𝛽1 = a1, ai if 𝛽i ≤ b <𝛽i+1, i = 1, … , n (5.40) where 𝛽k, k = 1, … , n is determined by (5.39), 𝛽n+1 = 1. Similar arguments serve for calculating the seller’s optimal response to a threshold strategy adopted by the buyer. By supposing that the buyer chooses the strategy (40), one can show that the seller’s optimal response is S(s) = { ai if 𝜎i−1 ≤ s <𝜎i, i = 1, … , n s if 𝜎n ≤ s ≤ 1. www.it-ebooks.info 216 MATHEMATICAL GAME THEORY AND APPLICATIONS Figure 6.28 The equilibrium with n thresholds. Here 𝜎k, k = 1, … , n obey the formulas 𝜎k = ak − (ak+1 − ak)(1 − G(𝛽k+1)) 2(G(𝛽k+1) − G(𝛽k)) , k = 1, … , n. (5.41) and, in addition, 𝜎n = an. Thus, the following assertion remains in force in the general case with n thresholds. Theorem 6.17 Let a non-decreasing sequence {ai}, i = 1, … , n on the interval [0, 1] meet the conditions xg(x) − 1 − G(x) 2 ≤ 0, x ∈ [0, a1], (1 − x)f(x) − F(x) 2 ≤ 0, x ∈ [an,1], and 𝛽k−1 ≤ ak ≤ 𝜎k+1, k = 1, … , n. Then the optimal strategies in the transaction problem have the form (see Figure 6.28) S(s) = { ai if 𝜎i−1 ≤ s <𝜎i, i = 1, … , n; s if 𝜎n ≤ s ≤ 1, B(b) = { b if 0 ≤ b ≤ 𝛽1 = a1, ai if 𝛽i ≤ b <𝛽i+1, i = 1, … , n. Here the quantities {𝜎i} and {𝛽i} are defined by (5.39) and (5.41). The transaction domain is illustrated in Figure 6.29. www.it-ebooks.info NEGOTIATION MODELS 217 Figure 6.29 The transaction domain with n thresholds. The uniform distribution. Two-threshold strategies. Assume that the reservation prices of the sellers and buyers have the uniform distribution on the market, i.e., F(s) = s, s ∈ [0, 1] and G(b) = b, b ∈ [0, 1]. Let the players apply two-threshold strategies. The conditions of Theorem 6.16 hold true for a1 ≤ 1∕3 and a2 ≥ 2∕3. It follows from (5.30)–(5.31) that 𝜎1 = a1 − (a2 − a1)(1 − 𝛽2) 2(𝛽2 − a1) , 𝛽2 = a2 + (a2 − a1)𝜎1 2(a2 − 𝜎1) . Under a1 = 1∕3 and a2 = 2∕3, we obtain 𝜎1 = 1 12(7 − √ 17) ≈ 0.239, 𝛽2 = 1 − 𝜎1 = 1 12(5 + √ 17) ≈ 0.761. Note that the equilibrium exists for arbitrary values of the thresholds a1, a2 such that a1 ≤ 1∕3 and a2 ≥ 2∕3. Therefore, it seems important to establish a1, a2 that maximize the total payoff of the sellers and buyers. This problem was posed by R. Myerson [1984]. The appropriate set of the parameters a1, a2 is called efficient. The total payoff of the sellers and buyers selecting two-threshold strategies takes the form Hb(B, S) + Hs(B, S) = Eb,s(b − s)I{B(b)≥S(s)} = ∫ 𝛽2 𝛽1 db ∫ 𝜎1 0 (b − s)ds + ∫ 1 𝛽2 db ∫ 𝜎2 0 (b − s)ds. www.it-ebooks.info 218 MATHEMATICAL GAME THEORY AND APPLICATIONS It is possible to show that the maximal total payoff corresponds to a1 = 1∕3, a2 = 2∕3 and equals Hb(B, S) + Hs(B, S) = (23 − √ 17)∕144 ≈ 0.1311. Recall that the class of one-threshold strategies admits no equilibrium in the case of the uniform distribution. However, the continuum of equilibria exists in the class of two-threshold strategies. The uniform distribution. Strategies with n thresholds. Consider the uniform distribution of the reservation prices in the class of n-threshold strategies, where n ≥ 4. For the optimal thresholds of the buyer and seller, the conditions (5.39) and (5.41) become 𝛽k = ak + (ak − ak−1)𝜎k−1 2(𝜎k − 𝜎k−1) , k = 1, … , n, 𝜎k = ak − (ak+1 − ak)(1 − 𝛽k+1) 2(𝛽k+1 − 𝛽k) , k = 1, … , n, where we set 𝜎0 = 0 and 𝛽n+1 = 1. As n → ∞, these threshold strategies converge uniformly to the continuous strategies found in Theorem 6.13. Furthermore, the total payoff (the probability of transaction) tends to 9∕64 (9∕32, respectively). 6.5.10 Two-stage transactions with arbitrator Consider another design of negotiations between a seller and buyer involving an arbitrator. It was pioneered by D.M. Kilgour [1994]. Again, suppose that the reservation prices of the sellers and buyers (the quantities s and b, respectively) represent independent random variables on the interval [0, 1]. Their distribution functions and density functions are defined by F(s), G(b) and f(s), s ∈ [0, 1], g(b), b ∈ [0, 1], respectively. The transaction comprises two stages. At stage 1, the seller and buyer inform the arbitrator of their reservation prices. Note that both players may report some prices s and b,differing from their actual values s and b.Ifb < s, the arbitrator announces transaction fail. In the case of s ≤ b, he announces that the transaction is possible; and the players continue to the next stage. At stage 2, the players make their offers for the transaction, S and B. Imagine that both transactions S, B enter the interval [s, b]; then the transaction occurs at the mean price (S + B)∕2. If just one offer hits the interval [s, b], the transaction runs at the corre- sponding price, but with the probability of 1∕2. And the transaction takes no place in the rest situations. Therefore, the strategies of the seller and the buyer lie in pairs (s(s), S(s)) and (b(b), B(b)), respectively. These are some functions of the reservation prices. Naturally, the functions (s(s), S(s)) and (b(b), B(b)) do not decrease, S ≥ s and B ≤ b. For further analysis, it seems convenient to modify the rules as follows. One of players is chosen equiprobably at shot 2. For instance, assume that we have randomly selected the seller. If S ≤ b, the transaction occurs at the price S. Otherwise (S > b), the transaction fails. Similar rules apply to the buyer. In other words, if B ≥ s, the transaction runs at the price B. www.it-ebooks.info NEGOTIATION MODELS 219 Such modifications do not affect negotiations and represent an equivalent transformation. Really, if the offers B and S belong to the interval [s, b], the second design of negotiations requires that the seller or buyer is chosen equiprobably. Hence, the seller’s payoff makes up 1 2(S − s) + 1 2(B − s) = S + B 2 − s, whereas the buyer’s payoff becomes 1 2(b − B) + 1 2(b − S) = b − S + B 2 . Both quantities coincide with their counterparts in the original design of negotiations. And so, we proceed from the modified rules of shot 2. Theorem 6.18 Any seller’s strategy (s, S) is dominated by the honest strategy (s, S) and any buyer’s strategy (b, B) is dominated by the honest strategy (b, B). Proof: Let us demonstrate the first part of the theorem (the second one is argued by analogy). Assume that the buyer adopts the strategy (b(b), B(b)). Find the best response of the seller, whose reservation price constitutes s, s ≥ s. Either the buyer or seller is chosen equiprobably. In the former case, the transaction occurs at the price B,ifB(b) ≥ s (equivalently, b ≥ B−1(s)). In the latter case, the transaction runs at the price S,ifS ≤ b(b) (equivalently, b ≥ b−1(S)). Note that the inverse functions B−1(s), b−1(S) exist owing to the monotonous property of the functions B(b), b(b). Therefore, the expected payoff of the seller acquires the form Hs(s, S) = 1 2 ∫ 1 B−1(s) (B(b) − s)dG(b) + 1 2 ∫ 1 b−1(S) (S − s)dG(b). (5.42) Evaluate the seller’s best response in s. The second summand in (5.42) appears independent from s. Perform differentiation of (5.42) with respect to s: 𝜕Hs 𝜕s =−1 2(s − s)g(B−1(s))dB−1(s) ds . (5.43) Clearly, the expression (5.43) is non-negative (non-positive) under s < s (under s > s, respec- tively). Thus, the function Hs gets maximized by s = s for any S. We have shown that the players should select truth-telling at stage 1 (i.e., report the actual reservation prices to the arbitrator). Now, the payoff (5.42) can be rewritten as Hs(s, S) = 1 2 ∫ 1 B−1(s) (B(b) − s)dG(b) + 1 2 ∫ 1 S (S − s)dG(b) = 1 2 ∫ 1 B−1(s) (B(b) − s)dG(b) + 1 2(S − s)(1 − G(S)). (5.44) www.it-ebooks.info 220 MATHEMATICAL GAME THEORY AND APPLICATIONS b 2 1 + s 2 0 1 b,s1 Figure 6.30 The optimal strategies of the players. Find the seller’s best response in S. The first expression in (5.44) does not depend on S. Consequently, 𝜕Hs 𝜕S = 1 2(1 − G(S)) − 1 2(S − s)g(S) = 0. (5.45) The optimal strategy S is defined implicitly by equation (5.45). There exists a solution to equation (5.45), since the expression (5.45) is non-negative (non-positive) under S = s (under S = 1, respectively). Theorem 6.19 The equilibrium strategies (S∗, B∗) follow from the system of equations 1 − G(S) = (S − s)g(S), F(B) = (b − B)f(B). (5.46) For the uniform distributions F(s) = s and G(b) = b, the optimal strategies of the seller and buyer are S∗ = s + 1 2 , B∗ = b 2 . See Figure 6.30 for the curves of the optimal strategies. The seller’s offer being chosen, the transaction takes place if s+1 2 ≤ b. In the case of buyer’s offer selection, the transaction occurs provided that b 2 ≥ s. Figure 6.31 illustrates the transaction domain. Finally, we compute the negotiators’ payoffs in this equilibrium: H∗ s = H∗ b = 1 2 ∫ 1 0 ds ∫ 1 s+1 2 (s + 1 2 − s ) db + 1 2 ∫ 1 0 db ∫ b 2 0 (b 2 − s ) ds ≈ 0.062. In comparison with the one-stage negotiations and the uniform distribution of the reservation prices, we observe reduction of the transaction value. www.it-ebooks.info NEGOTIATION MODELS 221 0 1 1 2 b1 2 s 1 Figure 6.31 The transaction domain. For the square distributions F(s) = s2 and G(b) = 1 − (1 − b)2, the optimal strategies of the seller and buyer make up S∗ = 2s + 1 3 , B∗ = 2b 3 . 6.6 Reputation in negotiations An important aspect of negotiations consists in the reputation of players. This characteristic forms depending on the behavior of negotiators during decision making. Therefore, at each stage of negotiations, players should maximize their payoff at this stage, but also think of their reputation (it predetermines their future payoffs). In the sequel, we provide some possible approaches to formalization of the concept of reputation in negotiations. 6.6.1 The notion of consensus in negotiations Let N = {1, 2, … , n} be a set of players participating in negotiations to solve some problem. Each player possesses a specific opinion on the correct solution of this problem. Denote by x = (x1, x2, … , xn) the set of opinions of all players, xi ∈ S, i = 1, … , n, where S ⊂ Rk indicates the admissible set in the solution space. Designate by x(0) the opinions of players at the initial instant. Subsequently, players meet to discuss the problem, share their opinions and (possibly) change their opinions. In the general case, such discussion can be described by the dynamic system x(t + 1) = ft(x(t)), t = 0, 1, … If the above sequence admits the limit x = limt→∞ x(t) such that all components of the vector x coincide, this value is called a consensus in negotiations. However, are there such dynamic systems? The next subsection presents the matrix model of opinions dynamics, where a consen- sus does exist. This model was first proposed by M.H. de Groot [1974] and extended by many authors. www.it-ebooks.info 222 MATHEMATICAL GAME THEORY AND APPLICATIONS 6.6.2 The matrix form of dynamics in the reputation model Suppose that R is the solution space. The key role belongs to the so-called confidence matrix A ∈ [0, 1]n×n, whose elements aij specify the confidence level of player i in player j.By assumption, the matrix A is stochastic, i.e., aij ≥ 0 and ∑n j=1 aij = 1, ∀i. In other words, the confidence of player i is distributed among all players (including player i). After a successive stage of negotiations, the opinions of players coincide with the weighted opinion of all negotiators (taking into account their confidence levels): xi(t + 1) = n∑ j=1 aijxj(t), ∀i. This formula has the matrix representation: x(t + 1) = Ax(t), t = 0, 1, … x(0) = x0. (6.1) Perform integration of (6.1) t times to obtain x(t) = Atx(0). (6.2) The main issue concerns the existence of the limit matrix A∞ = limt→∞ At. The behavior of stochastic matrices has been intensively studied within the framework of the theory of Markov chains. By an appropriate renumbering of players, any stochastic matrix can be rewritten as A = ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝ A1 0 … 00 0 A2 … 00 .. . .. .. . .. .. . .. 00 … Am 0 Am+1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠ . Here the matrices Ai, i = 1, … , m appear stochastic and correspond to different classes of communicating states. All states from the class Am+1 are non-essential; zeros correspond to these states in the limit matrix. In terms of the theory of reputation, this fact means the following. A player entering an appropriate class Ai appreciates the reputation of players belonging to this class only. Players from the class Am+1 have no influence during negotiations. A consensus exists iff the Markov chain is non-periodic and there is exactly one class of communicating states (m = 1). Then there exists limt→∞ At known as the limit matrix A∞.It comprises identical rows (a1, a2, … , an). Therefore, limt→∞ Atx(0) = A∞x(0) = x(∞) = (x, x, … , x). The quantity ai is the influence level of player i. www.it-ebooks.info NEGOTIATION MODELS 223 Example 6.4 The reputation matrix takes the form A = ( 1∕21∕2 1∕43∕4 ) . Player I equally trusts himself and player II. However, player II trusts himself three times higher than player I. Obviously, the influence levels of the players become a1 = 1∕3 and a2 = 2∕3. Example 6.5 The reputation matrix takes the form A = ⎛ ⎜ ⎜⎝ 1∕31∕31∕3 1∕61∕31∕2 1∕61∕21∕3 ⎞ ⎟ ⎟⎠ . Here player I equally trusts all players. Player II feels higher confidence in player III, whereas the latter more trusts player II. The influence levels of the players make up a1 = 1∕5, a2 = a3 = 2∕5. AllrowsofthematrixA∞ are identical. And so, all components of the limit vector x(∞) do coincide, representing the consensus x. Interestingly, x = n∑ i=1 aixi(0), where ai and xi(0) denote the influence level and initial opinion of player i. In the context of negotiations, we have the following interpretation. As the result of lengthy negotiations, the players arrive at the final common opinion owing to their mutual confidence. 6.6.3 Information warfare According to the stated concept, negotiations bring to some consensus x. A consensus may represent, e.g., budgetary funds allocation to a certain construction project, assignment of fishing quotas or settlement of territorial problems. The resulting solution depends on the reputation of negotiators and their opinions. And so, the final solution can be affected by modifying the initial opinion of some participant. Of course, such manipulation guarantees greater efficiency, if the participant possesses higher reputation. However, this inevitably incurs costs. Therefore, we arrive at the following optimization problem. Allocate a given amount of financial resources c among negotiators to maximize some utility function H(y1, … , yn) = F ( n∑ j=1 aj(xj(0) + kjyj) ) − G(y1, … , yn), n∑ j=1 yj ≤ c. Here the first summand answers for the payoff gained by variations of the initial opinion of negotiator i (ki takes positive or negative values). And the second summand specifies the corresponding costs. www.it-ebooks.info 224 MATHEMATICAL GAME THEORY AND APPLICATIONS Imagine that several negotiators strive to affect the final solution. A game-theoretic problem known as information warfare arises immediately. This game engages m players operating certain amounts ci of financial resources (i = 1, … , m). They allocate the above amounts among negotiators in order to maximize their own payoffs. Definition 6.3 Information warfare is the game Γ=< M,{Yi}m i=1,{Hi}m i=1 >, where M = {1, 2, … , m} denotes the set of players, Yi ⊂ Rn indicates the strategy set of player i, representing the simplex ∑n j=1 yi j ≤ ci, yi j ≥ 0, j = 1, … , n, and the payoff of player i takes the form Hi(y1, … , ym) = Fi ( n∑ j=1 aj(xj(0) + m∑ l=1 kl jyl j) ) − Gi(y1, … , ym), n∑ j=1 yi j ≤ ci, i = 1, … , m. 6.6.4 The influence of reputation in arbitration committee. Conventional arbitration The described influence approach can be adopted in negotiation models involving arbitration committees. Consider the conventional arbitration model in a salary conflict of two sides, the Labor Union (L) and the Manager (M). They address an arbitration court to resolve the conflict. Suppose that the arbitrators have some initial opinions of the appropriate salary; designate these quantities by xi(0), i = 1, … , n. In addition, the arbitrators enjoy certain reputation and corresponding influence levels ai, i = 1, … , n. Their influence levels play an important role during conflict arrangement. Both sides of the conflict (players) submit their offers to the arbitration committee. The arbitrators meet to discuss the offers and make the final decision. In the course of discussions, an arbitrator may correct his opinion according to the reputation model above. After lengthy negotiations, the arbitrators reach the consensus x =∑n i=1 aixi(0), which resolves the conflict. Assume that player M has some amount cM of financial resources to influence the original opinion of the arbitrators. This is an optimization problem defined by HM(y1, … , yn) = n∑ j=1 aj(xj(0) − kjyj) + n∑ j=1 yj → min subject to the constraints n∑ j=1 yj ≤ cM, yj ≥ 0, j = 1, … , n. Here the quantities kj, j = 1, … , n are non-negative. Interestingly, the initial opinions do not depend on y. Hence, the posed problem appears equivalent to the optimization problem HM(y1, … , yn) = n∑ j=1 (ajkj − 1)yj → max www.it-ebooks.info NEGOTIATION MODELS 225 subject to the constraints n∑ j=1 yj ≤ cM, yj ≥ 0, j = 1, … , n. Its solution seems obvious. Consider only arbitrators j such that ajkj > 1 and choose the one with the maximal value of ajkj. Subsequently, invest all financial resources cM in this arbitrator. Now, suppose that player 2 (the Trade Union) disposes of some amount cL of financial resources; it can be allocated to the arbitrators to “tilt the balance” in Trade Union’s favor. We believe that an arbitrator supports the player that has offered a greater amount of financial resources to him. Such statement leads to a two-player game with the payoff functions HM(yM, yL) = n∑ j=1 (ajkj − 1)yM j I{yM j > yL j }, HL(yM, yL) = n∑ j=1 (ajkj − 1)yL j I{yL j > yM j }. The strategies of both players meet the constraints n∑ j=1 yM j ≤ cM, n∑ j=1 yL j ≤ cL, yM j , yL j ≥ 0, j = 1, … , n. This is a modification of the well-known Colonel Blotto game. Generally, this game is considered under n = 2. Both players allocate some resource between two objects; the winner becomes the player having allocated the greatest amount of the resource on a given position. In our case, readers can easily obtain the following result. If there are two arbitrators and cM = cL, in the equilibrium each player should allocate his resource to arbitrator 1 with the probability of a1k1∕(a1k1 + a2k2) and to arbitrator 2 with the probability of a2k2∕(a1k1 + a2k2). 6.6.5 The influence of reputation in arbitration committee. Final-offer arbitration Now, we analyze the reputation model in the final-offer arbitration procedure with several arbitrators. Their opinions obey some probability distributions. Assume that arbitrators have certain reputations being significant in conflict settlement. The sides of a conflict submit their offers to an arbitration committee. The arbitrators meet to discuss the offers and make the final decision. In the course of discussions, an arbitrator may correct his opinion according to the repu- tation model above. After lengthy negotiations, the arbitrators reach the consensus described by the common probability distribution. For instance, let the committee include n arbitrators whose opinions are expressed by the distribution functions F1, … , Fn. Then the consensus is expressed by the distribution function Fa = a1F1 + ⋯ + anFn, where ai means the influence level of arbitrator i in the committee. This quantity depends on his reputation. www.it-ebooks.info 226 MATHEMATICAL GAME THEORY AND APPLICATIONS Clearly, all theorems from subsection 6.2.6 are applicable to this case, and Fa acts as the distribution function of one arbitrator. As an example, study the salary conflict involving an arbitration committee with two members. The opinions of arbitrators 1 and 2 are defined by the Gaussian distribution functions N(1, 1) and N(2, 1), respectively. Take the reputation matrix from Example 1 and suppose that the influence levels of the players make up a1 = 1∕3 and a2 = 2∕3. As the result of negotiations, the arbitrators reach a common opinion expressed by the common distribution 1∕3N(1, 1) + 2∕3N(2, 1). Following Theorem 2.11, find the median of the common distribution (mF ≈ 1.679) and the optimal strategies of players I and II: x∗ = mF + 1 2fa(mF) ≈ 3.075, y∗ = mF − 1 2fa(mF) ≈ 0.283. 6.6.6 The influence of reputation on tournament results We explore the impact of reputation on the optimal behavior in the tournament problem. Consider the tournament model in the form of a two-player zero-sum game. Projects are characterized by two parameters. As a matter of fact, this problem has been solved in Section 6.4. Player I strives for maximizing the sum x + y, whereas his opponent (player II) seeks to minimize it. Assume that two invited arbitrators choose the winner. Their reputation is expressed by a certain matrix A. For simplicity, we focus on the symmetrical case—the opinions of the arbitrators are modeled by the two-dimensional Gaussian distributions f1(x, y) = 1 2𝜋 exp{−((x + c)2 + (y − c)2)∕2} and f2(x, y) = 1 2𝜋 exp{−((x − c)2 + (y + c)2)∕2}, respec- tively. Here c stands for model’s parameter. Recall that lengthy negotiations of the arbitrators yield the final distribution fa(x, y) = a1f1(x, y) + a2f2(x, y), where a1 and a2 specify the influence levels of the arbitrators, a2 = 1 − a1. Repeat the line of reasoning used in Section 6.4. The players submit their offers (x1, y1) and (x2, y2). The solution plane is divided into two sets S1 and S2. Their boundary represents the line passing through the bisecting point of the segment which connects the points (x1, y1) and (x2, y2). This line has the following equation: y =−x1 − x2 y1 − y2 x + x2 1 − x2 2 + y2 1 − y2 2 2(y1 − y2) . Thus, the payoff of player I takes the form H(x1, y1; x2, y2) = (x1 + y1)𝜇(S1) = (x1 + y1) ∫R ∫R fa(x, y)I { y ≥ −x1 − x2 y1 − y2 x + (x2 1 − x2 2 + y2 1 − y2 2) 2(y1 − y2) } dxdy. www.it-ebooks.info NEGOTIATION MODELS 227 Fix the strategy (x2, y2) of player II; find the best response of his opponent from the conditions 𝜕H 𝜕x1 = 0, 𝜕H 𝜕y1 = 0. First, we obtain the derivatives: 𝜕H 𝜕x1 = 𝜇(S1) + (x1 + y1) 𝜕𝜇(S1) 𝜕x1 = 𝜇(S1) + (x1 + y1) ∫R x − x1 y1 − y2 fa ( x, −x1 − x2 y1 − y2 x + (x2 1 − x2 2 + y2 1 − y2 2) 2(y1 − y2) ) dx, (6.3) 𝜕H 𝜕y1 = 𝜇(S1) + (x1 + y1) 𝜕𝜇(S1) 𝜕y1 = 𝜇(S1) +(x1 + y1) ∫R ( − x1 − x2 (y1 − y2)2 x + x2 1 − x2 2 2(y1 − y2)2 − 1 2 ) fa ( x, −x1 − x2 y1 − y2 x + (x2 1 − x2 2 + y2 1 − y2 2) 2(y1 − y2) ) dx. (6.4) Set the functions (6.3) and (6.4) equal to zero. Require that the solution is achieved at the point x1 =−y2, y1 =−x2. This follows from problem’s symmetry with respect to the line y =−x. Note that, in this case, 𝜇(S1) = 1∕2. Consequently, we arrive at the system of equations 1 2 + ∫R (x + y2)fa(x, −x)dx = 0, 1 2 + ∫R (x2 − x)fa(x, −x)dx = 0. The first equation ∫ ∞ −∞ (y2 + x) ( a1e−(x+c)2 + a2e−(x−c)2 ) dx =−𝜋 gives the optimal value of y2: y2 =− √ 𝜋 + c(a1 − a2). Similarly, the second equation yields x2 =− √ 𝜋 + c(a2 − a1). And the optimal offer of player I makes x1 = √ 𝜋 + c(a2 − a1), y1 = √ 𝜋 + c(a1 − a2). www.it-ebooks.info 228 MATHEMATICAL GAME THEORY AND APPLICATIONS Therefore, the optimal strategies of players in this game depend on the reputation of arbitrators. If the latter enjoy identical reputation, the equilibrium coincides with the above- mentioned one (both components of the offer are same). If the reputation of the players differs, the components of the offer are shifted to the arbitrator having a higher weight. Exercises 1. Cake cutting. Suppose that three players cut a cake using the scheme of random offers with the Dirichlet distribution, where k1 = 2, k2 = 1, and k3 = 1. Negotiations have the maximal duration of K = 3 shots. Find the optimal strategies of the players when the final decision is made by (a) the majority of votes, (b) the consent of all players. 2. Meeting schedule for two players. Imagine that two players negotiate the day of their meeting on a week. Player I prefers a day closer to Tuesday, whereas player II would like to meet closer to Thursday. The strategy set takes the form X = Y = 1, 2, … , 7. To choose the day of their meeting, the players adopt the random offer scheme as follows. The probabilistic mechanism equiprobably generates offers from 1 to 7. The players agree or disagree. The number of shots is k = 5. Find the optimal strategies of the players. 3. Meeting schedule for two players: the case of an arbitration procedure. Two players negotiate the day of their meeting on a week (1, 2, … , 7). Player I prefers the beginning of the week, whereas player II would like to meet at the end of the week. To choose the day of their meeting, the players make offers; subsequently, a random arbitrator generates some number a (3 or 4 equiprobably) and compares the offers with this number. As the day of meeting, he chooses the offer closest to a.Find the optimal strategies of the players. 4. Meeting schedule for three or more players. Players 1, 2, … , n negotiate the date of a conference. For simplicity, we consider the months of a year: T = 1, 2, … , 12. The players make their offers a1, … , an ∈ T.For player i, the inconvenience of other months is assessed by fi(t) =min{|t − ai|, |t + 12 − ai|}, t ∈ T, i = 1, … , n . Using the random offer scheme, find the optimal behavioral strategies of the players on the horizon K = 5. Study the case of three players and the majority rule. Analyze the case of four players and compare the payoffs under the thresholds of 2 and 3. 5. Equilibrium in the transaction model. Consider the following transaction model. The reservation prices of the sellers have the uniform distribution on the market, whereas the reservation prices of the buyers possess the linear form g(b) = 2b, b ∈ [0, 1]. Evaluate a Bayesian equilibrium. 6. Networked auction. The seller exhibits some product in an information network. The buyers do not know the quoted price c and bid for the product, gradually raising their offers. They act simultaneously, increasing their offers either by the same quantity or by a specific www.it-ebooks.info NEGOTIATION MODELS 229 quantity 𝛼>1 (i.e., the players may have different values of 𝛼). The player who first announces a price higher or equal to c receives the product. If there are several such players, the seller prefers the highest offer. Each player wants to purchase the product. Find the optimal strategies of the players. 7. Traffic quotas. Users 1, … , n order from a provider the monthly traffic sizes of x1, … , xn. Assume that n∑ i=1 xi exceeds the channel’s capacity c. Evaluate the quotas using the random offer scheme. 8. The 2D arbitration procedure. Consider the tournament model in the form of a two-player zero-sum game in the 2D space. Player I strives for maximizing the sum x + y, whereas player II seeks to minimize it. The arbitrator is defined by the density function of the Cauchy distribution: f(x, y) = 1 𝜋2(1 + x2)(1 + y2) . Find the optimal strategies of the players. 9. Tournament of construction projects. Two companies seek to receive an order for house construction. Their offers rep- resent a couple of numbers (t, c), where t indicates the period of construction and c specifies the costs of construction. The customer is interested in the reduction of the construction costs and period, whereas the companies have opposite goals. A random arbitrator is distributed in a unit circle with the density function f(r, 𝜃) = 3(1−r) Π (in polar coordinates). Evaluate the optimal offers of the players under the following condition. Player I (player II) strives to maximize the construction period (the construction costs, respectively). 10. Reputation models. Suppose that the reputation matrix in three-player negotiations has the form A = ⎛ ⎜ ⎜⎝ 1∕41∕41∕2 2∕31∕61∕6 1∕61∕62∕3 ⎞ ⎟ ⎟⎠ . Find the influence levels of the players. 11. The salary problem. Consider the salary problem with final-offer arbitration. The arbitration committee consists of three arbitrators; the reputation matrix is given in exercise 10. The opinions of the arbitrators obey the Gaussian distributions N(100, 10), N(120, 10), and N(130, 10). Evaluate the optimal strategies of the players. 12. The filibuster problem. A group of 10 filibusters captures a ship with 100 bars of gold. They have to divide the plunder. The allocation procedure is as follows. First, the captain suggests a possible allocation, and the players vote. If, at least, the half of the crew supports his offer, the allocation takes place. Otherwise, filibusters kill the captain, and the second player makes his offer. The procedure repeats until the players reach a consensus. Find the subgame-perfect equilibrium in this game. www.it-ebooks.info 7 Optimal stopping games Introduction In this chapter, we consider games with the following specifics. As their strategies, players choose stopping times for some observation processes. This class of problems is close to the negotiation models discussed in Chapter 6. Players independently observe the values of some random process; at any time moment, they can terminate such observations, saving the current value observed. Subsequently, the values of players are compared and their payoffs are calculated. Similar problems arise in choice models (the best object in a group of objects), behavioral models of agents on a stock exchange, dynamic games, etc. Our analysis begins with a two-player game, where players I and II sequentially observe the values of some independent random processes xn and yn (n = 0, 1, … , N). At any moment, players can stop (the time moments 𝜏 and 𝜎, respectively) with the current values x𝜏 and y𝜎. Then these values are compared, and the winner is the player who has selected the greatest value. Therefore, the payoff function in the above non-cooperative game acquires the form H(𝜏, 𝜎) = P{x𝜏 > y𝜎} − P{x𝜏 < y𝜎}. Since observations represent random variables, the same applies to the strategies of the players. They take random integer values from the set {0, 1, … , N}. The decision on stopping time n must be made only using the observed values x1, … , xn, n = 0, 1, … , N. In other words, the events {𝜏 ≤ n} must be measurable in a non-decreasing sequence of 𝜎-algebras n = 𝜎{x1, … , xn}, n = 0, 1, … , N. To solve this class of games, we describe the general scheme of equilibrium construction via the backward induction method. Imagine that the strategy of a player (e.g., player II) Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info OPTIMAL STOPPING GAMES 231 is fixed. Then the maximization problem sup𝜏 H(𝜏, 𝜎) gets reduced to the optimal stopping problem for one player with a certain payoff function under stopping: sup𝜏 Ef(x𝜏), where f(x) = P{y𝜎 < x} − P{y𝜎 > x}. In order to find sup𝜏 Ef(x𝜏), we employ the backward induction method. Denote by v(x, n) =sup𝜏 E{f(x𝜏)∕xn = x}the optimal expected payoff of a player in the state when all n − 1 observations are missed and the current observation at shot n equals x. In this state, the player obtains the payoff f(xn) = f(x) by terminating the observations. If he continues observations and follows the optimal strategy further, his expected payoff constitutes E{v(xn+1, n + 1)∕xn = x}. By comparing these quantities, one can suggest the optimal strategy. And the optimality equation takes the recurrent form v(x, n) =max{f(x), E{v(xn+1, n + 1)∕xn = x}}, n = 1, … , N − 1, v(x, N) = f(x). For instance, if {x1, x2, … , xN} is a sequence of independent identically distributed ran- dom variables with some distribution function G(x), the expected payoff under continuation makes up Ev(xn+1, n + 1) = ∫ R v(x, n + 1)dG(x). If this is a Markov process on a finite set E = {1, 2, … , k} with a certain transition matrix pij(i, j = 1, … , k), the expected payoff under continuation becomes E{v(xn+1, n + 1)∕xn = x}} = k∑ y=1 v(y, n + 1)pxy. First, we establish an equilibrium for a simple game using standard techniques from game theory. Second, we construct a solution in a wider class of games by the backward induction method. 7.1 Optimal stopping game: The case of two observations This game consists of two shots. At Shot 1, players I and II are offered the values of some random variables, x1 and y1, respectively. They can select or reject these values. In the latter case, both players are offered new values, x2 to player I and y2 to player II. The game ends, the players show their values to each other. The winner is the player with the highest value. A player possesses no information on the opponent’s behavior. All random variables appear independent. For convenience, we believe they have the uniform distribution on the unit interval [0, 1]. www.it-ebooks.info 232 MATHEMATICAL GAME THEORY AND APPLICATIONS It seems comfortable to define the strategies of the players by thresholds u and v (0 ≤ u, v ≤ 1) such that, if x1 ≥ u (y1 ≥ v), player I (player II) stops on x1 (on y1, respectively). Otherwise, player I (player II) chooses the second value x2 (y2, respectively). Therefore, for given strategies the selected observations have the form x𝜏 = { x1,ifx1 ≥ u, x2,ifx1 < u, y𝜎 = { y1,ify1 ≥ v, y2,ify1 < v. To find the payoff function in this game, we need the distribution function of the random variables x𝜏 and y𝜎.Forx < u, the event {x𝜏 ≤ x} occurs only if the first observation x1 is smaller than u, whereas the second observation turns out less than x.Forx ≥ u,the event {x𝜏 ≤ x} happens either if the first observation x1 enters the interval [u, x], or the first observation is less than u, whereas the second observation is smaller or equal to x. Therefore, F1(x) = P{x𝜏 ≤ x} = ux + I{x≥u}(x − u). Similarly, F2(y) = P{y𝜎 ≤ y} = vy + I{y≥v}(y − v). For chosen strategies (u, v), the payoff function can be reexpressed by H(u, v) = P{x𝜏 > y𝜎} − P{x𝜏 < y𝜎} = 1 ∫ 0 [P{y𝜎 < x} − P{y𝜎 > x}]dF1(x) = 1 ∫ 0 [2F2(y) − 1]dF1(x). (1.1) By making successive simplifications in (1.1) for v ≤ u, i.e., H(u, v) = 2 1 ∫ 0 F2(y)dF1(x) − 1 = 2 ⎡ ⎢ ⎢⎣ v ∫ 0 udx x ∫ 0 vdy + u ∫ v udx ⎛ ⎜ ⎜⎝ v ∫ 0 vdy + x ∫ v (1 + v)dy ⎞ ⎟ ⎟⎠ + 1 ∫ u (1 + u)dx ⎛ ⎜ ⎜⎝ v ∫ 0 vdy + x ∫ v (1 + v)dy ⎞ ⎟ ⎟⎠ ⎤ ⎥ ⎥⎦ − 1, we arrive at the formula H(u, v) = (u − v)(1 − u − uv). (1.2) In the case of v > u, the problem symmetry implies that H(u, v) =−H(v, u) = (u − v)(1 − v − uv). www.it-ebooks.info OPTIMAL STOPPING GAMES 233 Imagine that the strategy v of player II is fixed. Then the best response of player I, which maximizes the function (1.2), meets the condition 𝜕H(u, v) 𝜕u = 1 − u − uv − (v + 1)(u − v) = 0. Hence, it follows that u(v) = v2 + v + 1 2(v + 1) . Again, the problem symmetry demands that the optimal strategies of the players do coincide. By setting u(v) = v, we obtain the equation v2 + v + 1 2(v + 1) = v, which is equivalent to v2 + v − 1 = 0. Its solution represents the well-known “golden section” u∗ = v∗ = √ 5 − 1 2 . By analogy, we evaluate the best response of player I from the expression (1.3) under v > 0. This yields the equation 𝜕H(u, v) 𝜕u = 1 − v − uv − v(u − v) = 0, whence it appears that u(v) = v2 − v + 1 2v . By setting u = v, readers again arrive at the golden section u∗ = v∗ = √ 5−1 2 . Therefore, if some player adopts the golden section strategy, the opponent’s best response lies in the same strategy. In other words, the strategy profile (u∗, v∗) forms a Nash equilibrium in this game. Interestingly, the stated approach is applicable to games with an arbitrary number of observations; however, it becomes extremely cumbersome. There exists an alternative solution method for optimal stopping problems, and we will address it below. Actually, it utilizes backward induction adapted to the class of problems under consideration. This method allows to find solutions to the optimal stopping problem in the case of an arbitrary number of observations, as well as to many other optimal stopping games. www.it-ebooks.info 234 MATHEMATICAL GAME THEORY AND APPLICATIONS 7.2 Optimal stopping game: The case of independent observations Let {xn, n = 1, … , N} and {yn, n = 1, … , N} be two sets of independent identically distributed random variables with a continuous distribution function G(x), x ∈ R and the density function g(x), x ∈ R. Consider the following game ΓN(G). At each time moment n = 0, … , N, players I and II receive some value of the corresponding random variable. They can either stop on the current values xn and yn, respectively, or continue observations. At the last shot N,the observations are terminated (if the players have not still made their choice); a player receives the value of the last random variable. The strategies in this game are the stopping times 𝜏 and 𝜎, representing random variables with integer values from the set {1, 2, … , N}. Each player seeks to stop observations with a higher value than the opponent. Find optimal stopping rules in the class of threshold strategies u = (u1, … , uN−1) and v = (v1, … , vN−1)oftheform 𝜏(u) = ⎧ ⎪ ⎨ ⎪⎩ 1, if x1 ≥ u1, n,ifx1 < u1, … , xn−1 < un−1, xn ≥ un, N,ifx1 < u1, … , xN−1 < uN−1, and 𝜎(v) = ⎧ ⎪ ⎨ ⎪⎩ 1, if y1 ≥ v1, n,ify1 < v1, … , yn−1 < vn−1, yn ≥ vn, N,ify1 < v1, … , yN−1 < vN−1. Here we believe that uN−1 < uN−2 < ... < u1, and vN−1 < vN−2 < ... < v1. (2.1) Similarly to the previous section, for the chosen class of strategies (u, v) the payoff function can be rewritten as H(u, v) = P{x𝜏 > y𝜎} − P{x𝜏 < y𝜎} = 1 ∫ 0 [P{y𝜎 < x} − P{y𝜎 > x}]dF1(x) = E{2F2(x𝜏(u)) − 1}, (2.2) where F1(x) and F2(y) mean the distribution functions of the random variables x𝜏(u) and y𝜎(v), respectively. www.it-ebooks.info OPTIMAL STOPPING GAMES 235 Lemma 7.1 For strategies 𝜏(u) and 𝜎(v), the distribution functions F1(x) and F2(y) have the density functions f1(u, x) = [ N−1∏ i=1 G(ui) + N−1∑ i=1 i−1∏ j=1 G(uj)I{x≥ui} ] g(x), (2.3) and f2(v, y) = [ N−1∏ i=1 G(vi) + N−1∑ i=1 i−1∏ j=1 G(vj)I{y≥vi} ] g(y). (2.4) Proof: We employ induction on N. For instance, we demonstrate (2.3). Actually, equality (2.4) is argued by analogy. The base case: for N = 1, we have f1(x) = g(x). The inductive step: assume that equality (2.3) holds true for some N = n, and show its validity for N = n + 1. By the definition of the threshold strategy u = (u1, … , un), one obtains f1(u1, … , un, x) = G(u1)f1(u2, … , un) + Ix1≥u1 g(y). In combination with the inductive hypothesis, this yields f1(u1, … , un, x) = G(u1) [ n∏ i=2 G(ui) + n∑ i=2 i−1∏ j=2 G(uj)I{x≥ui} ] g(x) +I{x1≥u1}g(y) = [ n∏ i=1 G(ui) + n∑ i=1 i−1∏ j=1 G(uj)I{x≥ui} ] g(x). The proof of Lemma 7.1 is concluded. Fix the strategy 𝜎(v) of player II and find the best response of player I. In fact, player I has to maximize the expression supu E{2F2(x𝜏(u)) − 1} with respect to u. For simpler exposition, suppose that all observations have the uniform distribution on the interval [0, 1]. This causes no loss of generality; indeed, it is always possible to pass to observations G(xn), G(yn), n = 1, … , N having the uniform distribution. To evalu- ate supu E{2F2(x𝜏(u)) − 1}, we apply the backward induction method (see the beginning of Chapter 7). Write down the optimality equation v(x, n) =max ⎧ ⎪ ⎨ ⎪⎩ 2 x ∫ 0 f2(v, t)dt − 1, Ev(xn+1, n + 1) ⎫ ⎪ ⎬ ⎪⎭ , n = 1, … , N − 1. www.it-ebooks.info 236 MATHEMATICAL GAME THEORY AND APPLICATIONS Illustrate its application in the case of N = 2, when player I observes the random variables x1, x2, whereas player II observes the random variables y1, y2. Lemma 1 claims that the density function f2(v, y) has the form f2(v, y) = { v,if0≤ y < v, 1 + v,ifv ≤ y ≤ 1. Then the payoff function f(x) = 2 ∫ x 0 f2(v, t)dt − 1 under stopping in the state x can be expressed by f(x) = { 2vx − 1, if 0 ≤ x < v, 1 − 2(1 − x)(1 + v), if v ≤ x ≤ 1. (2.5) Imagine that player I has received the observation x. If he decides to continue to the next shot, his payoff makes up Ev(x2,2)= 1 ∫ 0 f(t)dt = v ∫ 0 (2vt − 1)dt + 1 ∫ v (1 − 2(1 − t)(1 + v))dt = v2 − v. (2.6) According to the optimality equation, player I stops at shot 1 (having received the observa- tion x)iff(x) ≥ Ev(x2, 2), and passes to the next observation if f(x) < Ev(x2, 2). The function f(x) defined by (2.5) increases monotonically, while the function Ev(x2, 2) of the form (2.6) turns out independent from x (see Figure 7.1). Hence, there exists a unique intersection point of these functions; denote it by u′ ∈ [0, 1]. In the case of u′ ≤ v, the quantity u′ meets the condition 2vu′ − 1 = v2 − v. (2.7) Figure 7.1 The payoff function f(x). www.it-ebooks.info OPTIMAL STOPPING GAMES 237 If u′ > v, it satisfies 1 − 2(1 − u′)(1 + v) = v2 − v. (2.8) Thus, if player II adopts the strategy with the threshold v, the optimal strategy of player I is determined by the stopping set S = [u′, 1] and the continuation set C = [0, u′). Choose v such that it coincides with u′. Equations (2.7) and (2.8) bring to the same equation v2 + v − 1 = 0, whose solution is the golden section v∗ = √ 5−1 2 . Therefore, if player II adheres to the threshold strategy v∗, the best response of player I is the threshold strategy with the same threshold u∗ = v∗. The converse statement also holds true. This means optimality of the threshold strategies based on the golden section. We have derived the same solution as in the previous section. However, the suggested evaluation scheme of the optimal stopping time possesses higher performance in the given class of problems. Furthermore, there is no a priori need to conjecture that the optimal strategies belong to the class of threshold ones. This circumstance follows directly from the optimality equation. Section 7.3 shows the applicability of this scheme in stopping games with arbitrary numbers of observations. 7.3 The game 𝚪N(G) under N ≥ 3 Consider the general case of the game ΓN(G), where the players receive independent uniformly distributed random variables {xn, n = 1, … , N} and {yn, n = 1, … , N}, and N ≥ 3. Suppose that player II uses a threshold strategy 𝜎(v) with thresholds meeting the condition vN−1 < vN−2 < ... < v1. (3.1) Due to Lemma 7.1, the density function f2(v, y) has the form f2(v, y) = N∑ i=k i−1∏ j=0 vj,ifvk ≤ y ≤ vk−1, where k = 1, … , N and v0 = 1, vN = 0 for convenience. Therefore, the function f2(v, y) jumps by the quantity k−1∏ j=0 vj in the point y = vk. To construct the best response of player I to this strategy, we involve the optimality equation v(x, n) =max ⎧ ⎪ ⎨ ⎪⎩ 2 x ∫ 0 f2(v, t)dt − 1, Ev(xn+1, n + 1) ⎫ ⎪ ⎬ ⎪⎭ , n = 1, … , N − 1, (3.2) www.it-ebooks.info 238 MATHEMATICAL GAME THEORY AND APPLICATIONS with the boundary condition v(x, N) = 2 x ∫ 0 f2(v, t)dt − 1. The payoff function f(x) = 2 ∫ x 0 f2(v, t)dt − 1 under stopping in the state x can be rewritten as f(x) = f(vk) + 2(x − vk) N∑ i=k i−1∏ j=0 vj, vk ≤ x ≤ vk−1, (3.3) or f(x) = f(vk−1) + 2(x − vk−1) N∑ i=k i−1∏ j=0 vj, vk ≤ x ≤ vk−1. The curve of y = f(x) represents a jogged line ascending from point (0, −1) to the point (1, 1) (see Figure 7.2). For any n, the maximands in equation (3.2) are the monotonically increasing function f(x) and the function Ev(xn+1, n + 1) independent from x. This feature allows to simplify the optimality equation: v(x, n) = { Ev(xn+1, n + 1), 0 ≤ x ≤ un, f(x), un ≤ x ≤ 1, Figure 7.2 The payoff function f(x). www.it-ebooks.info OPTIMAL STOPPING GAMES 239 where un designates the intersection point of the functions y = f(x) and y = Ev(xn+1, n + 1). Therefore, if at shot n the observation exceeds un, the player should stop observations (and continue them, otherwise). Clearly, uN−1 < ... < u2 < u1. Which requires validity of the equality un = vn, n = 1, … , N − 1. This brings to the system of conditions f(un) = Ev(un+1, n + 1), n = 1, … , N − 1. (3.4) Under n = N − 1, formula (3.4) implies that f(uN−1) = Ev(uN, N) = 1 ∫ 0 f(t)dt. (3.5) According to (3.3), we obtain N−1∑ i=1 uj(1 − ui) + 2 N−1∏ i=1 uiuN−1 = 1. (3.6) Next, note that f(uN−2) = Ev(uN−1, N − 1) = uN−1 ∫ 0 v(t, N − 1)dt + 1 ∫ uN−1 f(t)dt = uN−1 ∫ 0 f(uN−1)dt + 1 ∫ uN−1 f(t)dt = 1 ∫ 0 f(t)dt + uN−1 2 [1 + f(uN−1)]. Hence, by virtue of (3.5) and the notation uN = 0, f(uN) =−1, we have f(uN−2) − f(uN−1) = uN−1 + uN 2 (f(uN−1) − f(uN)). Readers can easily demonstrate the following equality by induction: f(un−1) − f(un) = un + un+1 2 (f(un) − f(un+1)), n = 1, … , N − 1. (3.7) www.it-ebooks.info 240 MATHEMATICAL GAME THEORY AND APPLICATIONS It appears from (3.3) that f(un) − f(un+1) = 2 N−1∑ k=n+1 k−1∏ j=1 uj(un − un+1). In combination with (3.7), this yields 2 N−1∑ k=n k−1∏ j=1 uj(un−1 − un) = 2 N−1∑ k=n+1 k−1∏ j=1 uj u2 n − u2 n+1 2 . By canceling by 2 ∏n−1 j=1 uj, we obtain (un−1 − un) [ 1 + N−1∑ k=n+1 k−1∏ j=n uj ] = N−1∑ k=n+1 k−1∏ j=n uj u2 n − u2 n+1 2 . Now, reexpress un−1 from the above relationship: un−1 = un + un − un+1 2 N−1∑ k=n+1 k−1∏ j=n uj 1 + N−1∑ k=n+1 k−1∏ j=n uj , n = 2, … , N − 1. (3.8) Standard analysis of the system of equations (3.6), (3.8) shows that there exists a solution of this system, which satisfies the condition uN−1 < ... < u2 < u1. Furthermore, as N increases, the value of u1 gets arbitrarily close to 1, whereas uN−1 tends to some threshold value u∗ N−1 ≈ 0.715. Table 7.1 provides the numerical values of the optimal thresholds under different values of N. Therefore, if player II chooses the strategy 𝜎(u∗) defined by the thresholds u∗ n, n = 1, … , N − 1 that meet the system (3.6), (3.8), we have the following result. According to the backward induction method, the best response of player I has the same threshold form as that of player II. This means optimality of the corresponding strategy in the game under consideration. We summarize the above reasoning in Table 7.1 The optimal thresholds for different N. Nu∗ 1 u∗ 2 u∗ 3 u∗ 4 u∗ 5 2 0.618 3 0.742 0.657 4 0.805 0.768 0.676 5 0.842 0.821 0.781 0.686 6 0.869 0.855 0.833 0.791 0.693 www.it-ebooks.info OPTIMAL STOPPING GAMES 241 Theorem 7.1 The game ΓN(G) admits an equilibrium in the class of threshold strategies of the form 𝜏(u∗) = ⎧ ⎪ ⎨ ⎪⎩ 1, if G(x1) ≥ u∗ 1, n, if G(x1) < u∗ 1, … , G(xn−1) < u∗ n−1, G(xn) ≥ u∗ n, N, if G(x1) < u∗ 1, … , G(xN−1) < u∗ N−1, and 𝜎(u∗) = ⎧ ⎪ ⎨ ⎪⎩ 1, if G(y1) ≥ u∗ 1, n, if G(y1) < u∗ 1, … , G(yn−1) < u∗ n−1, G(yn) ≥ u∗ n, N, if G(y1) < u∗ 1, … , G(yN−1) < u∗ N−1, where u∗ n, n = 1, … , N − 1 satisfies the system (3.6), (3.8). For instance, if n = 3, the system (3.6), (3.8) implies that the optimal thresholds make up u∗ 1 = 0.742, u∗ 2 = 0.657. In the case of n = 4, we obtain the optimal thresholds u∗ 1 = 0.805, u∗ 2 = 0.768, and u∗ 3 = 0.676. Therefore, the above game with optimal stopping of a sequence of independent identically distributed random variables has a pure strategy Nash equilibrium. In the beginning of the game, the players evaluate their thresholds, and then compare the incoming observations with the thresholds. If the former exceed the latter, a player terminates observations. However, the equilibrium does not necessarily comprise pure strategies. In what follows, we find an equilibrium in the game with random walks, and it will exist among mixed strategies. 7.4 Optimal stopping game with random walks Consider a two-player game Γ(a, b) defined on random walks as follows. Let xn and yn be symmetrical random walks on the set of integers E = {0, 1, … , k}, starting in states a ∈ E and b ∈ E, respectively. For definiteness, we believe that a ≤ b. In any inner state i ∈ E,a walk equiprobably moves left or right and gets absorbed in the end points (0 and k). Players I and II observe the walks xn and yn and can stop them at certain time moments 𝜏 and 𝜎. These random time moments represent the strategies of the players. Consequently, if x𝜏 > y𝜎, player I wins. In the case of x𝜏 < y𝜎, player II wins accordingly. Finally, the game is drawn provided that x𝜏 = y𝜎. A player has no information on the opponent’s behavior. As usual, this game is antagonistic with the payoff function H(𝜏, 𝜎) = E{I{x𝜏 >y𝜎} − I{x𝜏 x}, x = 0, 1, … , k. Here the solution is based on the backward induction method. Similarly, the second problem inf𝜎 H(𝜏∗, 𝜎) is the optimal stopping problem sup𝜎 Ef(y𝜎) for player II with the payoff function under stopping determined by f2(y) = P{x𝜏∗ < y} − P{x𝜏∗ > y}, y = 0, … , k. To simplify the form of the payoff functions under stopping, introduce vectors s = (s0, s1, … , sk) and t = (t0, t1, … , tk), where si = P{x𝜏 = i}, ti = P{y𝜎 = i}, i = 0, 1, … , k. These vectors are called the spectra of the strategies 𝜏 and 𝜎. Now, if the strategy 𝜎 is fixed, the problem sup𝜏 H(𝜏, 𝜎) gets reduced to the optimal stopping problem for the random walk xn with the payoff function under stopping f1(x) = 2 x∑ i=0 ti − tx − 1, x = 0, 1, ...k. (4.1) By analogy, the problem inf𝜎 H(𝜏, 𝜎) with fixed 𝜏 represents the optimal stopping problem of the random walk yn with the payoff function under stopping f2(y) = 2 y∑ i=0 si − sy − 1, y = 0, 1, … , k. (4.2) We take advantage of the backward induction method to solve the derived problems. Write down the optimality equation (for definiteness, select player I). The optimal expected payoff of player I provided that the walk is in the state xn = x ∈ E will be denoted by v1(x) =sup𝜏 E{f1(x𝜏)∕xn = x}. By terminating observations in this state, the player gains the payoff f(xn) = f(x). If he continues observations and acts in the optimal way, his expected payoff constitutes E{v1(xn+1)∕xn = x} = 1 2v1(x − 1) + 1 2v1(x + 1). By comparing these payoffs, one can construct the optimal strategy. And the optimality equation takes the recurrent form v1(x) =max{f1(x), 1 2v1(x − 1) + 1 2v1(x + 1)}, x = 1, … , k − 1. (4.3) www.it-ebooks.info OPTIMAL STOPPING GAMES 243 In absorbing states, we have v1(0) = f1(0), v1(k) = f1(k). Equation (4.3) can be solved geometrically. It appears from (4.3) that the solution meets the conditions v1(x) ≥ f1(x), x = 0, … , k. In other words, the curve of y = v1(x) lies above the curve of y = f1(x). In addition, v1(x) ≥ 1 2v1(x − 1) + 1 2v1(x + 1), x = 1, … , k − 1. The last equation implies that the function v1(x) is concave. Therefore, to solve (4.3), it suffices to draw the curve y = f1(x), x = 0, … , k and strain a thread above it. The position of the thread yields the solution v1(x), eo ipso defines the optimal strategy of player I. Notably, this player should stop in the states S = {x : v1(x) = f1(x)}, and continue observations in the states C = {x : v1(x) > f1(x)}. The same reasoning applies to the optimality equation for player II. Consequently, the functions v1(x) and v2(y) being available, we have sup𝜏 H(𝜏, 𝜎∗) =sup𝜏 Ef1(x𝜏) = v1(a), inf𝜎 H(𝜏∗, 𝜎) =−sup𝜎 Ef2(y𝜎) =−v2(b). If some strategies (𝜏∗, 𝜎∗) meet the equality v1(a) =−v2(b) = H∗, (4.4) they actually represent optimal strategies. Prior to optimal strategies design, let us discuss several properties of the spectra of strategies. 7.4.1 Spectra of strategies: Some properties We demonstrate that the spectra of strategies form a certain polyhedron in the space Rk+1. This allows to reexpress the solution to the corresponding optimal stopping problem via a linear programming problem. Theorem 7.2 A vector s = (s0, … , sk) represents the spectrum of some strategy 𝜏 iff the following conditions hold true: k∑ i=0 isi = a, (4.5) k∑ i=0 si = 1, (4.6) si ≥ 0, i = 0, … , k. (4.7) www.it-ebooks.info 244 MATHEMATICAL GAME THEORY AND APPLICATIONS Proof: Necessity. Suppose that 𝜏 is the stopping time with the spectrum s. The definition of s directly leads to validity of the conditions (4.6)–(4.7). Next, the condition (4.8) results from considerations below. A symmetrical random walk enjoys the remarkable relationship E{xn+1∕xn = x} = 1 2(x − 1) + 1 2(x + 1) = x, or E{xn+1∕xn} = xn. (4.8) In this case, we say that the sequence xn, n ≥ 0 makes a martingale. It appears from the condition (4.8) that the mean value of the martingale is time-invariant: Exn = Ex0, n = 1, 2, .... Particularly, this takes place for the stopping time 𝜏. Hence, Ex𝜏 = Ex0 = a. By noticing that Ex𝜏 = k∑ i=0 iP{x𝜏 = i} = k∑ i=0 isi, readers naturally arrive at (4.5). Among other things, this also implies the following. The stopping time 𝜏ij defined by two thresholds i and j (0 ≤ i < a < j ≤ k), which dictates to continue observations until the random walk reaches one of states i or j, possesses the spectrum (see (4.5)–(4.6)) agreeing with the conditions si + sj = 1, si ⋅ i + sj ⋅ j = a. And so, si = j−a j−i and sj = a−i j−i . Consequently, the spectrum of the strategy 𝜏ij makes sij = ( 0, … ,0,j − a j − i ,0,… ,0,a − i j − i ,0,… ,0 ) . (4.9) Finally, note that the spectrum of the strategy 𝜏0 ≡ 0 equals s0 = (0, … ,0,1,0,… , 0), (4.10) where all components except a are zeros. Sufficiency. The set S of all vectors s = (s0, … , sk) satisfying the conditions (4.5)–(4.7) represents a convex polyhedral, ergo coincides with the convex envelope of the finite set of extreme points. www.it-ebooks.info OPTIMAL STOPPING GAMES 245 A point s of the convex set S is called an extreme point, if it does not admit the represen- tation s = (s(1) + s(2))∕2, where s(1) ≠ s(2) and s(1), s(2) ∈ S. Show that the extreme points of the set S have the form (4.9) and (4.10). Indeed, any vector s described by (4.5)–(4.7), having at least three non-zero components (sl, si, sj, where 0 ≤ l < i < j ≤ k), can be reexpressed as the half-sum of the vectors s(1) = (..., sl − e1, … , si + e, … , sj − e2, ...), s(2) = (..., sl + e1, … , si − e, … , sj + e2, ...). Here e1 + e2 = e, e1 = j−i i−l ⋅ e2,0< e ≤ min{sl, si, sj} belong to the set S. Therefore, all extreme points of the polyhedron S may have the form (4.9) or (4.10). Hence, any vector s enjoying the properties (4.5)–(4.7) can be rewritten as the convex envelope of the vectors s0 and sij: s = 𝜈0s0 + ∑ 𝜈ijsij, where 𝜈0 ≥ 0, 𝜈ij ≥ 0, 𝜈0 + ∑ 𝜈ij = 1. Here summation runs over all (i, j) such that 0 ≤ i < a < j ≤ k. By choosing the strategy 𝜏0 with the probability 𝜈0 and the strategies 𝜏ij with the probabilities 𝜈ij, we build the mixed strategy 𝜈, which possesses the spectrum s. This substantiates the sufficiency of the conditions (4.5)–(4.7). The proof of Theorem 7.2 is finished. 7.4.2 Equilibrium construction To proceed, we evaluate the equilibrium (𝜏∗, 𝜎∗). According to Theorem 7.2, it suffices to construct the spectra of the optimal strategies (s∗, t∗). These are the vectors meeting the conditions si ≥ 0, i = 0, … , k; k∑ i=0 si = 1; k∑ i=0 sii = a; (4.11) ti ≥ 0, i = 0, … , k; k∑ i=0 ti = 1; k∑ i=0 tii = b. (4.12) Lemma 7.2 For player II, there exists a strategy 𝜎∗ such that sup𝜏 H(𝜏, 𝜎∗) = (a − b)∕b. (4.13) www.it-ebooks.info 246 MATHEMATICAL GAME THEORY AND APPLICATIONS Proof: Let us employ Theorem 7.2 and, instead of the strategy 𝜎∗, construct its spectrum t∗. First, assume that k < 2b − 1. (4.14) Define the vector t∗ = (t0, t1, … , tk) by the expressions ti = ⎧ ⎪ ⎨ ⎪⎩ 1∕b, i = 1, 3, 5, … ,2(k − b) − 1, 1 − (k − b)∕b, i = k, 0, for the rest i ∈ E. (4.15) Verify the conditions (4.12) for the vector (4.15). For all i ∈ E, the inequality ti ≥ 0 takes place (for i = k, this follows from (4.14)). Next, we find k∑ i=0 ti = 1 b(k − b) + 1 − k − b b = 1, k∑ i=0 tii = 1 b [1 + 3 + ... + 2(k − b) − 1] + k ( 1 − k − b b ) = b. And so, the conditions of Theorem 7.2 hold true, and the vector t∗ represents the spectrum of some strategy 𝜎∗. Second, evaluate sup𝜏 H(𝜏, 𝜎∗). For this (see the discussion above), it is necessary to solve the optimal stopping problem with the payoff function under stopping f1(x) defined by (4.1). Substitute (4.15) into (4.1) to get f1(i) = ⎧ ⎪ ⎨ ⎪⎩ i∕b − 1, i = 0, … ,2(k − b), 2(k − b)∕b − 1, i = 2(k − b) + 1, … , k − 1, k∕b − 1, i = k. To solve the optimality equation, we use the same geometrical considerations as before. Figure 7.3 provides the curve of y = f2(x), which represents a straight line connecting the points (x = 0, y =−1) and (x = k, y = k∕b − 1). The equation of this line takes the form y = x∕b − 1. On the other hand, the optimality equation (4.3) is solved by the function v1(i) = i∕b − 1. In state i = a, we accordingly obtain v1(a) = (a − b)∕b. This proves (4.13). www.it-ebooks.info OPTIMAL STOPPING GAMES 247 0 x y ka 2(k − b) bk−1 −1 a − b b k b −1 y = f1(x) Figure 7.3 The payoff function under stopping f1(x). If k ≥ 2b − 1, define t∗ by ti = { 1∕b,ifi = 1, 3, … ,2b − 1, 0, for the rest i ∈ E. The vector t∗ of this form meets the conditions (4.12) and, hence, is the spectrum of a certain strategy. The function f1(x) acquires the form f1(i) = { i∕b − 1, i = 0, … ,2b − 1, 1, i = 2b, … , k. Its curve is illustrated in Figure 7.4. Clearly, the function v1(x) coincides with the function f1(x). And it follows that, within the interval [0, 2b], the function v1(x) has the form v1(i) = i∕b − 1, which argues validity of (4.13). The proof of Lemma 7.2 is concluded. Lemma 7.3 For player I, there exists a strategy 𝜏∗ such that inf𝜎 H(𝜏∗, 𝜎) = (a − b)∕b. (4.16) Proof: It resembles that of Lemma 7.1. First, consider the case of k ≤ 2b. www.it-ebooks.info 248 MATHEMATICAL GAME THEORY AND APPLICATIONS 0 x y 2bkab −1 a− b b 1 y = f 1(x) Figure 7.4 The payoff function under stopping f1(x). Determine the vector s∗ = (s0, s1, … , sk) by the expressions si = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ 1 − a b+1 ,ifi = 0, a b(b+1) ,ifi = 2, 4, … ,2(k − b − 1), a(2b−k+1) b(b+1) ,ifi = k, 0, for the rest i ∈ E. (4.17) Evidently, the vector (4.17) agrees with the conditions (4.11). According to Theorem 7.2, it makes the spectrum of some strategy 𝜏∗ of player I. Substitution of (4.17) into (4.2) gives f2(i) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ − a b+1 , i = 0, a b(b+1) (i − b) + b−a b , i = 1, … ,2(k − b) − 1, f2(2(k − b) − 1), i = 2(k − b), … , k − 1, a b(b+1) (k − b) + b−a b , i = k. (4.18) Figure 7.5 demonstrates the curve of the function (4.18). The straight line obeying the equation y = a b(b + 1)(x − b) + b − a b , coincides with the curve of y = f2(x) in the points x = 1, 2, … ,2(k − b), 2(k − b) + 1, … , k − 1 and x = 0 (see (4.18)). Moreover, the former lies above the latter in the points x = 2(k − b), 2(k − b) + 1, … , k − 1 and x = 0. Thus, we obtain the expressions y(0) =− a b + 1 + b − a b ≥ − a b + 1 = f2(0). www.it-ebooks.info OPTIMAL STOPPING GAMES 249 0 x y k12(k − b)−1 bk−1 −1 − a b+1 b−a b y = f 2(x) Figure 7.5 The payoff function under stopping f2(x). But this implies that, within the interval [1, k], the function v2(x) has the form v2(i) = a b(b + 1)(i − b) + b − a b . As a result, v2(b) = b − a b , which proves formula (4.16). Now, suppose that k ≥ 2b + 1. Define the spectrum s∗ by si = ⎧ ⎪ ⎨ ⎪⎩ 1 − a b+1 ,ifi = 0, a b(b+1) ,ifi = 2, 4, … ,2b, 0, for the rest i ∈ E. In this case, f2(i) = ⎧ ⎪ ⎨ ⎪⎩ − a b+1 , i = 0, a b(b+1) (i − b) + b−a b , i = 1, 2b, 1, i = 2b + 1, k. Figure 7.6 shows that the function v2(x) coincides with f2(x), whence it appears that v2(b) = f2(b) = (b − a)∕b. The proof of Lemma 7.3 is finished. www.it-ebooks.info 250 MATHEMATICAL GAME THEORY AND APPLICATIONS 0 x y 2b +11 bk 1 − a b+1 b−a b y = f 2(x) Figure 7.6 The payoff function under stopping f2(x). Therefore, if we choose the strategies 𝜏∗ and 𝜎∗ according to Lemmas 7.1 and 7.2, the expressions (4.13), (4.16) lead to sup𝜏 H(𝜏, 𝜎∗) =inf𝜎 H(𝜏∗, 𝜎) = (a − b)∕b. By turn, this yields the following assertion. Theorem 7.3 Let xn(w) and yn(w) be symmetrical random walks on the set E. Then the value of the game Γ(a, b) equals H∗ = (a − b)∕b. Apparently, the solution of the game problem belongs to the class of mixed strategies being random distributions on the set of two-threshold strategies. With some probabilities, each player selects left and right thresholds i, j with respect to the starting point of the walk; he continues observations until the walk leaves these limits. The value of the game is the probability that the walk starting in point a reaches zero earlier than the point −b. Interestingly, the value does not depend on the right limit of the walking interval. 7.5 Best choice games Best choice games are an alternative setting of optimal stopping games. Imagine N objects sorted by their quality; the best object has number 1. The objects are supplied randomly to a player, one by one. He can compare them, but is unable to get back to the viewed objects. The player’s aim consists in choosing the best object. As a matter of fact, this problem is also known as the secretary problem, the bride problem, the parking problem, etc. Here we deal with a special process of observations. Let us endeavor to provide a formal mathematical description. Suppose that all objects are assigned numbers {1, 2, … , N}, where object 1 possesses the highest quality. The number of an object is called its rank. The objects www.it-ebooks.info OPTIMAL STOPPING GAMES 251 arrive in a random order. All permutations N! are equiprobable. Denote by an the absolute rank of the object that appears at time moment n, n = 1, … , N. The whole difficulty lies in the following. The player receives an object at time moment n, but does not know the absolute rank of this object. Being able to compare the objects with each other, the player knows merely the relative rank yn of this object among the viewed objects. Such rank makes up yn = n∑ i=1 I(ai ≤ an), where I(A) acts as the indicator of the event A. If all objects which arrived before time moment n have ranks higher than the given object, the relative rank of the latter constitutes 1. Therefore, by observing the relative ranks yn, the player has to make some conclusions regarding the absolute rank an. The relative ranks yn represent random variables; owing to the equal probability of all permutations, we obtain P{yn = i} = 1∕n, i = 1, … , n. In other words, the relative rank of object n can be any value from 1 to n equiprobably. The best choice problem suggests two possible goals (criteria) for the player: (1) find the stopping rule 𝜏, which maximizes the probability of best object finding, i.e., P{a𝜏 = 1}, or (2) minimize the expected object’s rank E{a𝜏}. We begin with the best choice problem, where the player maximizes the probability of best object finding. Introduce the random sequence xn = P{an = 1∕yn}, n = 1, 2, … , N. Note that, for any stopping time 𝜏, Ex𝜏 = N∑ n=1 E{xnI{𝜏=n}} = N∑ n=1 E{P{an = 1∕yn}I{𝜏=n}}. The decision on stopping 𝜏 = n is made depending on the value of yn. By the properties of conditional expectations, Ex𝜏 = N∑ n=1 E{I{an=1}I{𝜏=n}} = E{I{a𝜏 =1}} = P{a𝜏 = 1}. Therefore, the problem of stopping rule which maximizes the probability of best object finding is the optimal stopping problem for the random process xn, n = 1, … , N. This sequence forms a Markov chain on the extended set of states E = {0, 1, … , N} (we have added the stopping state 0). Lemma 7.4 The following formula holds true: xn = P{an = 1∕yn} = { n N , if yn = 1, 0, if yn > 1. Proof: Obviously, if yn > 1, this object is not the best one. And so, xn = P{an = 1∕yn} = 0. On the other part, if yn = 1 (object n is the best among the viewed objects), we have xn = P{an = 1∕yn = 1} = P{an = 1} P{an < min{a1, … , an−1}} . www.it-ebooks.info 252 MATHEMATICAL GAME THEORY AND APPLICATIONS Due to the equiprobability of all permutations, P{an = 1} = 1∕N. The probability P{an < min{a1, … , an−1}} that the minimal element in a permutation of n values holds position n also makes up 1∕n. And it appears that xn = P{an = 1∕yn = 1} = 1∕N 1∕n = n N . This concludes the proof of Lemma 7.4. According to Lemma 7.4, the optimal behavior prescribes stopping only on objects with the relative rank of 1. Such objects are called candidates. If a candidate comes at time moment n and we choose it, the probability that this is the best object is n∕N. By comparing the payoffs in the cases of stopping and continuation of observations, one can find the optimal stopping rule. Revert to the backward induction method to establish the optimal rule. We define it by the optimal expected payoff function vn, n = 1, ...., N. Consider the end time moment n = N. The player’s payoff makes up xN. This is either 0 or 1, depending on the status of the given object (a candidate or not, respectively). Let us set vN = 1. At shot n = N − 1, the player’s payoff under stopping equals xN−1 (or0,or(N − 1)∕N). If the player continues observations, his expected payoff is ExN = 1 N ⋅ 1 + ( 1 − 1 N ) ⋅ 0 = 1 N . By comparing these payoffs, we get the optimal stopping rule vN−1 =max{xN−1, ExN} =max { N − 1 N , 1 N } . At shot n = N − 2, the player’s payoff under stopping equals xN−2 (or0,or(N − 2)∕N). If the player continues observations, a candidate appears at shot N − 1 with the probability 1∕(N − 1) and at shot N with the probability ( 1 − 1 N − 1 ) 1 N = N − 2 (N − 1)N . And his expected payoff becomes ExN−1 = 1 N − 1 N − 1 N + N − 2 (N − 1)N = N − 2 N ( 1 N − 2 + 1 N − 1 ) . Consequently, vN−2 =max{xN−2, ExN−1} =max { N − 2 N , N − 2 N ( 1 N − 2 + 1 N − 1 )} . www.it-ebooks.info OPTIMAL STOPPING GAMES 253 Repeat these arguments for subsequent shots. At shot n, we arrive at the equation vn =max{xn, Exn+1} =max { n N , n N ( N∑ i=n+1 1 i − 1 )} , n = N, ...1. If the player continues at shot n, a next candidate can appear at time moment n + 1 with the probability 1∕(n + 1) and at time moment i, i > n with the probability ( 1 − 1 n + 1 )( 1 − 1 n + 2 ) ⋅ ... ⋅ ( 1 − 1 i − 1 ) 1 i = n (i − 1)i . (5.1) The stopping rule is defined as follows. The player should stop on candidates such that the payoff under stopping becomes greater or equal to the payoff under continuation. The stopping set is given by the inequalities S = { n : n N ≥ n N ( N∑ i=n+1 1 i − 1 )} . Therefore, the set S has the form [r, r + 1, … , N], where r meets the inequalities N−1∑ i=r 1 i ≤ 1 < N−1∑ i=r−1 1 i . (5.2) Theorem 7.4 Consider the best choice game, where the player seeks to maximize the probability of best object finding. His optimal strategy is to stop on the object after time moment r defined by (5.2). Moreover, this object turns out the best among all viewed objects. Under large N, inequalities (5.2) can be rewritten in the integral form: N−1 ∫ i=r 1 t dt ≤ 1 < N−1 ∫ i=r−1 1 t dt. This immediately yields limN→∞ r N = 1∕e ≈ 0.368. Therefore, under large N, the player should stop on the first candidate after N∕e viewed objects. We have argued that the optimal behavior in this game is described by threshold strategies 𝜏(r). The player chooses some threshold r, till this time moment he merely observes the incoming objects and finds the best one among them. After the stated time moment, the player stops on the first object being better than the previous ones. Undoubtedly, he may skip the best object (if it appears among the first r − 1 objects) or not reach the best object (by terminating observations on the relatively best object). To compute the probability of best object finding under the optimal strategy, we find the spectrum of the threshold strategy 𝜏(r). www.it-ebooks.info 254 MATHEMATICAL GAME THEORY AND APPLICATIONS Lemma 7.5 Consider the best choice problem, where the player uses the threshold strategy 𝜏(r). The spectrum of such strategy (the probability of choosing object i) takes the form P{𝜏(r) = i} = ⎧ ⎪ ⎨ ⎪⎩ 0, if i = 1, … , r − 1, r−1 i(i−1) , if i = r, … , N, r−1 N , if i = 0. Proof: The strategy 𝜏(r) requires to stop only in states r, r + 1, … , N; therefore, we have P{𝜏(r) = i} = 0fori = 1, … , r − 1. The event {𝜏 = r} is equivalent to that the last element in the sequence a1, … , ar appears minimal. The corresponding probability makes up P{𝜏(r) = r} = 1∕r. On the other hand, the event 𝜏(r) = i, where i = r + 1, … , N, is equivalent to the following event. The minimal element in the sequence a1, … , ar−1, ar, … , ai holds position i, whereas the second smallest element is located on position j (1 ≤ j ≤ r − 1). The probability of such a complex event constitutes P{𝜏 = i} = r−1∑ j=1 (i − 2)! i! = (i − 2)! i! (r − 1) = r − 1 i(i − 1) . Finally, we find the probability of break P{𝜏(r) = 0} from the equality P{𝜏(r) = 0} = 1 − N∑ i=r r − 1 i(i − 1) = r − 1 N . The proof of Lemma 7.5 is completed. As a matter of fact, the quantity P{𝜏(r) = 0} = (r − 1)∕N represents exactly the proba- bility that the player skips the best object under the given threshold rule 𝜏(r). Evidently, the optimal behavior ensures best object finding with the approximate probability of 0.368. 7.6 Best choice game with stopping before opponent In the previous section, we have studied players acting independently (their payoff functions depend only on their individual behavior). Now, consider a nonzero-sum two-player game, where each player strives to find the best object earlier than the opponent. A possible inter- pretation of this game lies in the following. Two companies competing on a market wait for a favorable order or conduct research work to improve their products; the earlier a company succeeds, the more profits it makes. And so, let us analyze the following game. Players I and II randomly receive objects ordered from 1 to N. Each player has a specific set of objects, and all N! permutations appear equiprobable. Players make their choice at some time moments, and the chosen objects are compared. The payoff of a player equals 1, if he stops on the best object earlier than the opponent. We adopt the same notation as in Section 7.6. Designate by an, a′ n and yn, y′ n the absolute and relative ranks for players I and II, respectively. Consequently, the payoff www.it-ebooks.info OPTIMAL STOPPING GAMES 255 functions in this game acquire the form H1(𝜏, 𝜎) = E{I{a𝜏 =1,a′𝜎≠1} + I{a𝜏 =1,a′𝜎=1,𝜏<𝜎}} = P{a𝜏 = 1, a′𝜎 ≠ 1} + P{a𝜏 = 1, a′𝜎 = 1, 𝜏<𝜎}, (6.1) and H2(𝜏, 𝜎) = E{I{a𝜏 ≠1,a′𝜎=1} + I{a𝜏 =1,a′𝜎=1,𝜏>𝜎}} = P{a𝜏 ≠ 1, a′𝜎 = 1} + P{a𝜏 = 1, a′𝜎 = 1, 𝜏<𝜎}. (6.2) So long as the game enjoys symmetry, it suffices to find the optimal strategy of one player. For instance, we select player I. Fix a certain strategy 𝜎 of player II and evaluate the best response of the opponent. Recall the scheme involved in the preceding section. Notably, take a random sequence xn = E{I{an=1,a′𝜎≠1} + I{an=1,a′𝜎=1,n<𝜎}∕yn}, n = 1, 2, … , N. Using similar arguments, one can show that E{x𝜏} = H1(𝜏, 𝜎). Therefore, the optimal response problem of player I represents the optimal stopping problem of the random sequence xn, n = 1, … , N. The independence of the random variables a1 n, a2 n implies that xn can be reexpressed by xn = P{an = 1∕yn} [ P{a′𝜎 ≠ 1} + P{a′𝜎 = 1, n <𝜎} ] = P{an = 1∕yn} [ 1 − P{a′𝜎 = 1} + P{a′𝜎 = 1, n <𝜎} ] . Hence, xn = P{an = 1∕yn} [ 1 − P{a′𝜎 = 1, 𝜎 ≤ n} ] , n = 1, … , N. By virtue of Lemma 7.4, we arrive at the representation xn = n N I{yn=1} [ 1 − P{a′𝜎 = 1, 𝜎 ≤ n} ] , n = 1, … , N. (6.3) Formula (6.3) specifies the payoff of player I under stopping on object n. Clearly, to establish the best response of player I, one should follow up only the appearance of candidates in the sequence. Again, we address the backward induction method. Seek for the optimal strategies among threshold strategies 𝜏(r) dictating to terminate observations as soon as the sequence xn enters the set {r, r + 1, … , N}. Suppose that player II chooses the threshold strategy 𝜎(r). Then formula (6.3) immediately implies that, for n ≤ r − 1, the payoff under stopping equals xn = n N I{yn=1}, n = 1, … , r − 1. www.it-ebooks.info 256 MATHEMATICAL GAME THEORY AND APPLICATIONS In the case of n ≥ r, we compute the probability P{a′𝜎 = 1, 𝜎 ≤ n}: P{a′𝜎 = 1, 𝜎 ≤ n} = n∑ i=r P{a′ j = 1, 𝜎 = j} = n∑ i=r P{𝜎 = j}P{a′ j = 1∕y′ j = 1}. According to Lemma 7.5, we have P{𝜎 = j} = r−1 j(j−1) ; and Lemma 7.4 yields P{a′ j = 1∕y′ j = 1} = j∕N. Therefore, P{a′𝜎 = 1, 𝜎 ≤ n} = n∑ i=r r − 1 j(j − 1) j N = r − 1 N n∑ j=r 1 j − 1 . Using (6.3), readers easily find that xn = n N I{yn=1} [ 1 − r − 1 N n∑ j=r 1 j − 1 ] , n = r, … , N. (6.4) The payoff under stopping in state n has been successfully established. We can proceed and define the optimal stopping rule. This will employ the optimal expected payoff function vn, n = 1, ...., N and the backward induction method. Assume that incoming object n is the best among all previous ones. Consider the end time moment n = N. Here the player’s payoff equals xN. Due to (6.4), this is the quantity xN = [ 1 − r − 1 N N∑ j=r 1 j − 1 ] . At the last shot, a player must stop, since his payoff under continuation makes up 0. Set vN = xN. At shot n = N − 1, the player’s payoff under stopping is given by xN−1 = N − 1 N [ 1 − r − 1 N N−1∑ j=r 1 j − 1 ] . If he continues, the expected payoff becomes ExN = 1 N [ 1 − r − 1 N N∑ j=r 1 j − 1 ] . By comparing these expressions, we find the optimal stopping rule: vN−1 =max{xN−1, ExN} =max { N − 1 N [ 1 − r − 1 N N−1∑ j=r 1 j − 1 ] , 1 N [ 1 − r − 1 N N∑ j=r 1 j − 1 ]} . www.it-ebooks.info OPTIMAL STOPPING GAMES 257 Repeat these considerations accordingly; using the transition rate formulas (5.1), at shot n we get the equation vn =max{xn, Exn+1} =max { xn, N∑ i=n+1 n i(i − 1)xi } , n = N, ...1. Calculate the expected payoff under continuation by one shot. For r ≤ n ≤ N − 1, we have the representation Exn+1 = N∑ i=n+1 n i(i − 1) i N [ 1 − r − 1 N i∑ j=r 1 j − 1 ] = n N N−1∑ i=n 1 i − n(r − 1) N2 N−1∑ i=n i∑ j=r−1 1 j . (6.5) In the case of 1 ≤ n ≤ r − 1, the following relationship holds true: Exn+1 = r−1∑ i=n+1 n i(i − 1) i N + N∑ i=r n i(i − 1) i N [ 1 − r − 1 N i∑ j=r 1 j − 1 ] = n N N−1∑ i=n 1 i − n(r − 1) N2 N−1∑ i=r−1 i∑ j=r−1 1 j . (6.6) Figure 7.7 demonstrates the curves of the functions y = xn and y = Exn+1, n = 0, … , N under N = 10 and r = 4. We have mentioned that, owing to the problem’s symmetry, the optimal strategies of the players do coincide. Therefore, choose r such that xr−1 < Exr, xr ≥ Exr+1. Figure 7.7 The functions xn and Exn+1. www.it-ebooks.info 258 MATHEMATICAL GAME THEORY AND APPLICATIONS After simplifications, formulas (6.4)–(6.6) imply that r satisfies the inequalities 1 < N−1∑ i=r−1 1 i ( 1 − r − 1 N i∑ j=r−1 1 j ) ≤ 1 + N − r N(r − 1) . (6.7) Theorem 7.5 Consider the fastest best choice game. The equilibrium strategy profile is achieved among threshold strategies (𝜏(r), 𝜎(r)), where r meets the conditions (6.7). Proof: Suppose that player II adheres to the strategy 𝜎(r), where r agrees with (6.7). Below we demonstrate that the best response of player I represents the strategy 𝜏(r) with the same threshold r. In fact, it suffices to show that xn < Exn+1, n = 1, … , r − 1, xn ≥ Exn+1 n = r, … , N. According to (6.6), for n = 1, … , r − 1wehave Exn+1 − xn = n N [ r−2∑ i=n 1 i + N−1∑ i=r−1 1 i ( 1 − r − 1 N i∑ j=r−1 1 j ) − 1 ] . This expression is strictly positive by virtue of the condition (6.7). In the case of n = r, … , N, formulas (6.4), (6.5) lead to Exn+1 − xn = n N [ N−1∑ i=n 1 i ( 1 − r − 1 N i∑ j=r−1 1 j ) −1 + r − 1 N n−1∑ i=r−1 i i ] = n N G(n). Due to the second condition in (6.7), the bracketed expression G(n) appears non-positive in the point n = r. This property remains in force in the rest points n = r + 1, … , N − 1, since the function G(n) is non-increasing in n. Really, this follows from G(n + 1) − G(n) =−1 n [ 1 − r − 1 N n∑ i=r−1 1 i − r − 1 N ] and non-negativity of the expression 1 − r − 1 N n∑ i=r−1 1 i − r − 1 N , n = r, … , N − 1. www.it-ebooks.info OPTIMAL STOPPING GAMES 259 Figure 7.8 The optimal thresholds. The last fact is immediate from the inequalities 1 − r − 1 N n∑ i=r−1 1 i − r − 1 N ≥ 1 − r − 1 N N−1∑ i=r−1 1 i − r − 1 N ≥ r − 1 N ( N − r + 1 r − 1 − N−1∑ i=r−1 1 i ) ≥ 0. The proof of Theorem 7.5 is finished. Figure 7.8 shows the optimal thresholds in the fastest best choice problem under different values of N. Finally, we explore the asymptotical setting of this game as N → ∞. Imagine that the ratio r∕N tends to some limit z ∈ [0, 1]. Under large N, the conditions (6.7) get reduced to the equation −lnz − z ln2 ( z 2 ) = 1. Its solution z∗ ≈ 0.295 yields the asymptotically optimal value of r∕N. In contrast to the solution of the previous problem (0.368), a player should stop earlier. As a result, errors grow appreciably—this is the cost of taking the lead over the opponent. 7.7 Best choice game with rank criterion. Lottery Lemma 7.6 Assume that y is the relative rank of the candidate at shot n. Then the expected value of its absolute rank makes up E{an∕yn = y} = Q(n, y) = N + 1 n + 1 y. Proof: Let the relative rank of candidate n be equal to y. Find the probability P{an = r| yn = y} (the absolute rank of this candidate is r, where r = y, y + 1, … , N − n + y). Consider the event that, after choice of n objects, the last object with the relative rank of y possesses the absolute rank of r; actually, this event is equivalent to the following. While choosing www.it-ebooks.info 260 MATHEMATICAL GAME THEORY AND APPLICATIONS n objects k1, … , ky−1, ky, … , kn from N objects 1, 2, … , r − 1, r, … , N, one chooses objects k1, … , ky−1 from objects 1, … , r − 1 and objects ky, … , kn from objects r, … , N.Andthe desired probability is defined by P{an = r| yn = y} = ( r−1 y−1 )( N−r n−y ) ( N n ) , r = y, y + 1, … , N − n + y. (7.1) Formula (7.1) specifies the negative hypergeometric distribution. Now, we evaluate the expected absolute rank of candidate n provided that its relative rank constitutes y. Notably, Q(n, y) ≡ N−(n−y)∑ r=y r ( r − 1 y − 1 )( N − r n − y ) ∕ (N n ) = N + 1 n + 1 y N−(n−y)∑ r=y ( r y )( N − r n − y ) ∕ (N + 1 n + 1 ) = N + 1 n + 1 y. This concludes the proof of Lemma 7.6. And so, as candidate n appears, the players observe the relative ranks (yn, zn) = (y, z). If both players choose (R − R), candidate n is rejected. Subsequently, the players interview candidate (n + 1) and pass to state yn+1, zn+1. However, if the players choose (A − A), the game ends with the payoffs N+1 n+1 y (player I) and N+1 n+1 z (player II). In the case of different choices, lottery selects the decision of player I (or player II) with the probability p (the probability 1 − p, respectively). At the last shot, the last candidate is accepted anyway. Define state (n, y, z), where 1. first n − 1 candidates are rejected and players are shown candidate n, 2. the relative ranks of the current candidates equal yn = y and zn = z. Denote by un, vn the optimal expected payoffs of the players at shot n, when first n candidates are rejected. Apply the backward induction method and write down the optimality equation: (un−1, vn−1) = n−2 n∑ y,z=1 Val Mn(y, z). (7.2) Here Val Mn(y, z) represents the value of the game with the matrix Mn(y, z) defined by RA R A un, vn ̄pQ(n, y) + pun, ̄pQ(n, z) + pvn pQ(n, y) + ̄pun, pQ(n, z) + ̄pvn Q(n, y), Q(n, z) (7.3) ( n = 1, 2, … , N − 1; uN−1 = vN−1 = 1 N N∑ y=1 y = N + 1 2 ) . Without loss of generality, suppose that 1∕2 ≤ p ≤ 1. www.it-ebooks.info OPTIMAL STOPPING GAMES 261 Theorem 7.6 The optimal strategies of the players in the game with the matrix (7.3) have the following form: player I chooses A(R),ifQ(n, y) ≤ (>)un (regardless of z), player II chooses A(R),ifQ(n, z) ≤ (>)vn (regardless of y). The quantities un, vn satisfy the recurrent equations un−1 = pE[Q(n, yn) ∧ un] + ̄pE [N + 1 2 I{Q(n, zn) ≤ vn} + unI{Q(n, zn) > vn} ] , (7.4) vn−1 = ̄pE[Q(n, zn) ∧ vn] + pE [N + 1 2 I{Q(n, yn) ≤ un} + vnI{Q(n, yn) > un} ] (7.5) ( n = 1, 2, … , N − 1; uN−1 = vN−1 = N + 1 2 ) , where I{C} means the indicator of the event C. The optimal payoffs in the game ΓN(p) are Un = u0 and Vn = v0. Proof: Obviously, for any (y, z) ∈ {1, … , n} × {1, … , n}, the bimatrix game (7.3) admits the pure strategy equilibrium defined by Q(n, z) > vn Q(n, z) ≤ vn Q(n, y) > un Q(n, y) ≤ un R-R u, v R-A ̄pQ(n, y) + pu, ̄pQ(n, z) + pv A-R pQ(n, y) + ̄pu, pQ(n, z) + ̄pv A-A Q(n, y), Q(n, z) (7.6) In each cell, we have the payoffs of players I and II, where indexes un, vn are omitted for simplicity. Consider component 1 only and sum up all payoffs multiplied by n−2: n−2 n∑ y,z=1 Q(n, y)[I{Q(n, y) ≤ u, Q(n, z) ≤ v} + pI{Q(n, y) ≤ u, Q(n, z) > v} +̄pI{Q(n, y) > u, Q(n, z) ≤ v}] + n−2u n∑ y,z=1 [̄pI{Q(n, y) ≤ u, Q(n, z) > v} +pI{Q(n, y) > u, Q(n, z) ≤ v} + I{Q(n, y) > u, Q(n, z) > v}]. (7.7) The first sum in (7.7) equals n−2 n∑ y,z=1 Q(n, y)[pI{Q(n, y) ≤ u} + ̄pI{Q(n, z) ≤ v}] = pn−1 n∑ y=1 Q(n, y)I{Q(n, y) ≤ u} + ̄pn−1 n∑ z=1 N + 1 2 I{Q(n, z) ≤ v}, (7.8) www.it-ebooks.info 262 MATHEMATICAL GAME THEORY AND APPLICATIONS so far as n−1 n∑ y=1 Q(n, y) = 1 n n∑ y=1 N + 1 n + 1 y = N + 1 2 . Consider the second sum in (7.7): n−2u n∑ y,z=1 [pI{Q(n, y) > u} + ̄pI{Q(n, z) > v}] = n−1u [ p n∑ y=1 I{Q(n, y) > u} + ̄p n∑ z=1 I{Q(n, z) > v} ] . (7.9) By substituting (7.8) and (7.9) into (7.7), we obtain (7.4). Similarly, readers can establish the representation (7.5). The proof of Theorem 7.6 is finished. Introduce the designation ̄yn = un n+1 N+1 and ̄zn = vn n+1 N+1 for n = 0, 1, … , N − 1 to reexpress the system (7.4)–(7.5) as ̄yn−1 = p n + 1 [ 1 2[̄yn]([̄yn] + 1) + ̄yn(n − [̄yn]) ] + ̄p n + 1 [ 1 2(n + 1)[̄zn] + ̄yn(n − [̄zn]) ] ̄zn−1 = ̄p n + 1 [ 1 2[̄zn]([̄zn] + 1) + ̄zn(n − [̄zn]) ] + p n + 1 [ 1 2(n + 1)[̄yn] + ̄zn(n − [̄yn]) ] . Here [y] indicates the integer part of y, and ̄yN−1 = ̄zN−1 = N∕2. Note that states with candidate acceptance, i.e., Q(n, y) = N + 1 n + 1 y ≤ un, Q(n, z) = N + 1 n + 1 z ≤ vn acquire the following form: y ≤ ̄yn, z ≤ ̄zn. In other words, ̄yn, ̄zn represent the optimal thresholds for accepting candidates with the given relative ranks. Under p = 1∕2, the values of ̄yn and ̄zn do coincide. Denote these values by xn; they meet the recurrent expressions xn−1 = xn + [xn] 4 − 1 n + 1([xn] + 1)(xn − [xn]∕4), (7.10) n = 1, … , N − 1. We investigate their behavior for large N. www.it-ebooks.info OPTIMAL STOPPING GAMES 263 Theorem 7.7 Under N ≥ 10, the following inequalities hold true: n + 1 3 ≤ xn ≤ n 2 (n = 5, … , N − 2). (7.11) Proof: It follows from (7.10) that xN−2 = N∕2 + [N∕2] 4 − 1 N ([N∕2] + 1)(N∕2 − [N∕2]∕4). Clearly, xN−2 ≥ (N − 1)∕3, ∀N and xN−2 ≤ (N − 2)∕2 provided that N ≥ 10. Therefore, for- mula (7.11) remains in force for n = N − 2. Suppose that these conditions take place for 6 ≤ n ≤ N − 2. Below we demonstrate their validity for n − 1. Introduce the operator Tx(s) = x + s 4 − 1 n + 1(s + 1)(x − s∕4). Then for x − 1 ≤ s ≤ x we have T′ x(s) = 1 4(n + 1)(2s − 4x + 2 + n) ≥ 1 4(n + 1)(2(x − 1) − 4x + 2 + n) = 1 4(n + 1)(−2x + n). So long as xn ≤ n∕2, the inequality T′ xn (s) ≥ 0 is the case for xn − 1 ≤ s ≤ xn. Hence, xn−1 = Txn ([xn]) ≤ Txn (xn) = 1 4xn ( 5 − 3 n + 1(xn + 1) ) ≤ n 8 ( 5 − 3(n + 2) 2(n + 1) ) ≤ n − 1 2 ,forn ≥ 6. Moreover, owing to xn ≥ (n + 1)∕3, we obtain xn−1 = Txn ([xn]) ≥ Txn (xn − 1) = 5xn − 1 4 − xn(3xn + 1) 4(n + 1) ≥ 5(n + 1) − 3 12 − n + 2 12 ≥ n∕3. Now, it is possible to estimate x0. Corollary 7.1 0.387 ≤ x0 ≤ 0.404. Theorem 7.2 implies that 2 ≤ x5 ≤ 2.5. Using (7.10), we find successively 1.75 ≤ x4 ≤ 2, 1.4 ≤ x3 ≤ 1.6, 1.075 ≤ x2 ≤ 1.175, 0.775 ≤ x1 ≤ 0.808, and, finally, 0.387 ≤ x0 ≤ 0.404. The following fact is well-known in the theory of optimal stopping. In the non-game setting of the problem (p = 1), there exists the limit value of U(n) as n → ∞. It equals www.it-ebooks.info 264 MATHEMATICAL GAME THEORY AND APPLICATIONS Table 7.2 The limit values y0 = U(N)∕N, z0 = V(N)∕N under different values of p. p 0.5 0.6 0.7 0.8 0.9 1 y0 0.390 0.368 0.337 0.305 0.249 0 z0 0.390 0.407 0.397 0.422 0.438 0.5 ∞∏ j=1 (1 + 2 j ) 1 j+1 ≈ 3.8695. Therefore, a priority player guarantees a secretary whose mean rank is smaller than 4. Moreover, for player 2 this secretary possesses the rank of 50% among all candidates. In the case of p < 1, such limit value does not exist. For instance, if p = 1∕2, the corol- lary of Theorem 7.7 claims that x0 ≥ 0.387. And it appears that U(N) = u0 = x0(N + 1) ≥ 0.387(N + 1) → ∞ as N → ∞. Table 7.2 combines the limit values y0 = limn→∞{U(n)∕n} and z0 = limn→∞{V(n)∕n} under different values of p. Concluding this section, we compare the optimal payoffs with those ensured by other simple rules. If both players accept the first candidate, their payoffs (expected ranks) do coincide: UN = VN = N∑ y=1 y = N+1 2 . However, random strategies (i.e., choosing between (A) and (R) with the same probability of 1∕2) lead to un−1 = E [ 1 4un + 1 4(̄pQ(n, y) + pun) + 1 4(pQ(n, y) + ̄pun) + 1 4Q(n, y) ] = 1 2 [ un + EQ(n, y) ] = 1 2 [ un + N + 1 2 ] (n = 1, 2, … , N − 1), see Theorem 7.6. Then the condition uN−1 = N+1 2 implies that u0 = ... = uN−1 = N+1 2 . Simi- larly, v0 = ... = vN−1 = N+1 2 . Consequently, the first candidate strategy and the random strat- egy are equivalent. They both result in the candidate whose rank is the mean rank of all candidates (regardless of the players’ priority p). Still, the optimal strategies found above yield appreciably higher payoffs to the players. 7.8 Best choice game with rank criterion. Voting Consider the best choice game with m participants and final decision by voting. Assume that a commission of m players has to fill a vacancy. There exist N pretenders for this vacancy. For each player, the pretenders are sorted by their absolute ranks (e.g., communicability, language qualifications, PC skills, etc.). A pretender possessing the lowest rank is actually the best one. Pretenders appear in the commission one-by-one randomly such that all N! permutations are equiprobable. During an interview, each player observes the relative rank of a current pretender against preceding pretenders. The relative ranks are independent for different players. A pretender is accepted, if at least k members of the commission agree to accept him (and the game ends). Otherwise, the pretender is rejected and the commission proceeds to the next pretender (the rejected one gets eliminated from further consideration). At shot N, the players have to accept the last pretender. Each player seeks to minimize the www.it-ebooks.info OPTIMAL STOPPING GAMES 265 absolute rank of the selected pretender. We will find the optimal rule of decision making depending on the voting threshold k. 7.8.1 Solution in the case of three players To begin, we take the case of m = 3 players. Imagine that a commission of three members has to fill a vacancy. There are N pretenders sorted by three absolute ranks. During an interview, each player observes the relative rank of a current pretender against preceding pretenders. Based on this information, he decides whether to accept or reject the pretender. A pretender is accepted, if the majority of the members (here, 2) agree to accept him (and the game ends). The pretender is rejected provided that, at least, two players disagree. And the commission proceeds to the next pretender (the rejected one gets eliminated from further consideration). At shot N, the players have to accept the last pretender. Denote by xn, yn and zn the relative ranks of the pretender at shot n for player 1, player 2, and player 3, respectively. The sequence {(xn, yn, zn)}N n=1 composed of independent random variables obeys the distribution law P{xn = x, yn = y, zn = z} = 1 n3 , where x, y, and z take values from 1 to n. After interviewing a current pretender, the players have to accept or reject him. Pretender n being rejected, the players pass to pretender n + 1. If pretender n is accepted, the game ends. In this case, the expected absolute rank for players 1–3 makes up Q(n, x), Q(n, y), and Q(n, z), respectively. We have noticed that Q(n, x) = N + 1 n + 1 x. When all pretenders except the last one are rejected, the players must accept the last pretender. Each player strives for minimization of his expected payoff. Let un, vn, wn designate the expected payoffs of players 1–3, respectively, provided that n pretenders are skipped. At shot n, this game can be described by the following matrix; as their strategies, players choose between A (“accept”) and R (“reject”). R RA R A un, vn, wn un, vn, wn un, vn, wn Q(n, x), Q(n, y), Q(n, z) A RA R A un, vn, wn Q(n, x), Q(n, y), Q(n, z) Q(n, x), Q(n, y), Q(n, z) Q(n, x), Q(n, y), Q(n, z) According to the form of this matrix, strategy A dominates strategy R for players 1, 2, and 3 under Q(n, x) ≤ un, Q(n, y) ≤ vn, and Q(n, z) ≤ wn, respectively. Therefore, the optimal www.it-ebooks.info 266 MATHEMATICAL GAME THEORY AND APPLICATIONS behavior of player 1 lies in accepting pretender n,ifQ(n, x) ≤ un; by analogy, player 2 accepts pretender n if Q(n, y) ≤ vn, and player 3 accepts pretender n if Q(n, z) ≤ wn. Then un−1 = 1 n3 n∑ x,y,z=1 Q(n, x)[I{Q(n, x) ≤ un, Q(n, y) ≤ vn, Q(n, z) ≤ wn} + I{Q(n, x) ≤ un, Q(n, y) ≤ vn, Q(n, z) > wn} + I{Q(n, x) ≤ un, Q(n, y) > vn, Q(n, z) ≤ wn} + I{Q(n, x) > un, Q(n, y) ≤ vn, Q(n, z) ≤ wn}] + 1 n3 un n∑ x,y,z=1 [I{Q(n, x) > un, Q(n, y) > vn, Q(n, z) > wn} + I{Q(n, x) > un, Q(n, y) ≤ vn, Q(n, z) > wn} + I{Q(n, x) ≤ un, Q(n, y) > vn, Q(n, z) > wn} + I{Q(n, x) > un, Q(n, y) > vn, Q(n, z) ≤ wn}] or un−1 = 1 n2 [ n∑ x,y=1 Q(n, x)I{Q(n, x) ≤ un, Q(n, y) ≤ vn} + n∑ x,z=1 Q(n, x)I{Q(n, x) ≤ un, Q(n, z) ≤ wn} + n∑ y,z=1 N + 1 2 I{Q(n, y) ≤ vn, Q(n, z) ≤ wn} −2 n n∑ x,y,z=1 Q(n, x)I{Q(n, x) ≤ un, Q(n, y) ≤ vn, Q(n, z) ≤ wn} ] + 1 n2 un [ n∑ x,y=1 I{Q(n, x) > un, Q(n, y) > vn} + n∑ x,z=1 I{Q(n, x) > un, Q(n, z) > wn} + n∑ y,z=1 I{Q(n, y) > vn, Q(n, z) > wn} −2 n n∑ x,y,z=1 I{Q(n, x) > un, Q(n, y) > vn, Q(n, z) > wn} ] , where n = 1, 2, … , N − 1 and uN−1 = 1 N N∑ x=1 x = N+1 2 . Here I{A} is the indicator of the event A. Owing to the problem’s symmetry, un = vn = wn. And so, the optimal thresholds make up ̄xn = un n+1 N+1 . www.it-ebooks.info OPTIMAL STOPPING GAMES 267 Consequently, ̄xn−1 = un−1 n N + 1 = 1 n(N + 1) [ N + 1 (n + 1)[̄xn]2([̄xn] + 1) + N + 1 2 [̄xn]2 − N + 1 n(n + 1)[̄xn]3([̄xn] + 1) ] + ̄xn n(n + 1) [ 3(n − [̄xn])2 − 2 n(n − [̄xn])3 ] , where ̄xN−1 = N 2 ,[x] means the integer part of x. Certain transformations lead to ̄xn−1 = 1 2n2(n + 1)[[̄xn]2(2([̄xn] + 1)(n − [̄xn]) + n(n + 1)) + 2̄xn(n + 2[̄xn])(n − [̄xn])2]. By substituting N = 100 into this formula, we obtain the optimal expected rank of 33. Compare this quantity with the expected rank of 39.425 in the problem with two players (p = 1∕2) and with the optimal rank of 3.869 in the non-game problem. Obviously, the voting procedure ensures a better result than the equiprobable scheme involving two players. Theorem 7.8 Under N ≥ 19, the optimal payoff in the best choice game with voting is higher than in the game with an arbitrator. Proof: It is required to demonstrate that, for N ≥ 19, the inequality n+2 4 < ̄xn < n−1 2 holds true for 14 ≤ n ≤ N − 2. Apply the backward induction method. In the case of N ≥ 19, we have N 4 < ̄xN−2 < N−3 2 . Suppose that the inequality takes place for 15 ≤ n ≤ N − 2. Prove its validity under n − 1, i.e., n+1 4 < ̄xn−1 < n−2 2 for 15 ≤ n ≤ N − 2. Introduce the operator T(x, y) = 1 2n2(n + 1)[y2(2(y + 1)(n − y) + n(n + 1)) + 2x(n + 2y)(n − y)2], where x − 1 < y ≤ x. Find the first derivative: T′ y(x, y) = 1 n2(n + 1)(−4y3 + 3y2(n − 1 + 2x) + y(n2 + 3n − 6xn)) = y(n − y)(3 + n − 6x + 4y) n2(n + 1) . www.it-ebooks.info 268 MATHEMATICAL GAME THEORY AND APPLICATIONS So far as x − 1 < y ≤ x and n+2 4 < x < n−1 2 , we obtain T′ y(x, y) > 0. Then the function T(x, y) increases. Hence, owing to ̄xn < n−1 2 , ̄xn−1 = T(̄xn,[̄xn]) < T(̄xn, ̄xn) = 1 2n2(n + 1) ( −2̄x4 n + 2̄x3 n(n − 1 + 2̄xn) + ̄x2 n(n2 + 3n − 6̄xnn) + 2̄xnn3 ) < (7n2 − 3)(n − 1) 16n2 < n − 2 2 for n ≥ 9. Similarly, so long as ̄xn > n+2 4 , the following inequality holds true: ̄xn−1 = T(̄xn,[̄xn]) > T(̄xn, ̄xn − 1) = 1 2n2(n + 1) ( 2̄x4 n − 2̄x3 n(2n + 3) + ̄x2 n(n2 + 9n + 6) + 2̄xn(n3 − n2 − 3n − 1) + n2 + n ) > 65n4 + 116n3 + 32n2 − 16n − 16) 256n2(n + 1) > n + 1 4 for n ≥ 19. By taking into account the inequality n+2 4 < ̄xn < n−1 2 for 14 ≤ n ≤ N − 2 and N ≥ 19, we get 4 < ̄x14 < 6.5, 3.837 < ̄x13 < 5.650, 3.580 < ̄x12 < 4.984, … , 1.838 < ̄x5 < 2.029, 1.526 < ̄x4 < 1.736, 1.230 < ̄x3 < 1.372, 0.961 < ̄x2 < 1.040, 0.641 < ̄x1 < 0.763, 0.320 < ̄x0 < 0.382. Recall that 0.387 ≤ x0 ≤ 0.404 in the problem with two players. Thus, the derived thresh- olds in the case of three players are smaller than in the case of two players. Voting with three players guarantees a better result than fair lottery. 7.8.2 Solution in the case of m players This subsection concentrates on the scenario with m players. Designate by xj n (j = 1, … , m) the relative rank of pretender n for player j. Then the vector {(x1 n, … , xm n )}N i=1 possesses the distribution P{x1 n = x1, … , xm n = xm} = 1 nm for xl = 1, … , n, where l = 1, … , m. A current pretender is accepted, if at least k members of the commission agree, k = 1, … , m. If after the interview pretender n is accepted, the game ends. In this case, the expected value of the absolute rank for player j makes up the quantity Q(n, xj) = N + 1 n + 1 xj, j = 1, … , m. www.it-ebooks.info OPTIMAL STOPPING GAMES 269 Table 7.3 The optimal expected absolute ranks. k 12345k∗ m = 1 u0 3.603 1 m = 3 u0 47.815 33.002 19.912 3 m = 4 u0 49.275 44.967 26.335 27.317 3 m = 5 u0 49.919 47.478 40.868 26.076 33.429 4 Let uj n, j = 1, … , m indicate the expected payoff of player j provided that n pretenders are skipped. As above, the optimal strategy of player j consists in accepting pretender n if Q(n, xj) ≤ uj n. Then uj n−1 = 1 nm [ n∑ x1,x2,…,xm=1 Q(n, xj)[Jm + Jm−1 + ... + Jk+1 + Jk] + uj n n∑ x1,x2,…,xm=1 [Jk−1 + Jk−2 + ... + J0] ] , where Jl gives the number of all events when the pretender has been accepted by l players exactly, l = 0, 1, … , m. Problem’s symmetry dictates that u1 n = u2 n = ... = um n = un.Wesetxn = n+1 N+1 un. The optimal strategies acquire the form xn−1 = 1 2nm−1(n + 1) m−k∑ j=1 [(( m j ) ([xn] + 1 + n) − ( m − 1 j ) n ) [xn]m−j(n − [xn])j ] + [xn]m([xn] + 1) + xn nm−1(n + 1) k−1∑ j=1 [( m j ) [xn]j ( n − [xn] )m−j ] + (n − [xn])m; un = xn N + 1 n + 1 ; xN−1 = N 2 ; where n = 1, … , N − 1, and [x] corresponds to the integer part of x. Table 7.3 provides some numerical results for different m and k under N = 100. Clearly, the best result k∗ is achieved by the commission of three members. Interestingly, decision making by simple majority appears insufficient in small commissions. 7.9 Best mutual choice game The previous sections have studied best choice games with decision making by just one side. However, a series of problems include mutual choice. For instance, such problems arise in www.it-ebooks.info 270 MATHEMATICAL GAME THEORY AND APPLICATIONS biology and sociology (mate choice problems), in economics (modeling of market relations between buyers and sellers) and other fields. Let us imagine the following situation. There is a certain population of male and female individuals. Individuals choose each other depending on some quality index. Each individual seeks to maximize the mate’s quality. It may happen that one mate accepts another, whereas the latter disagrees. Thus, the choice rule must concern both mates. Suppose that the populations of both genders have identical sizes and their quality levels are uniformly distributed on the interval [0, 1]. Denote by x and y the quality level of females and males, respectively; accordingly, their populations are designated by X and Y. Choose randomly two individuals of non-identical genders. This pair (x, y) is called the state of the game. Each player has some threshold for the mate’s quality level (he/she does not agree to pair with a mate whose quality level is below the threshold). If, at least, one mate disagrees, the pair takes no place and both mates return to their populations. If they both agree, the pair is formed and the mates leave their populations. Consider a multi-shot game, which models random meetings of all individuals from these populations. After each shot, the number of individuals with high quality levels decreases, since they form pairs and leave their populations. Players have less opportunities to find mates with sufficiently high quality levels. Hence, the demands of the remaining players (their thresholds) must be reduced with each shot. Our analysis begins with the game of two shots. 7.9.1 The two-shot model of mutual choice Consider the following situation. At shot 1, all players from the populations meet each other; if there are happy pairs, they leave the game. At shot 2, the remaining players again meet each other randomly and form the pair regardless of the quality levels. Establish the optimal behavior of the players. Assume that each player can receive the observations x1 and x2 (y1 and y2, respectively) and adopts the threshold rule z :0≤ z ≤ 1. Due to the problem’s symmetry, we apply the same rule to both genders. If a current mate possesses a smaller quality level than z,themate is rejected, and the players proceed to the next shot. If the quality levels of both appears greater or equal to z, the pair is formed, and the players leave the game. Imagine that at shot 1 same-gender players have the uniform distribution on the segment [0, 1]; after shot 1, this distribution changes, since some players with quality levels higher than z leave the population (see Figure 7.9). For instance, we find the above distribution for x. In the beginning of the game, the power of the player set X makes up 1. After shot 1, the remaining players are the ones whose quality levels belong to [0, z) and the share (1 − z)z of the players whose quality levels are between z and 1. Those mates whose quality levels exceed z (they are z2 totally) leave the game. Therefore, just z + (1 − z)z players of the same gender continue the game at shot 2. And the density function of players’ distribution by their quality levels acquires the following form (see Figure 7.10) f(x) = ⎧ ⎪ ⎨ ⎪⎩ 1 z + (1 − z)z, x ∈ [0, z), z z + (1 − z)z, x ∈ [z,1]. www.it-ebooks.info OPTIMAL STOPPING GAMES 271 Figure 7.9 The two-short model. Hence, if some player fails to find an appropriate pair at shot 1, he/she obtains the mean quality level of all opposite-gender mates at shot 2, i.e., Ex2 = ∫ 1 0 xf(x)dx = ∫ z 0 x z + (1 − z)zdx + ∫ 1 z zx z + (1 − z)zdx. By performing integration, we arrive at the formula Ex2 = 1 + z − z2 2(2 − z) . Get back to shot 1. A player with a quality level y decides to choose a mate with a quality level x (and vice versa), if the quality level x appears greater or equal to the mean quality level Ex2 at the next shot. Therefore, the optimal threshold for mate choice at shot 1 obeys the equation z = 1 + z − z2 2(2 − z) . Figure 7.10 The density function. www.it-ebooks.info 272 MATHEMATICAL GAME THEORY AND APPLICATIONS Again, its solution z = (3 − √ 5)∕2 ≈ 0.382 has close connection with the golden section (z = 1 − z∗, where z∗ represents the golden section). 7.9.2 The multi-shot model of mutual choice Now, suppose that players have n + 1 shots for making pairs. Let player II adhere to a threshold strategy with the thresholds z1, … , zn, where 0 < zn ≤ zn−1 ≤ ... ≤ z1 ≤ z0 = 1. Evaluate the best response of player I and require that it coincides with the above threshold strategy. For this, we analyze possible variations of the players’ distribution by their quality levels after each shot. Initially, this distribution is uniform. Assume that the power of the player set equals N0 = 1. After shot 1, players with quality levels higher than z1 can create pairs and leave the game. Therefore, as shot 1 is finished, the mean power of the player set becomes N1 = z1+ (1 − z1)z1. This number can be rewritten as N1 = 2z1 − z2 1∕N0. After shot 2, players whose quality levels exceed z2 can find pairs and leave the game. And the mean power of the player set after shot 2 is given by N2 = z2 + (z1 − z2)z2∕N1 + (1 − z1)z1z2∕(N1N0). This quantity can be reexpressed as N2 = 2z2 − z2 2∕N1. Apply such reasoning further to find that, after shot i, the number of remaining players is Ni = zi + i−1∑ j=1 (zj − zj+1) i−1∏ k=j zk+1 Nk , i = 1, … , n. For convenience, rewrite the above formula in the recurrent form: Ni = 2zi − z2 i Ni−1 , i = 1, … , n. (9.1) After each shot, the distribution of players by the quality levels has the following density function: f1(x) = { 1∕N1,0≤ x < z1, z1∕N1, z1 ≤ x ≤ 1 (after shot 1), f2(x) = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪⎩ 1 N2 ,0≤ x < z2, z2 N1N2 , z2 ≤ x < z1, z1z2 N0N1N2 , z1 ≤ x ≤ 1 www.it-ebooks.info OPTIMAL STOPPING GAMES 273 (after shot 2), and finally, fi(x) = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ 1 Ni ,0≤ x < zi, ∏i−1 j=k zj+1 Nj 1 Ni , zk+1 ≤ x < zk, k = i − 1, … ,1 (after shot i, i = 1, … , n). Now, address the backward induction method and consider the optimality equation. Denote by vi(x), i = 1, … , n, the optimal expected payoff of the player after shot i provided that he/she deals with a mate of the quality level x. Suppose that, at shot n, the player observes a mate of the quality level x. If the player continues, he/she expects the quality level Exn+1, where xn+1 obeys the distribution fn(x). Hence, it appears that vn(x) =max { x, ∫ 1 0 yfn(y)dy } , or vn(x) =max { x, ∫ zn 0 y Nn dy + ∫ zn−1 zn zny NnNn−1 dy + ... + ∫ 1 z1 zn...z1y Nn...N1 dy } . (9.2) The maximand in equation (9.2) comprises an increasing function and a constant function. They intersect in one point representing the optimal threshold for accepting the pretender at shot n. Let us require that this value coincides with zn. Such condition brings to the equation zn = ∫ zn 0 y Nn dy + ∫ zn−1 zn zny NnNn−1 dy + ... + ∫ 1 z1 zn...z1y Nn...N1 dy, whence it follows that zn = z2 n 2Nn + zn ( z2 n−1 − z2 n ) 2NnNn−1 + ... + zn...z1 ( 1 − z2 1 ) 2Nn...N1 . (9.3) Then the function vn(x) acquires the form vn(x) = { zn,0≤ x < zn, x, zn ≤ x ≤ 1. Pass to shot n − 1. Assume that the player meets a mate with the quality level x. Under continuation, his/her expected payoff makes up Evn(xn), where the function vn(x) has been www.it-ebooks.info 274 MATHEMATICAL GAME THEORY AND APPLICATIONS obtained earlier and the expectation operator engages the distribution fn−1(x). The optimality equation at shot n − 1 is defined by vn−1(x) =max { x, ∫ zn 0 zn Nn−1 dy + ∫ zn−1 zn y Nn−1 dy + ... + ∫ 1 z1 zn−1...z1y Nn−1...N1 dy } . Require that the threshold determining the optimal choice at shot n − 1 coincides with zn−1. This yields zn−1 = z2 n Nn−1 + z2 n−1 − z2 n 2Nn−1 + zn−1 ( z2 n−2 − z2 n−1 ) 2Nn−1Nn−2 + ... + zn−1...z1 ( 1 − z2 1 ) 2Nn−1...N1 . (9.4) Repeat such arguments to arrive at the following conclusion. The optimality equation at shot i, i.e., vi(x) =max{x, Evi+1(xi+1)} gives the expression zi = 1 2Ni [ z2 i + z2 i+1 + i−1∑ k=0 ( z2 k − z2 k+1 ) i−1∏ j=k zj+1 Nj ] , i = 1, … , n − 1. (9.5) Next, compare the two equations for zi+1 and zi. According to (9.5), zi+1 = 1 2Ni+1 [ z2 i+1 + z2 i+2 + i∑ k=0 ( z2 k − z2 k+1 ) i∏ j=k zj+1 Nj ] . Rewrite this equation as zi+1 = 1 2Ni+1 [ z2 i+1 + z2 i+2 + ( z2 i − z2 i+1 )zi+1 Ni + i−1∑ k=0 ( z2 k − z2 k+1 )zi+1 Ni i−1∏ j=k zj+1 Nj ] . Multiplication by 2 ∏i+1 j=1 Nj leads to 2 i+1∏ j=1 Njzi+1 = ( z2 i+1 + z2 i+2 ) i∏ j=1 Nj + ( z2 i − z2 i+1 ) zi+1 i−1∏ j=1 Nj + i−1∑ k=0 ( z2 k − z2 k+1 ) i∏ j=k zj+1 k−1∏ j=1 Nj. (9.6) On the other hand, formula (9.1) implies that 2 i+1∏ j=1 Nj = 2 i−1∏ j=1 Nj ( 2zi+1Ni − z2 i+1 ). www.it-ebooks.info OPTIMAL STOPPING GAMES 275 By substituting this result into (9.6), we have 4 i∏ j=1 Njz2 i+1 = ( z2 i+1 + z2 i+2 ) i∏ j=1 Nj + ( z2 i + z2 i+1 ) zi+1 i−1∏ j=1 Nj + i−1∑ k=0 ( z2 k − z2 k+1 ) i∏ j=k zj+1 k−1∏ j=1 Nj. Comparison with equation (9.5) yields the expression 4z2 i+1 = z2 i+1 + z2 i+2 + 2zizi+1. And so, zi = 3 2zi+1 − 1 2 z2 i+2 zi+1 , i = n − 2, … ,1. (9.7) Taking into account (9.1), we compare (9.3) with (9.4) to get zn = 2 3zn−1. Then formula (9.7) brings to zn−2 = 3 2zn−1 − 1 2 ( zn zn−1 )2 zn−1 = 1 2 ( 3 − 4 9 ) zn−1, or zn−1 = 2 3 − 4∕9 ⋅ zn−2. The following recurrent expressions hold true: zi = aizi−1 i = 2, … , n, where the coefficients ai satisfy ai = 2 3 − a2 i+1 , i = 1, … , n − 1, (9.8) and an = 2∕3. Formulas (9.8) uniquely define the coefficients ai, i = 1, … , n. Unique determination of zi, i = 1, … , n calls for specifying one of these quantities. We define z1 using equation (9.5). Notably, z1 = 1 2N1 [ z2 1 + z2 2 + ( 1 − z2 1 ) z1 ] . www.it-ebooks.info 276 MATHEMATICAL GAME THEORY AND APPLICATIONS Table 7.4 The optimal thresholds in the mutual best choice problem. i 12345678910 ai 0.940 0.934 0.927 0.918 0.907 0.891 0.870 0.837 0.782 0.666 zi 0.702 0.656 0.608 0.559 0.507 0.452 0.398 0.329 0.308 0.205 So long as z2 = a2z1, it appears that 2 ( 2z1 − z2 1 ) z1 = z2 1 + a2 2z2 1 + ( 1 − z2 1 ) z1. We naturally arrive at a quadratic equation in z1: z2 1 + z1 ( a2 2 − 3 ) + 1 = 0. Since formulas (9.8) claim that a2 2 − 3 =−2∕a1, the following equation arises immediately: z2 1 − 2 z1 a1 + 1 = 0. Hence, z1 = 1 a1 ( 1 − √ 1 − a2 1 ) . Let us summarize the procedure. First, find the coefficients ai, i = n, n − 1, … ,1.Next, evaluate z1, and compute recurrently the optimal thresholds z2, … , zn. For instance, calcula- tions in the case of n = 10 are presented in Table 7.4. Clearly, the optimal thresholds decrease monotonically. This is natural—the requirements to mate’s quality level must go down as the game evolves. Exercises 1. Two players observe random walks of a particle. It starts in position 0 and moves to the right by unity with some probability p or gets absorbed in state 0 with the probability q = 1 − p. The player who stops random walks in the rightmost position becomes the winner. Find the optimal strategies of the players. 2. Within the framework of exercise no. 1, suppose that each player observes his personal random Bernoulli sequence. These sequences are independent. Establish the optimal strategies of the players. 3. Evaluate an equilibrium in exercise no. 2 in the case of dependent observations. 4. Best choice game with incomplete information. Two players observe a sequence of pretenders for the position of a secretary. Pre- tenders come in a random order. The sequence of moves is first, player I and then II. www.it-ebooks.info OPTIMAL STOPPING GAMES 277 A pretender may reject from the position with the probability p. Find the optimal strategies of the players. 5. Two players receive observations representing independent random walks on the set {0, 1, … , k} with absorption in the extreme states. In each state, a random walk moves by unity to the right (to the left) with the probability p (with the probability 1 − p, respectively). The winner is the player who terminates walks in the state lying to the right from the corresponding state of the opponent’s random walks. Find the optimal strategies of the players. 6. Evaluate an equilibrium in exercise no. 5 in the following case. Random walks in extreme states are absorbed with a given probability 𝛽<1. 7. Best choice game with complete information. Two players observe a sequence of independent random variables x1, x2, … , x𝜃, x𝜃+1, … , xn, where at random time moment 𝜃 the distribution of the random variables switches from p0(x)top1(x). First, the decision to stop is made by player I and then by player II. The players strive to select the observation with the maximal value. Find the optimal strategies of the players. 8. Consider the game described in exercise no. 7, but with random priority of the players. An observation is shown to player I with the probability p and to player II with the probability 1 − p. Find the optimal strategies of the players. 9. Best choice game with partial information. Two players receive observations representing independent identically distributed random variables. The players are unaware of the exact values of these observations. The only available knowledge is whether an observation exceeds a given threshold or not. The first move belongs to player I. Both players employ one-threshold strategies. The winner is the player who terminates observations on a higher value than the opponent. Find the optimal strategies and the value of this game. 10. Within the framework of exercise no. 9, assume that the priority of the players is defined by a random mechanism. Each observation is shown to player I with the probability p and to player II with the probability 1 − p. Find the optimal strategies of the players. www.it-ebooks.info 8 Cooperative games Introduction In the previous chapters, we have considered games, where each player pursues individual interests. In other words, players do not cooperate to increase their payoffs. Chapter 8 concentrates on games, where players may form coalitions. The major problem here lies in distribution of the gained payoff among the members of a coalition. The set N = {1, 2, … , n} will be called the grand coalition. Denote by 2N the set of all its subsets, and let |S| designate the number of elements in a set S. 8.1 Equivalence of cooperative games Definition 8.1 A cooperative game of n players is a pair Γ=< N, v >, where N = {1, 2, … , n} indicates the set of players and v :2N → R is a mapping which assigns to each coalition S ∈ 2N a certain number such that v(∅) = 0. The function v is said to be the characteristic function of the cooperative game. Generally, characteristic functions of cooperative games are assumed superadditive, i.e., for any coalitions S and T such that S ∩ T =∅ the following condition holds true: v(S ∪ T) ≥ v(S) + v(T). (1.1) This appears a natural requirement stimulating players to build coalitions. Suppose that inequality (1.1) becomes an equality for all non-intersecting coalitions S and T; then the Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info COOPERATIVE GAMES 279 corresponding characteristic function is called additive. Note that additive characteristic functions satisfy the formula v(N) = ∑ i∈N v(i). In this case, distribution runs in a natural way—each player receives his payoff v(i). Such games are said to be inessential. In the sequel, we analyze only essential games that meet the inequality v(N) > ∑ i∈N v(i). (1.2) Let us provide some examples. Example 8.1 (A jazz band) A restaurateur invites a jazz band to perform on an evening and offers 100 USD. The jazz band consists of three musicians, namely, a pianist (player 1), a vocalist (player 2), and a drummer (player 3). They should distribute the fee. An argument during such negotiations is the characteristic function v defined by individual honoraria the players may receive by performing singly (e.g., v(1) = 40, v(2) = 30, v(3) = 0) or in pairs (e.g., v(1, 2) = 80, v(1, 3) = 60, v(2, 3) = 50). Example 8.2 (The glove market) Consider the glove set N = {1, 2, … , n}, which includes left gloves (the subset L) and right gloves (the subset R). A glove costs nothing, whereas the price of a pair is 1 USD. Here the cooperative game < N, v > can be determined by the characteristic function v(S) =min{|S ∩ L|, |S ∩ R|}, S ∈ 2N. Actually, it represents the number of glove pairs that can be formed from the set S. Example 8.3 (Scheduling) Consider the set of players N = {1, 2, … , n}. Suppose that player i has a machine Mi and some production order Ji. It can be executed (a) by player i on his machine during the period tii or (b) by the coalition of players i and j on the machine Mj during the period tij. The cost matrix T = {tij}, i, j = 1, … , n is given. For any coalition S ∈ 2N, it is then possible to calculate the total costs representing the minimal costs over all permutations of players entering the coalition S, i.e., t(S) =min𝜎 ∑ i∈S ti𝜎(i). The characteristic function v(S) in this game can be specified by the total time saved by the coalition (against the case when each player executes his order on his machine). Example 8.4 (Road construction) Farmers agree to construct a road communicating all farms with a city. Construction of each segment of the road incurs definite costs; therefore, it seems beneficial to construct roads by cooperation. Each farm has a specific income from selling its agricultural products in the city. What is the appropriate cost sharing by farmers? www.it-ebooks.info 280 MATHEMATICAL GAME THEORY AND APPLICATIONS Prior to obtaining solutions, we partition the set of cooperative games into equivalence classes. Definition 8.2 Two cooperative games Γ1 = < N, v1 > and Γ2 = < N, v2 > are called equivalent, if there exist constants 𝛼>0 and ci, i = 1, … , n such that v1(S) = 𝛼v2(S) + ∑ i∈S ci for any coalition S ∈ 2N. In this case, we write Γ1 ∼Γ2. Clearly, the relation ∼ represents an equivalence relation. 1. v ∼ v (reflexivity). This takes place under 𝛼 = 1 and ci = 0, i = 1, … , n. 2. v ∼ v′ ⇒ v′ ∼ v (symmetry). By setting 𝛼′ = 1∕𝛼 and c′ i =−ci∕𝛼,wehavev′(S) = 𝛼′ + ∑ i∈S c′ i, S ∈ 2N, i.e., v′ ∼ v. 3. v ∼ v1, v1 ∼ v2 ⇒ v ∼ v2 (transitivity). Indeed, v(S) = 𝛼v1(S) + ∑ i∈S ci and v1(S) = 𝛼1v2(S) + ∑ i∈S c′ i. Hence, v(S) = 𝛼𝛼1v2(S) + ∑ i∈S (𝛼c′ i + ci ) . Consequently, ∼ makes an equivalence relation. All cooperative games get decomposed into equivalence classes, and it suffices to solve one game from a given class. Clearly, all inessential games appear equivalent to games with zero characteristic function. It seems comfortable to find solutions for cooperative games in the 0-1 form. Definition 8.3 A cooperative game in the 0-1 form is a game Γ=< N, v >, where v(i) = 0, i = 1, … , n and v(N) = 1. Theorem 8.1 Any essential cooperative game is equivalent to a certain game in the 0-1 form. Proof: It suffices to demonstrate that there exist constants 𝛼>0 and ci, i = 1, … , n such that 𝛼v(i) + ci = 0, i = 1, … , n, 𝛼v(N) + ∑ i∈N ci = 1. (1.3) The system (1.3) uniquely determines these quantities: 𝛼 = [v(N) − ∑ i∈N v(i)]−1, ci =−v(i)[v(N) − ∑ i∈N v(i)]−1. Note that, by virtue of (1.2), we have 𝛼>0. www.it-ebooks.info COOPERATIVE GAMES 281 8.2 Imputations and core Now, we define the solution of a cooperative game. The solution of a cooperative game is comprehended as some distribution of the total payoff gained by the grand coalition v(N). Definition 8.4 An imputation in the cooperative game Γ=< N, v > is a vector x = (x1, … , xn) such that xi ≥ v(i), i = 1, … , n (2.1) ∑ i∈N xi = v(N). (2.2) According to the condition (2.1) (the property of individual rationality), each player gives not less than he can actually receive. The condition (2.2) is called the property of efficiency. The latter presumes that (a) it is unreasonable to distribute less than the grand coalition can receive and (b) it is impossible to distribute more than v(N). We will designate the set of all imputations by D(v). For equivalent characteristic functions v and v′ such that v(S) = 𝛼v′(S) + ∑ i∈S ci, S ∈ 2N, imputations are naturally interconnected: xi = 𝛼x′ i + ci, i ∈ N. Interestingly, the set of imputations for cooperative games in the 0-1 form represents the simplex D(v) = {x : ∑ i∈N xi = 1, xi ≥ 0, i = 1, … , n}inRn. There exist several optimal principles of choosing a point or a set of points on the set D(v) that guarantee an acceptable solution of the payoff distribution problem in the grand coalition. We begin with the definition of core. First, introduce the notion of dominated imputations. Definition 8.5 An imputation x dominates an imputation y in a coalition S (which is denoted by x ≻S y), if xi > yi, ∀i ∈ S, (2.3) and ∑ i∈S xi ≤ v(S). (2.4) The condition (2.3) implies that the imputation x appears more preferable than the impu- tation y for all members of the coalition S. On the other hand, the condition (2.4) means that the imputation x is implementable by the coalition S. Definition 8.6 We say that an imputation x dominates an imputation y, if there exists a coalition S ∈ 2N such that x ≻S y. Here the dominance x ≻ y indicates the following. There exists a coalition supporting the given imputation x. Below we introduce the definition of core. Definition 8.7 The set of non-dominated imputations is called the core of a cooperative game. www.it-ebooks.info 282 MATHEMATICAL GAME THEORY AND APPLICATIONS Theorem 8.2 An imputation x belongs to the core of a cooperative game < N, v > iff ∑ i∈S xi ≥ v(S), ∀S ∈ 2N. (2.5) Proof: Let us demonstrate the necessity of the condition (2.5) by contradiction. Suppose that x ∈ C(v), but for some coalition S: ∑ i∈S xi < v(S). Note that 1 < |S| < n (otherwise, we violate the conditions of individual rationality and efficiency, see (2.1) and (2.2)). Suggest to the coalition S a new imputation y, where yi = xi + v(S) − ∑ i∈S xi |S| , i ∈ S, and distribute the residual quantity v(N) − v(S) among the members of the coalition N∖S: yi = v(N) − v(S) |N∖S| , i ∈ N∖S. Obviously, y is an imputation and y ≻ x. The resulting contradiction proves (2.5). Finally, we argue the sufficiency part. Assume that x meets (2.5), but is dominated by another imputation y for some coalition S. Due to (2.3)–(2.4), we have ∑ i∈S xi < ∑ i∈S yi ≤ v(S), which contradicts the condition (2.5). 8.2.1 The core of the jazz band game Construct the core of the jazz band game. Recall that musicians have to distribute their honorarium of 100 USD. The characteristic function takes the following form: v(1) = 40, v(2) = 30, v(3) = 0, v(1, 2) = 80, v(1, 3) = 60, v(2, 3) = 50, v(1,2,3)= 100. First, rewrite this function in the 0-1 form. We evaluate 𝛼 = 1∕[v(N) − v(1) − v(2) − v(3)] = 1∕30 and c1 =−4∕3, c2 =−1, c3 = 0. Then the new characteristic function is defined by v′(1) = 0, v′(2) = 0, v′(3) = 0, v′(1,2,3)= 1, v′(1, 2) = 8 3 − 4 3 − 1 = 1 3, v′(1, 3) = 6 3 − 4 3 = 2 3, v′(2, 3) = 5 3 − 1 = 2 3 . The core of this game lies on the simplex E = {x = (x1, x2, x3):x1 + x2 + x3 = 1}, xi ≥ 0, i = 1, 2, 3. www.it-ebooks.info COOPERATIVE GAMES 283 x1 x3 1 1 1 x2 Figure 8.1 The core of the jazz band game. According to (2.5), it obeys the system of inequalities x1 + x2 ≥ 1 3, x1 + x3 ≥ 2 3, x2 + x3 ≥ 2 3 . So long as x1 + x2 + x3 = 1, the inequalities can be reformulated as x3 ≤ 2 3, x2 ≤ 1 3, x1 ≤ 1 3 . In Figure 8.1, the core is illustrated by the shaded domain. Any element of the core is not dominated by another imputation. As a feasible solution, we can choose the center of gravity of the core: x = (2∕9, 2∕9, 5∕9). Getting back to the initial game, we obtain the following imputation: (140∕3, 110∕3, 50∕3). And so, the pianist, the vocalist, and the drummer receive 46.6 USD, 36.6 USD, and 16.6 USD, respectively. 8.2.2 The core of the glove market game Construct the core of the glove market game. Reexpress the glove set as N = {L, R}, where L = {l1, … , lk} is the set of left gloves and R = {r1, … , rm} gives the set of right gloves. For definiteness, we believe that k ≤ m. It is possible to compile k pairs, therefore v(N) = k.The characteristic function acquires the form v(li1 , … , lis , rj1 , … , rjt ) =min{s, t}, s = 1, … , k; t = 1, … , m. Theorem 8.2 claims that the core of this game on the set of imputations D = {(x1, … , xk, y1, … , ym): k∑ i=1 xi + m∑ j=1 yj = k, x ≥ 0, y ≥ 0} www.it-ebooks.info 284 MATHEMATICAL GAME THEORY AND APPLICATIONS is described by the inequalities xi1 + ⋯ + xis + yj1 + ⋯ + yjt ≥ min{s, t}, s = 1, … , k; t = 1, … , m. If k < m, these inequalities imply that x1 + ⋯ + xk + yj1 + ⋯ + yjk ≥ k, for any set k of right gloves {j1, ...., jk}. Since k∑ i=1 xi + m∑ j=1 yj = k, it appears that ∑ j≠j1,… ,jk yj = 0. Hence, yj = 0 for all j ≠ j1, … , jk. However, the set of right gloves is arbitrary; and so, all yj equal zero. Consequently, in the case of k < m, the core of the game consists of the point (x1 = ⋯ = xk = 1, y1, … , ym = 0) only. If k = m, readers can see that the core also comprises a single imputation of the form x1 = ⋯ = xk = y1 = ⋯ = yk = 1 2k . 8.2.3 The core of the scheduling game Construct the core of the scheduling game with N = {1, 2, 3}. In other words, we have three production orders and three machines for their execution. Suppose that the time cost matrix is determined by T = ⎛ ⎜ ⎜ ⎜⎝ 12 4 35 8 5711 ⎞ ⎟ ⎟ ⎟⎠ . The corresponding cooperative game < N, v > is presented in Table 8.1. Time cost evaluation lies in minimization over all possible schemes of order execution using different coalitions. For instance, there exist two options for S = {1, 2}: each production order is executed on the corresponding machine, or the players exchange their production orders: t(1, 2) =min{1 + 5, 2 + 3} = 5. The characteristic function v(S) results from the difference ∑ i∈S ti − t(S). Table 8.1 The characteristic function in the scheduling game. S ∅ {1} {2} {3} {1, 2} {1, 3} {2, 3} {1, 2, 3} t(S) 0 1 5 11 5 9 15 14 v(S)0 0 0 0 1 3 1 3 www.it-ebooks.info COOPERATIVE GAMES 285 The core of such characteristic function is defined by the inequalities x1 + x2 ≥ 1, x1 + x3 ≥ 3, x2 + x3 ≥ 1, x1 + x2 + x3 = 3, or C(v) = {x : x1 + x3 = 3, x2 = 0}. Therefore, the optimal solution prescribes to execute the second production order on machine 2, whereas machines 1 and 3 should exchange their orders. 8.3 Balanced games We emphasize that the core of a game can be empty. Then this criterion of payoff distribution fails. The existence of core relates to the notion of balanced games suggested by O. Bondareva (1963) and L. Shapley (1967). Definition 8.8 Let N = {1, 2, … , n} and 2N denote the set of all subsets of N. A mapping 𝜆(S):2N → R+, defined for all coalitions S ∈ 2N such that 𝜆(∅) = 0, is called balanced if ∑ S∈2N 𝜆(S)I(S) = I(N). (3.1) Here I(S) means the indicator of the set S (i.e., Ii(S) = 1, if i ∈ S and Ii(S) = 0, otherwise). Equality (3.1) holds true for each player i ∈ N. For instance, if N = {1, 2, 3}, the following mappings are balanced: 𝜆(1) = 𝜆(2) = 𝜆(3) = 𝜆(1,2,3)= 0, 𝜆(1, 2) = 𝜆(1, 3) = 𝜆(2, 3) = 1 2, or 𝜆(1) = 𝜆(2) = 𝜆(3) = 𝜆(1, 2) = 𝜆(1, 3) = 𝜆(2, 3) = 1 3, 𝜆(1,2,3)= 0. Definition 8.9 A cooperative game < N, v > is called a balanced game, if for each balanced mapping 𝜆(S) we have the condition ∑ S∈2N 𝜆(S)v(S) ≤ v(N). (3.2) Theorem 8.3 Consider a cooperative game < N, v >. Its core appears non-empty iff the game is balanced. Proof: Based on the duality theorem of linear programming. Take the linear programming problem min n∑ i=1 xi, ∑ i∈S xi ≥ v(S), ∀S ∈ 2N. (3.3) www.it-ebooks.info 286 MATHEMATICAL GAME THEORY AND APPLICATIONS The core being non-empty, Theorem 8.2 (see Section 8.1) claims that this problem admits a solution coinciding with v(N). The converse proposition takes place instead. If there exists a solution of the problem (3.3), which equals v(N), then C(v) ≠ ∅. Let us analyze the dual problem for (3.3): max ∑ S∈2N 𝜆(S)v(S), ∑ S∈2N 𝜆(S)I(S) = I(N), 𝜆 ≥ 0. (3.4) The constraints in the dual problem (3.4) define the balanced mapping 𝜆(S). Therefore, the problem (3.4) consists in seeking for the maximal value of the functional ∑ S∈2N 𝜆(S)v(S) among all balanced mappings. For the balanced mapping 𝜆(N) = 1, 𝜆(S) = 0, ∀S ⊂ N, its value makes up v(N). Hence, the value of the problem (3.4) is greater or equal to v(N). The duality theory of linear programming states the following. If there exist admissible solutions to the direct and dual problems, then these problems admit optimal solutions and their values coincide. And so, a necessary and sufficient condition of the non-empty core consists in ∑ S∈2N 𝜆(S)v(S) ≤ v(N) for any balanced mapping 𝜆(S). 8.3.1 The balance condition for three-player games Consider a three-player game in the 0-1 form with the characteristic function v(1) = v(2) = v(3) = 0, v(1, 2) = a, v(1, 3) = b, v(2, 3) = c, v(1,2,3)= 1. For a balanced mapping 𝜆(S), the condition (3.2) acquires the form ∑ S∈2N 𝜆(S)v(S) = 𝜆(1, 2)a + 𝜆(1, 3)b + 𝜆(2, 3)c + 𝜆(1,2,3)≤ 1, which is equivalent to the inequality a + b + c ≤ 2. Therefore, the core of a three-player cooperative game appears non-empty iff a + b + c ≤ 2. 8.4 The 𝝉-value of a cooperative game In Section 8.2, we have defined a possible solution criterion for cooperative games (core). It has been shown that core may not exist. Even if core is non-empty, there arises an uncertainty in choosing a specific imputation from this set. One feasible principle of such choice was proposed by S. Tijs (1981). This is the so-called 𝜏-value. Consider a cooperative game < N, v >. Define the maximum and minimum possible payoffs of each player. Definition 8.10 An utopia imputation (upper vector) M(v) is a vector M(v) = (M1(v), … , Mn(v)), www.it-ebooks.info COOPERATIVE GAMES 287 where the payoff of player i takes the form Mi(v) = v(N) − v(N∖i), i = 1, … , n. Asamatteroffact,Mi(v) specifies the maximum possible payoff of player i. If a player wants a higher payoff, the grand coalition benefits from eliminating this player from its staff. Definition 8.11 The minimum rights vector (lower vector) m(v) = (m1(v), … , mn(v)) is the vector with the components mi(v) =maxS:i∈S v(S) − ∑ j∈S∖i Mj(v), i = 1, … , n. The minimum rights vector enables each player i to join a coalition, where all other players are satisfied with the membership of player i. Indeed, they guarantee the maximum possible (utopia) payoffs. Theorem 8.4 Let < N, v > be a cooperative game with non-empty core. Then for any x ∈ C(v) we have m(v) ≤ x ≤ M(v), (4.1) or mi(v) ≤ xi ≤ Mi(v), ∀i ∈ N. Proof: Really, the property of efficiency implies that for any player i ∈ N: xi = ∑ j∈N xj − ∑ j∈N∖i xj = v(N) − ∑ j∈N∖i xj. As far as x lies in the core, ∑ j∈N∖i xj ≥ v(N∖i). Hence, xi = v(N) − ∑ j∈N∖i xj ≤ v(N) − v(N∖i) = Mi(v). This argues the right-hand side of inequalities (4.1). Since x ∈ C(v), we obtain ∑ j∈S xj ≥ v(S). Furthermore, the results established earlier lead to the following. Any coalition S containing player i meets the inequality ∑ j∈S∖i xj ≤ ∑ j∈S∖i Mj(v). www.it-ebooks.info 288 MATHEMATICAL GAME THEORY AND APPLICATIONS x1 x3 x2 m(v) (v) M(v) Figure 8.2 The 𝜏-value of a quasibalanced game. It appears that xi = ∑ j∈S xj − ∑ j∈S∖i xj ≥ v(S) − ∑ j∈S∖i Mj(v) for any coalition S containing player i. Therefore, xi ≥ maxS:i∈S {v(S) − ∑ j∈S∖i Mj(v)} = mi(v). The proof of Theorem 8.4 is finished. Let us summarize the outcomes. If core is non-empty and we connect the vectors m(v) and M(v) by a segment, then there exists a point x (lying on this segment and belonging to a hyperplane in Rn), which contains the core. Moreover, this point is uniquely defined. We also make an interesting observation. If the core does not exist, but the inequalities m(v) ≤ M(v), ∑ i∈N mi(v) ≤ v(N) ≤ ∑ i∈N Mi(v) (4.2) hold true, the segment [m(v), M(v)] necessarily has a unique point intersecting the hyperplane∑ i∈N xi = v(N) (see Figure 8.2). Definition 8.12 A cooperative game < N, v > obeying the conditions (4.2) is called quasi- balanced. Definition 8.13 In a quasibalanced game, the vector 𝜏(v) representing the intersection of the segment [m(v), M(v)] and the hyperplane ∑ i∈N xi = v(N) is said to be the 𝜏-value of this cooperative game. www.it-ebooks.info COOPERATIVE GAMES 289 8.4.1 The 𝝉-value of the jazz band game Find the 𝜏-value of the jazz band game. The characteristic function has the form v(1) = 40, v(2) = 30, v(3) = 0, v(1, 2) = 80, v(1, 3) = 60, v(2, 3) = 50, v(1,2,3)= 100. Evaluate the utopia imputation: M1(v) = v(1,2,3)− v(2, 3) = 50, M2(v) = v(1,2,3)− v(1, 3) = 40, M3(v) = v(1,2,3)− v(1, 2) = 20, and the equal rights vector: m1(v) =max{v(1), v(1, 2) − M2(v), v(1, 3) − M3(v), v(1,2,3)− M2(v) − M3(v)} = 40. By analogy, we obtain m2(v) = 30, m3(v) = 10. The 𝜏-value lies on the intersection of the segment 𝜆M(v) + (1 − 𝜆)m(v) and the hyper- plane x1 + x2 + x3 = 100. Hence, the following equality is valid: 𝜆(M1(v) + M2(v) + M3(v)) + (1 − 𝜆)(m1(v) + m2(v) + m3(v)) = 100. This leads to 𝜆 = 2∕3. Thus, 𝜏(v) = (140∕3, 110∕3, 50∕3) makes the center of gravity for the core of the jazz band game. 8.5 Nucleolus The concept of nucleolus was suggested by D. Schmeidler (1969) as a solution principle for cooperative games. Here a major role belongs to the notion of lexicographic order and excess. Definition 8.14 The excess of a coalition S is the quantity e(x, S) = v(S) − ∑ i∈S xi, x ∈ D(v), S ∈ 2N. Actually, excess represents the measure of dissatisfaction with an offered imputation x in a coalition S. For instance, core (if any) rules out unsatisfied players, all excesses are negative. We form the vector of excesses for all 2n − 1 non-empty coalitions by placing them in the descending order: e(x) = (e1(x), e2(x), … , em(x)), where ei(x) = e(x, Si), i = 1, 2, … , m = 2n − 1, and e1(x) ≥ e2(x) ≥ ... ≥ em(x). A natural endeavor is to find an imputation minimizing the maximal measure of dissatis- faction. To succeed, we introduce the concept of a lexicographic order. Definition 8.15 Let x, y ∈ Rm. We say that a vector x is lexicographically smaller than a vector y (and denote this fact by x ≤e y),ife(x) = e(y)y or there exists k :1≤ k ≤ m such that ei(x) = ei(y) for all i = 1, … , k − 1 and ek(x) < ek(y). www.it-ebooks.info 290 MATHEMATICAL GAME THEORY AND APPLICATIONS For instance, a vector with the excess (3, 2, 0) appears lexicographically smaller than a vector with the excess (3, 3, −10). Definition 8.16 The lexicographic minimum with respect to the preference . Proof: It is necessary to demonstrate the existence of the lexicographic minimum. Note that the components of the excess vector can be rewritten as e1(x) =max i=1,… ,m {e(x, Si)}, e2(x) =minj=1,… ,m { max i≠j {e(x, Si)} } , e3(x) =minj,k=1,… ,m;j≠k { max i≠j,k {e(x, Si} } ........................................................................................................ em(x) =mini=1,…,m {e(x, Si)}. (5.1) For each i, the functions e(x, Si) enjoy continuity. The maxima and minima of the con- tinuous functions in (5.1) are also continuous. Thus, all functions ei(x), i = 1, … , m turn out continuous. The imputation set D(v) is compact. The continuous function e1(x) attains its minimal value m1 on this set. If the above value is achieved in the single point x1, this gives the minimal element (which proves the theorem). Suppose that the minimal value is achieved on the set X1 = {x ∈ D(v):e1(x) = m1}. Since the function e1(x) enjoys continuity, X1 represents a compact set. We look to achieve the minimum of the continuous function e2(x) on the compact set X1. It does exist; let the minimal value equal m2 ≤ m1. If this value is achieved in a single point, we obtain the lexicographic minimum; otherwise, this is the compact set X2 = {x ∈ X1 : e2(x) = m2}. By repeating the described process, we arrive at the following result. There exists a point or set yielding the lexicographic minimum. Now, we prove its uniqueness. Suppose the contrary, i.e., there exist two imputations x and y such that e(x) = e(y). Note that, despite the equal excesses of these imputations, they can be expressed for different coalitions. Consider the vector e(x) and let e1(x) = ...ek(x)be the maximal components in it, ek(x) > ek+1(x). Moreover, imagine that the above components represent the excesses for the coalitions S1, … , Sk, respectively: e1(x) = e(x, S1), … , ek(x) = e(x, Sk). Then for these coalitions, yet another imputation y, we obtain the conditions e(y, Si) ≤ e(x, Si), or v(Si) − ∑ j∈Si yj ≤ v(Si) − ∑ j∈Si xj, i = 1, … , k. (5.2) www.it-ebooks.info COOPERATIVE GAMES 291 Suppose that, for i = 1, … ,̂i − 1, formulas (5.2) become equalities, while strict inequalities here take place for a certain coalition Ŝi, i.e., v(Ŝi) − ∑ j∈Ŝi yj < v(Ŝi) − ∑ j∈Ŝi xj. (5.3) Inequality (5.3) remains in force for the new imputation z = 𝜖y + (1 − 𝜖)x under any 𝜖>0: v(Ŝi) − ∑ j∈Ŝi zj < v(Ŝi) − ∑ j∈Ŝi xj. As far as e(x, Ŝi) > ej(x), j = k + 1, … , m, (5.4) the continuous property of the functions ej(x) leads to inequality (5.4) for the imputation z under sufficiently small 𝜖: e(z, Ŝi) > ej(z), j = k + 1, … , m. Hence, for such 𝜖, the imputation z appears lexicographically smaller than x. This contradiction proves that inequalities (5.2) are actually equalities. Further application of such considerations by induction for smaller components brings us to the following conclusion. All coalitions Si, i = 1, … , m satisfy the equality e(y, Si) = e(x, Si) or ∑ j∈Si yj = ∑ j∈Si xj, i = 1, … , m, whence it follows that x = y. The proof of Theorem 8.5 is concluded. Theorem 8.6 Suppose that the core of a cooperative game < N, v > is non-empty. Then the nucleolus belongs to C(v). Proof: Denote the nucleolus by x∗. Take an arbitrary imputation x belonging to the core. In the case of x ∈ C(v), the excesses of all coalitions are non-positive, i.e., e(x, Sj) ≤ 0, j = 1, … , m. However, x∗ in the 0-1 form, where v(1, 2) = c3,v(1, 3) = c2,v(2, 3) = c1 and c1 ≤ c2 ≤ c3 ≤ 1. The nucleolus has the form NC = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩ (1 3, 1 3, 1 3 ) if c1 ≤ c2 ≤ c3 ≤ 1 3 ( 1 + c3 4 , 1 + c3 4 , 1 − c3 2 ) if c3 > 1 3, c2 ≤ 1 − c3 2 ( c2 + c3 2 , 1 − c2 2 , 1 − c3 2 ) if c3 > 1 3, c2 > 1 − c3 2 , c1 ≤ 1 − c3 2 ( 1 − 2c1 + 2c2 + c3 4 , 1 + 2c1 − 2c2 + c3 4 , 1 − c3 2 ) if c3 > 1 3, c1 > 1 − c3 2 , c1 + c2 ≤ 1 + c3 2 ( 1 − 2c1 + c2 + c3 3 , 1 + c1 − 2c2 + c3 3 , 1 + c1 + c2 − 2c3 3 ) if c3 > 1 3, c1 > 1 − c3 2 , c1 + c2 > 1 + c3 2 . Proof: Without loss of generality, we believe that c1 ≤ c2 ≤ c3; otherwise, just renumber the players. For convenience, describe different stages of the proof as the diagram in Figure 8.4. Table 8.3 Imputations in the bankruptcy game. Debts Player 1 Player 2 Player 3 d1 = 300 d2 = 200 d3 = 100 100 33 1 3 33 1 3 33 1 3 E 200 75 75 50 300 150 100 50 www.it-ebooks.info 294 MATHEMATICAL GAME THEORY AND APPLICATIONS 1 3 1 3 1 3 , 1+c3 1+c3 3 4 4 2, , 1 3 , 1 3 3 2 ,23 2 , 2 2 123 4 , 1+2c123 4 3 2 , 123 3 , 1+c123 3 , 1+ 12 3 3 1+c3 2c1+c3 3 3 3 3 2 2 2 2 c1 c1 c2 c2 c3 c3 Figure 8.4 The nucleolus of the three-player game. We begin with the case of c3 ≤ 1∕3. Evaluate the excesses for all coalitions S1 = {1}, S2 = {2}, S3 = {3}, S4 = {1, 2}, S5 = {1, 3}, S6 = {2, 3}, and the imputation x = ( 1 3 , 1 3 , 1 3 ). They are combined in Table 8.2. Obviously, for the coalitions S1, S2, and S3, the excesses equal e(x, S1) = e(x, S2) = e(x, S3) = 1∕3. If c3 ≤ 1∕3, then −1∕3 ≥ c3 − 2∕3, and it follows that −1∕3 ≥ e(x, S4). Therefore, in the case of c3 ≤ 1∕3, we obtain the following order of the excesses: e(x, S1) = e(x, S2) = e(x, S3) ≥ e(x, S4) ≥ e(x, S5) ≥ e(x, S6). Any variations in a component of the imputation surely increase the maximal excess. Conse- quently, the vector (1∕3, 1∕3, 1∕3) forms the nucleolus. Now, suppose that c3 > 1∕3. Furthermore, let c2 ≤ 1 − c3 2 (6.2) and consider the imputation x1 = x2 = 1+c3 4 , x3 = 1−c3 2 . Table 8.4 provides the corresponding excesses. Notably, the maximal excesses belong to the coalitions S3 and S4: e(x, S3) = e(x, S4) =−1 − c3 2 . www.it-ebooks.info Table 8.4 The nucleolus of the three-player game. Sv e(x, S)(1∕3, 1∕3, 1∕3) ( 1+c3 4 , 1+c3 4 , 1−c3 2 )( c2+c3 2 , 1−c2 2 , 1−c3 2 ) {1} 0 −x1 −1∕3 − 1+c3 4 − c2+c3 2 {2} 0 −x2 −1∕3 − 1+c3 4 − 1−c2 2 {3} 0 −x3 −1∕3 − 1−c3 2 − 1−c3 2 {1, 2} c3 c3 − 1 + x3 c3 − 2∕3 − 1−c3 2 − 1−c3 2 {1, 3} c2 c2 − 1 + x2 c2 − 2∕3 1+c3 4 − 1 + c2 − 1−c2 2 {2, 3} c1 c1 − 1 + x1 c1 − 2∕3 1+c3 4 − 1 + c1 c2+c3 2 − 1 + c1 S ( 1−2c1+2c2+c3 4 , 1+2c1−2c2+c3 4 , 1−c3 2 )( 1−2c1+c2+c3 3 , 1+c1−2c2+c3 3 , 1+c1+c2−2c3 3 ) {1} − 1−2c1+2c2+c3 4 − 1−2c1+c2+c3 3 {2} − 1+2c1−2c2+c3 4 − 1+c1−2c2+c3 3 {3} − 1−c3 2 − 1+c1+c2−2c3 3 {1, 2} − 1−c3 2 −2+c1+c2+c3 3 {1, 3} −3+2c1+2c2+c3 4 −2+c1+c2+c3 3 {2, 3} −3+2c1+2c2+c3 4 −2+c1+c2+c3 3 www.it-ebooks.info 296 MATHEMATICAL GAME THEORY AND APPLICATIONS Indeed, e(x, S3) =−1 − c3 2 > −1 + c3 4 = e(x, S1) = e(x, S2), since c3 > 1∕3, and the assumption (6.2) brings to e(x, S1) =−1 + c3 4 ≥ 1 + c3 4 − 1 + c2 = e(x, S5). And so, the excesses in this case possess the following order: e(x, S3) = e(x, S4) > e(x, S1) = e(x, S2) ≥ e(x, S5) ≥ e(x, S6). The maximal excesses e(x, S3), e(x, S4) comprise x3 with reverse signs. Hence, variations of x3 increase the maximal excess. Fix the quantity x3. The second largest excesses e(x, S1), e(x, S2) do coincide. Any variations of x1, x2 would increase the second largest excess. Thus, the imputation x1 = x2 = 1+c3 4 , x3 = 1−c3 2 makes the nucleolus. To proceed, assume that c2 > 1−c3 2 and let c1 ≤ 1 − c3 2 . (6.3) Consider the imputation x1 = c2+c3 2 , x2 = 1−c2 2 , x3 = 1−c3 2 . Again, Table 8.4 presents the cor- responding excesses. Here the maximal excesses are e(x, S3) = e(x, S4) =−1−c3 2 , since −1 − c3 2 ≥ −1 − c2 2 = e(x, S2) = e(x, S5), and the condition c2 > 1−c3 2 implies that e(x, S2) =−1 − c2 2 > −c2 + c3 2 = e(x, S1). On the other hand, due to (6.3), we have e(x, S2) =−1 − c2 2 ≥ c1 + c2 2 − 1 + c1 = e(x, S6). Therefore, e(x, S3) = e(x, S4) ≥ e(x, S2) = e(x, S5) ≥ max{e(x, S1), e(x, S6)}. Recall that the maximal excesses e(x, S3), e(x, S4) coincide and incorporate x3 with reverse signs. Hence, it is not allowed to change x3. Any variations of x2 cause further growth of the second largest excess due to the equality of the second largest excesses e(x, S2) and e(x, S5). This means that x1 = c2+c3 2 , x2 = 1−c2 2 , x3 = 1−c3 2 form the nucleolus. www.it-ebooks.info COOPERATIVE GAMES 297 Next, take the case of c1 > 1−c3 2 ; accordingly, we have c2 > 1−c3 2 . Suppose validity of the following inequality: c1 + c2 ≤ 1 + c3 2 . (6.4) Demonstrate that the imputation x1 = 1−2c1+2c2+c3 4 , x2 = 1+2c1−2c2+c3 4 , x3 = 1−c3 2 represents the nucleolus. Clearly, all xi ∈ [0, 1], i = 1, 2, 3. The corresponding excesses can be found in Table 8.4. As previously, it is necessary to define the lexicographic order of the excesses. In this case, we have the inequality e(x, S3) = e(x, S4) ≥ e(x, S5) = e(x, S6) > e(x, S2) ≥ e(x, S1). The first inequality −1 − c3 2 ≥ −3 + 2c1 + 2c2 + c3 4 appears equivalent to (6.4), whereas the second one is equivalent to the condition c2 > 1−c3 2 . The first equality e(x, S3) = e(x, S4) claims that any variations of x3 would increase the maximal excess. And the second equality e(x, S5) = e(x, S6) states that any variations of x2 and x3 cause an increase in the second largest excess. Finally, suppose that c2 > 1−c3 2 and c1 + c2 > 1 + c3 2 . (6.5) We endeavor to show that the imputation x1 = 1−2c1+c2+c3 3 , x2 = 1+c1−2c2+c3 3 , x3 = 1+c1+c2−2c3 3is the nucleolus. Table 8.4 gives the corresponding excesses. They are in the following lexicographic order: e(x, S4) = e(x, S5) = e(x, S6) > e(x, S3) ≥ e(x, S2) ≥ e(x, S1). The first inequality appears equivalent to (6.5); the rest are clear. The equalities e(x, S4) = e(x, S5) = e(x, S6) imply that the maximal excess increases by any variations in the imputation. This concludes the proof of Theorem 8.7. Revert to the bankruptcy problem, see the beginning of this section. The characteristic function (6.1) takes the form v(1) = v(2) = v(3) = 0, v(1,2,3)= E and v(1, 2) = (E − d3)+ = (E − 100)+, v(1, 3) = (E − d2)+ = (E − 200)+, v(2, 3) = (E − d1)+ = (E − 300)+. If E = 100, we have v(1, 2) = 0, v(1, 3) = 0, v(2, 3) = 0. www.it-ebooks.info 298 MATHEMATICAL GAME THEORY AND APPLICATIONS This agrees with the case when all values of the characteristic function do not exceed 1/3 of the payoff. According to the theorem, the nucleolus dictates equal sharing. Next, if E = 200, the characteristic function becomes v(1, 2) = 100, v(1, 3) = 0, v(2, 3) = 0. This matches the second condition of the theorem when v(1, 2) is greater than 1/3 of the payoff E,butv(1, 3) does not exceed 1/2 of the residual E − v(1, 2). Then the nucleolus is require to give this quantity (E − v(1, 2))∕2 = (200 − 100)∕2 = 50 to player 3, and to distribute the remaining shares between players 1 and 2 equally (by 75 units). If E = 300, we get v(1, 2) = 200, v(1, 3) = 100, v(2, 3) = 0. This corresponds to the third case when v(1, 3) is greater than 1/2 of E − v(1, 2), whereas v(2, 3) does not exceed this quantity. And the nucleolus distributes the debt proportionally, i.e., 150 units to player 1, 100 units to player 2, and 50 units to player 3. Clearly, the variant described in Table 8.1 coincides with the nucleolus of the cooperative game with the characteristic function (6.1). If E = 400, the characteristic function acquires the form v(1, 2) = 300, v(1, 3) = 200, v(2, 3) = 100. This relates to the fourth case of the theorem. By evaluating the nucleolus, we obtain the imputation (225, 125, 50). And finally, if E = 500, the fifth case of the theorem arises naturally; the nucleolus equals (266.66, 166.66, 66.66). 8.7 The Shapley vector A popular solution in the theory of cooperative games consists in the Shapley vector [1953]. Consider a cooperative game < N, v >. Denote by 𝜎 = (𝜎(1), … , 𝜎(n)) an arbitrary permu- tation of players 1, … , n. Imagine the following situation. Players get together randomly in some room to form a coalition. By assumption, all permutations 𝜎 are equiprobable. And the probability each permutation makes up 1∕n!. Consider a certain player i. We believe that the coalition is finally formed with his arrival. Designate by P𝜎(i) = {j ∈ N : 𝜎−1(j) <𝜎−1(i)} the set of his forerunners in the permutation 𝜎. Evaluate the contribution of player i to this coalition as mi(𝜎) = v(P𝜎(i) ∪ {i}) − v(P𝜎(i)). Definition 8.17 The Shapley vector is the mean value of contributions for each player over all possible permutations, i.e., 𝜙i(v) = 1 n! ∑ 𝜎 mi(𝜎) = 1 n! ∑ 𝜎 [v(P𝜎(i) ∪ {i}) − v(P𝜎(i))], i = 1, … , n. (7.1) www.it-ebooks.info COOPERATIVE GAMES 299 8.7.1 The Shapley vector in the road construction game Construct the Shapley vector in the road construction game stated before. Recall that the characteristic function takes the form v(1) =−1, v(2) =−1, v(3) =−2, v(1, 2) = 12, v(2, 3) = 5, v(1, 3) = 3, v(1,2,3)= 18. Computations of the Shapley vector can be found in Table 8.5 below. The left col- umn presents all possible permutations, as well as the contributions of all players (for each permutation) according to the given characteristic function. For the first permuta- tion (1, 2, 3), the contribution of player 1 constitutes v(1) − v(∅) =−1, the contribution of player 2 makes up v(1, 2) − v(1) = 12 − (−1) = 13, and the contribution of player 3 equals v(1,2,3)− v(1, 2) = 18 − 12 = 6. Calculations yield the Shapley vector 𝜙 = (7,8,3)inthis problem. Note that this solution differs from the nucleolus x∗ = (9, 7, 2). Accordingly, we observe variations in the cost of each player in road construction. Now, the shares of players’ costs have the form c1 = 20 − 7 = 13, c2 = 15 − 8 = 7, c3 = 10 − 3 = 7. Find a more convenient representation for Shapley vector evaluation. The bracketed expression in (7.1) is v(S) − v(S∖{i}), where player i belongs to the coalition S. Therefore, summation in formula (7.1) can run over all coalitions S containing player i. In each coalition S, player i is on the last place, whereas forerunners can enter the coalition in (|S| − 1)! ways. On the other hand, players from the coalition N∖S can come after player i in (n − |S|)! ways. Thus, the number of permutations in the sum (7.1), which correspond to the same coalition S containing player i, equals (|S| − 1)!(n − |S|)!. Hence, formula (7.1) can be rewritten as 𝜙i(v) = ∑ S:i∈S (|S| − 1)!(n − |S|)! n! [v(S) − v(S∖{i})], i = 1, … , n. (7.2) The quantities (|S|−1)!(n−|S|)! n! stand for the probabilities that player i forms the coalition S.And so, ∑ S:i∈S (|S| − 1)!(n − |S|)! n! = 1, ∀i. (7.3) We demonstrate that the introduced vector is an imputation. Table 8.5 Evaluation of the Shapley vector. 𝜎 Player 1 Player 2 Player 3 The total contribution (1,2,3) −113 6 18 (1,3,2) −115 4 18 (2,1,3) 13 −16 18 (2,3,1) 13 −16 18 (3,1,2) 5 15 −218 (3,2,1) 13 7 −218 The mean value 7 8 3 18 www.it-ebooks.info 300 MATHEMATICAL GAME THEORY AND APPLICATIONS Lemma 8.1 The vector 𝜙 satisfies the properties of individual rationality and efficiency, i.e., 𝜙i(v) ≥ v(i), ∀i and ∑ i∈N 𝜙i(v) = v(N). Proof: Due to the superadditive property of the function v, we have the inequality v(S) − v(S∖{i}) ≥ v(i) for any coalition S. Then it follows from (7.2) and (7.3) that 𝜙i(v) ≥ v(i) ∑ S:i∈S (|S| − 1)!(n − |S|)! n! = v(i), i = 1, … , n. Now, let us show the equality ∑ i∈N 𝜙i(v) = v(N). Address the definition (7.1) and consider the sum ∑ i∈N 𝜙i(v) = 1 n! ∑ 𝜎 ∑ i∈N [v(P𝜎(i) ∪ {i}) − v(P𝜎(i))]. (7.4) For each permutation 𝜎, the sum (7.4) comprises the contributions of all players in this permutation, i.e., v(𝜎(1)) + [v(𝜎(1), 𝜎(2)) − v(𝜎(1))] + [v(𝜎(1), 𝜎(2), 𝜎(3)) − v(𝜎(1), 𝜎(2))] + ⋯ + [v(𝜎(1), … , 𝜎(n)) − v(𝜎(1), … , 𝜎(n − 1))] = v(𝜎(1), … , 𝜎(n)) = v(N). Hence, ∑ i∈N 𝜙i(v) = 1 n! ∑ 𝜎 v(N) = v(N). The proof of Lemma 8.1 is completed. We have already formulated some criteria for the solution of a cooperative game. For core, the matter concerns an undominated offer. In the case of nucleolus, a solution minimizes the maximal dissatisfaction from other solutions. L.S. Shapley stated several desired properties to-be-enjoyed by an imputation in a cooperative game. 8.7.2 Shapley’s axioms for the vector 𝝋i(v) 1. Efficiency. ∑ i∈N 𝜑i(v) = v(N). 2. Symmetry. If players i and j are such that v(S ∪ {i}) = v(S ∪ {j} for any coalition S without players i and j, then 𝜑i(v) = 𝜑j(v). 3. Dummy player property. Player i such that v(S ∪ {i}) = v(S) for any coalition S without player i meets the condition 𝜑i(v) = 0. 4. Linearity.Ifv1 and v2 are two characteristic functions, then 𝜑(v1 + v2) = 𝜑(v1) + 𝜑(v2). Axiom 1 declares that the whole payoff must be completely distributed among participants. The symmetry property consists in that, if the characteristic function is symmetrical for players www.it-ebooks.info COOPERATIVE GAMES 301 i and j, they must receive equal shares. A player giving no additional utility to any coalition is called a dummy player. Of course, his share must equal zero. The last axiom reflects the following fact. If a series of games are played, the share of each player in the series must coincide with the sum of shares in each game. Theorem 8.8 There exists a unique vector 𝜑(v) satisfying Axioms 1–4. Proof: Consider the elementary characteristic functions. Definition 8.18 Let S ⊂ N. The elementary characteristic function is the function vS(T) = { 1 if S ⊂ T 0 otherwise. Therefore, in a cooperative game with such characteristic function, the coalition T wins if it contains some minimal winning coalition S. We endeavor to find the vector 𝜑(vS) agreeing with Shapley’s axioms. Interestingly, any player outside the minimal winning coalition represents a zero player. According to Axiom 2, 𝜑i(vS) = 0ifi ∉ S. The symmetry axiom implies that 𝜑i(vS) = 𝜑j(vS) for all players i, j entering the coalition S. In combination with the efficiency axiom, this yields ∑ i∈N 𝜑i(vS) = vS(N) = 1. Hence, 𝜑i(vS) = 1∕|S| for all players in the coalition S. Analogous reasoning applies to the characteristic function cvS, where c indicates a constant factor. Then 𝜑i(cvS) = ⎧ ⎪ ⎨ ⎪⎩ c |S| if i ∈ S 0ifi ∉ S. Lemma 8.2 The elementary characteristic functions form a basis on the set of all charac- teristic functions. Proof: We establish that any characteristic function can be rewritten as a linear combination of the elementary functions, i.e., there exist constants cS such that v = ∑ S∈2N cSvS. (7.5) Choose c∅ = 0. Argue the existence of the constants cS by induction over the number of elements in the set S. Select cT = v(T) − ∑ S⊂T,S≠T cS. In other words, the value of cT is determined via the quantities cS, where the number of elements in the set S appears smaller than in the set T. www.it-ebooks.info 302 MATHEMATICAL GAME THEORY AND APPLICATIONS Since vS(T) is non-zero only for coalitions S ⊂ T, the above-defined constants cS obey the equality ∑ S∈2N cSvS(T) = ∑ S⊂T cS = cT + ∑ S⊂T,S≠T cS = v(T). This proves formula (7.5). Consequently, each characteristic function is uniquely represented as the sum of the elementary characteristic functions cSvS. By virtue of linearity, the vector 𝜑(v) turns out uniquely defined, either: 𝜑i(v) = ∑ i∈S∈2N cS |S|. The proof of Theorem 8.8 is finished. Now, we note that the Shapley vector—see (7.2)—meets Axioms 1–4. Its efficiency has been rigorously shown in Lemma 8.1. Symmetry comes from the following fact. If players i and j satisfy the condition v(S ∪ {i}) = v(S ∪ {j} for any coalition S without players i and j, the contributions of the players to the sum in (7.2) do coincide, and hence 𝜙i(v) = 𝜙j(v). Imagine that player i represents a zero player. Then all his contributions (see the bracketed expressions in (7.2)) vanish and 𝜙i(v) = 0, i.e., the property of zero player holds true. Finally, linearity follows from the additive form of the expression in (7.2). Satisfaction of Shapley’s axioms and uniqueness of an imputation meeting these axioms (see Theorem 8.8) lead to an important result. Theorem 8.9 A unique imputation agreeing with Axioms 1–4 is the Shapley vector 𝜙(v) = (𝜙1(v), … , 𝜙n(v)), where 𝜙i(v) = ∑ S⊆N:i∈S (|S| − 1)!(n − |S|)! n! [v(S) − v(S∖{i})], i = 1, … , n. Remark 8.1 If summation in (7.2) runs over coalitions excluding player i, the Shapley vector formula becomes 𝜙i(v) = ∑ S⊆N:i∉S (|S|)!(n − |S| − 1)! n! [v(S ∪ i) − v(S)], i = 1, … , n. (7.6) 8.8 Voting games. The Shapley–Shubik power index and the Banzhaf power index We should mention political science (especially, political decision-making) among important applications of cooperative games. Generally, a political decision is made by voting in some public authority, e.g., a parliament. In these conditions, a major role belongs to the power of www.it-ebooks.info COOPERATIVE GAMES 303 factions within such an authority. The matter concerns political parties possessing a certain set of votes, or other political unions. Political power definition was studied by L. Shapley and M. Shubik [1954], as well as by J.F. Banzhaf [1965]. In their works, the researchers employed certain methods from the theory of cooperative games to define the power or influence level of voting sides. Definition 8.19 A voting game is a cooperative game < N, v >, where the characteristic function takes only two values, 0 and 1,v(N) = 1. A coalition S such that v(S) = 1 is called a winning coalition. Denote by W the set of winning coalitions. Within the framework of voting games, the contribution of each player in any coalition equals 0 or 1. Therefore, the Shapley vector concept can be modified for such games. Definition 8.20 The Shapley–Shubik vector in a voting game < N, v > is the vector 𝜙(v) = (𝜙1(v), … , 𝜙n(v)), where the index of player i has the form 𝜙i(v) = ∑ S∉W,S∪∈W (|S|)!(n − |S| − 1)! n! , i = 1, … , n. According to Definition 8.20, the influence of player i is defined as the mean number of coalitions, where his participation guarantees win and non-participation leads to loss. In fact, there exists another definition of player’s power, viz., the so-called Banzhaf index. For player i, we say that a pair of coalitions (S ∪ i, S) is a switching, when (S ∪ i) appears as a winning coalition and the coalition S does not. In this case, player i is referred to as the key player in the coalition S. For each player i ∈ N, evaluate the number of all switchings in the game < N, v > and designate it by 𝜂i(v). The total number of switchings makes up 𝜂(v) = ∑ i∈N 𝜂i(v). Definition 8.21 The Banzhaf vector in a voting game < N, v > is the vector 𝛽(v) = (𝛽1(v), … , 𝛽n(v)), where the index of player i obeys the formula 𝛽i(v) = 𝜂i(v)∑ j∈N 𝜂j(v), i = 1, … , n. Now, let us concentrate on voting games proper. We believe that each player i in a voting game is described by some number of votes wi, i = 1, … , n. Furthermore, the affirmative decision requires a given threshold q of votes. Definition 8.22 A weighted voting game is a cooperative game < q; w1, … , wn > with the characteristic function v(S) = { 1, if w(S) ≥ q 0, if w(S) < q. Here w(S) = ∑ i∈S wi specifies the sum of votes of players in a coalition S. www.it-ebooks.info 304 MATHEMATICAL GAME THEORY AND APPLICATIONS To compute power indices, one can involve generating functions. Recall that the gener- ating function of a sequence {an, n ≥ 0} is the function G(x) = ∑ n≥0 anxn. For a sequence {a(n1, … , nk), ni ≥ 0, i = 1, … , k}, this function acquires the form G(x1, … , xk) = ∑ n1≥0 ... ∑ nk≥0 a(n1, … , nk)xn1 1 ...xnk k , The generating function for Shapley–Shubik power index evaluation was found by D.G. Cantor [1962]. In the case of the Banzhaf power index, the generating function was obtained by S.J. Brams and P.J. Affuso [1976]. Theorem 8.10 Suppose that < q; w1, … , wn > represents a weighted voting game. Then the Shapley–Shubik power index is defined by 𝜙i(v) = n−1∑ s=0 s!(n − s − 1)! n! ( q−1∑ k=q−wi Ai(k, s) ) , i = 1, … , n. (8.1) Here Ai(k, s) means the number of coalitions S comprising exactly s players and i ∉ S, whose power equals w(S) = k. In addition, the generating function takes the form Gi(x, z) = ∏ j≠i (1 + zxwj ). (8.2) Proof: Consider the product (1 + zxw1 )...(1 + zxwn ). By removing all brackets, we get the following expression. As their coefficients, identical degrees zk hold the quantity x raised to the power wi1 + ⋯ + wik for different combinations of (i1, … , ik), i.e., (1 + zxw1 )...(1 + zxwn ) = ∑ S⊂N z|S|x ∑ i∈S wi . (8.3) Now, take this sum and extract terms having identical degrees of x (they correspond to coalitions with the same power w(S)). Such manipulations yield (1 + zxw1 )...(1 + zxwn ) = ∑ k≥0 ∑ s≥0 A(k, s)xkzs, where the factor A(k, s) equals the number of all coalitions with s participants, whose power constitutes k. By eliminating the factor (1 + zxwi ) in the product (8.3), we construct the generating function for Ai(k, s). Formula (8.1) is immediate from the following fact. Coalitions, where player i appears the key one, are coalitions S with the power levels w(S) ∈ {q − wi, q − wi + 1, … , q − 1}. Really, in this case, we have w(S ∪ i) = w(S) + wi ≥ q. The proof of Theorem 8.10 is concluded. www.it-ebooks.info COOPERATIVE GAMES 305 Theorem 8.11 Let < q; w1, … , wn > be a weighted voting game. The number of switchings 𝜂i(v) in the Banzhaf power index can be rewritten as 𝜂i(v) = q−1∑ k=q−wi bi(k), i = 1, … , n, (8.4) where bi(k) stands for the number of coalitions S : i ∉ S, whose power makes up w(S) = k. Furthermore, the generating function has the form Gi(x) = ∏ j≠i (1 + xwj ). (8.5) Proof: Similarly to Theorem 8.10, consider the product (1 + xw1 )...(1 + xwn ) and remove the brackets: G(x) = (1 + xw1 )...(1 + xwn ) = ∑ S⊂N ∏ i∈S xwi = ∑ S⊂N x ∑ i∈S wi . (8.6) Again, extract summands with coinciding degrees of x, that correspond to coalitions having the identical power w(S). This procedure brings to G(x) = ∑ k≥0 b(k)xk, where the factor b(k) is the number of all coalitions with power k. By eliminating the factor (1 + xwi ) from the product (8.6), we derive the generating function (8.5) for bi(k). Formula (8.4) follows from when coalitions, where player i is the key one, are actually coali- tions S with the power w(S) ∈ {q − wi, q − wi + 1, … , q − 1}. This completes the proof of Theorem 8.11. Theorems 8.10 and 8.11 provide a simple computation technique for the above power indices. As examples, we select the 14th Bundestag (the national parliament of the Federal Republic of Germany, 1998–2002) and the 3rd State Duma (the lower chamber of the Russian parliament, 2000–2003). 8.8.1 The Shapley–Shubik power index for influence evaluation in the 14th Bundestag The 14th Bundestag consisted of 669 members from five political parties: The Social Democratic Party of Germany (Sozialdemokratische Partei Deutschlands, SPD), 298 seats The Christian Democratic Union of Germany (Christlich Demokratische Union Deutsch- lands, CDU), 245 seats The Greens (Die Gr¨unen), 47 seats The Free Democratic Party (Freie Demokratische Partei, FDP), 43 seats The Party of Democratic Socialism (Partei des Demokratischen Sozialismus,PDS), 36 seats. www.it-ebooks.info 306 MATHEMATICAL GAME THEORY AND APPLICATIONS For a draft law, the enactment threshold q is the simple majority of votes, i.e., 335 votes. To find the influence levels of each party, we apply the Shapley–Shubik power indices. It is necessary to calculate the generating function (8.2). For the SDP, the generating function acquires the form G1(x, z) = (1 + zx245)(1 + zx47)(1 + zx43)(1 + zx36) = 1 + z(x36 + x43 + x47 + x245) + z2(x79 + x83 + x90 + x281 + x288 + x292) +z3(x126 + x324 + x328 + x335) + z4x371. (8.7) Now, we can define the number of coalitions, where the SDP appears to be the key coalition. This requires that the influence level of a coalition before SDP entrance lies between q − w1 = 335 − 298 = 37 and q − 1 = 334. The first bracketed expression shows that, for s = 1, the number of such coalitions makes up 3. In the cases of s = 2 and s = 3, we use the second bracketed expression to find that the number of such coalitions is 6 and 3, respectively. In the rest cases, the SDP does not represent the key coalition. Therefore, q−1∑ k=q−w1 A1(k,0)= 0, q−1∑ k=q−w1 A1(k,1)= 3, q−1∑ k=q−w1 A1(k,2)= 6, q−1∑ k=q−w1 A1(k,3)= 3, q−1∑ k=q−w1 A1(k,4)= 0. Hence, the Shapley–Shubik index equals 𝜙1(v) = 1!3! 5! 3 + 2!2! 5! 6 + 3!1! 5! 3 = 0.5. We can perform similar computations for other parties in the 14th Bundestag. The CDU becomes the key coalition, if the influence level of a coalition before its entrance is within the limits of q − w2 = 90 and q − 1 = 334. For the CDU, the generating function is described by G2(x, z) = (1 + zx298)(1 + zx47)(1 + zx43)(1 + zx36) = 1 + z(x36 + x43 + x47 + x298) + z2(x79 + x83 + x90 + x334 + x341 + x345) +z3(x126 + x377 + x381 + x388) + z4x424, whence it follows that q−1∑ k=q−w2 A2(k,0)= 0, q−1∑ k=q−w2 A2(k,1)= 1, q−1∑ k=q−w2 A2(k,2)= 2, q−1∑ k=q−w2 A2(k,3)= 1, q−1∑ k=q−w2 A2(k,4)= 0. The corresponding Shapley–Shubik power index constitutes 𝜙2(v) = 1!3! 5! 1 + 2!2! 5! 2 + 3!1! 5! 1 = 1 6 ≈ 0.1667. www.it-ebooks.info COOPERATIVE GAMES 307 In the case of the Greens, we obtain G3(x, z) = (1 + zx298)(1 + zx245)(1 + zx47)(1 + zx36) = 1 + z(x36 + x47 + x245 + x298) + z2(x83 + x281 + x292 + x334 + x345 + x543) +z3(x328 + x381 + x579 + x590) + z4x626. This party is the key coalition for coalitions whose influence levels lie between q − w3 = 292 and q − 1 = 334. The number of coalitions, where the Greens form the key coalition, coincides with the appropriate number for the CDU; thus, their power indices are identical. The same applies to the FDP. The PDS possesses the generating function G5(x, z) = (1 + zx298)(1 + zx245)(1 + zx47)(1 + zx43) = 1 + z(x43 + x47 + x245 + x298) + z2(x90 + x288 + x292 + x341 + x345 + x543) +z3(x335 + x388 + x586 + x590) + z4x633. The PDS is the key coalition for coalitions with the influence levels in the range [q − w5 = 299, 334]. However, the shape of the generating function implies the non-existence of such coalitions. Hence, 𝜙5(v) = 0. And finally, 𝜙1(v) = 1 2, 𝜙2(v) = 𝜙3(v) = 𝜙4(v) = 1 6, 𝜙5(v) = 0. Therefore, the greatest power in the 14th Bundestag belonged to the Social Democrats. Yet the difference in the number of seats of this party and the CDU was not so significant (298 and 245, respectively). Meanwhile, the power indices of the CDU, the Greens, and the Free Democrats turned out the same despite considerable differences in their seats (245, 47, and 43, respectively). The PDS possessed no influence at all, as it was not the key one in any coalition. 8.8.2 The Banzhaf power index for influence evaluation in the 3rd State Duma The 3rd State Duma of the Russian Federation was represented by the following political parties and factions: The Agro-industrial faction (AIF), 39 seats The Unity Party (UP), 82 seats The Communist Party of the Russian Federation (CPRF), 92 seats The Liberal Democratic Party (LDPR), 17 seats The People’s Deputy faction (PDF), 57 seats Fartherland-All Russia (FAR), 46 seats Russian Regions (RR), 41 seats The Union of Rightist Forces (URF), 32 seats Yabloko (YAB), 21 Independent deputies (IND), 23 seats. www.it-ebooks.info 308 MATHEMATICAL GAME THEORY AND APPLICATIONS Table 8.6 Banzhaf index evaluation for the State Duma of the Russian Federation. Parties AIF UP CPRF LDPR PDF Switching thresholds 187-225 144-225 134-225 209-225 169-225 The number of switchings 96 210 254 42 144 The Banzhaf index 0.084 0.185 0.224 0.037 0.127 Parties FAR RR URF YAB IND Switching thresholds 180-225 185-225 194-225 205-225 203-225 The number of switchings 110 100 78 50 52 The Banzhaf index 0.097 0.088 0.068 0.044 0.046 The 3rd State Duma included 450 seats totally. For a draft law, the enactment threshold q is the simple majority of votes, i.e., 226 votes. To evaluate the influence level of each party and faction, we employ the Banzhaf power index. To compute the generating function Gi(x) =∏ j≠i (1 + xwj ), i = 1, … , 10 for each player, it is necessary to remove brackets in the product (8.5). For instance, we can use any software for symbolic computations, e.g., Mathematica (procedure Expand). For each player i = 1, … , 10, remove brackets and count the number of terms of the form xk, where k varies from q − wi to q − 1. This corresponds to the number of coalitions, where players are key ones. For the AIF, these thresholds make up q − w1 = 225 − 39 = 187 and 225. Table 8.6 combines the thresholds for calculating the number of switchings for each player. Moreover, it presents the resulting Banzhaf indices of the parties. In addition to the power indices suggested by Shapley–Shubik and Banzhaf, researchers sometimes employ the Deegan–Packel power index [1978] and the Holler index [1982]. They are defined through minimal winning coalitions. Definition 8.23 A minimal winning coalition is a coalition where each player appears the key one. Definition 8.24 The Deegan–Packel vector in a weighted voting game < N, v > is the vector dp(v) = (dp1(v), … , dpn(v)), where the index of player i has the form dpi(v) = 1 m ∑ S∈M:i∈S 1 s , i = 1, … , n. Here M denotes the set of all minimal winning coalitions, m means the total number of minimal winning coalitions and s is the number of members in a coalition S. Definition 8.25 The Holler vector in a weighted voting game < N, v > is the vector h(v) = (h1(v), … , hn(v)), where the index of player i has the form hi(v) = mi(v)∑ i∈N mi(v), i = 1, … , n. Here mi specifies the number of minimal winning coalitions containing player i (i = 1, … , n). Evaluate these indices for defining the influence levels of political parties in the parliament of Japan. www.it-ebooks.info COOPERATIVE GAMES 309 8.8.3 The Holler power index and the Deegan–Packel power index for influence evaluation in the National Diet (1998) The National Diet is Japan’s bicameral legislature. It consists of a lower house (the House of Representatives) and an upper house (the House of Councillors). We will analyze the House of Councillors only, which comprises 252 seats. After the 1998 elections, the majority of seats were accumulated by six parties: The Liberal Democratic Party (LDP), 105 seats The Democratic Party of Japan (DPJ), 47 seats The Japanese Communist Party (JCP), 23 seats The Komeito Party (KP), 22 seats The Social Democratic Party (SDP), 13 seats The Liberal Party (LP), 12 seats The rest parties (RP), 30 seats. For a draft law, the enactment threshold q is the simple majority of votes, i.e., 127 votes. Obviously, the minimal winning coalitions are (LDP, DPJ), (LDP, JCP), (LDP, KP), (LDP, RP), (LDP, SDP, LP), (DPJ, JCP, KP, SDP, RP), and (DPJ, JCP, KP, LP, RP). Therefore, the LDP appears in five minimal winning coalitions, the DPJ, the JCP, the KP, the RP belong to three minimal winning coalitions, the SDP, and the LP enter two minimal winning coalitions. And the Holler power indices make up h1(v) = 5∕21 ≈ 0.238, h2(v) = h3(v) = h4(v) = h7(v) = 3∕21 ≈ 0.144, h5(v) = h6(v) = 2∕21 ≈ 0.095. Now, evaluate the Deegan–Packel power indices: dp1(v) = 1 7 ( 41 2 + 1 3 ) = 1∕3 ≈ 0.333, dp2(v) = dp3(v) = dp4(v) = dp7(v) = 1 7 (1 2 + 21 5 ) ≈ 0.128, dp5(v) = dp6(v) = 1 7 (1 3 + 1 5 ) ≈ 0.076. Readers can see that the influence of the Liberal Democrats is more than two times higher than any party in the House of Councillors. The rest parties get decomposed into two groups of almost identical players. 8.9 The mutual influence of players. The Hoede–Bakker index The above models of voting ignore the mutual influence of players. However, real decision- making includes situations when a certain player or some players modify an original decision under the impact of others. Consider a voting game < N, v >, where the characteristic function possesses two values, 0 and 1, as follows. www.it-ebooks.info 310 MATHEMATICAL GAME THEORY AND APPLICATIONS 3 12 45 Figure 8.5 Influence graph. Player 1 influences players 3, 4, and 5. Player 2 is fully independent. Imagine that players N = {1, 2, … , n} have to select a certain draft law. The weights of the players are represented by the vector w = (w1, … , wn). Next, their initial preferences form a binary vector 𝜋 = (𝜋1, … , 𝜋n), where 𝜋i equals 1, if player i supports the draft law, and equals 0 otherwise. At stage 1, players hold a consultation, and the vector 𝜋 gets transformed into a new decision vector b = B𝜋 (which is also a binary vector). The operator B can be defined, e.g., using the mutual influence graph of players (see Figure 8.5). Stage 2 lies in calculating the affirmative votes for the draft law; the affirmative decision follows, if their number is not smaller than a given threshold q,1≤ q ≤ n. Therefore, the collective decision is defined by the characteristic function v(b) = v(B𝜋) = I { n∑ i=1 biwi ≥ q } . (9.1) Here I{A} means the indicator of the set A. Suppose that the function v matches a couple of axioms below. As a matter of fact, this requirement applies to the operator B. A1. Denote by ̄𝜋 the complement vector, where ̄𝜋i = 1 − 𝜋1. Then any preference vector 𝜋 meets the equality v(B ̄𝜋) = 1 − v(B𝜋). A2. Consider vectors 𝜋, 𝜋′ and define the order 𝜋 ≤ 𝜋′,if{i ∈ N : 𝜋i = 1} ⊂ {i ∈ N : 𝜋′ i = 1}. Then any preference vectors 𝜋, 𝜋′ such that 𝜋 ≤ 𝜋 satisfy the condition v(B𝜋) ≤ v(B𝜋′). According to Axiom A1, the decision-making rule must be such that, if all players reverse their initial opinions, the collective decision is also contrary. Axiom A2 claims that, if players with initial affirmative decision are supplemented by other players, the final collective decision either remains the same or may turn out affirmative (if it was negative). Definition 8.26 The Hoede–Bakker index of player i is the quantity HBi(v(B)) = 1 2n−1 ∑ 𝜋:𝜋i=1 v(B𝜋), i = 1, … , n. (9.2) www.it-ebooks.info COOPERATIVE GAMES 311 C AB D Figure 8.6 Influence graph. Player A influences player B. Players C and D are fully independent. In the expression (9.2), summation runs over all binary preference vectors 𝜋, where the opinion of player i is affirmative. The Hoede–Bakker index reflects the influence level of player i on draft law adoption under equiprobable preferences of other players (the latter may have mutual impacts). Concluding this section, we study the following example. A parliament comprises four parties A, B, C, and D that have 10, 20, 30, and 40 seats, respectively. Assume that (a) draft law enactment requires 50 votes and (b) party A exerts an impact on party B (see Figure 8.6). First, compute the Banzhaf indices: 𝛽1(v) = 1 12, 𝛽2(v) = 𝛽3(v) = 3 12, 𝛽4(v) = 5 12 . Evidently, the influence level of party A is minimal; however, we neglect the impact of party A on party B. Second, calculate the Hoede–Bakker indices, see Table 8.7. We find that HB1(v) = HB3(v) = HB4(v) = 3 4, HB2(v) = 1 2 . Now, the influence level of party A goes up, reaching the levels of parties C and D. Table 8.7 Hoede–Bakker index evaluation. 𝜋 0,0,0,0 0,0,0,1 0,0,1,0 0,1,0,0 1,0,0,0 0,0,1,1 0,1,0,1 0,1,1,0 B𝜋 0,0,0,0 0,0,0,1 0,0,1,0 0,0,0,0 1,1,0,0 0,0,1,1 0,0,0,1 0,0,1,0 v(B𝜋)00000100 𝜋 1,0,0,1 1,0,1,0 1,1,0,0 0,1,1,1 1,0,1,1 1,1,0,1 1,1,1,0 1,1,1,1 B𝜋 1,1,0,1 1,1,1,0 1,1,0,0 0,0,1,1 1,1,1,1 1,1,0,1 1,1,1,0 1,1,1,1 v(B𝜋)11011111 www.it-ebooks.info 312 MATHEMATICAL GAME THEORY AND APPLICATIONS Exercises 1. The jazz band game with four players. A restaurateur invites a jazz band to perform one evening and offers 100 USD. The jazz band consists of four musicians, namely, a pianist (player 1), a vocalist (player 2), a drummer (player 3), and a guitarist (player 4). They should distribute the fee. An argument during such negotiations is the characteristic function v defined by individ- ual honoraria the players may receive by performing singly, in pairs or triplets. The characteristic function has the form v(1) = 40, v(2) = 30, v(3) = 20, v(4) = 0, v(1, 2) = 80, v(1, 3) = 70, v(1, 4) = 50, v(2, 3) = 60, v(2, 4) = 35, v(3, 4) = 25, v(1,2,3)= 95, v(1,2,4)= 85, v(1,3,4)= 75, v(2,3,4)= 65. Construct the core and the 𝜏-equilibrium of this game. 2. Find the Shapley vector in exercise no. 1. 3. The road construction game with four players. Four farms agree to construct a road communicating all farms with a city. Con- struction of each segment of the road incurs definite costs. Each farm has a specific income from selling its agricultural products in the city. Road infrastructure, the con- struction cost of each road segment and the incomes of the farmers are illustrated in Figure 8.7. 6 6 6 10 10 12 10 1 5 20 1015 5 Figure 8.7 Road construction. Build the nucleolus of this game. 4. Construct the core and the 𝜏-equilibrium of the game in exercise no. 2. 5. The shoes game. This game involves four sellers. Sellers 1 and 2 have right shoes, whereas sellers 3 and 4 have left shoes. The price of a single shoe is 0 USD, the price of a couple makes up 10 USD. Sellers strive to obtain some income from shoes. Build the core and the 𝜏-core of this game. 6. Take the game from exercise no. 5 and construct the Shapley vector. 7. Give an example of a cooperative game which is not quasibalanced. 8. Give an example of a cooperative game, which is quasibalanced, but fails to be balanced. www.it-ebooks.info COOPERATIVE GAMES 313 9. The parliament of a small country has 40 seats distributed as follows: Party1,20seats Party2,15seats Party 3, 5 seats For a draft law, the enactment threshold q is the simple majority of votes, i.e., 21 votes. Evaluate the Shapley–Shubik power index of the parties. 10. Consider the parliament from exercise no. 9 and find the Banzhaf power index of the parties. www.it-ebooks.info 9 Network games Introduction Games in information networks form a modern branch of game theory. Their development was connected with expansion of the global information network (Internet), as well as with organization of parallel computations on supercomputers. Here the key paradigm concerns the non-cooperative behavior of a large number of players acting independently (still, their payoffs depend on the behavior of the rest participants). Each player strives for transmitting or acquiring maximum information over minimum possible time. Therefore, the payoff function of players is determined either as the task time or as the packet transmission time over a network (to-be-minimized). Another definition of the payoff function lies in the transmitted volume of information or channel capacity (to-be-maximized). An important aspect is comparing the payoffs of players with centralized (cooperative) behavior and their equilibrium payoffs under non-cooperative behavior. Such comparison provides an answer to the following question. Should one organize management in a system (thus, incurring some costs)? If this sounds inefficient, the system has to be self-organized. Interesting effects arise naturally in the context of equilibration. Generally speaking, in an equilibrium players may obtain non-maximal payoffs. Perhaps, the most striking result covers Braess’s (1968) paradox (network expansion reduces the equilibrium payoffs of different players). There exist two approaches to network games analysis. According to the first one, a player chooses a route for packet transmission; a packet is treated as an indivisible quantity. Here we mention the works by Papadmitrious and Koutsoupias (1999). Accordingly, such models will be called the KP-models. The second approach presumes that a packet can be divided into segments and transmitted by different routes. It utilizes the equilibrium concept suggested by J.G. Wardrop (1952). Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info NETWORK GAMES 315 9.1 The KP-model of optimal routing with indivisible traffic. The price of anarchy We begin with an elementary information network representing m parallel channels (see Figure 9.1). Consider a system of n users (players). Player i (i = 1, … , n) intends to send traffic of some volume wi through a channel. Each channel l = 1, … , m has a given capacity cl. When traffic of a volume w is transmitted by a channel with a capacity c, the channel delay equals w∕c. Each user pursues individual interests, endeavoring to occupy the minimal delay channel. The pure strategy of player i is the choice of channel l for his traffic. Consequently, the vector L = (l1, … , ln) makes the pure strategy profile of all users; here li means the number of the channel selected by user i. His mixed strategy represents the probabilistic distribution pi = (p1 i , … , pm i ), where pl i stands for the probability of choosing channel l by user i.The matrix P composed of the vectors pi is the mixed strategy profile of the users. In the case of pure strategies for user i, the traffic delay in the channel li is determined by 𝜆i = ∑ k:lk=li wk cli . Definition 9.1 A pure strategy profile (l1, … , ln) is called a Nash equilibrium, if for each user i we have 𝜆i =minj=1,…,m wi+ ∑ k≠i:lk=j wk cj . In the case of mixed strategies, it is necessary to introduce the expected traffic delay for user i employing channel l. This characteristic makes up 𝜆l i = wi+ n∑ k=1,k≠i pl kwk cl . The minimal expected delay of user i equals 𝜆i =minl=1,…,m 𝜆l i. Definition 9.2 A strategy profile P is called a Nash equilibrium, if for each user i and any channel adopted by him the following condition holds true: 𝜆l i = 𝜆i,ifpl i > 0, and 𝜆l i >𝜆i,if pl i = 0. Definition 9.3 A mixed strategy equilibrium P is said to be a completely mixed strategy equilibrium, if each user selects each channel with a positive probability, i.e., for any i = 1, … , n and any l = 1, … , m: pl i > 0. Figure 9.1 A network of parallel channels. www.it-ebooks.info 316 MATHEMATICAL GAME THEORY AND APPLICATIONS The quantity 𝜆i describes the minimum possible individual costs of user i to send his traffic. Pursuing personal goals, each user chooses strategies ensuring this value of the expected delay. The so-called social costs characterize the general costs of the system due to channels operation. One can involve the following social costs functions SC(w, L) for a pure strategy profile: 1. the linear costs LSC(w, L) = m∑ l=1 ∑ k:lk=l wk cl ; 2. the quadratic costs QSC(w, L) = m∑ l=1 ( ∑ k:lk=l wk )2 cl ; 3. the maximal costs MSC(w, L) =max l=1,…,m ∑ k:lk=l wk cl . Definition 9.4 The social costs for a mixed strategy profile P are the expected social costs SC(w, L) for a random pure strategy profile L: SC(w, P) = E (SC(w, L)) = ∑ L=(l1,…,ln) ( n∏ k=1 plk k ⋅ SC(w, L) ) . Denote by opt =minP SC(w, P) the optimal social costs. The global optimum in the model considered follows from social costs minimization. Generally, the global optimum is found by enumeration of all admissible pure strategy profiles. However, in a series of cases, it results from solving the continuous conditional minimization problem for social costs, where the mixed strategies of users (the vector P) act as variables. Definition 9.5 The price of anarchy is the ratio of the social costs in the worst-case Nash equilibrium and the optimal social costs: PA =sup P−equilibrium SC(w, P) opt . Moreover, if sup affects equilibrium profiles composed of pure strategies only, we mean the pure price of anarchy. Similarly, readers can state the notion of the mixed price of anarchy. The price of anarchy defines how much the social costs under centralized control differ from the social costs when each player acts according to his individual interests. Obviously, PA ≥ 1 and the actual deviation from 1 reflects the efficiency of centralized control. 9.2 Pure strategy equilibrium. Braess’s paradox Study several examples of systems, where the behavior of users represents pure strategy profiles only. As social costs, we select the maximal social costs function. Introduce the notation (wi1 , … , wik ) → cl for a situation when traffic segments wi1 , … , wik belonging to users i1, … , ik ∈ {1, … , n} are transmitted through the channel with the capacity cl. www.it-ebooks.info NETWORK GAMES 317 Figure 9.2 The worst-case Nash equilibrium with the delay of 2.5. Example 9.1 Actually, it illustrates Braess’s paradox under elimination of one channel. Consider the following set of users and channels: n = 5, m = 3, w = (20, 10, 10, 10, 5), c = (20, 10, 8) (see Figure 9.2). In this case, there exist several Nash equilibria. One of them consists in the strategy profile {(10, 10, 10) → 20, 5 → 10, 20 → 8)}. Readers can easily verify that any deviation of a player from this profile increases his delay. However, such equilibrium maximizes the social costs: MSC(w; c; (10, 10, 10) → 20, 5 → 10, 20 → 8) = 2.5. We call this equilibrium the worst-case equilibrium. Interestingly, the global optimum of the social costs is achieved in the strategy profile (20, 10) → 20, (10, 5) → 10, 10 → 8; it makes up 1.5. Exactly this value represents the best- case pure strategy Nash equilibrium. If we remove channel 8 (see Figure 9.3), the worst-case social costs become MSC(w; c; (20, 10, 10) → 20, (10, 5) → 10) = 2. This strategy profile forms the best-case pure strategy equilibrium and the global optimum. Example 9.2 Set n = 4, m = 3, w = (15, 5, 4, 3), and c = (15, 10, 8). The social costs in the worst-case equilibrium constitute MSC(w; c; (5, 4) → 15, 15 → 10, 3 → 8) = 1.5. Under the best-case equilibrium, the global optimum of 1 gets attained in the strategy profile 15 → 15, (5, 3) → 10, 4 → 8. The non-equilibrium strategy profile 15 → 15, (5, 4) → 10, 3 Figure 9.3 Delay reduction owing to channel elimination. www.it-ebooks.info 318 MATHEMATICAL GAME THEORY AND APPLICATIONS → 8 is globally optimal, either. As the result of channel 10 removal, the worst-case equilibrium becomes (15, 5) → 15, (4, 3) → 8 (the corresponding social costs equal 1.333). The global optimum and the best-case equilibrium are achieved in (15, 3) → 15, (5, 4) → 8, and the social costs make up 1.2. Example 9.3 Set n = 4, m = 3, w = (15, 8, 4, 3), and c = (15, 8, 3). The social costs in the worst-case equilibrium constitute MSC(w; c; (8, 4, 3) → 15, 15 → 8) = 1.875. Under the best-case equilibrium, the global optimum of 1.2666 gets attained in the strat- egy profile (15, 4) → 15, 8 → 8, 4 → 3. By eliminating channel 8, we obtain the worst-case equilibrium (15, 8, 4) → 15, 3 → 3 with the social costs of 1.8. Finally, the global optimum and the best-case equilibrium are observed in (15, 8, 3) → 15, 4 → 3, and the corresponding social costs equal 1.733. Example 9.4 (Braess’s paradox.) This model was proposed by D. Braess in 1968. Consider a road network shown in Figure 9.4. Suppose that 60 automobiles move from point A to point B. The delay on the segments (C, B) and (A, D) does not depend on the number of automobiles (it equals 1 h). On the segments (A, C) and (D, B), the delay is proportional to the number of moving automobiles (measured in mins). Obviously, here an equilibrium lies in the equal distribution of automobiles between the routes (A, C, B) and (A, D, B), i.e., 30 automobiles per route. In this case, for each automobile the trip consumes 1.5 h. Now, imagine that we have connected points C and D by a speedway, where each auto- mobile has zero delay (see Figure 9.5). Then automobiles that have previously selected the route (A, D, B) benefit from moving along the route (A, C, D, B). This applies to automo- biles that have previously chosen the route (A, C, B)—they should move along the route (A, C, D, B) as well. Hence, the Nash equilibrium (worst case) is the strategy profile, where all automobiles move along the route (A, C, D, B). However, each automobile spends 2 h for the trip. Therefore, we observe a self-contradictory situation—the costs of each participant have increased as the result of highway construction. This makes Braess’s paradox. Figure 9.4 In the equilibrium, players are equally distributed between the routes. www.it-ebooks.info NETWORK GAMES 319 Figure 9.5 In the equilibrium, all players choose the route ACDB. 9.3 Completely mixed equilibrium in the optimal routing problem with inhomogeneous users and homogeneous channels In the current section, we study a system with identical capacity channels. Suppose that the capacity of each channel l equals cl = 1. Let us select linear social costs. Lemma 9.1 Consider a system with n users and m parallel channels having identical capacities. There exists a unique completely mixed Nash equilibrium such that for any user i and channel l the equilibrium probabilities make up pl i = 1∕m. Proof: By the definition of an equilibrium, each player i has the same delay on all channels, i.e., ∑ k≠i pj kwk = 𝜆i, i = 1, … , n; j = 1, … , m. First, sum up these equations over j = 1, … , m: m∑ j=1 ∑ k≠i pj kwk = ∑ k≠i wk = m𝜆i. Hence it follows that 𝜆i = 1 m ∑ k≠i wk, i = 1, … , n. Second, sum up these equations over i = 1, … , n: n∑ i=1 ∑ k≠i pj kwk = (n − 1) n∑ k=1 pj kwk = n∑ i=1 𝜆i. www.it-ebooks.info 320 MATHEMATICAL GAME THEORY AND APPLICATIONS This yields n∑ k=1 pj kwk = 1 n − 1 n∑ i=1 𝜆i = 1 n − 1 ⋅ (n − 1)W m = W m , where W = w1 + ⋯ + wn. The equilibrium equation system leads to pj iwi = n∑ k=1 pj kwk − ∑ k≠i pj kwk = W m − 1 m ∑ k≠i wk = wi m , whence it appears that pj i = 1 m, i = 1, … , n; j = 1, … , m. Denote by F the completely mixed equilibrium in this model and find the corresponding social costs: LSC(w, F) = E ( m∑ l=1 ∑ k:lk=l wk ) = m∑ l=1 n∑ k=1 E(wk ⋅ Ilk=l) = m∑ l=1 n∑ k=1 wkpk l = n∑ k=1 wk. 9.4 Completely mixed equilibrium in the optimal routing problem with homogeneous users and inhomogeneous channels This section deals with a system, where users send the same volumes of traffic. Suppose that the traffic volume of any user i makes up wi = 1. Define the total capacity of all channels: C = m∑ l=1 cl. We select linear and quadratic social costs. Without loss of generality, sort the channels in the ascending order of their capacities: c1 ≤ c2 ≤ … ≤ cm. Lemma 9.2 Consider the model with n homogeneous users and m parallel channels. A unique completely mixed Nash equilibrium exists iff c1(m + n − 1) > C. Furthermore, for each channel l = 1, … , m and any user i = 1, … , n the equilibrium probabilities take the form pl i = pl = cl(m+n−1)−C C(n−1) and the individual equilibrium delays do coincide, being equal to m+n−1 C . Proof: Suppose that a completely mixed equilibrium exists. Then the expected traffic delay of each user i on any channel must be the same: 1 + ∑ k≠i pl k cl = 1 + ∑ k≠i pj k cj for i = 1, … , n and l, j = 1, … , m. www.it-ebooks.info NETWORK GAMES 321 Multiply both sides of each identity by cl. Next, perform summation over l for each group of identities with the same indexes i and j. Bearing in mind that m∑ l=1 pl k = 1fork = 1, … , n,we obtain m + (n − 1) = C 1 + ∑ k≠i pj k cj for i = 1, … , n, j = 1, … , m m + n − 1 C = 1 + ∑ k≠i pj k cj = 𝜆j i for i = 1, … , n, j = 1, … , m. Since the left-hand side of the identity takes the same value for any missed term pj i cj ,all quantities pj i = pj for any i. The identity can be transformed to m + n − 1 C = 1 + (n − 1)pj cj for j = 1, … , m, whence it follows that pj = cj(m + n − 1) − C C(n − 1) for j = 1, … , m. Clearly, the sum of the equilibrium probabilities over all channels constitutes 1. Thus, a neces- sary and sufficient admissibility condition of the derived solution lies in the inequality pl > 0 valid for all l = 1, … , m. In other words, the condition c1(m + n − 1) > C must hold true. Social costs evaluation in a completely mixed equilibrium involves a series of identities below. Lemma 9.3 For any x ∈ [0, 1] and integer n, we have the expressions n∑ k=1 Ck nkxk(1 − x)n−k = nx. Proof: Address some properties of the binomial distribution. Let each independent random variable 𝜉i, where i = 1, … , n, possess values 0 or 1 and E𝜉i = x. In this case, n∑ k=1 Ck nkxk(1 − x)n−k = E ( n∑ i=1 𝜉i ) = n∑ i=1 E𝜉i = nx. Now, find the social costs for the completely mixed equilibrium. LSC(c, F) = E ( m∑ l=1 the number of users l cl ) = m∑ l=1 1 cl n∑ k=1 Ck nk(1 − pl)n−k(pl)k = n m∑ l=1 pl cl = mn(m + n − 1) C(n − 1) − n n − 1 m∑ l=1 1 cl . www.it-ebooks.info 322 MATHEMATICAL GAME THEORY AND APPLICATIONS Analyze the possible appearance of Braess’s paradox in this model, i.e., when adding a new channel worsens the completely mixed equilibrium (increases the social costs for the completely mixed equilibrium). We believe that the completely mixed equilibrium exists in the original system, viz., the condition c1(m + n − 1) > C takes place. Adding a new channel must not violate the existence of a completely mixed equilibrium. Notably, we add a channel c0 such that c0(m + n) > C + c0 and c1(m + n) > C + c0. Theorem 9.1 Consider the model with n homogeneous users and m inhomogeneous parallel channels. The social costs in a completely mixed equilibrium increase as the result of adding a new channel with the capacity C m+n−1 < c0 < C m such that c0(m + n) > C + c0 and c1(m + n) > C + c0. Proof: Let F be the completely mixed equilibrium strategy profile in the model with n homogeneous users and m inhomogeneous parallel channels. Assume that we have added a channel with some capacity c0, and F0 indicates the completely mixed equilibrium in the resulting system. Then the variation of the linear social costs becomes LSC(w, F0) − LSC(w, F) =− n (n − 1)c0 + (m + 1)n(m + n) (C + c0)(n − 1) − mn(m + n − 1) C(n − 1) = n (n − 1)Cc0(C + c0) ( Cc0(2m + n − 1) − C2 − mc2 0(m + n − 1) ) . The above difference appears negative, if Cc0(2m + n − 1) − C2 − mc2 0(m + n − 1) > 0. The left-hand side of this inequality represents a parabolic function in c0 with a non-negative coefficient held by c2 0. Therefore, all positive values of the function lie between its roots C m+n−1 and C m . Adding a new channel with some capacity C m+n−1 < c0 < C m increase the linear social costs. Example 9.5 Choose a system with four users and two parallel channels of capacity 1. Here the completely mixed equilibrium is the strategy profile, where all equilibrium strategies equal 0.5. The linear social costs in the equilibrium make up 4. If we add a new channel with any capacity 2 5 < c0 < 1, the completely mixed equilibrium exists; the equilibrium probabilities are 5c0−2 3(c0+2) (the new channel) and 4−c0 3(c0+2) (the two former channels). The linear social costs in the equilibrium equal 52c0−8c2 0−8 3c0(c0+2) , which exceeds 4. 9.5 Completely mixed equilibrium: The general case To proceed, we concentrate on the general case model, where users send traffic of different volumes through channels with different capacities. Again, select the linear social costs and let W = n∑ i=1 wi be the total volume of user traffic, C = m∑ l=1 cl represent the total capacity of all channels. www.it-ebooks.info NETWORK GAMES 323 The following theorem provides the existence condition of a completely mixed equilib- rium and the corresponding values of equilibrium probabilities. Theorem 9.2 A unique completely mixed equilibrium exists iff the condition ( 1 − mcl C ) ( 1 − W (n − 1)wi ) + cl C ∈ (0, 1) holds true for all users i = 1, … , n and all channels l = 1, … , m. The corresponding equilib- rium probabilities make up pl i = ( 1 − mcl C ) ( 1 − W (n − 1)wi ) + cl C . Obviously, for any user the sum of the equilibrium probabilities over all channels equals 1. Therefore, we should verify the above inequality only in one side, i.e., for all users i = 1, … , n and all channels l = 1, … , m: ( 1 − mcl C ) ( 1 − W (n − 1)wi ) + cl C > 0. (5.1) Evaluate the linear social costs for the completely mixed equilibrium F: LSC(w, c, F) = E ⎛ ⎜ ⎜ ⎜⎝ m∑ l=1 ∑ k:lk=l wk cl ⎞ ⎟ ⎟ ⎟⎠ = m∑ l=1 n∑ k=1 E(wk ⋅ Ilk=l) cl = m∑ l=1 n∑ k=1 wkpk l cl = mW(n + m − 1) C(n − 1) − W n − 1 m∑ l=1 1 cl . Study the possible revelation of Braess’s paradox in this model. Suppose that the com- pletely mixed equilibrium exists in the original system (the condition (5.1) takes place). Adding a new channel must preserve the existence of a completely mixed equilibrium. In other words, we add a certain channel c0 meeting the analog of the condition (5.1) in the new system of m + 1 channels. Theorem 9.3 Consider the model with n inhomogeneous users and m inhomogeneous parallel channels. The linear social costs in the completely mixed equilibrium increase as the result of adding a new channel with some capacity C m+n−1 < c0 < C m such that for any users i = 1, … , n and all channels l = 0, … , m: ( 1 − (m + 1)cl C + c0 )( 1 − W (n − 1)wi ) + cl C + c0 > 0. www.it-ebooks.info 324 MATHEMATICAL GAME THEORY AND APPLICATIONS Proof: Let F denote the completely mixed equilibrium strategy profile in the model with n inhomogeneous users and m inhomogeneous parallel channels. Suppose that we add a channel with a capacity c0, and F0 is the completely mixed equilibrium in the resulting system. Then the linear social costs become LSC(w, c, F0) − LSC(w, c, F) =− W (n − 1)c0 + (m + 1)W(m + n) (C + c0)(n − 1) − mW(m + n − 1) C(n − 1) = W (n − 1)Cc0(C + c0) ( Cc0(2m + n − 1) − C2 − mc2 0(m + n − 1) ) . The remaining part of the proof coincides with that of Theorem 9.2. Example 9.6 Choose a system with two users sending their traffic of the volumes w1 = 1 and w2 = 3, respectively, through two parallel channels of the capacities c1 = c2 = 1. The completely mixed equilibrium is the strategy profile, where all equilibrium probabilities equal 0.5. The linear social costs in the equilibrium constitute 4. If we add a new channel with some capacity 6 7 < c0 < 1, a completely mixed equilibrium exists and the equilibrium prices make up p0 1 = 7c0−6 2+c0 ; p0 2 = 5c0−2 3(2+c0) ; p1 1 = p2 1 = 4−3c0 2+c0 ; p1 2 = p2 2 = 4−c0 3(2+c0) . In the new system, the linear social costs in the equilibrium become 28c0−8c2 0−8 c0(c0+2) > 4. 9.6 The price of anarchy in the model with parallel channels and indivisible traffic Revert to the system with m homogeneous parallel channels and n players. Take the maximal social costs MSC(w, c, L). Without loss of generality, we believe that the capacity c of all channels is 1 and w1 ≥ w2 ≥ … ≥ wn.LetP be some Nash equilibrium. Designate by pj i the probability that player i selects channel j. The quantity Mj specifies the expected traffic in channel j, j = 1, … , m. Then Mj = n∑ i=1 pj iwi. (6.1) In the Nash equilibrium P, the optimal strategy of player i is employing only channels j, where his delay 𝜆j i = wi + n∑ k=1,k≠i pj kwk attains the minimal value (𝜆j i = 𝜆i,ifpj i > 0, and 𝜆j i >𝜆i,if pj i = 0). Reexpress the quantity 𝜆j i as 𝜆j i = wi + n∑ k=1,k≠i pj kwk = Mj + (1 − pj i)wi. (6.2) www.it-ebooks.info NETWORK GAMES 325 Denote by Si the support of player i strategy, i.e., Si = {j : pj i > 0}. In the sequel, we write Sj i = 1, if pj i > 0, and Sj i = 0, otherwise. Suppose that we know the supports S1, … , Sn of the strategies of all players. In this case, the strategies proper are defined by Mj + (1 − pj i)wi = 𝜆i, Sj i > 0, i = 1, … , n; j = 1, … , m. Hence it appears that pj i = Mj + wi − 𝜆i wi . (6.3) According to (6.1), for all j = 1, … , m we have Mj = n∑ i=1 Sj i(Mj + wi − 𝜆i). Moreover, since the equality m∑ j=1 pj i = 1 takes place for all players i, we also obtain m∑ j=1 Sj i(Mj + wi − 𝜆i) = wi, i = 1, … , n. Reexpress the social costs as the expected maximal traffic over all channels: SC(w, P) = m∑ j1=1 … m∑ jn=1 n∏ i=1 pji i max l=1,…,m ∑ k:lk=l wk. (6.4) Denote by opt =minP SC(w, P) the optimal social costs. Now, calculate the price of anarchy in this model. Recall that it represents the ratio of the social costs in the worst-case Nash equilibrium and the optimal social costs: PA =sup P−equilibrium SC(w, P) opt . Let P indicate some mixed strategy profile and qi be the probability that player i chooses the maximal delay channel. Then SC(w, P) = m∑ i=1 wiqi. In addition, introduce the probability that players i and k choose the same channel—the quantity tik. Consequently, the inequality P(A ∪ B) = P(A) + P(B) − P(A ∩ B) ≤ 1 implies that qi + qk ≤ 1 + tik. www.it-ebooks.info 326 MATHEMATICAL GAME THEORY AND APPLICATIONS Lemma 9.4 The following condition holds true in the Nash equilibrium P: ∑ k≠i tikwk = 𝜆i − wi, i = 1, … , n. Proof: First, note that tik = m∑ j=1 pj ipj k. In combination with (6.1), this yields ∑ k≠i tikwk = m∑ j=1 pj i ∑ k≠i pj kwk = m∑ j=1 pj i ( Mj − pj iwi ) . According to (6.3), if pj i > 0, then Mj − pj iwi = 𝜆i − wi. Thus, we can rewrite the last expres- sion as ∑ k≠i tikwk = m∑ j=1 pj i(𝜆i − wi) = 𝜆i − wi. Lemma 9.5 The following estimate takes place: 𝜆i ≤ 1 m n∑ i=1 wi + m − 1 m wi, i = 1, … , n. Proof: Proof is immediate from the expressions 𝜆i =minj { Mj + ( 1 − pj iwi )} ≤ 1 m m∑ j=1 { Mj + ( 1 − pj iwi )} = 1 m m∑ j=1 Mj + m − 1 m wi = 1 m n∑ i=1 wi + m − 1 m wi. Now, we can evaluate the price of anarchy in a two-channel network (see Figure 9.6). Figure 9.6 A two-channel network. www.it-ebooks.info NETWORK GAMES 327 Theorem 9.4 Consider the model with n inhomogeneous users and two homogeneous parallel channels. The corresponding price of anarchy constitutes 3/2. Proof: Construct an upper estimate for the social costs SC(w, P). Rewrite them as SC(w, P) = m∑ k=1 qkwk = ∑ k≠i qkwk + qiwi = ∑ k≠i (qi + qk)wk − ∑ k≠i qiwk + qiwi. (6.5) Since qi + qk ≤ 1 + tik,wehave ∑ k≠i (qi + qk)wk ≤ ∑ k≠i (1 + tik)wk. In the case of m = 2, Lemmas 9.4 and 9.5 imply that ∑ k≠i tikwk = ci − wi ≤ 1 2 n∑ k=1 wk − 1 2wi = 1 2 n∑ k≠i wk. Hence it appears that ∑ k≠i (qi + qk)wk ≤ 3 2 n∑ k≠i wk, and the function (6.5) can be estimated by SC(w, P) ≤ (3 2 − qi ) m∑ k=1 wk + ( 2qi − 3 2 ) wi. Note that opt ≥ max { w1, 1 2 ∑ k wk } . Indeed, if w1 ≥ 1 2 ∑ k wk, then w1 ≥ w2 + ⋯ + wn. The optimal strategy lies in transmitting the packet w1 through one channel, whereas the rest packets should be sent by another channel. Accordingly, the delay makes up w1.Ifw1 < 1 2 ∑ k wk, the optimal strategy is distributing each packet between the channels equiprobably; the corresponding delay equals 1 2 ∑ k wk. Then, if some player i meets the inequality qi ≥ 3∕4, one obtains SC(w, P) ≤ (3 2 − qi ) 2opt + ( 2qi − 3 2 ) opt = 3 2opt. At the same time, if all players i are such that qi < 3∕4, we get SC(w, P) = m∑ k=1 qkwk ≤ 3 4 ∑ k wk ≤ 3 2opt. www.it-ebooks.info 328 MATHEMATICAL GAME THEORY AND APPLICATIONS Therefore, all Nash equilibria P satisfy the inequality SC(w, P) ≤ 3 2 opt. And so, PA =sup P SC(w, P) opt ≤ 3 2 . To derive a lower estimate, consider a system with two homogeneous channels and two players, where w1 = w2 = 1. Obviously, the worst-case equilibrium is pj i = 1∕2fori = 1, 2; j = 1, 2. The expected maximal load of the network makes up 1 ⋅ 1∕2 + 2 ⋅ 1∕2 = 3∕2. The maximal value of opt = 1 is achieved when each channel transmits a single packet. Thus, we have found the precise estimate for the price of anarchy in a system with two homogeneous channels. 9.7 The price of anarchy in the optimal routing model with linear social costs and indivisible traffic for an arbitrary network Up to this point, we have explored networks with parallel channels. Now, switch to network games with an arbitrary topology. Interpret the optimal routing problem as a non-cooperative game Γ=⟨N, G, Z, f⟩, where users (players) N = (1, 2, … , n) send traffic via some channels of a network G = (V, E). The symbol G stands for an undirected graph with a node set V and an edge set E (see Figure 9.7). For each user i, there exists Zi—a set of routes from si to ti via channels G. We suppose that the volume of user traffic is 1. Further analysis covers two types of network games, viz., symmetrical ones (all players have identical strategy sets Zi), and asymmetrical ones (all players have different strategy sets). Each channel e ∈ E possesses a given capacity ce > 0. Users pursue individual interests— they choose routes of traffic transmission to minimize the maximal traffic delay on the way from s to t. Each user selects a specific strategy Ri ∈ Zi, which represents the route used by player i for his traffic. Consequently, the vector R = (R1, … , Rn) forms the pure strategy profile of all users. For a strategy profile R, we again introduce the notation (R−i, R′ i) = (R1, … , Ri−1, R′ i, Ri+1, … , Rn). It indicates that user i has modified his strategy from Ri to R′ i, while the rest users keep their strategies invariable. Figure 9.7 An asymmetrical network game with 10 channels. www.it-ebooks.info NETWORK GAMES 329 For each channel, define its load ne(R) as the number of players involving channel e in the strategy profile R. The traffic delay on a given route depends on the loads of channels in this route. Consider the linear latency function fe(k) = aek + be, where ae and be specify non-negative constants. For the sake of simplicity, we take the case fe(k) = k. All relevant results are easily extended to the general case. Each user i strives to minimize the total traffic delay over all channels in his route: ci(R) = ∑ e∈Ri fe(ne(R)) = ∑ e∈Ri ne(R). This function represents the individual costs of user i. A Nash equilibrium is defined as a strategy profile such that none of the players benefits by unilateral deviation from this strategy profile (provided that the rest players still follow their strategies). Definition 9.6 A strategy profile R is called a Nash equilibrium, if for each user i ∈ Nwe have ci(R) ≤ ci(R−i, R′ i). We emphasize that this game is a special case of the congestion game analyzed in Section 3.4. Recall that players choose some objects (channels) from their feasible sets Zi, i ∈ N, and the payoff function of a player depends on the number of other players choosing the same object. Such observation guarantees that the game in question always admits a pure strategy equilibrium. Therefore, further consideration focuses on pure strategies only. Take the linear (total) costs of all players as the social costs, i.e., SC(R) = n∑ i=1 ci(R) = n∑ i=1 ∑ e∈Ri ne(R) = ∑ e∈E n2 e(R). Designate by opting the minimal social costs. Evaluate the ratio of the social costs in the worst-case Nash equilibrium and the optimal costs. In other words, find the price of anarchy PA =sup R−equilibrium SC(R) opt . Theorem 9.5 In the asymmetrical model with indivisible traffic and linear delays, the price of anarchy constitutes 5/2. Proof: We begin with upper estimate derivation. Let R∗ be a Nash equilibrium and R form an arbitrary strategy profile (possibly, the optimal one). To construct an upper estimate for the price of anarchy, compare the social costs in these strategy profiles. In the Nash equilibrium R∗, the costs of player i under switching to the strategy Ri do not decrease: ci(R∗) = ∑ e∈R∗i ne(R∗) ≤ ∑ e∈Ri ne(R∗ −i, Ri). In the case of switching by player i, the number of players on each channel may increase by unity only. Therefore, ci(R∗) ≤ ∑ e∈Ri (ne(R∗) + 1). www.it-ebooks.info 330 MATHEMATICAL GAME THEORY AND APPLICATIONS Summing up these inequalities over all i yields SC(R∗) = n∑ i=1 ci(R∗) ≤ n∑ i=1 ∑ e∈Ri (ne(R∗) + 1) = ∑ e∈E ne(R)(ne(R∗) + 1). We will need the following technical result. Lemma 9.6 Any non-negative integers 𝛼, 𝛽 meet the inequality 𝛽(𝛼 + 1) ≤ 1 3 𝛼2 + 5 3 𝛽2. Proof: Fix 𝛽 and consider the function f(𝛼) = 𝛼2 + 𝛽2 − 3𝛽(𝛼 + 1). This is a parabola, whose node lies in the point 𝛼 = 3∕2𝛽. The minimal value equals f (3 2 𝛽 ) = 1 4 𝛽(11𝛽 − 12). If 𝛽 ≥ 2, the above value appears positive. Hence, the lemma holds true for 𝛽 ≥ 2. In the cases of 𝛽 = 0, 1, the inequality can be verified directly. Using Lemma 9.6, we obtain the upper estimate SC(R∗) ≤ 1 3 ∑ e∈E n2 e(R∗) + 5 3 ∑ e∈E n2 e(R) = 1 3SC(R∗) + 5 3SC(R), whence it follows that SC(R∗) ≤ 5 2SC(R) for any strategy profiles R. This immediately implies that PA ≤ 5∕2. To argue that PA ≥ 5∕2, we provide an example of a network, where the price of anarchy is 5∕2. Consider a network with the topology illustrated by Figure 9.8. Three players located in node 0 send their traffic through network channels, {h1, h2, h3, g1, g2, g3}. Each player chooses between just two pure strategies. For player 1, these are the routes (h1, g1)or(h2, h3, g2). For player 2, these are the routes (h2, g2)or(h1, h3, g3). And finally, for player 3, these are the routes (h3, g3)or(h1, h2, g1). Evidently, the optimal distribution of players consists in choosing the first strategies, (h1, g1), (h2, g2), and (h3, g3). The corresponding social costs constitute 2. The worst-case Nash equilibrium results from selection of the second strategies: (h2, h3, g2), (h1, h3, g3), (h1, h2, g1). Really, imagine that, e.g., player 1 (the equilibrium costs are 5) switches to the first strategy (h1, g1). Then his costs still make up 5. Therefore, the price of anarchy in the described network is 5∕2. This concludes the proof of Theorem 9.5. The symmetrical model, where all players have the same strategy set, ensures a smaller price of anarchy. Theorem 9.6 Consider the n-player symmetrical model with indivisible traffic and linear delays. The price of anarchy equals (5n − 2)/(2n + 1). www.it-ebooks.info NETWORK GAMES 331 Figure 9.8 A network with three players and three channels (h1, h2, h3, g1, g2, g3). Proof: Let R∗ be a Nash equilibrium and R represent the optimal strategy profile, which minimizes the social costs. We estimate the costs of player i in the equilibrium, i.e., the quantity ci(R∗). As he deviates from the equilibrium by choosing another strategy Rj (this is possible, since the strategy sets of all players coincide), the costs rise accordingly: ci(R∗) = ∑ e∈R∗i ne(R∗) ≤ ∑ e∈Rj ne(R∗ −i, Rj). Moreover, ne(R∗ −i, Rj) differs from ne(R∗) by 1 in the channels, where e ∈ Rj − R∗ i . Hence it appears that ci(R∗) ≤ ∑ e∈Rj ne(R∗) + |Rj − R∗ i |, where |R| means the number of elements in R. As far as A − B = A − A ∩ B,wehave ci(R∗) ≤ ∑ e∈Rj ne(R∗) + |Rj| − |Rj ∩ R∗ i |. Summation over all j ∈ N brings to the inequalities nci(R∗) ≤ n∑ j=1 ∑ e∈Rj ne(R∗) + n∑ j=1 (|Rj| − |Rj ∩ R∗ i |) ≤ ∑ e∈E ne(R)ne(R∗) + ∑ e∈E ne(R) − ∑ e∈R∗i ne(R). www.it-ebooks.info 332 MATHEMATICAL GAME THEORY AND APPLICATIONS Now, by summing up over all i ∈ N, we get nSC(R∗) ≤ n∑ i=1 ∑ e∈E ne(R)ne(R∗) + n∑ i=1 ∑ e∈E ne(R) − n∑ i=1 ∑ e∈R∗i ne(R) = n ∑ e∈E ne(R)ne(R∗) + n ∑ e∈E ne(R) − ∑ e∈E ne(R)ne(R∗) = (n − 1) ∑ e∈E ne(R)ne(R∗) + n ∑ e∈E ne(R). Rewrite this inequality as SC(R∗) ≤ n − 1 n ∑ e∈E ( ne(R)ne(R∗) + ne(R) ) + 1 n ∑ e∈E ne(R), and apply Lemma 9.6: SC(R∗) ≤ n − 1 3n ∑ e∈E n2 e(R∗) + 5(n − 1) 3n ∑ e∈E n2 e(R) + 1 n ∑ e∈E n2 e(R) = n − 1 3n SC(R∗) + 5n − 2 3n SC(R). This immediately implies that SC(R∗) ≤ 5n − 2 2n + 1SC(R), and the price of anarchy enjoys the upper estimate PA ≤ (5n − 2)∕(2n + 1). To obtain a lower estimate, it suffices to give an example of a network with the price of anarchy (5n − 1)∕(2n + 1). We leave this exercise to an interested reader. Theorem 9.6 claims that the price of anarchy is smaller in the symmetrical model than in its asymmetrical counterpart. However, as n increases, the price of anarchy reaches the level of 5∕2. 9.8 The mixed price of anarchy in the optimal routing model with linear social costs and indivisible traffic for an arbitrary network In Section 9.6 we have estimated the price of anarchy by considering only pure strategy equilibria. To proceed, find the mixed price of anarchy for arbitrary networks with linear delays. Suppose that players can send traffic of different volumes through channels of a network G = (V, E). Well, consider an asymmetrical optimal routing game Γ=⟨N, G, Z, w, f⟩, where players N = (1, 2, … , n) transmit traffic of corresponding volumes {w1, w2, … , wn}. For each user i, there is a given set Zi of pure strategies, i.e., a set of routes from si to ti via channels of www.it-ebooks.info NETWORK GAMES 333 the network G. The traffic delay on a route depends on the load of engaged channels. We understand the load of a channel as the total traffic volume transmitted through this channel. Assume that the latency function on channel e has the linear form fe(k) = aek + be, where k indicates channel load, ae and be are non-negative constants. Then the total traffic delay on the complete route makes the sum of traffic delays on all channels of a route. Users pursue individual interests and choose routes for their traffic to minimize the delay during traffic transmission from s to t. Each user i ∈ N adopts a mixed strategy Pi, i.e., player i sends his traffic wi by the route Ri ∈ Zi with the probability pi(Ri), i = 1, … , n, ∑ Ri∈Zi pi(Ri) = 1. A set of mixed strategies forms a strategy profile P = {P1, … , Pn} in this game. Each user i strives to minimize the expected delay of his traffic on all engaged routes: ci(P) = ∑ R∈Z n∏ j=1 pj(Rj) ∑ e∈Ri fe(ne(R)) = ∑ R∈Z n∏ j=1 pj(Rj) ∑ e∈Ri ( aene(R) + be ) , where ne(R) is the load of channel e under a given strategy profile R. The function ci(P) specifies the individual costs of user i. On the other hand, the function SC(P) = n∑ i=1 wici(P) = ∑ R∈Z n∏ j=1 pj(Rj) ∑ e∈E ne(R)fe(ne(R)) gives the social costs. Let P∗ be a Nash equilibrium. We underline that a Nash equilibrium exists due to strategy set finiteness. Denote by R∗ the optimal strategy profile ensuring the minimal social costs. Obviously, it consists of pure strategies of players, i.e., R∗ = (R∗ 1, … , R∗ n). Then each user i ∈ N obeys the inequality ci(P∗) ≤ ci(P∗ −i, R∗ i ). Here (P∗ −i, R∗ i ) means that in the strategy profile P∗ player i chooses the pure strategy R∗ i instead of the mixed strategy P∗i. In the equilibrium, we have the condition ci(P∗) ≤ ci(P∗ −i, R∗ i ) = ∑ R∈Z n∏ j=1 pj(Rj) ∑ e∈R∗i fe(ne(R−i, R∗ i )), i = 1, … , n. Note that, in any strategy profile (R−i, R∗ i ), only player i deviates actually. And so, the load of any channel in the route R∗ i may increase at most by wi, i.e., fe(ne(R−i, R∗ i )) ≤ fe(ne(R) + wi). Hence it follows that ci(P∗) ≤ ∑ R∈Z n∏ j=1 pj(Rj) ∑ e∈R∗i fe(ne(R) + wi), i = 1, … , n. www.it-ebooks.info 334 MATHEMATICAL GAME THEORY AND APPLICATIONS Multiply these inequalities by wi and perform summation from 1 to n. Such manipulations lead to SC(P∗) = n∑ i=1 wici(P∗) ≤ ∑ R∈Z n∏ j=1 pj(Rj) n∑ i=1 ∑ e∈R∗i wi fe(ne(R) + wi). Using the linear property of the latency functions, we arrive at the inequalities SC(P∗) ≤ ∑ R∈Z n∏ j=1 pj(Rj) n∑ i=1 ∑ e∈R∗i wi ( ae(ne(R) + wi) + be ) = ∑ R∈Z n∏ j=1 pj(Rj) n∑ i=1 ⎛ ⎜ ⎜⎝ ∑ e∈R∗i aene(R)wi + aew2 i ⎞ ⎟ ⎟⎠ + n∑ i=1 ∑ e∈R∗i bewi = ∑ R∈Z n∏ j=1 pj(Rj) ∑ e∈E ae ( ne(R)ne(R∗) + ne(R∗)2 ) + ∑ e∈E bene(R∗). (8.1) Further exposition will employ the estimate from Lemma 9.7. Lemma 9.7 Any non-negative numbers 𝛼, 𝛽 meet the inequality 𝛼𝛽 + 𝛽2 ≤ z 2 𝛼2 + z + 3 2 𝛽2, (8.2) where z = ( √ 5 − 1)∕2 ≈ 0.618 is the golden section of the interval [0, 1]. Proof: Fix 𝛽 and consider the function f(𝛼) = z 2 𝛼2 + z + 3 2 𝛽2 − 𝛼𝛽 − 𝛽2 = z 2 𝛼2 + z + 1 2 𝛽2 − 𝛼𝛽. This is a parabola with the vertex 𝛼 = 𝛽∕z. The minimal value of the parabola equals f (𝛽 z ) = 𝛽2 ( z + 1 − 1 z ) . The expression in brackets (see the value of z equal to the golden section) actually vanishes. This directly gives inequality (9.2). In combination with inequality (9.2), the condition (9.1) implies that SC(P∗) ≤ ∑ R∈Z n∏ j=1 pj(Rj) ∑ e∈E ae ( z 2ne(R)2 + z + 3 2 ne(R∗)2 ) + ∑ e∈E bene(R∗) ≤ z 2 ∑ R∈Z n∏ j=1 pj(Rj) ∑ e∈E ( aene(R)2 + bene(R) ) + z + 3 2 ( aene(R∗)2 + bene(R∗) ) = z 2SC(P∗) + z + 3 2 SC(R∗). (8.3) Now, it is possible to estimate the price of anarchy for a pure strategy Nash equilibrium. www.it-ebooks.info NETWORK GAMES 335 Theorem 9.7 Consider the n-player asymmetrical model with indivisible traffic and linear delays. The mixed price of anarchy does not exceed z + 2 = ( √ 5 + 3)∕2 ≈ 2.618. Proof: It follows from (9.3) that SC(P∗) ≤ z + 3 2 − zSC(R∗). By virtue of golden section properties, z+3 2−z = z + 2. Consequently, the ratio of the social costs in the Nash equilibrium and the optimal costs, i.e., the quantity PA = SC(P∗) SC(R∗), does not exceed z + 2 ≈ 2.618. The proof of Theorem 9.7 is finished. Remark. The price of anarchy in pure strategies is 5∕2 = 2.5. Transition to mixed strategies slightly increases the price of anarchy (up to 2.618). This seems natural, since the worst-case Nash equilibrium can be achieved in mixed strategies. 9.9 The price of anarchy in the optimal routing model with maximal social costs and indivisible traffic for an arbitrary network We have demonstrated that, in the case of linear social costs, the price of anarchy possesses finite values. However, if we select the maximal costs of a player as the social costs, the price of anarchy takes arbitrary large values. Illustrate this phenomenon by an example—consider a network in Figure 9.9. The network comprises the basic nodes {v0, v1, … , vk}. The nodes vi, vi+1 are connected through k routes; one of them (see abscissa axis) has the length of 1, whereas the rest routes possess the length of k. Player 1 (suffering from the maximal costs) sends his traffic from node v0 to node vk. Note that he can employ routes lying on abscissa axis only. Each node vi, i = 0, … , k − 1 contains k − 1 players transmitting their traffic from vi to vi+1. Evidently, the optimal social costs of k are achieved if player 1 sends his traffic via the route v0, v1, … , vk, Figure 9.9 A network with k2 − k + 1 players. Player 1 follows the route (v0, v1, … , vk). Node vi has k − 1 players following the route (vi, vi+1). The delay on the main channel equals 1, the delay on the rest channels is k. The price of anarchy makes up k. www.it-ebooks.info 336 MATHEMATICAL GAME THEORY AND APPLICATIONS and all k − 1 players in the node vi are distributed among the rest routes (a player per a specific route). Readers can easily observe the following. Here, the worst-case Nash equilibrium is when all n = (k − 1)k + 1 players send their traffic through routes lying on abscissa axis. However, then the costs of player 1 (ergo, the maximal social costs) constitute (k − 1)k + 1. Hence, the price of anarchy in this model is defined by PA = k2 − k + 1 k = √ n + O(1). Now, construct an upper estimate for the price of anarchy in an arbitrary network with indivisible traffic. Suppose that R∗ is a Nash equilibrium and R designates the optimal strategy profile guaranteeing the minimal social costs. In our case, the social costs represent the maximal costs of players. Without loss of generality, we believe that in the equilibrium the maximal costs are attained for player 1, i.e., SC(R∗) = c1(R∗). To estimate the price of anarchy, apply the same procedure as in the proof of Theorem 9.6. Compare the maximal costs SC(R∗) in the equilibrium and the maximal costs for the strategy profile R—the quantity SC(R) =maxi∈N ci(R). So long as R∗ forms a Nash equilibrium, we have c1(R∗) ≤ ∑ e∈R1 (ne(R∗) + 1) ≤ ∑ e∈R1 ne(R∗) + |R1| ≤ ∑ e∈R1 ne(R∗) + c1(R). The last inequality follows from clear considerations: if player 1 chooses channels from R1, his delay is greater or equal to the number of channels in R1. Finally, let us estimate ∑ e∈R1 ne(R∗). Using the inequality ( ∑n i=1 ai)2 ≤ n ∑n i=1 a2 i ,we have ( ∑ e∈R1 ne(R∗) )2 ≤ |R1| ∑ e∈R1 n2 e(R∗) ≤ |R1| ∑ e∈E n2 e(R∗) = n∑ i=1 ci(R∗). According to Theorem 9.5, n∑ i=1 ci(R∗) ≤ 5 2 n∑ i=1 ci(R). And so, ( ∑ e∈R1 ne(R∗) )2 ≤ |R1|5 2 n∑ i=1 ci(R), which means that c1(R∗) ≤ c1(R) + √√√√|R1|5 2 n∑ i=1 ci(R). www.it-ebooks.info NETWORK GAMES 337 Since |R1| ≤ c1(R) and ci(R) ≤ SC(R), we get the inequality c1(R∗) ≤ SC(R) ( 1 + √ 5 2n ) . The last expression implies that the price of anarchy admits the upper estimate 1 + √ 5∕2n. As a matter of fact, we have established the following result. Theorem 9.8 Consider the n-player asymmetrical model with indivisible traffic and the maximal costs of players as the social costs. The price of anarchy constitutes O( √ n). This theorem shows that the price of anarchy may possess arbitrary large values. 9.10 The Wardrop optimal routing model with divisible traffic The routing model studied in this section is based on the Wardrop model with divisible traffic suggested in 1952. Here the optimality criterion lies in traffic delay minimization. The optimal traffic routing problem is treated as a game Γ=⟨n, G, w, Z, f⟩, where n users transmit their traffic by network channels; the network has the topology described by a graph G = (V, E). For each user i, there exists a certain set Zi of routes from si to ti via channels G and a given volume of traffic wi. Next, each channel e ∈ E possesses some capacity ce > 0. All users pursue individual interests and choose routes for their traffic to minimize the maximal delay during traffic transmission from s to t. Each user selects a specific strategy xi = {xiRi ≥ 0}Ri∈Zi . The quantity xiRi determines the volume of traffic sent by user i through route Ri, and∑ Ri∈Zi xiRi = wi. Then x = (x1, … , xn) represents a strategy profile of all users. For a strategy profile x, we again introduce the notation (x−i, x′ i) = (x1, … , xi−1, x′ i, xi+1, … , xn). It indicates that user i has modified his strategy from xi to x′ i, while the rest users keep their strategies invariable. For each channel e ∈ E, define its load (the total traffic through this channel) by 𝛿e(x) = n∑ i=1 ∑ Ri∈Zi:e∈Ri xiRi . The traffic delay on a given route depends on the loads of channels in this route. The continuous latency function fiRi (x) = fiRi ({𝛿e(x)}e∈Ri ) is specified for each user i and each route Ri engaged by him. Actually, it represents a non-decreasing function with respect to the loads of channels in a route (ergo, with respect to xiRi ). Each user i strives to minimize the maximal traffic delay over all channels in his route: PCi(x) =maxRi∈Zi:xiRi >0 fiRi (x). This function represents the individual costs of user i. A Nash equilibrium is defined as a strategy profile such that none of the players benefit by unilateral deviation from this strategy profile (provided that the rest players still follow www.it-ebooks.info 338 MATHEMATICAL GAME THEORY AND APPLICATIONS their strategies). In terms of the current model, the matter concerns a strategy profile such that none of the players can reduce his individual costs by modifying his strategy. Definition 9.7 A strategy profile x is called a Nash equilibrium, if for each user i and any strategy profile x′ = (x−i, x′ i) we have PCi(x) ≤ PCi(x′). Within the framework of network models, an important role belongs to the concept of a Wardrop equilibrium. Definition 9.8 A strategy profile x is called a Wardrop equilibrium, if for each i and any Ri, 𝜌i ∈ Zi the condition xiRi > 0 leads to fiRi (x) ≤ fi𝜌i (x). This definition can be restated similarly to the definition of a Nash equilibrium. Definition 9.9 A strategy profile x is a Wardrop equilibrium, if for each i the following condition holds true: the inequality xiRi > 0 leads to fiRi (x) =min𝜌i∈Zi fi𝜌i (x) = 𝜆i and the equality xiRi = 0 yields fiRi (x) ≥ 𝜆i. Such explicit definition provides a system of equations and inequalities for evaluating Wardrop equilibrium strategy profiles. Strictly speaking, the definitions of a Nash equilibrium and a Wardrop equilibrium are not equivalent. Their equivalence depends on the type of latency functions in channels. Theorem 9.9 If a strategy profile x represents a Wardrop equilibrium, then x is a Nash equilibrium. Proof: Let x be a strategy profile such that for all i we have the following condition: the inequality xiRi > 0 brings to fiRi (x) =min𝜌i∈Zi fi𝜌i (x) = 𝜆i and the equality xiRi = 0 implies fiRi (x) ≥ 𝜆i. Then for all i and Ri one obtains max𝜌i∈Zi:xi𝜌i >0 fi𝜌i (x) ≤ fiRi (x). Suppose that user i modifies his strategy from xi to x′ i. In this case, denote by x′ = (x−i, x′ i) a strategy profile such that, for user i, the strategies on all his routes Ri ∈ Zi change to x′ iRi = xiRi +ΔRi , where ∑ Ri∈Zi ΔRi = 0. The rest users k ≠ i adhere to the same strategies as before, i.e., x′ k = xk. If all ΔRi = 0, then PCi(x) = PCi(x′). Assume that x ≠ x′, viz., there exists a route Ri such that ΔRi > 0. This route meets the condition fiRi (x) ≤ fiRi (x′), since fiRi (x) is a non-decreasing function in xiRi . As far as x′ iRi > 0, we get fiRi (x′) ≤ max𝜌i∈Zi:xi𝜌i >0 fi𝜌i (x′). Finally, max𝜌i∈Zi:xi𝜌i >0 fi𝜌i (x) ≤ max𝜌i∈Zi:xi𝜌i >0 fi𝜌i (x′), or PCi(x) ≤ PC(x′). Hence, due to the arbitrary choice of i and x′ i, we conclude that the strategy profile x forms a Nash equilibrium. www.it-ebooks.info NETWORK GAMES 339 Figure 9.10 A Nash equilibrium mismatches a Wardrop equilibrium. Any Nash equilibrium in the model considered represents a Wardrop equilibrium under the following sufficient condition imposed on all latency functions. For a given user, it is possible to redistribute a small volume of his traffic from any route to other (less loaded) routes for this user such that the traffic delay on this route becomes strictly smaller. Example 9.7 Consider a simple example explaining the difference between the definitions of a Nash equilibrium and a Wardrop equilibrium. A system contains one user, who sends traffic of volume 1 from node s to node t via two routes (see Figure 9.10). Suppose that the latency functions on route 1 (which includes channels (1,2,4)) and on route 2 (which includes channels (1,3,4)) have the form f1(x) =max{1, x,1}= 1 and f2(y) =min{1, y,1}= y, respectively; here x = 1 − y. Both functions are continuous and non- decreasing in x and y, respectively. The inequality f1(x) > f2(y) takes place for all feasible strategy profiles (x, y) such that x + y = 1. However, any reduction in x (the volume of traffic through channel 1) does not affect f1(x). In the described model, a Nash equilibrium is any strategy profile (x,1− x), where 0 ≤ x ≤ 1. Still, the delays in both channels coincide only for the strategy profile (0, 1). Definition 9.10 Let x indicate some strategy profile. The social costs are the total delay of all players under this strategy profile: SC(x) = n∑ i=1 ∑ Ri∈Zi xiRi fiRi (x). Note that, if x represents a Wardrop equilibrium, then (by definition) for each player i the delays on all used routes Ri equal 𝜆i(x). Therefore, the social costs in the equilibrium acquire the form SC(x) = n∑ i=1 wi𝜆i(x). Designate by opt =minx SC(x) the minimal social costs. Definition 9.11 We call the price of anarchy the maximal value of the ratio SC(x)∕opt, where the social costs are evaluated only in Wardrop equilibria. www.it-ebooks.info 340 MATHEMATICAL GAME THEORY AND APPLICATIONS 9.11 The optimal routing model with parallel channels. The Pigou model. Braess’s paradox We analyze the Wardrop model for a network with parallel channels. Example 9.8 The Pigou model (1920). Consider a simple network with two parallel channels (see Figure 9.11). One channel possesses the fixed capacity of 1, whereas the second channel has the capacity proportional to traffic. Imagine very many users transmitting their traffic from node s to node t such that the total load is 1. Each user seeks to minimize his costs. Then a Nash equilibrium lies in employing the lower channel for each user. Indeed, if the upper channel comprises a certain quantity of players, the lower channel always guarantees a smaller delay than the upper one. Therefore, the costs of each player in the equilibrium make up 1. Furthermore, the social costs constitute 1 too. Now, assume that some share x of users utilize the upper channel, and the rest users (the share 1 − x) employ the lower channel. Then the social costs become x ⋅ 1 + (1 − x) ⋅ (1 − x) = x2 − x + 1. The minimal social costs of 3∕4 correspond to x = 1∕2. Obviously, the price of anarchy in the Pigou model is PA = 4∕3. Example 9.9 Consider the same two-channel network, but set the delay in the lower channel equal to xp, where p means a certain parameter. A Nash equilibrium also consists in sending the traffic of all users through the lower channel (the social costs make up 1). Next, send some volume 𝜖 of traffic by the upper channel. The corresponding social costs 𝜖 ⋅ 1 + (1 − 𝜖)p+1 possess arbitrary small values as 𝜖 → 0 and p → ∞. And so, the price of anarchy can have arbitrary large values. Example 9.10 (Braess’s paradox). Recall that we have explored this phenomenon in the case of indivisible traffic. Interestingly, Braess’s paradox arises in models with divisible traffic. Select a network composed of four nodes, see Figure 9.4. There are two routes from node s to node t with the identical delays of 1 + x. Suppose that the total traffic of all users equals 1. Owing to the symmetry of this network, all users get partitioned into two equal groups with the identical costs of 3∕2. This forms a Nash equilibrium. To proceed, imagine that we have constructed a new superspeed channel (CD) with zero delay. Then, for each user, the route A → C → D → B is always not worse than the route A → C → B or A → D → B. Nevertheless, the costs of all players increase up to 2 in the new Figure 9.11 The Pigou model. www.it-ebooks.info NETWORK GAMES 341 equilibrium. This example shows that adding a new channel may raise the costs of individual players and the social costs. 9.12 Potential in the optimal routing model with indivisible traffic for an arbitrary network Let Γ=⟨n, G, w, Z, f⟩ be the Wardrop model, where n users send traffic via channels of a network. Its topology is defined by a graph G = (V, E). The quantity W = n∑ i=1 wi specifies the total volume of data packets of all players. Denote by xiRi the strategy of player i; actually, this is the part of traffic transmitted through the channel Ri. Note that ∑ Ri∈Zi xiRi = wi, xiRi ≥ 0. For each edge e, a given strictly increasing continuous function fe(𝛿(x)) taking non-negative values on [0, W] characterizes the delay on this edge. We believe that the delay of player i on the route Ri has the additive form fiRi (𝛿(x)) = ∑ e∈Ri fe(𝛿e(x)), i.e., represents the sum of delays on all channels of this route. Consider a game with the payoff functions PCi(x) =maxRi∈Zi:xiRi >0 fiRi (x) =maxRi∈Zi:xiRi >0 ∑ e∈Ri fe(𝛿e(x)). Introduce the potential P(x) = ∑ e∈E 𝛿e(x) ∫ 0 fe(t)dt. Since 𝛿 ∫ 0 fe(t)dt is a differentiable function with non-decreasing derivative, the above function enjoys convexity. Theorem 9.10 A strategy profile x forms a Wardrop equilibrium (ergo, a Nash equilibrium) iff P(x) =miny P(y). Proof: Let x be a Wardrop equilibrium and y mean an arbitrary strategy profile. The convexity of the function P(x)impliesthat P(y) − P(x) ≥ n∑ i=1 ∑ Ri∈Zi 𝜕P(x) 𝜕xiRi (yiRi − xiRi ). (12.1) www.it-ebooks.info 342 MATHEMATICAL GAME THEORY AND APPLICATIONS Clearly, 𝜕P(x) 𝜕xiRi = ∑ e∈Ri fe(𝛿e(x)). According to the Wardrop equilibrium condition, for any player i we have 𝜆i(x) = ∑ e∈Ri fe(𝛿e(x)), xiRi>0, 𝜆i(x) ≤ ∑ e∈Ri fe(𝛿e(x)), xiRi=0. Expand the second sum in (12.1) into two sums as follows. Where yiRi − xiRiI ≥ 0, take advantage of the inequality 𝜕P(x) 𝜕xiRi ≥ 𝜆i(x). In the second sum, we have yiRi − xiRi < 0. Hence, xiRi > 0, and the equilibrium condition brings to 𝜕P(x) 𝜕xiRi = 𝜆i(x). As a result, P(y) − P(x) ≥ n∑ i=1 ∑ Ri∈Zi 𝜆i(x)(yiRi − xiRi ) = n∑ i=1 𝜆i(x) ∑ Ri∈Zi (yiRi − xiRi ). On the other hand, for any player i and any strategy profile we have ∑ Ri∈Zi yiRi = ∑ Ri∈Zi xiRi = wi. Then it follows that P(y) ≥ P(x), ∀y. And so, x minimizes the potential P(y). Now, let the strategy profile x be the minimum point of the function P(y). Assume that x is not a Wardrop equilibrium. Then there exists player i and two routes Ri, 𝜌i ∈ Zi such that xRi > 0 and ∑ e∈Ri fe(𝛿e(x)) > ∑ e∈𝜌i fe(𝛿e(x)). (12.2) Next, take the strategy profile x and replace the traffic on the routes Ri and 𝜌i such that yRi = xRi − 𝜖 and y𝜌i = xRi + 𝜖. This is always possible for a sufficiently small 𝜖, so long as xRi > 0. Then the inequality P(x) − P(y) ≥ n∑ i=1 ∑ Ri∈Zi 𝜕P(y) 𝜕xiRi (yiRi − xiRi ) = 𝜖 ( ∑ e∈Ri fe(𝛿e(y)) − ∑ e∈𝜌i fe(𝛿e(y)) ) > 0 www.it-ebooks.info NETWORK GAMES 343 holds true for a sufficiently small 𝜖 by virtue of inequality (12.2) and the continuity of the function fe(𝛿e(y)). This contradicts the hypothesis that P(x) is the minimal value of the potential. The proof of Theorem 9.10 is finished. We emphasize that potential represents a continuous function defined on the compact set of all feasible strategy profiles x. Hence, this function admits a minimum, and a Nash equilibrium exists. Generally, researchers employ linear latency functions fe(𝛿) = ae𝛿 + be, as well as latency functions of the form fe(𝛿) = 1∕(ce − 𝛿)orfe(𝛿) = 𝛿∕(ce − 𝛿), where ce indicates the capacity of channel e. 9.13 Social costs in the optimal routing model with divisible traffic for convex latency functions Consider a network with an arbitrary topology, where the latency functions fe(𝛿) are differ- entiable increasing convex functions. Then the social costs acquire the form SC(x) = n∑ i=1 ∑ Ri∈Zi xiRi ∑ e∈Ri fe(𝛿e(x)) = ∑ e∈E 𝛿e(x)fe(𝛿e(x)), i.e., become a convex function. Note that 𝜕SC(x) 𝜕xiRi = ∑ e∈Ri ( fe(𝛿e(x)) + 𝛿e(x)f ′ e (𝛿e(x)) ) = ∑ e∈Ri f ∗ e (𝛿e(x)). The expression f ∗ e (𝛿e(x)) will be called the marginal costs on channel e. By repeating argumentation of Theorem 9.10 for the function SC(x) (instead of potential), we arrive at the following assertion. Theorem 9.11 A strategy profile x minimizes the social costs SC(x) =miny SC(y) iff the inequality ∑ e∈Ri f ∗ e (𝛿e(x)) ≤ ∑ e∈𝜌i f ∗ e (𝛿e(x)) holds true for any i and any routes Ri, 𝜌i ∈ Zi, where xiRi>0. For instance, choose the linear latency functions fe(𝛿) = ae𝛿 + be. The marginal costs are determined by f ∗ e (𝛿) = 2ae𝛿 + be, and the minimum condition of the social costs in the strategy profile x takes the following form. For any player i and any routes Ri, 𝜌i ∈ Zi, where xiRi>0,wehave ∑ e∈Ri ( 2ae𝛿e(x) + be ) ≤ ∑ e∈𝜌i ( 2ae𝛿(x) + be ) . www.it-ebooks.info 344 MATHEMATICAL GAME THEORY AND APPLICATIONS The last condition can be reexpressed as follows. For any player i, the inequality xiRi > 0 implies that ∑ e∈Ri (2ae𝛿e(x) + be) = 𝜆∗ i (x), while the equality xiRi = 0 brings to∑ e∈Ri (2ae𝛿e(x) + be) ≥ 𝜆∗ i (x) Compare this result with the conditions when a strategy profile x forms a Wardrop equilibrium. Corollary. If a strategy profile x is a Wardrop equilibrium in the model ⟨n, G, w, Z, f⟩ with the linear latency function, then the strategy profile x∕2 minimizes the social costs in the model ⟨n, G, w∕2, Z, f⟩, where the traffic of all players is cut by half. 9.14 The price of anarchy in the optimal routing model with divisible traffic for linear latency functions Consider the game ⟨n, G, w, Z, f⟩ with the linear latency functions fe(𝛿) = ae𝛿 + be, where ae > 0, e ∈ E.Letx∗ be a strategy profile ensuring the optimal social costs SC(x∗) = miny SC(y). Lemma 9.8 The social costs in the Wardrop model with doubled traffic ⟨n, G,2w, Z, f⟩ grow, at least, to the quantity SC(x∗) + n∑ i=1 𝜆∗ i (x∗)wi. Proof: Take an arbitrary strategy profile x in the model with double traffic. The following inequality can be easily verified: (ae𝛿e(x) + be)𝛿e(x) ≥ (ae𝛿e(x∗) + be)𝛿e(x∗) + (𝛿e(x) − 𝛿e(x∗)(2ae𝛿e(x∗) + be). It appears equivalent to the inequality (𝛿(x) − 𝛿(x∗))2 ≥ 0. In the accepted system of symbols, this inequality takes the form fe(𝛿e(x))𝛿e(x) ≥ fe(𝛿e(x∗))𝛿e(x∗) + (𝛿e(x) − 𝛿e(x∗)f ∗ e (𝛿e(x∗)). Summation over all e ∈ E yields the expressions SC(x) = ∑ e∈E fe(𝛿e(x))𝛿e(x) ≥ ∑ e∈E fe(𝛿e(x∗))𝛿e(x∗) + ∑ e∈E (𝛿e(x) − 𝛿e(x∗))f ∗ e (𝛿e(x∗)). And so, SC(x) ≥ SC(x∗) + n∑ i=1 ∑ Ri∈Zi ( xiRi − x∗ iRi ) ∑ e∈Ri f ∗ e (𝛿e(x∗)). www.it-ebooks.info NETWORK GAMES 345 Since x∗ specifies the minimum point of SC(x), Theorem 9.10 implies that ∑ e∈Ri f ∗ e (𝛿e(x∗)) = 𝜆∗ i (x∗) under x∗ iRi > 0 and ∑ e∈Ri f ∗ e (𝛿e(x∗)) ≥ 𝜆∗ i (x∗) under x∗ iRi = 0. Hence, it follows that SC(x) ≥ SC(x∗) + n∑ i=1 𝜆∗ i (x∗) ∑ Ri∈Zi ( xiRi − x∗ iRi ) . By the assumption, ∑ Ri∈Zi (xiRi − x∗ iRi ) = 2wi − wi = wi. Therefore, SC(x) ≥ SC(x∗) + n∑ i=1 𝜆∗ i (x∗)wi. This concludes the proof of Lemma 9.8. Theorem 9.12 The price of anarchy in the Wardrop model with linear latency functions constitutes PA = 4∕3. Proof: Suppose that x represent a Wardrop equilibrium in the model ⟨n, G, w, Z, f⟩. Then, according to the corollary of Theorem 9.11, the strategy profile x∕2 yields the minimal social costs in the model ⟨n, G, w∕2, Z, f⟩. Lemma 9.8 claims that, if we double traffic in this model (i.e., getting back to the initial traffic w), for any strategy profile y the social costs can be estimated as follows: SC(y) ≥ SC(x∕2) + n∑ i=1 𝜆∗ i (x∕2)wi 2 = SC(x∕2) + 1 2 n∑ i=1 𝜆i(x)wi. Recall that x forms a Wardrop equilibrium. Then ∑n i=1 𝜆i(x)wi = SC(x), whence it appears that SC(y) ≥ SC(x∕2) + 1 2SC(x). Furthermore, SC(x∕2) = ∑ e∈E 𝛿e(x∕2)fe(𝛿e(x∕2)) = ∑ e∈E 1 2 𝛿e(x) (1 2ae𝛿e(x) + be ) ≥ 1 4 ∑ e∈E ( ae𝛿2 e (x) + be𝛿e(x) ) = 1 4SC(x). www.it-ebooks.info 346 MATHEMATICAL GAME THEORY AND APPLICATIONS These inequalities lead to SC(y) ≥ 3 4 SC(x) for any strategy profile y (particularly, for the strategy profile guaranteeing the minimal social costs). Consequently, we obtain the upper estimate for the price of anarchy: PA =sup x−equilibrium SC(x) opt ≤ 4 3 . The corresponding lower estimate has been established in the Pigou model, see Sec- tion 9.10. The proof of Theorem 9.12 is completed. 9.15 Potential in the Wardrop model with parallel channels for player-specific linear latency functions In the preceding sections, we have studied models with identical latency functions of all players on each channel (this latency function depends on channel load only). However, in real games channel delays may have different prices for different players. In this case, we speak about network games with player-specific delays. Consider the Wardrop model ⟨n, G, w, Z, f⟩ with parallel channels (Figure 9.12) and linear latency functions of the form fie(𝛿) = aie𝛿. Here the coefficients aie are different for different players i ∈ N and channels e ∈ E. Let x = {xie, i ∈ N, e ∈ E} be some strategy profile, ∑ e∈E xie = wi, i = 1, … , n. Introduce the function P(x) = n∑ i=1 ∑ e∈E xie ln aie + ∑ e∈E 𝛿e(x) ln 𝛿e(x). Theorem 9.13 A strategy profile x makes a Wardrop equilibrium iff P(x) =miny P(y). Proof: We begin with the essentials. Assume that x is a Wardrop equilibrium. Find the derivative of the function P: 𝜕P(x) 𝜕xie = 1 +lnaie +ln ( n∑ k=1 xke ) = 1 +ln ( aie n∑ k=1 xke ) . Figure 9.12 The Wardrop model with parallel channels and linear delays. www.it-ebooks.info NETWORK GAMES 347 The equilibrium conditions require that for all i ∈ N and e, l ∈ E: xie > 0 ⇒ aie n∑ k=1 xke ≤ ail n∑ k=1 xkl. Due to the monotonicity of the function ln x, this inequality leads to xie > 0 ⇒ 𝜕P(x) 𝜕xie ≤ 𝜕P(x) 𝜕xil , ∀i, e, l. Subsequent reasoning is similar to that of Theorem 9.10. The function x ln x,aswellasthe linear function are convex. On the other hand, the sum of convex functions also represents a convex function. Thus and so, P(x) becomes a convex function. This function is continuously differentiable. The convexity of P(x)impliesthat P(y) − P(x) ≥ n∑ i=1 ∑ e∈E 𝜕P(x) 𝜕xie (x)(yie − xie). By virtue of the equilibrium conditions, we have xie > 0 ⇒ 𝜕P(x) 𝜕xie = 𝜆i, ∀e ∈ E, xie = 0 ⇒ 𝜕P(x) 𝜕xie ≥ 𝜆i, ∀e ∈ E. Under the second condition xie = 0, we get yie − xie ≥ 0, and then 𝜕P(x) 𝜕xie (x)(yie − xie) ≥ 𝜆i(yie − xie). This brings to the expressions P(y) − P(x) ≥ n∑ i=1 ∑ e∈E 𝜆i(yie − xie) = n∑ i=1 𝜆i ∑ e∈E (yie − xie) = 0. Consequently, P(y) ≥ P(x) for all y; hence, x is the minimum point of the function P(x). Now, argue the sufficiency part of Theorem 9.13. Imagine that x is the minimum point of the function P(y). Proceed by reductio ad absurdum. Conjecture that x is not a Wardrop equilibrium. Then for some player k there are two channels p and q such that xkp > 0 and akp𝛿p(x) > akq𝛿q(x). In this case, there exists a number z :0< z < xkp meeting the condition akp(𝛿p(x) − z) ≥ akq(𝛿q(x) + z). Define a new strategy profile y such that all strategies of players i ≠ k remain the same, whereas the strategy of player k acquires the form yke = ⎧ ⎪ ⎨ ⎪⎩ xkp − z,ife = p xkq + z,ife = q xke, otherwise. www.it-ebooks.info 348 MATHEMATICAL GAME THEORY AND APPLICATIONS Consider the difference P(x) − P(y) = n∑ i=1 ∑ e∈E (xie − yie) ln aie + ∑ e∈E (𝛿e(x) ln 𝛿e(x) − 𝛿e(y) ln 𝛿e(y)). (15.1) Both sums in (15.1) have non-zero terms corresponding to player k and channels p, q only: P(x) − P(y) = z(ln akp −lnakq) + 𝛿p(x) ln 𝛿p(x) + 𝛿q(x) ln 𝛿q(x) −(𝛿p(x) − z) ln(𝛿p(x) − z) − (𝛿q(y) + z) ln(𝛿q(x) + z) =ln ( az kp ⋅ 𝛿p(x)𝛿p(x) ⋅ 𝛿q(x)𝛿q(x) ) −ln ( az kq ⋅ (𝛿p(x) − z)𝛿p(x)−z ⋅ (𝛿q(x) + z)𝛿q(x)+z ) . Below we demonstrate Lemma 9.9, which claims that the last expression is strictly positive. But, in this case, one obtains P(x) > P(y). This obviously contradicts the condition that x is the minimum point of the function P(y). And the desired conclusion follows. Lemma 9.9 Let a, b, u, v, and z be non-negative, u ≥ z. If a(u − z) ≥ b(v + z),then az ⋅ uu ⋅ vv > bz ⋅ (u − z)u−z ⋅ (v + z)v+z. Proof: First, show the inequality > ( 𝛼 𝛼 − 1 )𝛼 > e > ( 1 + 1 𝛽 )𝛽 , 𝛼>1, 𝛽>0. (15.2) It suffices to notice that the function f(𝛼) = ( 1 + 1 𝛼 − 1 )𝛼 =exp ( 𝛼 ln ( 1 + 1 𝛼 − 1 )) , being monotonically decreasing, tends to e as 𝛼 → ∞. The monotonous property follows from the negativity of the derivative f ′(𝛼) = f(𝛼) ( ln ( 1 + 1 𝛼 − 1 ) − 1 𝛼 − 1 ) < 0forall𝛼>1. By analogy, readers can verify the right-hand inequality. Now, set 𝛼 = u∕z and 𝛽 = v∕z. Then the condition a(u − z) ≥ b(v + z) implies that a(𝛼z − z) ≥ b(𝛽z + z), whence it appears that a(𝛼 − 1) ≥ b(𝛽 + 1). Due to inequality (15.2), we have a𝛼𝛼𝛽𝛽 > a(𝛼 − 1)𝛼(𝛽 + 1)𝛽 ≥ b(𝛼 − 1)𝛼−1(𝛽 + 1)𝛽+1. Multiply the last inequality by z𝛼+𝛽, a(z𝛼)𝛼(z𝛽)𝛽 > b(z𝛼 − z)𝛼−1(z𝛽 + z)𝛽+1, www.it-ebooks.info NETWORK GAMES 349 and raise to the power of z to get az(z𝛼)z𝛼(z𝛽)z𝛽 > bz(z𝛼 − z)z𝛼−z(z𝛽 + z)z𝛽+z. This proves Lemma 9.9. 9.16 The price of anarchy in an arbitrary network for player-specific linear latency functions consider the Wardrop model ⟨n, G, w, Z, f⟩ for an arbitrary network with divisible traffic and linear latency functions of the form fie(𝛿) = aie𝛿. Here the coefficients aie are different for different players i ∈ N and channels e ∈ E. An important characteristic lies in Δ= max i,k∈N,e∈E { aie ake } , i.e., the maximal ratio of delays over all players and channels. We have demonstrated that the price of anarchy in an arbitrary network with linear delays (identical for all players) equals 4∕3. The price of anarchy may grow appreciably, if the latency functions become player-specific. Still, it is bounded by the quantity Δ. The proof of this result employs the following inequality. Lemma 9.10 For any u, v ≥ 0 and Δ > 0, we have uv ≤ 1 2Δu2 + Δ 2 v2. Proof is immediate from the representation 1 2Δu2 + Δ 2 v2 − uv = Δ 2 ( u Δ − v )2 ≥ 0. Theorem 9.14 The price of anarchy in the Wardrop model with player-specific linear costs does not exceed Δ. Proof: Let x be a Wardrop equilibrium and x∗ denote a strategy profile minimizing the social costs. Consider the social costs in the equilibrium: SC(x) = n∑ i=1 ∑ Ri∈Zi xiRi ∑ e∈Ri aie𝛿e(x). By the definition of a Wardrop equilibrium, the delays on all used channels coincide, i.e.,∑ e∈Ri aie𝛿e(x) = 𝜆i,ifxiRi > 0. This implies that SC(x) = n∑ i=1 ∑ Ri∈Zi xiRi ∑ e∈Ri aie𝛿e(x) ≤ n∑ i=1 ∑ Ri∈Zi x∗ iRi ∑ e∈Ri aie𝛿e(x). www.it-ebooks.info 350 MATHEMATICAL GAME THEORY AND APPLICATIONS Rewrite the last expression as n∑ i=1 ∑ Ri∈Zi x∗ iRi ∑ e∈Ri aie 𝛿e(x∗) 𝛿e(x∗)𝛿e(x), and take advantage of Lemma 9.10: SC(x) ≤ n∑ i=1 ∑ Ri∈Zi x∗ iRi ∑ e∈Ri aie 𝛿e(x∗) (Δ 2 𝛿2 e (x∗) + 1 2Δ 𝛿2 e (x) ) = Δ 2 n∑ i=1 ∑ Ri∈Zi x∗ iRi ∑ e∈Ri aie𝛿e(x∗) + 1 2Δ ∑ e∈E n∑ i=1 ∑ Ri∈Zi:e∈Ri x∗ iRi 𝛿e(x∗)aie𝛿2 e (x) = Δ 2 SC(x∗) + 1 2Δ ∑ e∈E n∑ i=1 ∑ Ri∈Zi:e∈Ri x∗ iRi 𝛿e(x∗) ⋅ aie𝛿2 e (x). To estimate the second term in the last formula, make the following observation. If the equalities x1 + x2 + ⋯ + xn = y1 + y2 + ⋯ + yn = 1 hold true with non-negative summands, one obtains the estimate a1x1 + a2x2 + ⋯ + anxn a1y1 + a2y2 + ⋯ + anyn ≤ max{ai} min{ai} , ai > 0, i = 1, … , n. On the other hand, in the last expression for any e ∈ E we have n∑ i=1 ∑ Ri∈Zi:e∈Ri x∗ iRi 𝛿e(x∗) = n∑ i=1 ∑ Ri∈Zi:e∈Ri xiRi 𝛿e(x) = 1. Hence, SC(x) ≤ Δ 2 SC(x∗) + 1 2ΔΔ ∑ e∈E n∑ i=1 ∑ Ri∈Zi:e∈Ri xiRi 𝛿e(x) ⋅ aie𝛿2 e (x). Certain simplifications yield SC(x) ≤ Δ 2 SC(x∗) + 1 2 n∑ i=1 ∑ Ri∈Zi xiRi ∑ e∈Ri aie𝛿e(x) = Δ 2 SC(x∗) + 1 2SC(x). And so, the estimate SC(x) SC(x∗) ≤ Δ is valid for any equilibrium. This proves Theorem 9.14. www.it-ebooks.info NETWORK GAMES 351 Therefore, in an arbitrary network with player-specific linear delays, the price of anarchy appears finite and depends on the ratio of latency function coefficients of different players. Exercises 1. Three identical parallel channels are used to send four data packets w1 = 2, w2 = 2, w3 = 3, and w4 = 4. The latency function has the form f(w) = w c . Find a pure strategy Nash equilibrium and a completely mixed equilibrium. 2. Three parallel channels with the capacities c1 = 1.5, c2 = 2 and c3 = 2.5 transmit four identical data packets. Evaluate a pure strategy Nash equilibrium and a completely mixed equilibrium under linear delays. 3. Two parallel channels with the capacities c1 = 1 and c2 = 2 transmit five packets w1 = 2, w2 = 2, w3 = 2, w4 = 4, and w5 = 5. Find a Nash equilibrium in the class of pure strategies and mixed strategies. Calculate the social costs in the linear, quadratic, and maximin forms. 4. Three parallel channels with the capacities c1 = 2, c2 = 2.5, and c3 = 4 are used to send four data packets w1 = 5, w2 = 7, w3 = 10, and w4 = 12. The latency function possesses the linear form. Evaluate the worst-case Nash equilibrium. Compute the corresponding social costs and the price of anarchy. 5. Two channels with the capacities c1 = 1 and c2 = 2 transmit four data packets w1 = 3, w2 = 4, w3 = 6, and w4 = 8. The latency functions are linear. One channel is added to the network. Which capacity of the additional channel will increase the social costs? 6. Consider a network with parallel channels. The social costs have the quadratic form. One channel is added to the network. Is it possible that the resulting social costs go up? s t c1=5 c2=5 c3=5 c4=5 c5=5 7. Take a Wardrop network illustrated by the figure above. Four data packets w1 = 1, w2 = 1, w3 = 2, and w4 = 3 are sent from node s to node t. The latency function is defined by f(w) = 1 c−w . Find a Wardrop equilibrium. Calculate the linear and maximal social costs. 8. In a Wardrop network, three parallel channels with the capacities c1 = 3, c2 = 3, and c3 = 4 transmit four data packets w1 = 1, w2 = 1, w3 = 2, and w4 = 3. The latency function is defined by f(w) = 1 c−w . Find a Wardrop equilibrium, calculate the linear social costs and evaluate the price of anarchy. 9. Consider the Wardrop model. In a general form network, the delay on channel e possesses the form fe(w) = 1 ce−w . Find the potential of this network. 10. Choose the player-specific Wardrop model. A network comprises parallel channels. For player i, the delay on channel e is given by fie(w) = aew + bie. Find the potential of such network. www.it-ebooks.info 10 Dynamic games Introduction Dynamic games are remarkable for their evolvement over the course of time. Here players control some object or system whose dynamics is described by a set of difference equations or differential equations. In the case of objects moving in certain space, pursuit games arise naturally. Players strive to approach an opponent’s object at minimum time or maximize the probability of opponent’s object detection. Another interpretation concerns economic or ecological systems, where players seek to gain the maximal income or cause the minimal environmental damage. Definition 10.1 A dynamic game is a game Γ=< N, x,{Ui}n i=1,{Hi}n i=1 >, where N = {1, 2, … , n} denotes the set of players, x′(t) = f(x, u1, … , un, t), x(0) = x0, x = (x1, … , xm), 0 ≤ t ≤ T, indicates a controlled system in the space Rm,U1, … , Un are the strategy sets of players 1, … , n, respectively, and a function Hi(u1, … , un) specifies the payoff of player i ∈ N. A controlled system is considered on a time interval [0, T] (finite or infinite). Player strategies represent some functions ui = ui(t), i = 1, … , n. Depending on selected strategies, each player receives the payoff Hi(u1, … , un) = ∫ T 0 gi(x(t), u1(t), … , un(t), t)dt + Gi(x(T), T), i = 1, … , n. Actually, it consists of the integral component and the terminal component; gi and Gi, i = 1, … , n are given functions. Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info DYNAMIC GAMES 353 There exist cooperative and non-cooperative dynamic games. Solutions of non- cooperative games are comprehended in the sense of Nash equilibria. Definition 10.2 A Nash equilibrium in the game Γ is a set of strategies (u∗ 1, … , u∗ n) such that Hi ( u∗ −i, ui ) ≤ Hi(u∗) for arbitrary strategies ui, i = 1, … , n. Our analysis begins with discrete-time dynamic games often called “fish wars.” 10.1 Discrete-time dynamic games Imagine a certain dynamic system governed by the system of difference equations xt+1 = (xt)𝛼, t = 0, 1, … , where 0 <𝛼≤ 1. For instance, this is the evolution of some fish population. The initial state x0 of the system is given. Interestingly, the system admits the stationary state x = 1. If x0 > 1, the population diminishes approaching infinitely the limit state x = 1. In the case of x0 < 1, the population spreads out with the same asymptotic line. Suppose that two countries (players) perform fishing; they aim at maximizing the income (the amount of fish) on some time interval. The utility function of each player depends on the amount of fish u caught by this player and takes the form log(u). The discounting coefficients 𝛽1 (player 1) and 𝛽2 (player 2) are given, 0 <𝛽i < 1, i = 1, 2. Find a Nash equilibrium in this game. 10.1.1 Nash equilibrium in the dynamic game First, we study this problem on a finite interval. Take the one-shot model. Assume that players decide to catch the amounts u1 and u2 at the initial instant (here u1 + u2 ≤ x0). At the next instant t = 1, the population of fish has the size x1 = (x0 − u1 − u2)𝛼. The game finishes and, by the agreement, players divide the remaining amount of fish equally. Consequently, the payoff of player 1 makes up H1(u1, u2) =logu1 + 𝛽1 log ( 1 2 (x − u1 − u2)𝛼 ) = =logu1 + 𝛼𝛽1 log(x − u1 − u2) − 𝛽1 log 2, x = x0. The factor 𝛽1 corresponds to payoff reduction due to the discounting effect. Similarly, player 2 obtains the payoff H2(u1, u2) =logu2 + 𝛼𝛽2 log(x − u1 − u2) − 𝛽2 log 2, x = x0. www.it-ebooks.info 354 MATHEMATICAL GAME THEORY AND APPLICATIONS The functions H1(u1, u2) and H2(u1, u2) are convex, and a Nash equilibrium exists. For its evaluation, solve the system of equations 𝜕H1∕𝜕u1 = 0, 𝜕H2∕𝜕u2 = 0, or 1 u1 − 𝛼𝛽1 x − u1 − u2 = 0, 1 u2 − 𝛼𝛽2 x − u1 − u2 = 0. And so, the equilibrium is defined by u′ 1 = 𝛼𝛽2 (1 + 𝛼𝛽1)(1 + 𝛼𝛽2) − 1 ⋅ x, u′ 2 = 𝛼𝛽1 (1 + 𝛼𝛽1)(1 + 𝛼𝛽2) − 1 ⋅ x. The size of the population after fishing constitutes x − u′ 1 − u′ 2 = 𝛼2𝛽1𝛽2 (1 + 𝛼𝛽1)(1 + 𝛼𝛽2) − 1 ⋅ x. The players’ payoffs in the equilibrium become H1 ( u′ 1, u′ 2 ) = (1 + 𝛼𝛽1) log x + a1, H2 ( u′ 1, u′ 2 ) = (1 + 𝛼𝛽2) log x + a2, where the constants a1, a2 follow from the expressions ai =log ( 𝛼𝛽j(𝛼2𝛽1𝛽2)𝛼𝛽i [(1 + 𝛼𝛽1)(1 + 𝛼𝛽2) − 1]1+𝛼𝛽i ) − 𝛽i log 2, i, j = 1, 2, i ≠ j. Now, suppose that the game includes two shots, i.e., players can perform fishing twice. As a matter of fact, we have determined the optimal behavior and payoffs of both players at the last shot (though, under another initial condition). Hence, the equilibrium in the two-shot model results from maximization of the new payoff functions H2 1(u1, u2) =logu1 + 𝛼𝛽1(1 + 𝛼𝛽1) log(x − u1 − u2) + 𝛽1a1, x = x0, H2 2(u1, u2) =logu1 + 𝛼𝛽2(1 + 𝛼𝛽2) log(x − u1 − u2) + 𝛽2a2, x = x0. Still, the payoff functions keep their convexity. The Nash equilibrium appears from the system of equations 1 u1 − 𝛼𝛽1(1 + 𝛼𝛽1) x − u1 − u2 = 0, 1 u2 − 𝛼𝛽2(1 + 𝛼𝛽2) x − u1 − u2 = 0. Again, it possesses the linear form: u2 1 = 𝛼𝛽2(1 + 𝛼𝛽2) (1 + 𝛼𝛽1)(1 + 𝛼𝛽2) − 1 ⋅ x, u2 2 = 𝛼𝛽1(1 + 𝛼𝛽1) (1 + 𝛼𝛽1)(1 + 𝛼𝛽2) − 1 ⋅ x. www.it-ebooks.info DYNAMIC GAMES 355 We continue such construction procedure to arrive at the following conclusion. In the n-shot fish war game, the optimal strategies of players are defined by un 1 = 𝛼𝛽2 n−1∑ j=0 (𝛼𝛽2) j n∑ j=0 (𝛼𝛽1) j n∑ j=1 (𝛼𝛽2) j − 1 ⋅ x, un 2 = 𝛼𝛽1 n−1∑ j=0 (𝛼𝛽1) j n∑ j=0 (𝛼𝛽1) j n∑ j=1 (𝛼𝛽2) j − 1 ⋅ x. (1.1) After shot n, the population of fish has the size x − un 1 − un 2 = 𝛼2𝛽1𝛽2 n−1∑ j=0 (𝛼𝛽1) j n−1∑ j=0 (𝛼𝛽2) j n∑ j=0 (𝛼𝛽1) j n∑ j=1 (𝛼𝛽2) j − 1 ⋅ x. (1.2) As n → ∞, the expressions (1.1), (1.2) admit the limits u∗ 1 = limn→∞ un 1 = 𝛼𝛽2(1 − 𝛼𝛽1)x 1 − (1 − 𝛼𝛽1)(1 − 𝛼𝛽2), u∗ 2 = limn→∞ un 2 = 𝛼𝛽1(1 − 𝛼𝛽2)x 1 − (1 − 𝛼𝛽1)(1 − 𝛼𝛽2) . Therefore, x − u∗ 1 − u∗ 2 = limn→∞ x − un 1 − un 2 = kx, where k = 𝛼2𝛽1𝛽2x 1 − (1 − 𝛼𝛽1)(1 − 𝛼𝛽2) . To proceed, we revert to the problem in its finite horizon setting. Suppose that at each shot players adhere to the strategies u∗ 1, u∗ 2. Starting from the initial state x0, the system evolves according to the law xt+1 = (xt − u∗ 1(xt) − u∗ 2(xt))𝛼 = k𝛼x𝛼 t−1 = k𝛼( kx𝛼 t−1 )𝛼 = k𝛼+𝛼2 x𝛼2 t−1 = ... = k t∑ j=1 𝛼j ⋅ x𝛼t 0 , t = 0, 1, 2, ... Under large t, the system approaches the stationary state ̄x = ⎛ ⎜ ⎜⎝ 1 1 𝛼𝛽1 + 1 𝛼𝛽2 − 1 ⎞ ⎟ ⎟⎠ 𝛼 1−𝛼 . (1.3) In the case of 𝛽1 = 𝛽2 = 𝛽, the stationary state has the form ̄x = ( 𝛼𝛽 2−𝛼𝛽 ) 𝛼 1−𝛼 . www.it-ebooks.info 356 MATHEMATICAL GAME THEORY AND APPLICATIONS We focus on the special linear case. Here the population of fish demonstrates the following dynamics: xt+1 = r(xt − u1 − u2), r > 1. Apply the same line of reasoning as before to get the optimal strategies of players in the Nash equilibrium for the multi-shot finite-horizon game: un 1 = 𝛽2 n−1∑ j=0 (𝛽2) j n∑ j=0 (𝛽1) j n∑ j=1 (𝛽2) j − 1 ⋅ x, un 2 = 𝛽1 n−1∑ j=0 (𝛽1) j n∑ j=0 (𝛽1) j n∑ j=1 (𝛽2) j − 1 ⋅ x. As n → ∞, we obtain the limit strategies u∗ 1 = 𝛽2(1 − 𝛽1)x 1 − (1 − 𝛽1)(1 − 𝛽2), u∗ 2 = 𝛽1(1 − 𝛽2)x 1 − (1 − 𝛽1)(1 − 𝛽2) . As far as x − u∗ 1 − u∗ 2 = x 1 𝛽1 + 1 𝛽2 − 1 , the optimal strategies of the players lead to the population dynamics xt = r 1 𝛽1 + 1 𝛽2 − 1 ⋅ xt−1 = ⎛ ⎜ ⎜⎝ r 1 𝛽1 + 1 𝛽2 − 1 ⎞ ⎟ ⎟⎠ t x0, t = 0, 1, ... Obviously, the population dynamics in the equilibrium essentially depends on the coefficient r∕( 1 𝛽1 + 1 𝛽2 − 1). The latter being smaller than 1, the population degenerates; if this coefficient exceeds 1, the population grows infinitely. And finally, under strict equality to 1, the population possesses the stable size. In the case of identical discounting coefficients (𝛽1 = 𝛽2 = 𝛽), further development or extinction of the population depends on the sign of 𝛽(r + 1) − 2. 10.1.2 Cooperative equilibrium in the dynamic game Get back to the original model xt = x𝛼 t−1 with 𝛼<1. Assume that players agree about joint actions. We believe that 𝛽1 = 𝛽2 = 𝛽. Denote by u = u1 + u2 the general control. Arguing by analogy, readers can easily establish the optimal strategy in the n-shot game: un = 1 − 𝛼𝛽 1 − (𝛼𝛽)n+1 ⋅ x. www.it-ebooks.info DYNAMIC GAMES 357 The corresponding limit strategy is u∗ = (1 − 𝛼𝛽)x. And the population dynamics in the cooperative equilibrium acquires the form xt = (𝛼𝛽xt−1)𝛼 = (𝛼𝛽)𝛼+𝛼2+...+𝛼t ⋅ x𝛼t 0 , t = 0, 1, ... For large t, it tends to the stationary state ̂x = (𝛼𝛽) 𝛼 1−𝛼 . (1.4) By comparing the stationary states (1.3) and (1.4) in the cooperative equilibrium and in the Nash equilibrium, we can observe that ̂x = (𝛼𝛽) 𝛼 1−𝛼 ≥ ̄x = ( 𝛼𝛽 2 − 𝛼𝛽 ) 𝛼 1−𝛼 . In other words, cooperative actions guarantee a higher size of the population. Now, juxtapose the payoffs of players in these equilibria. In the cooperative equilibrium, at each shot players have the total payoff uc = (1 − 𝛼𝛽)̂x = (1 − 𝛼𝛽)(𝛼𝛽) 𝛼 1−𝛼 . (1.5) Non-cooperative play brings to the following sum of their payoffs (under 𝛽1 = 𝛽2): un = u∗ 1 + u∗ 2 = 2𝛼𝛽(1 − 𝛼𝛽) 1 − (1 − 𝛼𝛽)2 ⋅ ̄x = 2(1 − 𝛼𝛽) 2 − 𝛼𝛽 ( 𝛼𝛽 2 − 𝛼𝛽 ) 𝛼 1−𝛼 . (1.6) Obviously, 2 < (2 − 𝛼𝛽) 1 1−𝛼 ,0<𝛼, 𝛽<1, which means that uc > un. Thus and so, cooperative behavior results in a benevolent sce- nario for the population and, furthermore, ensures higher payoffs to players (as against their independent actions). This difference has the largest effect in the linear case xt+1 = rxt, t = 0, 1, .... Here the cooperative behavior u = (1 − 𝛽)x leads to the population dynamics xt = r𝛽xt−1 = ... = (r𝛽)tx0 t = 0, 1, ... Its stationary state depends on the value of r𝛽. If this expression is higher (smaller) than 1, the population grows infinitely (diminishes, respectively). The case 𝛽 = 1∕r corresponds to stable population size. By virtue of r𝛽∕(2 − 𝛽) ≤ r𝛽, we may have a situation when r𝛽>1 and r𝛽∕(2 − 𝛽) < 1. This implies that, under the cooperative behavior of players, the population increases infinitely, whereas their egoistic behavior (each player pursues individual interests only) destroys the population. www.it-ebooks.info 358 MATHEMATICAL GAME THEORY AND APPLICATIONS 10.2 Some solution methods for optimal control problems with one player 10.2.1 The Hamilton–Jacobi–Bellman equation The principle of optimality was introduced by R. Bellman in 1958. Originally, the author suggested the following statement: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision. Consider the discrete-time control problem for one player. Let a controlled system evolve according to the law xt+1 = ft(xt, ut). Admissible control actions satisfy certain constraints u ∈ U, where U forms some domain in the space Rm. The player seeks to minimize the functional J(u) = G(xN) + N−1∑ t=0 gt(xt, ut), where ut = ut(xt). Introduce the Bellman function Bk(xk) =minuk,⋯,uN−1∈U N∑ i=k gi(xi, ui) + G(xN). We will address Bellman’s technique. Suppose that we are at the point xN−1 and it suffices to make one shot, i.e., choose uN−1. The payoff at this shot makes up JN−1 = gN−1(xN−1, uN−1) + G(xN) = gN−1(xN−1, uN−1) + G(fN−1(xN−1, uN−1)). Therefore, on the last shot the functional represents a function of two variables, xN−1 and uN−1. The minimum of this functional in uN−1 is the Bellman function minuN−1∈U JN−1 = BN−1(xN−1). Take two last shots: JN−2 = gN−2(xN−2, uN−2) + gN−1(xN−1, uN−1) + G(xN). www.it-ebooks.info DYNAMIC GAMES 359 We naturally have BN−2(xN−2) =minuN−2,uN−1 JN−2 =minuN−2 gN−2(xN−2, uN−2) +minuN−1 {gN−1(xN−1, uN−1) + G(xN)} =minuN−2 gN−2(xN−2, uN−2) + BN−1(xN−1) =minuN−2 {gN−2(xN−2, uN−2) + BN−1(fN−2(xN−2, uN−2))}. Proceeding by analogy, readers can construct the recurrent expression known as the Bellman equation: BN−k(xN−k) =minuN−k∈U {gN−k(xN−k, uN−k) + BN−k+1(fN−k(xN−k, uN−k))}. (2.1) Consequently, for optimal control evaluation, we search for Bk backwards. At any shot, implementing the minimum of the Bellman function gives the optimal control on this shot, u∗ k(xk). Now, switch to the continuous-time setting of the optimal control problem: x′(t) = f(t, x(t), u(t)), x(0) = x0, u ∈ U, J(u) = ∫ T 0 g(t, x(t), u(t))dt + G(x(T)) → min. Introduce the Bellman function V(x, t) =minu(s),t≤s≤T [ ∫ T t g(s, x(s), u(s))ds + G(x(T)) ] meeting the initial condition V(x, T) = G(x(T)). Using Bellman’s principle of optimality, rewrite the function as V(x, t) =minu(s),t≤s≤T [ ∫ t+Δt t g(s, x(s), u(s))ds + ∫ T t+Δt g(s, x(s), u(s))ds + G(x(T)) ] =minu(s),t≤s≤t+Δt { ∫ t+Δt t g(s, x(s), u(s))ds +minu(s),t+Δt≤s≤T [ ∫ T t+Δt g(s, x(s), u(s))ds + G(x(T)) ]} =minu(s),t≤s≤t+Δt { ∫ t+Δt t g(s, x(s), u(s))ds + V(x(t +Δt), t +Δt) } . www.it-ebooks.info 360 MATHEMATICAL GAME THEORY AND APPLICATIONS By assuming that the function V(x, t) is continuously differentiable, apply Taylor’s expansion to the integral to obtain V(x, t) =minu(s),t≤s≤t+Δt {g(t, x(t), u(t))Δt + V(x(t), t) + 𝜕V(x, t) 𝜕t Δt + 𝜕V(x, t) 𝜕x f(t, x(t), u(t))Δt + o(Δt)}. As Δt → 0, we derive the Hamilton–Jacobi–Bellman equation − 𝜕V(x(t), t) 𝜕t =minu(t)∈U [𝜕V(x(t), t) 𝜕x f(t, x(t), u(t)) + g(t, x(t), u(t)) ] (2.2) with the initial condition V(x, T) = G(x(T)). Theorem 10.1 Suppose that there exists a unique continuously differentiable solution V0(x, t) of the Hamilton–Jacobi–Bellman equation (2.2) and there exists an admissible control law u0(x, t) such that minu∈U [𝜕V0(x, t) 𝜕x f(t, x, u) + g(t, x, u) ] = 𝜕V0(x, t) 𝜕x f(t, x, u0) + g(t, x, u0). Then u0(x, t) makes the optimal control law, and the corresponding Bellman function is V0(x, t). Proof: Write down the total derivative of the function V0(x, t) by virtue of equation (2.2): V′ 0(x, t) = 𝜕V0(x, t) 𝜕t + 𝜕V0(x, t) 𝜕x f(t, x, u0) =−g(t, x, u0). Substitute into this equality the state x = x0(t) corresponding to the control action u0(t) to get V′ 0(x0(t), t) =−g(t, x0(t), u0(x0(t), t)). Integration over t from 0 to T yields V0(x0,0)= V(T, x(T)) + ∫ T 0 g(t, x0(t), u0(x0(t), t)) = J(u0). Assume that u(x, t) is any admissible control and x(t) designates the corresponding tra- jectory of the process. Due to equation (2.2), we have V′ 0(x(t), t) ≥ −g(t, x(t), u(x(t), t)). Again, perform integration from 0 to T to obtain J(u) ≥ V0(x0,0), www.it-ebooks.info DYNAMIC GAMES 361 whence it follows that J(u0) = V0(x0,0)≤ J(u). This proves optimality of u0. Let the function V(t, x) be twice continuously differentiable. Suppose that the Hamilton– Jacobi–Bellman equation admits a unique continuous solution V(x, t) and there exists admis- sible control u0(x, t), meeting the conditions of Theorem 10.1 with the trajectory x0(t). Introduce the following functions: 𝜓(t) =− 𝜕V(x(t), t) 𝜕x , H(t, x, u, 𝜓) = 𝜓(t)f(t, x, u) − g(t, x, u). Using Theorem 10.1, we have H(t, x0, u0, 𝜓) =− 𝜕V(x0, t) 𝜕x f(t, x0, u0) − g(t, x0, u0) =maxu∈U [ − 𝜕V(x0, t) 𝜕x f(t, x0, u) − g(t, x0, u) ] =maxu∈U H(t, x0, u, 𝜓). According to Theorem 10.1, 𝜕V(x, t) 𝜕t = H(t, x, u0, 𝜓). By differentiating with respect to x and setting x = x0, one obtains that 𝜕2V(x0, t) 𝜕t𝜕x = 𝜕H(t, x0, u0, 𝜓) 𝜕x =−𝜓′(t). Similarly, differentiation of the initial conditions brings to 𝜕V(x0(T), T) 𝜕x =−𝜓(T) = G′ x(x0(T)). Therefore, we have derived the maximum principle for the fixed-time problem. Actually, it can be reformulated for more general cases. 10.2.2 Pontryagin’s maximum principle Consider the continuous-time optimal control problem J(u) = ∫ T 0 f0(x(t), u(t))dt + G(x(T)) → min, x′(t) = f(x(t), u(t)), x(0) = x0, u ∈ U, where x = (x1, ⋯ , xn), u = (u1, ⋯ , ur), and f(x, u) = (f1(x, u), ⋯ , fn(x, u)). www.it-ebooks.info 362 MATHEMATICAL GAME THEORY AND APPLICATIONS Introduce the Hamiltonian function H(x, u, 𝜓) = n∑ i=0 𝜓i fi(x, u), (2.3) with 𝜓 = (𝜓0, ⋯ , 𝜓n) indicating the vector of conjugate variables. Theorem 10.2 (the maximum principle) Suppose that the functions fi(x, u) and G(x) have partial derivatives and, along with these derivatives, are continuous in all their arguments for x ∈ Rn,u∈ U, t ∈ [0, T]. A necessary condition for the control law u∗(t) and the trajectory x∗(t) to be optimal is that there exists a non-zero vector-function 𝜓(t) satisfying 1. the maximum condition H(x∗(t), u∗(t), 𝜓(t)) =maxu∈U H(x∗(t), u, 𝜓(t)); 2. the conjugate system in the conjugate variables 𝜓′(t) =− 𝜕H(x∗, u∗, 𝜓) 𝜕x ; (2.4) 3. the transversality condition 𝜓(T) =−G′ x(x∗(T)); (2.5) 4. the normalization condition 𝜓0(t) =−1. The proof of the maximum principle in the general statement appears rather complicated. Therefore, we are confined to the problem with fixed T and a free endpoint of the trajectory. Take a controlled system described by the differential equations dx dt = f(x, u), x(0) = x0, (2.6) where x = (x1, ⋯ , xn), u = (u1, ⋯ , ur), and f(x, u) = (f1(x, u), ⋯ , fn(x, u)). The problem consists in choosing an admissible control law u(t) which minimizes the functional Q = ∫ T 0 f0(x(t), u(t))dt, where T is a fixed quantity. Introduce an auxiliary variable x0(t) defined by the equation dx0 dt = f0(x, u), x0(0) = 0. (2.7) www.it-ebooks.info DYNAMIC GAMES 363 This leads to the problem Q = x0(T) → min. (2.8) Theorem 10.3 A necessary condition for an admissible control u(t) and the corresponding trajectory x(t) to solve the problem (2.6), (2.8) is that there exists a non-zero continuous vector-function 𝜓(t) meeting the conjugate system (2.4) such that 1. H(x∗(t), u∗(t), 𝜓(t)) =maxu∈U H(x∗(t), u, 𝜓(t)); 2. 𝜓(T) = (−1, 0, ⋯ ,0). Proof: Let u∗(t) be an optimal control law and x∗(t) mean the corresponding optimal trajec- tory of the system. We address the method of needle-shaped variations—this is a standard technique to prove the maximum principle in the general case. Notably, consider an infinitely small time interval 𝜏 − 𝜀0 and 𝜏 can be any moment t, then n∑ i=0 𝜓i(𝜏)fj(x∗(𝜏), u(𝜏)) ≤ n∑ i=0 𝜓i(𝜏)fj(x∗(𝜏), u∗(𝜏)). Finally, we have established that H(x∗, u, 𝜓) ≤ H(x∗, u∗, 𝜓), which proves the validity of the maximum conditions. Further exposition focuses on the discrete-time optimal control problem I(u) = N∑ 0 f 0(xt, ut)dt + G(xN) → min, xt+1 = f(xt, ut), x0 = x0, u ∈ U, (2.12) where x = (x1, ⋯ , xn), u = (u1, ⋯ , ur), and f(x, u) = (f 1(x, u), ⋯ , f n(x, u)). Our intention is to formulate the discrete-time analog of Pontryagin’s maximum principle. Consider the Hamiltonian function H(𝜓t+1, xt, ut) = n∑ i=0 𝜓i t+1f i(xt, ut), t = 0, ⋯ , N − 1, (2.13) where 𝜓 = (𝜓0, ⋯ , 𝜓n) designates the vector of conjugate variables. Theorem 10.4 (the maximum principle for the discrete-time optimal control problem) A necessary condition for admissible control u∗ t and the corresponding trajectory x∗ t to be optimal is that there exists a set of non-zero continuous vector-functions 𝜓1 t , ⋯ , 𝜓n t satisfying 1. the maximum condition H (𝜓t+1, x∗ t , u∗ t ) =maxut∈U H (𝜓t+1, x∗ t , ut ) , t = 0, ⋯ , N − 1; www.it-ebooks.info 366 MATHEMATICAL GAME THEORY AND APPLICATIONS 2. the conjugate system in the conjugate variables 𝜓t =− 𝜕H (𝜓t+1, x∗ t , u∗ t ) 𝜕xt ; (2.14) 3. the transversality condition 𝜓N =− 𝜕G(xN) 𝜕xN ; (2.15) 4. the normalization condition 𝜓0 t =−1. We prove the discrete-time maximum principle in the terminal state optimization problem. In other words, the functional takes the form I = G(xN) → max. (2.16) It has been demonstrated above that, in the continuous-time case, the optimization problem with the sum-type performance index, i.e., the functional I = N∑ t=0 f 0(xt, ut)dt, can be easily reduced to the problem (2.16). The Hamiltonian function in the problem (2.16) takes the form H(𝜓t+1, xt, ut) = n∑ i=1 𝜓i t+1f i(xt, ut), t = 0, ⋯ , N − 1. For each u ∈ U, consider the cone of admissible variations K(u) = {𝛿u | u + 𝜀𝛿u ∈ U}, 𝜀>0. Assume that the cone K(u) is convex and contains inner points. Denote by 𝛿uH(𝜓, x, u) the admissible differential of the Hamiltonian function: 𝛿uH(𝜓, x, u) = (𝜕H(𝜓, x, u) 𝜕u , 𝛿u ) = r∑ i=0 𝜕H(𝜓, x, u) 𝜕ui 𝛿ui, where 𝛿u ∈ K(u). www.it-ebooks.info DYNAMIC GAMES 367 Below we argue Lemma 10.1 Let u∗ = {u∗ 0, ⋯ , u∗ N−1} be the optimal control under the initial state x0 = x0 in the problem (2.16). The inequality 𝛿uH (𝜓∗ t+1, x∗ t , u∗ t ) ≤ 0 takes place for any 𝛿u∗ t ∈ K(u∗ t ), where the optimal values x∗ follow from the system (2.12), whereas the optimal values 𝜓∗ follow from the conjugate system (2.14) with the boundary condition (2.15). Moreover, if u∗ t makes an inner point of the set U, then 𝛿H(u∗ t ) = 0 for any admissible variations at this point. In the case of 𝛿H(u∗ t ) < 0, the point u∗ t represents a boundary point of the set U. Proof: Fix the optimal process {u∗, x∗} and consider the variation equation on this process: 𝛿x∗ t+1 = 𝜕f ( x∗ t , u∗ t ) 𝜕xt 𝛿x∗ t + 𝜕f ( x∗ t , u∗ t ) 𝜕ut 𝛿u∗ t , t = 0, ⋯ , N − 1. Suppose that the vectors 𝜓∗ t are evaluated from the conjugate system. Analyze the scalar product (𝜓∗ t+1, 𝛿x∗ t+1 ) = (𝜓∗ t , 𝛿x∗ t ) + ( 𝜓∗ t+1, 𝜕f ( x∗ t , u∗ t ) 𝜕ut 𝛿u∗ t ) . (2.17) Perform summation over t = 0, ⋯ , N − 1 in formula (2.17) and use the equalities 𝛿x∗(0) = 0 and (2.15). Such manipulations lead to 𝛿G ( x∗ N ) = N−1∑ t=0 𝛿uH (𝜓∗ t+1, x∗ t , u∗ t ) , where 𝛿G ( x∗ N ) = ( 𝜕G ( x∗ N ) 𝜕xN , 𝛿x∗ N ) . (2.18) Since x∗ N is the optimal state, then 𝛿G(x∗ N) ≤ 0 for any 𝛿u∗ t ∈ K(u∗ t ) (readers can easily verify this). Assume that there exists a variation 𝛿x∗ N such that ( 𝜕G(x∗ N) 𝜕xN , 𝛿x∗ N) > 0. By the definition of the cone K(x∗), there exists 𝜀1 > 0 such that x∗ N + 𝜀𝛿x∗ N ∈ R for any 0 <𝜀<𝜀1. Consider the expansion G(x + 𝜀𝛿x) − G(x) = 𝜀𝛿G(x) + o(𝜀) = 𝜀 (𝜕G(x) 𝜕x , 𝛿x ) + o(𝜀) > 0, which is valid for admissible variations ensuring increase of the function G(x). www.it-ebooks.info 368 MATHEMATICAL GAME THEORY AND APPLICATIONS The above expansion and our assumption imply that it is possible to choose 𝜀 such that G(x∗ + 𝜀𝛿x∗) > G(x∗). This contradicts optimality. Thus, we have shown that 𝛿G(x∗ N) ≤ 0 for any 𝛿u∗ t ∈ K(u∗ t ). Select 𝛿u∗ j = 0, j ≠ t, 𝛿u∗ t ≠ 0; then it follows from (2.18) that 𝛿uH(𝜓∗ t+1, x∗ t , u∗ t ) ≤ 0for any 𝛿u∗ t ∈ K(u∗ t ). Now, imagine that u∗ t makes an inner point of the set U for some t. In this case, the cone K(u∗ t ) is the whole space of variations. Therefore, if 𝛿u∗ t ∈ K(u∗ t ), then necessarily −𝛿u∗ t ∈ K(u∗ t ). Consequently, we arrive at 𝜕H (𝜓∗ t+1, x∗ t , u∗ t ) 𝜕ut = 0. On the other hand, if 𝛿uH(𝜓∗ t+1, x∗ t , u∗ t ) < 0 at a certain point u∗ t ∈ U, the latter is not an inner point of the set U. The proof of Lemma 10.1 is completed. Actually, we have demonstrated that the admissible differential of the Hamiltonian func- tion possesses non-positive values on the optimal control. In other words, the necessary maximum conditions of the function H(ut)onthesetU hold true on the optimal control. If u∗ t forms an inner point of the set U, then 𝜕H ( u∗ t ) 𝜕uk = 0, i.e., we obtain the standard necessary maximum conditions of a multivariable function. Note that, if 𝛿H(u∗ t ) < 0(viz., the gradient of the Hamiltonian function has non-zero components and is non-orthogonal to all admissible variations at the point u∗ t ), then the point u∗ t provides a local maximum of the function H(ut) under some regularity assumption. This is clear from the expansion H(ut) − H ( u∗ t ) = 𝜀𝛿uH ( u∗ t ) + o(𝜀). 10.3 The maximum principle and the Bellman equation in discrete- and continuous-time games of N players Consider a dynamic N player game in discrete time. Suppose that the dynamics obeys the equation xt+1 = ft ( xt, u1 t , ⋯ , uN t ) , t = 1, ⋯ , n, where x1 is given. The payoff functions of players have the form Ji(u1, ⋯ , uN) = n∑ j=1 gi j ( u1 j , ⋯ , uN j , xj ) → min. www.it-ebooks.info DYNAMIC GAMES 369 Theorem 10.5 Let the functions ft and gi t be continuously differentiable. If (u1∗, ⋯ , uN∗) makes a Nash equilibrium in this game and x∗ t denotes the corresponding trajectory of the process, then for each i ∈ N there exists a finite set of n-dimensional vectors 𝜓i 2, ⋯ , 𝜓i n+1 such that the following conditions hold true: x∗ t+1 = ft ( x∗ t , u1 t ∗, ⋯ , uN t ∗) , x∗ 1 = x1, ui t ∗ = argminui t∈Ui t Hi t (𝜓i t+1, u1∗ t , ⋯ , ui−1∗ t , ui t, ui+1∗ t , ⋯ , uN∗ t , x∗ t ) , 𝜓i t = 𝜕ft ( x∗ t , u1 t ∗, ⋯ , uN t ∗) 𝜕xt 𝜓i t+1 + 𝜕gi t ( u1 t ∗, ⋯ , uN t ∗, x∗ t ) 𝜕xt , 𝜓i n+1 = 0, where Hi t (𝜓i t+1, u1 t , ⋯ , uN t , xt ) = gi t ( u1 t , ⋯ , uN t , xt ) + 𝜓i t+1ft ( xt, u1 t , ⋯ , uN t ) . Proof: For each player i, the Nash equilibrium condition acquires the form Ji∗(u1∗, ⋯ , ui−1∗, ui∗, ui+1∗, ⋯ , uN∗ ≤ Ji(u1∗, ⋯ , ui−1∗, ui, ui+1∗, ⋯ , uN∗). This inequality takes place when the minimum of Ji is attained on ui∗ under the dynamics xt+1 = ft(xt, u1∗, ⋯ , ui−1∗, ui∗, ui+1∗, ⋯ , uN∗). Actually, we have obtained the optimal control problem for a single player. And the conclusion follows according to Theorem 10.4. Theorem 10.6 Consider the infinite discrete-time dynamic game of N players. Strategies (ui∗ t (xt)) represent a Nash equilibrium iff there exist functions Vi(t, x) meeting the conditions Vi(t, x) =min ui t∈Ui t [ gi t (̄ui t, x ) + Vi ( t + 1, ft(x, ̄ui t) )] = gi t ( u1∗ t (x), ⋯ , uN∗ t (x), x ) + Vi(t + 1, ft ( x, u1∗ t (x), ⋯ , uN∗ t (x) ) ), Vi(n + 1, x) = 0, where ̄ui t = (u1 t ∗(x), ⋯ , ui−1 t ∗(x), ui t, ui+1 t ∗(x), ⋯ , uN t ∗(x). Proof: For each player i, the Nash equilibrium condition is defined by Ji∗(u1∗, ⋯ , ui−1∗, ui∗, ui+1∗, ⋯ , uN∗) ≤ Ji(u1∗, ⋯ , ui−1∗, ui, ui+1∗, ⋯ , uN∗). Again, this inequality is the case when the maximum of Ji is reached on ui∗ under the dynamics xt+1 = ft(xt, u1∗, ⋯ , ui−1∗, ui∗, ui+1∗, ⋯ , uN∗). www.it-ebooks.info 370 MATHEMATICAL GAME THEORY AND APPLICATIONS This makes the optimal control problem for a single player. And the desired result of Theorem 10.6 follows from the Bellman equation (2.1). To proceed, we analyze a two-player game, where one player is the leader. Assume that the dynamics satisfies the equation xt+1 = ft ( xt, u1 t , u2 t ) , t = 1, ⋯ , n. The payoff functions of players have the form Ji ( u1, u2 ) = n∑ j=1 gi j(u1 j , u2 j , xj) → min, where ui ∈ Ui is a compact set in Rm. Theorem 10.7 Let the function g1 t be continuously differentiable, the functions ft and g2 t be twice continuously differentiable. Moreover, suppose that the minimum of H2 t (𝜓t+1, u1 t , u2 t , xk) in u2 t is achieved in an inner point for any u1 t ∈ U1. Then, if (u1∗, u2∗) forms a Stackelberg equilibrium in this game (player 1 is the leader) and x∗ t indicates the corresponding trajectory of the process, there exist three finite sets of n-dimensional vectors 𝜆1, ⋯ , 𝜆n, 𝜇1, ⋯ , 𝜇n, 𝜈1, ⋯ , 𝜈n such that x∗ t+1 = ft ( x∗ t , u1 t ∗, u2 t ∗) , x∗ 1 = x1, ∇u1 t H1 t (𝜆t, 𝜇2, 𝜈2, 𝜓∗ t+1, u1∗ t , u2∗ t , x∗ t ) = 0, ∇u2 t H1 t (𝜆t, 𝜇2, 𝜈2, 𝜓∗ t+1, u1∗ t , u2∗ t , x∗ t ) = 0, 𝜆t−1 = 𝜕H1 t (𝜆t, 𝜇2, 𝜈2, 𝜓∗ t+1, u1∗ t , u2∗ t , x∗ t ) 𝜕xt , 𝜆n = 0, 𝜇t+1 = 𝜕H1 t (𝜆t, 𝜇2, 𝜈2, 𝜓∗ t+1, u1∗ t , u2∗ t , x∗ t ) 𝜕𝜓t+1 , 𝜇1 = 0, ∇u2 t H2 t (𝜓∗ t+1, u1∗ t , u2∗ t , x∗ t ) = 0, 𝜓∗ t = Ft ( x∗ t , 𝜓∗ t+1, u1∗ t , u2∗ t ) , 𝜓n+1 = 0, where H1 t = g1 t ( u1 t , u2 t , xt ) + 𝜆tft ( xt, u1 t , u2 t ) + 𝜇tFt ( xt, u1 t , u2 t , 𝜓t+1 ) + 𝜈t∇u2 t H2 t (𝜓t+1, u1 t , u2 t , xt ) , Ft = 𝜕ft ( xt, u1 t , u2 t ) 𝜕xt 𝜓t+1 + 𝜕g2 t ( u1 t , u2 t , xt ) 𝜕xt , H2 t (𝜓t+1, u1 t , u2 t , xt ) = g2 t ( u1 t , u2 t , xt ) + 𝜓t+1ft ( xt, u1 t , u2 t ). www.it-ebooks.info DYNAMIC GAMES 371 Proof: First, assume that we know the leader’s control u1. In this case, the optimal response ̄u2 of player 2 results from Theorem 10.5: ̄xt+1 = ft (̄xt, u1 t , ̄u2 t ) , ̄x1 = x1, ̄u2 t = argminu2 t ∈U2 t lim H2 t (𝜓t+1, u1 t , u2 t , ̄xt ) , 𝜓t = 𝜕ft (̄xt, u1 t , ̄u2 t ) 𝜕xt 𝜓t+1 + 𝜕g2 t ( u1 t , ̄u2 t , ̄xt ) 𝜕xt , 𝜓i n+1 = 0, where H2 t (𝜓i t+1, u1 t , u2 t , xt ) = g2 t ( u1 t , u2 t , xt ) + 𝜓t+1ft ( xt, u1 t , u2 t N ) , and 𝜓1, ⋯ 𝜓n+1 is a sequence of n-dimensional conjugate vectors for this problem. Recall that, by the premise, the Hamiltonian function admits its minimum in an inner point. Therefore, the maximum condition can be rewritten as ∇u2 t H2 t (𝜓t+1, u1 t , ̄u2 t , ̄xt ) = 0. To find the leader’s control, we have to solve the problem minu1∈U1 J1(u1, u2) subject to the constraints ̄xt+1 = ft (̄xt, u1 t , u2 t ) , 𝜓t = Ft ( xt, 𝜓t+1, u1 t , u2 t ) , 𝜓n+1 = 0, ∇u2 t H2 t (𝜓t+1, u1 t , u2 t , xt ) = 0, where Ft = 𝜕ft ( xt, u1 t , u2 t ) 𝜕xt 𝜓t+1 + 𝜕g2 t ( u1 t , u2 t , xt ) 𝜕xt . Apply Lagrange’s method of multipliers. Construct the Lagrange function for this con- strained optimization problem: L = ∑ t g1 t ( u1 t , u2 t , xt ) + 𝜆t[ ft ( xt, u1 t , u2 t ) − xt+1] + 𝜇t [ ft ( xt, 𝜓t+1, u1 t , u2 t ) − 𝜓t ] + 𝜈 𝜕H2 t (𝜓t+1, u1 t , u2 t , xt ) 𝜕u2 t , www.it-ebooks.info 372 MATHEMATICAL GAME THEORY AND APPLICATIONS where 𝜆t, 𝜇t, and 𝜈t stand for corresponding Lagrange multipliers. For u1 t ∗ to be a solution of the posed problem, it is necessary that ∇u1 t L = 0, ∇u2 t L = 0, ∇xt L = 0, ∇𝜓t+1 L = 0. These expressions directly lead to the conditions of Theorem 10.7. Now, we switch to continuous-time games. Suppose that the dynamics is described by the equation x′(t) = f(t, x(t), u1(t), ⋯ , uN(t)), 0 ≤ t ≤ T, x(0) = x0, ui ∈ Ui. The payoff functions of players have the form Ji(u1, ⋯ , uN) = ∫ T 0 gi(t, x(t), u1(t), ⋯ , uN(t))dt + Gi(x(T)) → min. Theorem 10.8 Let the functions f and gi be continuously differentiable. If (u1∗(t), ⋯ , uN∗(t)) represents a Nash equilibrium in this game and x∗(t) specifies the corresponding trajectory of the process, then for each i ∈ N there exist N functions 𝜓i(⋅):[0,T] ∈ Rn such that x∗′(t) = f(t, x∗(t), u1∗(t), ⋯ , uN∗(t)), x∗(0) = x0, ui∗(t) = argminui∈Ui lim Hi(t, 𝜓i(t), x∗(t), u1∗(t), ⋯ , ui−1∗(t), ui, ui+1∗(t), ⋯ , uN∗(t)), 𝜓i′(t) =− 𝜕Hi(t, 𝜓i(t), x∗, u1∗(t), ⋯ , uN∗(t)) 𝜕x , 𝜓(T) = 𝜕Gi(x∗(T)) 𝜕x , where Hi(t, 𝜓i, x, u1, ⋯ , uN) = gi(t, x, u1, ⋯ , uN) + 𝜓if(t, x, u1, ⋯ , uN). Proof is immediate from the maximum principle for continuous-time games (see Theorem 10.2). Theorem 10.9 Consider a dynamic continuous-time game of N players. Strategies (ui∗(t, x)) form a Nash equilibrium iff there exist functions Vi :[0,T]Rn ∈ R meeting the conditions − 𝜕Vi(t, x) 𝜕t =minui∈Si lim [𝜕Vi(t, x) 𝜕x f(t, x, u1∗(t, x), ⋯ , ui−1∗(t, x), ui, ui+1∗(t, x), ⋯ , uN∗(t, x)) + gi(t, x, u1∗(t, x), ⋯ , ui−1∗(t, x), ui, ui+1∗(t, x), ⋯ , uN∗(t, x)) ] = 𝜕Vi(t, x) 𝜕x f(t, x, u1∗(t, x), ⋯ , uN∗(t, x)) + gi(t, x, u1∗(t, x), ⋯ , uN∗(t, x)), Vi(T, x) = Gi(x). www.it-ebooks.info DYNAMIC GAMES 373 Proof follows from the Hamilton–Jacobi–Bellman equation (2.2). In addition, we analyze a two-player game, where one player is a leader. Suppose that the dynamics has the equation x′(t) = f(t, x(t), u1(t), u2(t)), x(0) = x0. The payoff functions of players take the form Ji(u1, u2) = ∫ T 0 gi(t, x(t), u1(t), u2(t)) dt + Gi(x(T)) → min, where ui ∈ Ui is a compact set in Rm. Theorem 10.10 Let the function g1 be continuously differentiable in Rn, while the func- tions f, g2, G1 and G2 be twice continuously differentiable in Rn. Moreover, assume that the function H2(t, 𝜓, u1, u2) appears continuously differentiable and strictly convex on U2. If (u1∗(t), u2∗(t)) makes a Stackelberg equilibrium in this game and x∗(t) means the cor- responding strategy of the process, then there exist continuously differentiable functions 𝜓(⋅), 𝜆1(⋅), 𝜆2(⋅):[0,T] ∈ Rn and a continuous function 𝜆3(⋅):[0,T] ∈ Rm such that x∗′(t) = f(t, x∗(t), u1∗(t), u2∗(t)), x∗(0) = x0, 𝜓′(t) =− 𝜕H2(t, 𝜓, x∗, u1∗, u2∗) 𝜕x , 𝜓(T) = 𝜕G2(x∗(T)) 𝜕x , 𝜆′ 1(t) =− 𝜕H1(t, 𝜓, 𝜆1, 𝜆2, 𝜆3, x∗, u1∗, u2∗) 𝜕x , 𝜆1(T) = 𝜕G1(x∗(T)) 𝜕x − 𝜕2G2(x∗(T)) 𝜕x2 𝜆2(T), 𝜆′ 2(t) =− 𝜕H1(t, 𝜓, 𝜆1, 𝜆2, 𝜆3, x∗, u1∗, u2∗) 𝜕x , 𝜆2(0) = 0, ∇u1 H1(t, 𝜓, 𝜆1, 𝜆2, 𝜆3, x∗, u1∗, u2∗) = 0, ∇u2 H1(t, 𝜓, 𝜆1, 𝜆2, 𝜆3, x∗, u1∗, u2∗) =∇u2 H2(t, 𝜓, x∗, u1∗, u2∗) = 0, where H2(t, 𝜓, u1, u2) = g2(t, x, u1, u2) + 𝜓f(t, x, u1, u2), H1 = g1(t, x, u1, u2) + 𝜆1f(t, x, u1, u2) − 𝜆2 𝜕H2 𝜕x + 𝜆3∇u2 H2. Proof: We proceed by analogy to the discrete-time case. First, imagine that we know leader’s control u1. The optimal response ̄u2 of player 2 follows from Theorem 10.7. Notably, we have x′(t) = f(t, x(t), u1(t), ̄u2(t)), x(0) = x0, ̄u2(t) = argminu2∈U2 lim H2(t, 𝜓(t), x, u1(t), u2(t)), 𝜓′(t) =− 𝜕H2(t, 𝜓(t), x, u1(t), ̄u2(t)) 𝜕x , 𝜓(T) = 𝜕G2(x(T)) 𝜕x , www.it-ebooks.info 374 MATHEMATICAL GAME THEORY AND APPLICATIONS where H2(t, 𝜓, x, u1, u2) = g2(t, x, u1, u2) + 𝜓f(t, x, u1, u2), and 𝜓(t) is the conjugate variable for this problem. By the premise, the Hamiltonian function possesses its minimum in an inner point. Therefore, the maximum condition can be reexpressed as ∇u2 H2(t, 𝜓, x, u1, ̄u2) = 0. Now, to find leader’s control, we have to solve the optimization problem min u1∈U1 lim J1(u1, u2) subject to the constraints x′(t) = f(t, x(t), u1(t), u2(t)), x(0) = x0, 𝜓′(t) =− 𝜕H2(t, 𝜓, x, u1, u2) 𝜕x , 𝜓(T) = 𝜕G2(x(T)) 𝜕x , ∇u2 H2(t, 𝜓, x, u1, ̄u2) = 0. Again, take advantage of Theorem 10.7. Compile the Hamiltonian function for this problem: H1 = g1(t, x, u1, u2) + 𝜆1f(t, x, u1, u2) − 𝜆2 𝜕H2 𝜕x + 𝜆3∇u2 H2. The conjugate variables of the problem meet the equations 𝜆′ 1(t) =− 𝜕H1(t, 𝜓, 𝜆1, 𝜆2, 𝜆3, x∗, u1∗, u2∗) 𝜕x , 𝜆1(T) = 𝜕G1(x∗(T)) 𝜕x − 𝜕2G2(x∗(T)) 𝜕x2 𝜆2(T), 𝜆′ 2(t) =− 𝜕H1(t, 𝜓, 𝜆1, 𝜆2, 𝜆3, x∗, u1∗, u2∗) 𝜕x , 𝜆2(0) = 0. In addition, the maximum conditions are valid in inner points: ∇u1 H1 = 0, ∇u2 H2 = 0, which completes the proof of Theorem 10.10. www.it-ebooks.info DYNAMIC GAMES 375 10.4 The linear-quadratic problem on finite and infinite horizons Consider the linear-quadratic problem of bioresource management. Let the dynamics of a population have the form x′(t) = 𝜀x(t) − u1(t) − u2(t), (4.1) where x(t) ≥ 0 is the population size at instant t, u1(t) and u2(t) indicate the control laws applied by player 1 and player 2, respectively. Both players strive for maximizing their profits on the time interval [0,T]. We select the following payoff functionals of the players: J1 = ∫ T 0 e−𝜌t [ p1u1(t) − c1u2 1(t) ] dt + G1(x(T)), J2 = ∫ T 0 e−𝜌t [ p2u2(t) − c2u2 2(t) ] dt + G2(x(T)). (4.2) Here c1 and c2 are the fishing costs of the players, p1 and p2 specify the unit price of caught fish. Denote ci𝜌 = cie−𝜌t , pi𝜌 = pie−𝜌t, i = 1, 2. Find a Nash equilibrium in this problem by Pontryagin’s maximum principle. Construct the Hamiltonian function for player 1: H1 = p1𝜌u1 − c1𝜌u2 1 + 𝜆1(𝜀x − u1 − u2). Hence, it appears that u1(t) = p1𝜌 − 𝜆1(t) 2c1𝜌 , and the conjugate variable equation acquires the form 𝜆′ 1(t) =− 𝜕H1 𝜕x =−𝜀𝜆1(t), 𝜆1(T) = G′ 1(x(T)). By obtaining the solution of this equation and reverting to the original variables, we express the optimal control of player 1: u∗ 1(t) = p1 − G′ 1(x(T))e(𝜌−𝜀)te𝜀T 2c1 . Similarly, the Hamiltonian function for player 2 is defined by H2 = p2𝜌u2 − c2𝜌u2 2 + 𝜆2(𝜀x − u1 − u2). This leads to u2(t) = p2𝜌 − 𝜆2(t) 2c2𝜌 , www.it-ebooks.info 376 MATHEMATICAL GAME THEORY AND APPLICATIONS and the conjugate variable equation becomes 𝜆′ 2(t) =− 𝜕H2 𝜕x =−𝜀𝜆2(t), 𝜆2(T) = G′ 2(x(T)). Finally, the optimal control of player 2 is given by u∗ 2(t) = p2 − G′ 2(x(T))e(𝜌−𝜀)te𝜀T 2c2 . Actually, we have demonstrated the following result. Theorem 10.11 The control laws u∗ 1(t) = p1 − G′ 1(x(T))e(𝜌−𝜀)te𝜀T 2c1 , u∗ 2(t) = p2 − G′ 2(x(T))e(𝜌−𝜀)te𝜀T 2c2 form the Nash-optimal solution of the problem (4.1)–(4.2). Proof: Generally, the maximum principle states the necessary conditions of optimality. However, in the linear-quadratic case it appears sufficient. Let us show such sufficiency for the above model. Fix u∗ 2(t) and study the problem for player 1. Designate by x∗(t) the dynamics corre- sponding to the optimal behavior of both players. Consider the perturbed solution x∗(t) +Δx, u∗ 1(t) +Δu1.Herex∗(t) and u∗ 1(t) satisfy equation (4.1), whereas Δx meets the equation Δx′ = 𝜀Δx −Δu1 (since (x∗)′ +Δx′ = 𝜀x∗ − u∗ 1 − u∗ 2 + 𝜀Δx −Δu1). Under the optimal behavior, the payoff constitutes J∗ 1 = ∫ T 0 [ p1𝜌u∗ 1(t) − c1𝜌 ( u∗ 1(t) )2] dt + G1(x∗(T)). Its perturbed counterpart equals J1 = ∫ T 0 [ p1𝜌u∗ 1(t) + p1𝜌Δu1(t) − c1𝜌 ( u∗ 1(t) +Δu1(t) )2] dt + G1(x∗(T) +Δx(T)). Their difference is J∗ 1 − J1 = ∫ T 0 c1𝜌Δu2 1 − 𝜆1(t)Δu1 dt + G1(x∗(T)) − G1(x∗(T) +Δx(T)) = ∫ T 0 c1𝜌Δu2 1 dt − G′ 1(x∗(T))Δx(T) − ∫ T 0 𝜆1Δu1 dt = ∫ T 0 c1𝜌Δu2 1 dt − G′ 1(x∗(T))Δx(T) − ∫ T 0 𝜆1(𝜀Δx − (Δx)′) dt = ∫ T 0 c1𝜌Δu2 1 dt − G′ 1(x∗(T))Δx(T) +Δx𝜆1 ||T 0 = ∫ T 0 c1𝜌Δu2 1 dt > 0. This proves the optimality of u∗ 1(t) for player 1. www.it-ebooks.info DYNAMIC GAMES 377 By analogy, readers can demonstrate that the control law u∗ 2(t) becomes optimal for player 2. To proceed, we analyze the above problem with infinite horizon. Let the dynamics of a fish population possess the form (4.1). The players aim at minimization of their costs on infinite horizon. We adopt the following cost functionals of the players: J1 = ∫ ∞ 0 e−𝜌t [ c1u2 1(t) − p1u1(t) ] dt, J2 = ∫ ∞ 0 e−𝜌t [ c2u2 2(t) − p2u2(t) ] dt. (4.3) In these formulas, c1 and c2 mean the fishing costs of the players, p1 and p2 are the unit prices of caught fish. Apply the Bellman principle for Nash equilibrium evaluation. Fix a control law of player 2 and consider the optimal control problem for his opponent. Define the function V(x)by V(x) =minu1 { ∫ ∞ 0 e−𝜌t [ c1u2 1(t) − p1u1(t) ] dt } . The Hamilton–Jacobi–Bellman equation acquires the form 𝜌V(x) =minu1 lim { c1u2 1 − p1u1 + 𝜕V 𝜕x (𝜀x − u1 − u2) } . The minimum with respect to u1 is given by u1 = (𝜕V 𝜕x + p1 )/ 2c1. Substitution of this quantity into the equation yields 𝜌V(x) =− ( 𝜕V 𝜕x + p1 )2 4c1 + 𝜕V 𝜕x (𝜀x − u2). Interestingly, a quadratic form satisfies the derived equation. And so, set V(x) = a1x2 + b1x + d1. In this case, the control law equals u1(x) = 2a1x + b1 + p1 2c1 , where the coefficients follow from the system of equations ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ 𝜌a1 = 2a1𝜀 − a2 1 c1 , 𝜌b1 = 𝜀b1 − 2a1u2 − a1(p1+b1) c1 , 𝜌d1 =−b1u2 − (p1+b1)2 4c1 . (4.4) www.it-ebooks.info 378 MATHEMATICAL GAME THEORY AND APPLICATIONS Similarly, for player 2, one can get the formula u2(x) = 2a2x + b2 + p2 2c2 , where the coefficients obey the system of equations ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ 𝜌a2 = 2a2𝜀 − a2 2 c2 , 𝜌b2 = 𝜀b2 − 2a2u1 − a2(p2+b2) c2 , 𝜌d2 =−b2u1 − (p2+b2)2 4c2 . (4.5) In fact, we have established Theorem 10.12 The control laws u∗ 1(x) = 2c1c2𝜀x(2𝜀 − 𝜌) − c1p2(2𝜀 − 𝜌) + 𝜀c2p1 2c1c2(3𝜀 − 𝜌) , u∗ 2(x) = 2c1c2𝜀x(2𝜀 − 𝜌) − c2p1(2𝜀 − 𝜌) + 𝜀c1p2 2c1c2(3𝜀 − 𝜌) form a Nash equilibrium in the problem (4.1)–(4.3). Proof: These control laws are expressed from the systems (4.4) and (4.5). Note that the Bellman principle again provides the necessary and sufficient conditions of optimality. 10.5 Dynamic games in bioresource management problems. The case of finite horizon We divide the total area S of a basin into two domains S1 and S2, where fishing is forbidden and allowed, respectively. Let x1 and x2 be the fish resources in the domains S1 and S2.Fish migrates between these domains with the exchange coefficients 𝛾i. Two fishing artels catch fish in the domain S2 during T time instants. Within the framework of this model, the dynamics of a fish population is described by the following equations: { x′ 1(t) = 𝜀x1(t) + 𝛾1(x2(t) − x1(t)), x′ 2(t) = 𝜀x2(t) + 𝛾2(x1(t) − x2(t)) − u(t) − v(t), xi(0) = x0 i . (5.1) Here x1(t) ≥ 0 denotes the amount of fish at instant t in the forbidden domain, x2(t) ≥ 0 means the amount of fish at instant t in the allowed domain, 𝜀 is the natural growth coefficient of the population, 𝛾i corresponds to the migration coefficients, and u(t), v(t) are the control laws of player 1 and player 2, respectively. www.it-ebooks.info DYNAMIC GAMES 379 We adopt the following payoff functionals of the players: J1 = ∫ T 0 e−rt[m1((x1(t) − ̄x1)2 + (x2(t) − ̄x2)2) + c1u2(t) − p1u(t)]dt, J2 = ∫ T 0 e−rt[m2((x1(t) − ̄x1)2 + (x2(t) − ̄x2)2) + c2v2(t) − p2v(t)]dt, (5.2) where ̄xi, i = 1, 2 indicates the optimal population size in the sense of reproduction, c1, c2 are the fishing costs of the players, and p1, p2 designate the unit prices of produced fish. Introduce the notation cir = cie−rt , mir = mie−rt, pir = pie−rt, i = 1, 2. Analyze the problem (5.1)–(5.2) by different principles of optimality. 10.5.1 Nash-optimal solution To find optimal controls, we use Pontryagin’s maximum principle. Construct the Hamiltonian function for player 1: H1 = m1r((x1 − ̄x1)2 + (x2 − ̄x2)2) + c1ru2 − p1ru + 𝜆11(𝜀x1 + 𝛾1(x2 − x1)) + 𝜆12(𝜀x2 + 𝛾2(x1 − x2) − u − v). Hence, it appears that u(t) = 𝜆12(t) + p1r 2c1r , and the conjugate variable equations take the form 𝜆′ 11(t) =− 𝜕H1 𝜕x1 =−2m1r(x1(t) − ̄x1) − 𝜆11(t)(𝜀 − 𝛾1) − 𝜆12(t)𝛾2, 𝜆′ 12(t) =− 𝜕H1 𝜕x2 =−2m1r(x2(t) − ̄x2) − 𝜆12(t)(𝜀 − 𝛾2) − 𝜆11(t)𝛾1, with the transversality conditions 𝜆1i(T) = 0, i = 1, 2. In the case of player 2, the same technique leads to H2 = m2r((x1 − ̄x1)2 + (x2 − ̄x2)2) + c2rv2 − p2rv + 𝜆21(𝜀x1 + 𝛾1(x2 − x1)) + 𝜆22(𝜀x2 + 𝛾2(x1 − x2) − u − v). Therefore, v(t) = 𝜆22(t) + p2r 2c2r , www.it-ebooks.info 380 MATHEMATICAL GAME THEORY AND APPLICATIONS and the conjugate variable equations become 𝜆′ 21(t) =− 𝜕H2 𝜕x1 =−2m2r(x1(t) − ̄x1) − 𝜆21(t)(𝜀 − 𝛾1) − 𝜆22(t)𝛾2, 𝜆′ 22(t) =− 𝜕H2 𝜕x2 =−2m2r(x2(t) − ̄x2) − 𝜆22(t)(𝜀 − 𝛾2) − 𝜆21(t)𝛾1, with the transversality conditions 𝜆2i(T) = 0, i = 1, 2. In terms of the new variables ̄𝜆ij = 𝜆ijert, the system of differential equations for optimal control laws acquires the form ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪⎩ x′ 1(t) = 𝜀x1(t) + 𝛾1(x2(t) − x1(t)) , x1(0) = x0 1, x′ 2(t) = 𝜀x2(t) + 𝛾2(x1(t) − x2(t)) − ̄𝜆12(t)+p1 2c1 − ̄𝜆22(t)+p2 2c2 , x2(0) = x0 2, ̄𝜆′ 11(t) =−2m1(x1(t) − ̄x1) − ̄𝜆11(t)(𝜀 − 𝛾1 − r) − ̄𝜆12(t)𝛾2, ̄𝜆11(T) = 0, ̄𝜆′ 12(t) =−2m1(x2(t) − ̄x2) − ̄𝜆12(t)(𝜀 − 𝛾2 − r) − ̄𝜆11(t)𝛾1, ̄𝜆12(T) = 0, ̄𝜆′ 21(t) =−2m2(x1(t) − ̄x1) − ̄𝜆21(t)(𝜀 − 𝛾1 − r) − ̄𝜆22(t)𝛾2, ̄𝜆21(T) = 0, ̄𝜆′ 22(t) =−2m2(x2(t) − ̄x2) − ̄𝜆22(t)(𝜀 − 𝛾2 − r) − ̄𝜆21(t)𝛾1, ̄𝜆22(T) = 0. (5.3) Theorem 10.13 The control laws u∗(t) = ̄𝜆12(t) + p1 2c1 , v∗(t) = ̄𝜆22(t) + p2 2c2 , where the conjugate variables result from (5.3), are the Nash-optimal solution of the problem (5.1)–(5.2). Proof: We argue the optimality of such controls. Fix v∗(t) and consider the optimal con- trol problem for player 1. Suppose that {x∗ 1(t), x∗ 2(t), u(t)} represents the solution of the system (5.3). Take the perturbed solution x∗ 1(t) +Δx1, x∗ 2(t) +Δx2, u∗(t) +Δu, where x∗ 1(t), Δx1, and x∗ 2(t) meet the system (5.1), whereas Δx2 satisfies the equation Δx′ 2 = 𝜀Δx2 + 𝛾2(Δx1 −Δx2) −Δu (as far as (x∗ 2)′ +Δx′ 2 = 𝜀x∗ 2 + 𝛾2(x∗ 1 − x∗ 2) − u∗ − v∗ + 𝜀Δx2 + 𝛾2(Δx1 − Δx2) −Δu). Under the optimal behavior, the payoff makes up J∗ 1 = ∫ T 0 [ m1r (( x∗ 1(t) − ̄x1 )2 + ( x∗ 2(t) − ̄x2 )2) + c1r(u∗(t))2 − p1ru∗(t) ] dt. The corresponding perturbed payoff is J1 = ∫ T 0 [m1r (( x∗ 1(t) +Δx1 − ̄x1 )2 + ( x∗ 2(t) +Δx2 − ̄x2 )2) + c1(u∗(t) +Δu)2 − p1ru∗(t) − p1rΔu] dt. www.it-ebooks.info DYNAMIC GAMES 381 Again, study their difference: J1 − J∗ 1 = ∫ T 0 m1rΔx2 1 + m1rΔx2 2 +Δx1(−𝜆′ 11 − 𝜆11(𝜀 − 𝛾1) − 𝜆12𝛾2) + c1rΔu2 +Δx2(−𝜆′ 12 − 𝜆12(𝜀 − 𝛾2) − 𝜆11𝛾1) + 𝜆12Δudt = ∫ T 0 m1rΔx2 1 + m1rΔx2 2 + c1rΔu2 − 𝜆′ 11Δx1 − 𝜆′ 12Δx2 − 𝜆11Δx′ 1 − 𝜆12Δx′ 2 dt = ∫ T 0 m1rΔx2 1 + m1rΔx2 2 + c1rΔu2 dt > 0. This substantiates the optimality of u∗(t) for player 1. Similarly, one can prove that the optimal control law of player 2 lies in v∗(t). 10.5.2 Stackelberg-optimal solution For optimal control evaluation, we employ the following modification of Pontryagin’s maxi- mum principle for two-shot games. Compile the Hamiltonian function for player 2: H2 = m2r((x1 − ̄x1)2 + (x2 − ̄x2)2) + c2rv2 − p2rv + 𝜆21(𝜀x1 + 𝛾1(x2 − x1)) + 𝜆22(𝜀x2 + 𝛾2(x1 − x2) − u − v). Then it appears that v(t) = 𝜆22(t) + p2r 2c2r , and the conjugate variable equations are defined by 𝜆′ 21(t) =− 𝜕H2 𝜕x1 =−2m2r(x1(t) − ̄x1) − 𝜆21(t)(𝜀 − 𝛾1) − 𝜆22(t)𝛾2, 𝜆′ 22(t) =− 𝜕H2 𝜕x2 =−2m2r(x2(t) − ̄x2) − 𝜆22(t)(𝜀 − 𝛾2) − 𝜆21(t)𝛾1, with the transversality conditions 𝜆2i(T) = 0, i = 1, 2. Substitute this control of player 2 to derive the system of differential equations ⎧ ⎪ ⎪ ⎨ ⎪ ⎪⎩ x′ 1(t) = 𝜀x1(t) + 𝛾1(x2(t) − x1(t)) , x1(0) = x0 1, x′ 2(t) = 𝜀x2(t) + 𝛾2(x1(t) − x2(t)) − u(t) − 𝜆22(t)+p2r 2c2r , x2(0) = x0 2, 𝜆′ 21(t) =−2m2r(x1(t) − ̄x1) − 𝜆21(t)(𝜀 − 𝛾1) − 𝜆22(t)𝛾2, 𝜆21(T) = 0, 𝜆′ 22(t) =−2m2r(x2(t) − ̄x2) − 𝜆22(t)(𝜀 − 𝛾2) − 𝜆21(t)𝛾1, 𝜆22(T) = 0. www.it-ebooks.info 382 MATHEMATICAL GAME THEORY AND APPLICATIONS Apply Pontryagin’s maximum principle to find the optimal control of player 1: H1 = m1r((x1 − ̄x1)2 + (x2 − ̄x2)2) + c1ru2 − p1ru + 𝜆11(𝜀x1 + 𝛾1(x2 − x1)) + 𝜆12 ( 𝜀x2 + 𝛾2(x1 − x2) − u − 𝜆22 + p2r 2c2r ) + 𝜇1(−2m2r(x1 − ̄x1) − 𝜆21(𝜀 − 𝛾1)) − 𝜇1𝜆22𝛾2 + 𝜇2(−2m2r(x2 − ̄x2) − 𝜆22(𝜀 − 𝛾2) − 𝜆21𝛾1). This leads to u(t) = 𝜆12(t) + p1r 2c1r , and the conjugate variable equations have the form ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪⎩ 𝜆′ 11(t) =−2m1r(x1(t) − ̄x1) − 𝜆11(t)(𝜀 − 𝛾1) − 𝜆12(t)𝛾2 + 2m2r𝜇1(t), 𝜆′ 12(t) =−2m1r(x2(t) − ̄x2) − 𝜆12(t)(𝜀 − 𝛾2) − 𝜆11(t)𝛾1 + 2m2r𝜇2(t), 𝜇′ 1(t) =−𝜕H1 𝜕𝜆21 = 𝜇1(t)(𝜀 − 𝛾1) + 𝜇2(t)𝛾1, 𝜇′ 2(t) =−𝜕H1 𝜕𝜆22 = 𝜆12(t) 2c2r + 𝜇2(t)(𝜀 − 𝛾2) + 𝜇1(t)𝛾2, with the transversality conditions 𝜆2i(T) = 0, 𝜇i(0) = 0. Finally, in terms of the new variables ̄𝜆ij = 𝜆ijert, the system of differential equations for optimal controls is determined by ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩ x′ 1(t) = 𝜀x1(t) + 𝛾1(x2(t) − x1(t)), x′ 2(t) = 𝜀x2(t) + 𝛾2(x1(t) − x2(t)) − u − ̄𝜆22(t)+p2 2c2 , ̄𝜆′ 11(t) =−2m1(x1(t) − ̄x1) − ̄𝜆11(t)(𝜀 − 𝛾1 − r) − ̄𝜆12(t)𝛾2 + 2m2𝜇1(t), ̄𝜆′ 12(t) =−2m1(x2(t) − ̄x2) − ̄𝜆12(t)(𝜀 − 𝛾2 − r) − ̄𝜆11(t)𝛾1 + 2m2𝜇2(t), ̄𝜆′ 21(t) =−2m2(x1(t) − ̄x1) − ̄𝜆21(t)(𝜀 − 𝛾1 − r) − ̄𝜆22(t)𝛾2, ̄𝜆′ 22(t) =−2m2(x2(t) − ̄x2) − ̄𝜆22(t)(𝜀 − 𝛾2 − r) − ̄𝜆21(t)𝛾1, 𝜇′ 1(t) = 𝜇1(t)(𝜀 − 𝛾1) + 𝜇2(t)𝛾1, 𝜇′ 2(t) = ̄𝜆12(t) 2c2 + 𝜇2(t)(𝜀 − 𝛾2) + 𝜇1(t)𝛾2, ̄𝜆i1(T) = ̄𝜆i2(T) = 0, xi(0) = x0 i , 𝜇i(0) = 0. (5.4) Consequently, we have proved Theorem 10.14 For the control laws u∗(t) = ̄𝜆12(t) + p1 2c1 , v∗(t) = ̄𝜆22(t) + p2 2c2 www.it-ebooks.info DYNAMIC GAMES 383 to be the Stackelberg-optimal solution of the problem (5.1)–(5.2), it is necessary that the conjugate variables follow from (5.4). 10.6 Dynamic games in bioresource management problems. The case of infinite horizon As before, the dynamics of a fish population is described by the equations { x′ 1(t) = 𝜀x1(t) + 𝛾1(x2(t) − x1(t)), x′ 2(t) = 𝜀x2(t) + 𝛾2(x1(t) − x2(t)) − u(t) − v(t), xi(0) = x0 i . (6.1) All parameters have been defined in the preceding section. We adopt the following payoff functionals of the players: J1 = ∫ ∞ 0 e−rt[m1((x1(t) − ̄x1)2 + (x2(t) − ̄x2)2) + c1u2(t) − p1u(t)]dt, J2 = ∫ ∞ 0 e−rt[m2((x1(t) − ̄x1)2 + (x2(t) − ̄x2)2) + c2v2(t) − p2v(t)]dt, (6.2) where ̄xi, i = 1, 2 means the optimal population size in the sense of reproduction, c1, c2 specify the fishing costs of the players, and p1, p2 are the unit prices of caught fish. In the sequel, we study the problem (6.1)–(6.2) using different principles of optimality. 10.6.1 Nash-optimal solution Fix the control law of player 2 and consider the optimal control problem for his opponent. Determine the function V(x)by V(x1, x2) =minu { ∫ ∞ 0 e−rt[m1((x1(t) − ̄x1)2 + (x2(t) − ̄x2)2) + c1u2(t) − p1u(t)] dt } . The Hamilton–Jacobi–Bellman equation acquires the form rV(x1, x2) =minu lim{m1((x1 − ̄x1)2 + (x2 − ̄x2)2) + c1u2 − p1u + 𝜕V 𝜕x1 (𝜀x1 + 𝛾1(x2 − x1)) + 𝜕V 𝜕x2 (𝜀x2 + 𝛾2(x1 − x2) − u − v)}. Find the minimum in u: u = ( 𝜕V 𝜕x2 + p1 ) / 2c1. www.it-ebooks.info 384 MATHEMATICAL GAME THEORY AND APPLICATIONS Substitute this result into the above equation to get rV(x1, x2) = m1((x1 − ̄x1)2 + (x2 − ̄x2)2) − ( 𝜕V 𝜕x2 + p1 )2 4c1 + 𝜕V 𝜕x1 (𝜀x1 + 𝛾1(x2 − x1)) + 𝜕V 𝜕x2 (𝜀x2 + 𝛾2(x1 − x2) − v). Interestingly, a quadratic form satisfies this equation. Set V(x1, x2) = a1x2 1 + b1x1 + a2x2 2 + b2x2 + kx1x2 + l. The corresponding control law makes u(x) = 2a2x2 + b2 + kx1 + p1 2c1 , where the coefficients meet the system of equations ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪⎩ ra1 = m1 − k2 4c1 + 2a1(𝜀 − 𝛾1) + k𝛾2, rb1 =−2m1 ̄x1 − kb2 2c1 − kp1 2c1 + b1(𝜀 − 𝛾1) + b2𝛾2 − kv, ra2 = m1 − a2 2 c1 + 2a2(𝜀 − 𝛾2) + k𝛾1, rb2 =−2m1 ̄x2 − a2b2 c1 − a2p1 c1 + b2(𝜀 − 𝛾2) + b1𝛾1 − 2a2v, rk =−a2k c1 + k(𝜀 − 𝛾1) + 2a1𝛾1 + 2a2𝛾2 + k(𝜀 − 𝛾2), rl = m1 ̄x2 1 + m1 ̄x2 2 − b2 2 4c1 − b2p1 2c1 − p2 1 4c1 − b2v. (6.3) Similar reasoning for player 2 yields v(x) = 2𝛼2x2 + 𝛽2 + k2x1 + p2 2c2 , where the coefficients follow from the system of equations ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪⎩ r𝛼1 = m2 − k2 2 4c1 + 2𝛼1(𝜀 − 𝛾1) + k2𝛾2, r𝛽1 =−2m2 ̄x1 − k2𝛽2 2c1 − k2p1 2c1 + 𝛽1(𝜀 − 𝛾1) + 𝛽2𝛾2 − k2u, r𝛼2 = m2 − 𝛼2 2 c1 + 2𝛼2(𝜀 − 𝛾2) + k2𝛾1, r𝛽2 =−2m2 ̄x2 − 𝛼2𝛽2 c1 − 𝛼2p1 c1 + 𝛽2(𝜀 − 𝛾2) + 𝛽1𝛾1 − 2𝛼2u, rk2 =−𝛼2k2 c1 + k2(𝜀 − 𝛾1) + 2𝛼1𝛾1 + 2𝛼2𝛾2 + k2(𝜀 − 𝛾2), rl2 = m2 ̄x2 1 + m2 ̄x2 2 − 𝛽2 2 4c1 − 𝛽2p1 2c1 − p2 1 4c1 − 𝛽2u. (6.4) Consequently, we have established www.it-ebooks.info DYNAMIC GAMES 385 Theorem 10.15 The control laws u∗(x) = 2a2x2 + b2 + kx1 + p1 2c1 , v∗(x) = 2𝛼2x2 + 𝛽2 + k2x1 + p2 2c2 , where the coefficients result from (6.3) and (6.4), are the Nash-optimal solution of the problem (6.1)–(6.2). 10.6.2 Stackelberg-optimal solution The Hamilton–Jacobi–Bellman equation for player 2 brings to v(x) = 2𝛼2x2 + 𝛽2 + k2x1 + p2 + 𝜎u 2c2 , where the coefficients meet the system (6.4). Define the function V(x) in the optimal control problem for player 1 as V(x1, x2) =minu { ∫ ∞ 0 e−rt[m1((x1(t) − ̄x1)2 + (x2(t) − ̄x2)2) + c1u2(t) − p1u(t)] dt } . The Hamilton–Jacobi–Bellman equation takes the form rV(x1, x2) =minu lim { m1((x1 − ̄x1)2 + (x2 − ̄x2)2) + c1u2 − p1u + 𝜕V 𝜕x1 (𝜀x1 + 𝛾1(x2 − x1)) + 𝜕V 𝜕x2 ( 𝜀x2 + 𝛾2(x1 − x2) − u − 2𝛼2x2 + 𝛽2 + k2x1 + p2 + 𝜎u 2c2 )} . Again, find the minimum in u: u = ( 𝜕V 𝜕x2 (2c2 + 𝜎) + 2p1c2 )/ 4c1c2. Substitute this expression in the above equation to obtain rV(x1, x2) = m1((x1 − ̄x1)2 + (x2 − ̄x2)2) + ( 𝜕V 𝜕x2 )2 (2c2 + 𝜎)2 8c1c2 2 + p2 1 2c1 𝜕V 𝜕x1 (𝜀x1 + 𝛾1(x2 − x1)) + 𝜕V 𝜕x2 ( x1 ( 𝛾2 − k2 2c2 ) + x2 ( 𝜀 − 𝛾2 − 𝛼2 c2 ) + 2c2 + 𝜎 2c1c2 − 𝛽2 + p2 2c2 ) . Note that a quadratic form satisfies this equation. Set V(x1, x2) = a1x2 1 + b1x1 + a2x2 2 + b2x2 + gx1x2 + l. Then the control law becomes u(x) = ((2a2x2 + b2 + gx1)(2c2 + 𝜎) + 2p1c2)∕4c1c2, www.it-ebooks.info 386 MATHEMATICAL GAME THEORY AND APPLICATIONS where the coefficients follow from the system of equations ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩ ra1 = m1 + g2 (2c2+𝜎)2 8c1c2 2 + 2a1(𝜀 − 𝛾1) + g ( 𝛾2 − k2 2c2 ) , rb1 =−2m1 ̄x1 + 2b2g (2c2+𝜎)2 8c1c2 2 + g (2c2+𝜎) 2c1c2 b1(𝜀 − 𝛾1) + b2 ( 𝛾2 − k2 2c2 ) − g(𝛽2+p2) 2c2 , ra2 = m1 + a2 (2c2+𝜎)2 2c1c2 2 + g𝛾1 + 2a2 ( 𝜀 − 𝛾2 − 𝛼2 c2 ) , rb2 =−2m1 ̄x2 + a2b2 (2c2+𝜎)2 2c1c2 2 + a2 (2c2+𝜎) c1c2 + b2 ( 𝜀 − 𝛾2 − 𝛼2 c2 ) + b1𝛾1 − a2 𝛽2+p2 c2 , rg = a2g (2c2+𝜎)2 2c1c2 2 + g(𝜀 − 𝛾1) + 2a1𝛾1 + 2a2(𝛾2 − k2 2c2 ) + g ( 𝜀 − 𝛾2 − 𝛼2 c2 ) , rl = m1 ̄x2 1 + m1 ̄x2 2 + b2 2 (2c2+𝜎)2 8c1c2 2 + b2 (2c2+𝜎) 2c1c2 + p2 1 2c1 − b2(𝛽2+p2) 2c2 . (6.5) Therefore, we have proved Theorem 10.16 The control laws u∗(x) = (2a2x2 + b2 + gx1)(2c2 + 𝜎) + 2p1c2 4c1c2 , v∗(x) = 𝜎(2a2x2 + b2 + gx1)(2c2 + 𝜎) 8c1c2 2 + 2c1(2𝛼2x2 + 𝛽2 + k2x1) + (𝜎p1 + 2c1p2) 8c1c2 , where the coefficients are defined from (6.4) and (6.5), represent the Stackelberg-optimal solution of the problem (6.1)–(6.2). Finally, we provide numerical examples with the following parameters: q = 0.2, 𝛾1 = 𝛾2 = 2q, 𝜀 = 0.08, m1 = m2 = 0.09, c1 = c2 = 10, p1 = 100, p2 = 100, T = 200, and r = 0.1. Let the optimal population sizes in the sense of reproduction be equal to ̄x1 = 100 and ̄x2 = 100. The initial population sizes constitute x1(0) = 50 and x2(0) = 50. In the case of Nash equilibrium, Figure 10.1 shows the curve of the population size dynamics in the forbidden domain (S1). Figure 10.2 demonstrates the same curve in the allowed domain (S2). And Figure 10.3 illustrates the control laws of both players (they coincide). In the case of Stackelberg equilibrium, Figure 10.4 shows the curve of the population size dynamics in the forbidden domain (S1). Figure 10.5 demonstrates the same curve in the allowed domain (S2). And Figures 10.6 and 10.7 present the control laws of player 1 and player 2, respectively. It suffices to compare the costs of both players under different equilibrium concepts. www.it-ebooks.info DYNAMIC GAMES 387 60 80 100 120 140 160 02040 60 80 100 120 140 160 180 200 t Figure 10.1 The values of x∗ 1(t). In the case of Nash equilibrium, both players are “in the same boat.” And so, their control laws and payoffs appear identical (J1 = J2 = 93.99514185). In the case of Stackelberg equilibrium, player 1 represents a leader. According to the above example, this situation is more beneficial to player 1 (J1 =−62.73035267) than to player 2 (J2 = 659.9387578). In fact, such equilibrium even gains profits to player 1, whereas his opponent incurs all costs to maintain the admissible population size. 60 80 100 120 140 02040 60 80 100 120 140 160 180 200 t Figure 10.2 The values of x∗ 2(t). www.it-ebooks.info 388 MATHEMATICAL GAME THEORY AND APPLICATIONS 2 3 4 5 6 7 8 02040 60 80 100 120 140 160 180 200 t Figure 10.3 The values of u∗(t). 10.7 Time-consistent imputation distribution procedure Consider a dynamic game in the cooperative setting. The payoff of the grand coalition is the total payoff of all players. Total payoff evaluation makes an optimal control problem. The total payoff being found, players have to distribute it among all participants. For this, it is necessary to calculate the characteristic function, i.e., the payoff of each coalition. Below we discuss possible methods to construct characteristic functions. 50 100 150 200 250 02040 60 80 100 120 140 160 180 200 t Figure 10.4 The values of x∗ 1(t). www.it-ebooks.info DYNAMIC GAMES 389 50 100 150 200 250 02040 60 80 100 120 140 160 180 200 t Figure 10.5 The values of x∗ 2(t). A fundamental difference of dynamic cooperative games from common cooperative games is that their characteristic functions, ergo all imputations depend on time. To make an impu- tation distribution procedure time-consistent, we adopt the special distribution procedure suggested by L.A. Petrosjan (2003). To elucidate the basic ideas of this approach, let us take the “fish wars” model. 3 4 5 6 7 02040 60 80 100 120 140 160 180 200 t Figure 10.6 The values of u∗(t). www.it-ebooks.info 390 MATHEMATICAL GAME THEORY AND APPLICATIONS 0 2 4 6 8 10 12 14 16 20 40 60 80 100 120 140 160 180 200 t Figure 10.7 The values of v∗(t). 10.7.1 Characteristic function construction and imputation distribution procedure Imagine several countries (players) I = {1, 2, … , n} that plan fishing in the ocean. The dynam- ics of fish resources evolves in discrete time: xt+1 = f(xt, ut), x0 = x, where ut = (u1t, … , unt), xt denotes the amount of fish resources at instant t, uit is the amount of fish catch by player i, i = 1, … , n. Each player seeks to maximize his income—the sum of discounted incomes at each instant Ji = ∞∑ t=0 lim 𝛿tgi(uit). The quantity gi(uit) designates the payoff of player i at instant t, and 𝛿 is the discounting parameter, 0 <𝛿<1. Before starting fishing, the countries have to negotiate the best way of fishing. Naturally, a cooperative behavior guarantees higher payoffs to each country. And so, it is necessary to solve the corresponding optimization problem, where the players aim at maximizing the total income of all participants, i.e., the function ∑n i=1 Ji. Denote by uc t = (uc 1t, … , uc nt) the optimal strategies of the players under such cooperative behavior, and let xc t be the corresponding behavior of the ecological system under consideration. www.it-ebooks.info DYNAMIC GAMES 391 If cooperation fails, each country strives to maximize its individual income. Let uN t = (uN 1t, … , uN 2t) be a Nash equilibrium in this dynamic game. The total payoff JN = ∞∑ t=0 lim 𝛿t n∑ i=1 gi(uit) appears smaller in comparison with the cooperative case. Some countries can form coalitions. We define by JS(u) = ∑∞ t=0 lim 𝛿t ∑ i∈S lim gi(uit)the payoff of a coalition S ∈ N. Suppose that the process evolves according to the cooperative scenario. Then it is neces- sary to distribute the income. For this, we apply certain methods from the theory of cooperative games. Define the characteristic function V(S, 0) as the income of a coalition S in the equilib- rium, when all players from S act as one player and all other players have individual strategies. In this case, V(S,0)=maxui,i∈S lim JS(uN∕uS), where (uN∕uS) = {uN j , j ∉ S, ui, i ∈ S}. Below we analyze two behavioral scenarios of independent players (lying outside the coalition). According to scenario 1, these players adhere to the same strategies as in the Nash equilibrium in the absence of the coalition S. This corresponds to the model, where players know nothing about coalition formation. In scenario 2, players outside the coalition S are informed about it and choose new strategies by making a Nash equilibrium in the game with N∖K players. We call these scenarios by the model without information and the model with information, respectively. As soon as the characteristic function is found, construct the imputation set 𝜉 = {𝜉(0) = (𝜉1(0), … , 𝜉n(0)) : n∑ i=1 lim 𝜉i(0) = V(N,0),𝜉i(0) ≥ V(i,0), i = 1, … , n}. Similarly, one can define the characteristic function V(S, t) and the imputation set 𝜉(t) = (𝜉1(t), … , 𝜉n(t)) at instant t for any subgame evolving from the state xc t . Subsequently, it is necessary to evaluate the optimal imputation by some principle from the theory of cooperative games (e.g., a Nash arbitration solution, the C–core, the Shapley vector, etc.). Note that, once selected, the principle of imputation choice remains invariant. We follow the time-consistent imputation distribution procedure proposed by Petrosjan [1996, 2003]. Definition 10.3 A vector-function 𝛽(t) = (𝛽1(t), … , 𝛽n(t)) forms an imputation distribution procedure (IDP) if 𝜉i(0) = ∞∑ t=0 𝛿t𝛽i(t), i = 1, … , n. www.it-ebooks.info 392 MATHEMATICAL GAME THEORY AND APPLICATIONS The major idea of this procedure lies in distributing the cooperative payoff along a game trajectory. Then 𝛽i can be interpreted as the payment to player i at instant t. Definition 10.4 A vector-function 𝛽(t) = (𝛽1(t), … , 𝛽n(t)) forms a time-consistent IDP, if for any t ≥ 0: 𝜉i(0) = t∑ 𝜏=0 𝛿𝜏𝛽i(𝜏) + 𝛿t+1𝜉i(t + 1), i = 1, … , n. This definition implies the following. Adhering to the cooperative trajectory, players can further receive payments in the form of IDP. In other words, they have no reason to leave the cooperative agreement. Theorem 10.17 The vector-function 𝛽(t) = (𝛽1(t), … , 𝛽n(t)), where 𝛽i(t) = 𝜉i(t) − 𝛿𝜉i(t + 1), i = 1, 2, … , n, forms a time-consistent IDP. Proof: By definition, ∞∑ t=0 𝛿t𝛽i(t) = ∞∑ t=0 𝛿t𝜉i(t) − ∞∑ t=0 𝛿t+1𝜉i(t + 1) = 𝜉i(0). Thus, 𝛽(t) forms an IDP. Now, we demonstrate the time-consistency of this IDP. Actually, this property is immediate from the following equalities: t∑ 𝜏=0 𝛿𝜏𝛽i(𝜏) + 𝛿t+1𝜉i(t + 1)= t∑ 𝜏=0 𝛿𝜏𝜉i(𝜏)− t∑ 𝜏=0 𝛿𝜏+1𝜉i(𝜏 + 1) + 𝛿t+1𝜉i(t + 1) = 𝜉i(0). Definition 10.5 An imputation 𝜉 = (𝜉1, … , 𝜉n) meets the irrational-behavior-proof condition if t∑ 𝜏=0 𝛿𝜏𝛽i(𝜏) + 𝛿t+1V(i, t + 1) ≥ V(i,0) for all t ≥ 0, where 𝛽(t) = (𝛽1(t), … , 𝛽n(t)) is a time-consistent IPD. This condition introduced by D.W.K. Yeung [2006] guarantees that, even in the case of cooperative agreement cancelation, the participants obtain payoffs not smaller than under their initial non-cooperative behavior. As applied to our model, the irrational-behavior-proof condition acquires the form 𝜉i(0) − 𝜉i(t)𝛿t ≥ V(i,0)− 𝛿tV(i, t), i = 1, … , n. There exist another condition (see Mazalov and Rettieva [2009]), being stronger than Yeung’s condition yet easily verifiable. www.it-ebooks.info DYNAMIC GAMES 393 Definition 10.6 An imputation 𝜉 = (𝜉1, … , 𝜉n) satisfies the incentive condition for rational behavior at each shot if 𝛽i(t) + 𝛿V(i, t + 1) ≥ V(i, t) (7.1) for t ≥ 0, where 𝛽(t) = (𝛽1(t), … , 𝛽n(t)) is a time-consistent IDP. The suggested condition stimulates cooperation maintenance by a player, since at each shot the latter benefits more from cooperation than from independent behavior. For our model, the condition (7.1) takes the form 𝜉i(t) − 𝛿𝜉i(t + 1) ≥ V(i, t) − 𝛿V(i, t + 1), i = 1, … , n. (7.2) Clearly, the condition (7.2) directly leads to Yeung’s condition. For evidence, just consider (7.2) at instant 𝜏, multiply by 𝛿𝜏 and perform summation over 𝜏 = 0, … , t. 10.7.2 Fish wars. Model without information We illustrate the application of the time-consistent imputation distribution procedure in the “fish war” model. In the case of two players, optimal control laws in fish wars have been established in Section 10.1. These results can be easily generalized to the case of n players. And so, n countries participate in fishing within a fixed time period. The dynamics of this bioresource obeys the equation (see Levhari and Mirman [1980]) xt+1 = ( 𝜀xt − n∑ i=1 uit )𝛼 , x0 = x, where xt ≥ 0 is the population size at instant t, 𝜀 ∈ (0, 1) stands for the mortality parameter, 𝛼 ∈ (0, 1) corresponds to the fertility parameter, and uit ≥ 0 means the fish catch amount of player i, i = 1, … , n. Consider the dynamic game with the logarithmic utility function of the countries. Then the incomes of the players on infinite horizon make up Ji = ∞∑ t=0 lim 𝛿t log(uit), where 0 <𝛿<1 is the discounting coefficient, i = 1, … , n. Construct the characteristic function in the following case. Any players forming a coalition do not report of this fact to the rest players. For Nash equilibrium evaluation, we address the dynamic programming approach. It is necessary to solve the Bellman equation Vi(x) =maxui≥0 lim { log ui + 𝛿Vi ( 𝜀x − n∑ i=1 ui )𝛼} , i = 1, … , n, www.it-ebooks.info 394 MATHEMATICAL GAME THEORY AND APPLICATIONS We seek for its solution in the form Vi(x) = Ai log x + Bi, i = 1, … , n. Accordingly, optimal control search runs in the class of ui = 𝛾ix, i = 1, … , n. Recall that all players appear homogeneous. Hence, the Bellman equation yields the optimal amounts of fish catch uN i = 1 − 𝛼𝛿 n − 𝛼𝛿(n − 1) 𝜀x (7.3) and the payoffs Vi(x) = 1 1 − 𝛼𝛿 log x + 1 1 − 𝛿 Bi, (7.4) where Bi = 1 1 − 𝛼𝛿 log ( 𝜀 n − 𝛼𝛿(n − 1) ) +log(1 − 𝛼𝛿) + 𝛼𝛿 1 − 𝛼𝛿 log(𝛼𝛿). Next, denote a = 𝛼𝛿. The population dynamics in the non-cooperative case is given by xt = x0 𝛼t ̂xN ∑t j=1 lim 𝛼j , (7.5) where ̂xN = 𝜀a n − a(n − 1) . Find the payoff of each coalition K engaging k players. By assumption, all players outside the coalition adopt their Nash equilibrium strategies defined by (7.3). Take players from the coalition K; seek for the solution of the Bellman equation VK(x) =maxui∈K lim { ∑ i∈K log ui + 𝛿VK ( 𝜀x − ∑ i∈K ui − ∑ i∈N∖K uN i )𝛼} (7.6) among the functions VK(x) = AK log x + BK. The optimal control laws have the form ui = 𝛾K i x, i ∈ K. Again, all players in the coalition K are identical. It follows from equation (7.6) that the optimal amount of fish catch constitutes uK i = (1 − a)(k − a(k − 1)) k(n − a(n − 1)) 𝜀x (7.7) www.it-ebooks.info DYNAMIC GAMES 395 and the payoff of the coalition becomes VK(x) = k 1 − 𝛼𝛿 log x + 1 1 − 𝛿 BK, (7.8) where BK = k 1 − a log (𝜀(k − a(k − 1)) n − a(n − 1) ) + k(log(1 − a) −log(k)) + ka 1 − a log(a). For further exposition, we need the equality BK = kBi + k ( 1 1 − a log(k − a(k − 1)) −log(k) ) . (7.9) Under the existing coalition K, the population dynamics acquires the form xt = x0 𝛼t ̂xK ∑t j=1 lim 𝛼j , (7.10) where ̂xK = 𝜀a(k − a(k − 1)) n − a(n − 1) . Finally, find the payoff and optimal strategies in the case of complete cooperation. For- mulas (7.7) and (7.8) bring to uI i = (1 − a) n 𝜀x, (7.11) VI(x) = n 1 − 𝛼𝛿 log x + 1 1 − 𝛿 BI, (7.12) where BI = nBi + n ( 1 1 − a log(n − a(n − 1)) −log(n) ) . The dynamic control under complete cooperation is determined by xt = x𝛼t 0 ̂xI ∑t j=1 lim 𝛼j , where ̂xI = 𝜀a. Theorem 10.18 Cooperative behavior ensures a higher population size than non- cooperative one. www.it-ebooks.info 396 MATHEMATICAL GAME THEORY AND APPLICATIONS Proof: Obviously, ̂xI = 𝜀a > 𝜀a n − a(n − 1) = ̂xN. However, the optimal amounts of fish catch meet the inverse inequality 𝛾I i = (1 − a)𝜀 n < (1 − a)𝜀 n − a(n − 1) = 𝛾N i . Now, find the characteristic function for the game evolving from the state x at instant t: V(L, x, t) = ⎧ ⎪ ⎨ ⎪⎩ 0, L = 0, V({i}, x, t) = Vi(x), L = {i}, V(K, x, t) = VK(x), L = K, V(I, x, t) = VI(x), L = I. (7.13) Here Vi(x), VK(x), and VI(x) are defined by (7.4), (7.8) and (7.12), respectively. Demonstrate that the constructed characteristic function enjoys superadditivity. For this, take advantage of Lemma 10.2 If c > d, the function f(z) = 1 z log ( 1 + zc 1 + zd ) decreases in z. Proof: Consider f ′(z) = 1 z2 [ z ( c 1 + zc − d 1 + zd ) −log ( 1 + zc 1 + zd )] = 1 z2 g(z). The function g(z) appears non-positive, as far as g(0) = 0 and g′(z) =−z ( c2 (1 + zc)2 − d2 (1 + zd)2 ) ≤ 0 for c > d. This implies that f ′(z) < 0. Theorem 10.19 The characteristic function (7.13) is superadditive, i.e., V(K ∪ L, x, t) ≥ V(K, x, t) + V(L, x, t), ∀t. Proof: It suffices to show that V(K ∪ L, x, t) − V(K, x, t) − V(L, x, t) = AK∪L log(xK∪L) − AK log(xK) − AL log(xL) + 1 1 − 𝛿 (BK∪L − BK − BL) = AK log ( xK∪L xK ) + AL ln ( xK∪L xL ) + 1 1 − 𝛿 (BK∪L − BK − BL) ≥ 0. www.it-ebooks.info DYNAMIC GAMES 397 First of all, we notice that xK = x𝛼t (𝜀a(k − a(k − 1)) n − a(n − 1) )∑t j=1 lim 𝛼j , and so log ( xK∪L xK ) = t∑ j=1 lim 𝛼j log ( k + l − a(k + l − 1) k − a(k − 1) ) > 0, since k + l − a(k + l − 1) k − a(k − 1) − 1 = l(1 − a) k − a(k − 1) > 0. Next, consider the second part and utilize the property (7.9): BK∪L − BK − BL = (k + l)Bi + (k + l) ( 1 1 − a log(k + l − a(k + l − 1)) −log(k + l) ) − kBi − k ( 1 1 − a log(k − a(k − 1)) −log(k) ) − lBi − l ( 1 1 − a log(l − a(l − 1)) −log(l) ) = k ( 1 1 − a log ( k + l − a(k + l − 1) k − a(k − 1) ) −log (k + l k )) + l ( 1 1 − a log ( k + l − a(k + l − 1) l − a(l − 1) ) −log (k + l l )) . Analyze the expression f(a) = 1 1 − a log ( k + l − a(k + l − 1) k − a(k − 1) ) −log (k + l k ) . Denote z = 1 − a, then f(z) = 1 z log (1 + (k + l − 1)z 1 + (k − 1)z ) −log (k + l k ) . It is possible to use the lemma with k + l − 1 = c > d = k − 1. The function f(z) decreases in z. Therefore, f(a) represents an increasing function in a and f(0) = 0. Hence, f(a) possesses non-negative values. Similarly, one can prove that 1 1 − a log ( k + l − a(k + l − 1) l − a(l − 1) ) −log (k + l l ) ≥ 0. www.it-ebooks.info 398 MATHEMATICAL GAME THEORY AND APPLICATIONS Therefore, we have argued that BK∪L − BK − BL ≥ 0. 10.7.3 The Shapley vector and imputation distribution procedure Subsection 10.7.3 selects the Shapley vector as the principle of imputation distribution. In this case, the cooperative income is allocated among the participants in the quantities 𝜉i = ∑ K⊂N, i∈K (n − k)!(k − 1)! n! [ V{K} − V{K⧵i} ] , i ∈ N = {1, … , n}, where k indicates the number of players in a coalition K, V{K} is the payoff of the coalition K and V{K} − V{K⧵i} gives the contribution of player i to the coalition K. Theorem 10.20 The Shapley vector in this game takes the form 𝜉i(t) = 1 1 − a log xt + 1 1 − 𝛿 (Bi + B𝜉), (7.14) with B𝜉 = 1 1 − a log(1 + (n − 1)(1 − a)) −log(n) ≥ 0. Proof: Evaluate the contribution of player i to the coalition K: VK(xt) − VK⧵i(xt) = (AK − AK⧵i) log(xt) + 1 1 − 𝛿 (BK − BK⧵i) = 1 1 − a log xt + 1 1 − 𝛿 Bi + 1 1 − 𝛿 ( k ( 1 1 − a log(1 + (k − 1)(1 − a) ) −log(k) ) − (k − 1) ( 1 1 − a log(1 + (k − 2)(1 − a)) −log(k − 1)) ) . This expression turns out independent from i, which means that 𝜉i(t) = ∑ K⊂N, i∈K (n − k)!(k − 1)! n! [VK(xt) − VK⧵i(xt)] = n∑ k=1 1 n[VK(xt) − VK⧵i(xt)] = 1 1 − a log xt + 1 1 − 𝛿 ( Bi + 1 1 − a log(1 + (n − 1)(1 − a)) −log(n) ) . Theorem 10.21 The Shapley vector (7.14) forms a time-consistent imputation distribution procedure and the incentive condition for rational behavior (7.1) holds true. www.it-ebooks.info DYNAMIC GAMES 399 Proof: It follows from Theorem 10.17 that 𝛽i(t) = 1 1 − a(log xt − 𝛿 log xt+1) + Bi + B𝜉. For each shot, the incentive condition for rational behavior (7.2) becomes 1 1 − a(log xt − 𝛿 log xt+1) + Bi + B𝜉 ≥ 1 1 − a(log xt − 𝛿 log xt+1) + Bi. It is valid, so long as B𝜉 ≥ 0. 10.7.4 The model with informed players Now, consider another scenario when players outside a coalition K are informed on its appearance. Subsequently, they modify their strategies to achieve a new Nash equilibrium in the game with N∖K players. In comparison with the previous case, the whole difference concerns evaluation of the characteristic function VK. Let us proceed by analogy. Take players from the coalition K and solve the Bellman equation ̃VK(x) =maxui∈K lim { ∑ i∈K log ui + 𝛿 ̃VK ( 𝜀x − ∑ i∈K ui − ∑ i∈N∖K ̃uN i )𝛼} , (7.15) where ̃uN i corresponds to the Bellman equation solution for players outside the coalition K: ̃Vi(x) =max̃ui∈N∖K lim { log ̃ui + 𝛿 ̃Vi ( 𝜀x − ∑ i∈K ui − ∑ i∈N∖K ̃ui )𝛼} . (7.16) Seek for solutions of these equations in the form ̃VK(x) = ̃AK log x + ̃BK, ̃Vi(x) = ̃Ai log x + ̃Bi, and the optimal control laws defined by ui = 𝛾K i x, i ∈ K and ̃ui = ̃𝛾N i x. It follows from (7.15) that the optimal amounts of fish catch of the players belonging to the coalition K are ̃uK i = 1 − a k(1 + (n − k)(1 − a)) 𝜀x. (7.17) And so, their payoff makes up ̃VK(x) = k 1 − a log x + 1 1 − 𝛿 ̃BK, (7.18) where ̃BK = k ( 1 1 − a log ( 𝜀 1 + (n − k)(1 − a) ) +log(1 − a) + a 1 − a log(a) −log(k) ) . www.it-ebooks.info 400 MATHEMATICAL GAME THEORY AND APPLICATIONS We present a relevant inequality for further reasoning: ̃BK = kBi + k ( 1 1 − a log (1 + (n − 1)(1 − a) 1 + (n − k)(1 − a) ) −log(k) ) . (7.19) For players outside the coalition K, the optimal amounts of fish catch constitute ̃uN i = 1 − a 1 + (n − k)(1 − a) 𝜀x and the payoffs equal ̃Vi(x) = 1 1 − a log x + 1 1 − 𝛿 ̃Bi, where ̃Bi = 1 1 − a log ( 𝜀 1 + (n − k)(1 − a) ) +log(1 − a) + a 1 − a log(a). The corresponding dynamics in the case of the coalition K acquires the form xt = x𝛼t 0 ̃xK ∑t j=1 lim 𝛼j , where ̃xK = 𝜀a 1 + (n − k)(1 − a) . In the grand-coalition I, the optimal amounts of fish catch and payoffs do coincide with the previous scenario. Therefore, Theorem 10.18 remains in force as well. The characteristic function of the game evolving from the state x at instant t is deter- mined by V(L, x, t) = ⎧ ⎪ ⎨ ⎪⎩ 0, L = 0, V({i}, x, t) = Vi(x), L = {i}, V(K, x, t) = ̃VK(x), L = K, V(I, x, t) = VI(x), L = I, where Vi(x), ̃VK(x), and VI(x) obey formulas (7.4), (7.18) and (7.12), respectively. Similarly to the model without information, we find the Shapley vector and the time- consistent imputation distribution procedure. It appears from (7.18) and (7.19) that 𝜉i(t) = 1 1 − a log xt + 1 1 − 𝛿 (Bi + B𝜉), www.it-ebooks.info DYNAMIC GAMES 401 where B𝜉 = ∑ K∈N lim (n − k)!(k − 1)! n! [ k ( 1 1 − a log ( 1 + (n − 1)(1 − a) 1 + (n − k)(1 − a) ) −log(k) ) − (k − 1) ( 1 1 − a log ( 1 + (n − 1)(1 − a) 1 + (n − k + 1)(1 − a) ) −log(k − 1) )] = n∑ k=1 lim 1 n [ k ( 1 1 − a log ( 1 + (n − 1)(1 − a) 1 + (n − k)(1 − a) ) −log(k) ) − (k − 1) ( 1 1 − a log ( 1 + (n − 1)(1 − a) 1 + (n − k + 1)(1 − a) ) −log(k − 1) )] = 1 1 − a log(1 + (n − 1)(1 − a)) −log(n). By analogy to Theorem 10.21, one can prove Theorem 10.22 The Shapley vector defines the time-consistent IDP and the incentive condition for rational behavior holds true. Finally, we compare these scenarios. Theorem 10.23 The payoffs of free players in the second model are higher than in the first one. Proof: Consider players outside the coalition K and calculate the difference in their payoffs: ̃Vi(x) − Vi(x) = 1 1 − 𝛿 (̃Bi − Bi) = 1 (1 − 𝛿)(1 − a) log (1 + (n − 1)(1 − a) 1 + (n − k)(1 − a) ) > 0. Theorem 10.24 The payoff of the coalition K in the first model is higher than in the second one. Proof: Consider players from the coalition K and calculate the difference in their payoffs: VK(x) − ̃VK(x) = 1 1 − 𝛿 (BK − ̃BK) = k (1 − 𝛿)(1 − a) log ((1 + (n − k)(1 − a))(1 + (k − 1)(1 − a)) 1 + (n − 1)(1 − a) ) > 0, so long as (1 + (n − k)(1 − a))(1 + (k − 1)(1 − a)) 1 + (n − 1)(1 − a) − 1 = (k − 1)(1 − a)2(n − k) 1 + (n − 1)(1 − a) > 0. www.it-ebooks.info 402 MATHEMATICAL GAME THEORY AND APPLICATIONS Theorem 10.25 The population size under coalition formation in the first model is higher than in the second one. Proof: Reexpress the corresponding difference as xK − ̃xK = 𝜀a(k − a(k − 1)) 1 + (n − 1)(1 − a) − 𝜀a 1 + (n − k)(1 − a) = (1 − a)2(n − k)(k − 1) (1 + (n − 1)(1 − a))(1 + (n − k)(1 − a)) . Actually, it possesses positive values, and the conclusion follows. Exercises 1. Two companies exploit a natural resource with rates of usage u1(t) and u2(t). The resource dynamics meets the equation x′(t) = 𝜖x(t) − u1(t) − u2(t), x(0) = x0. The payoff functionals of the players take the form Ji(u1, u2) = ∫ ∞ 0 [ ciui(t) − u2 i (t) ] dt, i = 1, 2. Find a Nash equilibrium in this game. 2. Two companies manufacture some commodity with rates of production u1(t) and u2(t), but pollute the atmosphere with same rates. The pollution dynamics is described by xt+1 = 𝛼xt + u1(t) + u2(t), t = 0, 1, ... The initial value x0 appears fixed, and the coefficient 𝛼 is smaller than 1. The payoff functions of the players represent the difference between their incomes and the costs of purification procedures: Ji(u1, u2) = ∞∑ t=0 𝛽t [ (a − u1(t) − u2(t))ui(t) − cui(t) ] dt, i = 1, 2. Evaluate a Nash equilibrium in this game. 3. A two-player game satisfies the equation x′(t) = u1(t) + u2(t), x(0) = 0, u1, u2 ∈ [0, 1]. www.it-ebooks.info DYNAMIC GAMES 403 The payoff functionals of the players have the form Ji(u1, u2) = x(1) − ∫ 1 0 u2 i (t)dt, i = 1, 2. Under the assumption that player 1 makes a leader, find a Nash equilibrium and a Stackelberg equilibrium in this game. 4. Two companies exploit a natural resource with rates of usage u1(t) and u2(t). The resource dynamics meets the equation x′(t) = rx(t)(1 − x(t)∕K) − u1(t) − u2(t), x(0) = x0. The payoff functionals of the players take the form Ji(u1, u2) = ∫ T 0 e−𝛽t ( ciui(t) − u2 i (t) ) dt, i = 1, 2. Evaluate a Nash equilibrium in this game. 5. Find a Nash equilibrium in exercise no. 4 provided that both players utilize the resource on infinite time horizon. 6. Two players invest unit capital in two production processes evolving according to the equations xi t+1 = aixi t + biui t, yi t+1 = ciyi t + di ( xi t − ui t ) , t = 1, 2, … , T − 1. Their initial values xi 0 and yi 0 (i = 1, 2) are fixed. The payoffs of the players have the form Ji(u1, u2) = 𝛿i ( xi T )2 − T−1∑ t=0 [( yi t − yj t )2 + ( ui t )2 ] , i ≠ j, i = 1, 2. Find a Nash equilibrium in this game. 7. Consider a dynamic game of two players described by the equation x′(t) = 𝜀 + u1(t) + u2(t), x(0) = x0. The payoff functionals of the players are defined by J1(u1, u2) = a1x2(T) − ∫ T 0 [ b1u2 1(t) − c1u2 2(t) ] dt, J2(u1, u2) = a2x2(T) − ∫ T 0 [ b2u2 2(t) − c2u2 1(t) ] dt. Evaluate a Nash equilibrium in this game. www.it-ebooks.info 404 MATHEMATICAL GAME THEORY AND APPLICATIONS 8. Find the cooperative payoff in exercises no. 6 and 7. Construct the time-consistent imputation distribution procedure under the condition of equal payoff sharing by the players. 9. Consider the fish war model with three countries and the core as the imputation distri- bution criterion. Construct the time-consistent imputation distribution procedure. 10. Verify the incentive conditions for rational behavior at each shot in exercises no. 8 and 9. www.it-ebooks.info References Ahn H.K., Cheng S.W., Cheong O., Golin M., Oostrum van R. Competitive facility location: the Voronoi game, Theoretical Computer Science 310 (2004), 457–467. Algaba E., Bilbao J.M., Fernandez Garcia J.R., Lopez, J.J. Computing power indices in weighted multiple majority games, Math. Social Sciences 46, no. 1 (2003), 63–80. Altman E., Shimkin N. Individually optimal dynamic routing in a processor sharing system, Operations Research (1998), 776–784. d’Aspremont C., Gabszewicz and Thisse J.F. On Hotelling’s stability in competition, Econometrica 47 (1979), 1245–1150. Awerbuch B., Azar Y., Epstein A. The price of routing unsplittable flow, Proceedings of the 37th Annual ACM Symposium on Theory of Computing (STOC 2005), 331–337. Aumann R.J., Maschler M. Game theoretic analysis of a bankruptcy problem from the Talmud, Journal of Economic Theory 36 (1985), 195–213. Banzaf J.F.III Weighted voting doesn’t work: a mathematical analysis, Rutgers Law Review 19 (1965), 317–343. Basar T., Olsder G.J. Dynamic noncooperative game theory, Academic Press, New York, 1982. Bellman R.E., Glicksberg I., Gross O.A. Some aspects of the mathematical theory of control processes, Rand Corporation, Santa Monica, 1958. De Berg M., van Kreveld M., Overmars M., Schwarzkopf O. Computational geometry. Springer, 2000. 367 p. Berger R.L. A necessary and sufficient condition for reaching a consensus using De Groot’s method, Journal of American Statistical Association 76 (1981), 415–419. Bertrand J. Theorie mathematique de la richesse sociale, Journal des Savants (1883), 499–508. Bester H., De Palma A., Leininger W., Thomas J. and Von Tadden E.L. A non-cooperative analysis of Hotelling’s location game, Games and Economics Behavior 12 (1996), 165–186. Bilbao J.M., Fernandez J.R., Losada A.J., Lopez J.J. Generating functions for computing power indices efficiently, 8, no. 2 (2000), 191–213. Bondareva O.N. Some applications of linear programming methods to cooperative game theory, Prob- lemi Kibernetiki 10 (1963), 119–139 (Russian). Braess D. Uber ein Paradoxon der Verkehrsplanung, Unternehmensforschung 12 (1968), 258–268. Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info 406 REFERENCES Brams S.J. Game theory and politics. New York: Free Press, 1975. 397 p. Brams S.J. Negotiation games: Applying game theory to bargaining and arbitration. New York: Rout- ledge, 1990. 280 p. Brams S.F., Affuso P.J. Power and size: A new paradox, Theory and Decisions 7 (1976), 29–56. Brams S.J., Kaplan T.R., Kilgour D.M. A simple bargaining mechanism that elicits truthful reservation prices, Working paper 2011/2, University of Haifa, Department of Economics. 2011. Brams S.J. Merill S. Binding versus final-offer arbitration: A combination is best, Management Science 32, no. 10 (1986), 1346–1355. Brams S.J., Merill S. Equilibrium strategies for final-offer arbitration: There is no median convergence, Management Science 29, no. 8 (1983), 927–941. Brams S.J., Merill S. Samuel III final-offer arbitration with a bonus, European Journal of Political Economy 7, no. 1 (1991), 79–92. Brams S.J., Taylor A.D. An envy-free cake division protocol, American Mathematical Monthly 102, no. 1 (1995), 9–18. Brams S. J., Taylor A.D. Fair division: From cake-cutting to dispute resolution. Cambridge University Press, 1996. 272 p. Cardona D., Ponsati C. Bargaining one-dimensional social choices, Journal of Economic Theory 137, Issue 1 (2007), 627–651. Cardona D., Ponsati C. Uniqueness of stationary equilibria in bargaining one-dimensional policies under (super) majority rules, Games and Economic Behavior 73, Issue 1 (2011), 65–75. Chatterjee K. Comparison of arbitration procedures: Models with complete and incomplete information, IEEE Transactions on Systems, Man, and Cybernetics smc-11, no. 2 (1981), 101–109. Chatterjee K., Samuelson W. Bargaining under incomplete information, Operations Research 31, no. 5 (1983), 835–851. Christodoulou G., Koutsoupias E. The price of anarchy of finite congestion games, Proc. of the 37th Annual ACM Symposium on Theory of Computing (STOC 2005), 67–73. Christodoulou G., Koutsoupias E. On the price of anarchy and stability of correlated equilibria of linear congestion games, Lecture Notes in Computer Science 3669 (2005), 59–70. Clark C.W. Bioeconomic modeling and fisheries management, Wiley, New York, 1985. Cournot A.A. Recherches sur les pricipes mathematiques de la theorie des richesses. Paris, 1838. Cowan R. The allocation of offensive and defensive resources in a territorial game, Journal of Applied Probability 29 (1992), 190–195. Dresner Z. Competitive location strategies for two facilities, Regional Science and Urban Economics 12 (1982), 485–493. Dresher M. The Mathematics of games of strategy: Theory and applications. New York: Dover, 1981. Dubins L.E., Spanier E.H. How to cut a cake fairly, American Mathematical Monthly 68 (1961), 1–17. Epstein R.A. The theory of gambling and statistical logic, Academic Press, New York, 1977. Farber H. An analysis of final-offer arbitration, Journal of Conflict Resolution 35 (1980), 683–705. Feldmann R., Gairing M., Lucking T., Monien B., Rode M. Selfish routing in non-cooperative networks: asurvey. Proc. of the 28th International Symposium on Mathematics Foundation of Computer Science, Lecture Notes in Computer Science 2747 (2003), 21–45. Ferguson C., Ferguson T., Gawargy C. Uniform (0,1) two-person poker models, Game Theory and Applications 12 (2007), 17–38. Fudenberg D., Tirole J. Game theory, Cambridge, MIT Press, 1996. www.it-ebooks.info REFERENCES 407 Gairing M., Monien B., Tiemann K. Routing (un-) splittable flow in games with player-specific linear latency functions. Proc. of the 33rd International Colloquium on Automata Languages and Program- ming (ICALP 2006), 501–512. Gibbons R. A primer in game theory, Prentice Hall, 1992. Gerchak Y., Greenstein E., Weissman I. Estimating arbitrator’s hidden judgement in final offer arbitra- tion, Group Decision and Negotiation 13, no. 3 (2004), 291–298. de Groot M.H. Reaching a consensus, Journal of American Statistical Association 69 (1974), 118– 121. Hakimi S.L. On locating new facilities in a competitive environment, European Juornal of Operational Research 12 (1983), 29–35. Harsanyi J.C., Selten R. A generalized Nash solution for two-person bargaining games with incomplete information, Managing Science 18 (1972), 80–106. Harsanyi J.C., Selten R. A general theory of equilibrium selection in games, Cambridge, MIT Press, 1988. Hoede C., Bakker R.R. A theory of decisional power, Journal of Mathematical Sociology 8 (1982), 309–322. Hotelling H. Stability in competition, Economic Journal 39 (1929), 41–57. Hurley W.J. Effects of multiple arbitrators on final-offer arbitration settlements, European Journal of Operational Research 145 (2003), 660–664. Isaacs R. Differential games. John Wiley and Sons, 1965. Karlin S. Mathematical methods and theory in games, programming, and economics,2Vols.Vol.1: Matrix games, programming, and mathematical economics. Vo l . 2 : The theory of infinite games.New York: Dover, 1992. Kats A. Location-price equilibria in a spatial model of discriminatory pricing, Economic Letters 25 (1987), 105–109. Kilgour D.M. Game-theoretic properties of final-offer arbitration, Group Decision and Negotiation 3 (1994), 285–301. Klemperer P. The economic theory of auction, Northampton, MA: Edward Elgar Publishing, Inc. (2000), 399–415. Korillis Y.A.,Lazar A.A., Orda A. Avoiding the Braess’s paradox for traffic networks, Journal of Applied Probability 36 (1999), 211–222. Kuhn H.W. Extensive games and the problem of information, Contributions to the Theory of Games II, Annals of Mathematics Study 28, Princeton University Press, 1953, pp. 193–216. Lemke C.E., Howson J.J. Equilibrium points of bimatrix games, Proceedings of the National Academy Science USA 47 (1961), 1657–1662. Levhari D., Mirman L.J. The great fish war: an example using a dynamic Cournot-Nash solution,The Bell Journal of Economics 11, no. 1 (1980), 322–334. Lin H., Roughgarden T., Tardos E. On Braess’s paradox, Proceedings of the 15th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA04) (2004), 333–334. Lucas W.F. Measuring power in weighted voting systems, Political and Related Models, Edited by Brams S.J., Lucas W.F., Straffin, Springer, 1975, 183–238. Mavronicolas M., Spirakis P. The price of selfish routing, Proceedings of the 33th Annual ACM STOC (2001), 510–519. Mazalov V.V. Game-theoretic model of preference, Game theory and applications 1, Nova Science Publ., N (1996), 129–137. Mazalov V.V., Mentcher A.E., Tokareva J.S. On a discrete arbitration problem, Scientiae Mathematicae Japonicae 63, no. 3 (2006), 283–288. www.it-ebooks.info 408 REFERENCES Mazalov V.V., Panova S.V., Piskuric M., Two-person bilateral many-rounds poker, Mathematical Meth- ods of Operations Research 50, no. 1 (1999). Mazalov V.V., Rettieva A.N. Incentive equilibrium in discrete-time bioresource sharing model, Dokl. Math. 78, no. 3 (2008), 953–955. Mazalov V.V., Rettieva A.N. Incentive conditions for rational behavior in discrete-time bioresource management problem, Dokl. Math. 81, no. 3 (2009), 399–402. Mazalov V.V., Rettieva A.N. Fish wars and cooperation maintenance, Ecological Modeling 221 (2010), 1545–1553. Mazalov V.V., Rettieva A.N. Incentive equilibrium in bioresource sharing problem // Journal of Com- puter Systems Scientific International 49, no. 4 (2010), 598–606. Mazalov V.V., Rettieva A.N. Fish wars with many players, International Game Theory Review 12, issue 4 (2010), 385–405. Mazalov V.V., Rettieva A.N. The discrete-time bioresource sharing model // Journal of Applied Math- ematical Mechanics 75, no. 2 (2011), 180–188. Mazalov V.V., Sakaguchi M. Location game on the plane, International Game Theory Review 5 (2003), no. 1, 13–25. Mazalov V.V., Sakaguchi M., Zabelin A.A. Multistage arbitration game with random offers, Game Theory and Applications 8 (2002), Nova Science Publishers, N.Y., 95–106. Mazalov V., Tokareva J. Arbitration procedures with multiple arbitrators, European Journal of Opera- tional Research 217, Issue 1 (2012), 198–203. Mazalov V.V., Tokareva J.S. Bargaining model on the plane. Algorithmic and Computational Theory in Algebra and Languages, RIMS Kokyuroky 1604. Kyoto University (2008), 42–49. Mazalov V., Tokareva J. Equilibrium in combined arbitration procedure, Proc. II International Conf. in Game Theory and Applications (Qingdao, China, Sep. 17–19), (2007), 186–188. Mazalov V.V., Zabelin A.A. Equilibrium in an arbitration procedure, Advances in Dynamic Games 7 (2004), Birkhauser, 151–162. Milchtaich I. Congestion games with player-specific payoff functions, Games and Economic Behavior 13 (1996), 111–124. Monderer D., Shapley L. Potential games, Games and Economic Behavior 14 (1996), 124–143. Moulin H. Game theory for the social sciences: New York, 1982. Myerson R., Satterthwait M.A. Efficient mechanisms for bilateral trading, Journal of Economic Theory 29 (1983), 265–281. Myerson R., Two-person bargaining problems with incomplete information, Econometrica 52 (1984), 461–487. Nash J. The bargaining problem, Econometrica 18, no. 2 (1950), 155–162. Neumann J. von, O. Morgenstern. Theory of games and economic behavior. Princeton University Press, 1944. Osborne M.J., Rubinstein A. A course in game theory, MIT Press Academic Press, New York, 1977. Owen G., Game theory, Academic Press, 1982. Papadimitriou C.H., Koutsoupias E. Worst-case equilibria, Lecture Notes in Comp. Sci. 1563 (1999), 404–413. Papadimitriou C.H. Algorithms, games, and the Internet, Proceedings of the 33th Annual ACM STOC (2001), 749–753. Parthasarathy T. and Raghavan T.E.S., Some topics in two person games, American Elsevier Publications Co., New York, 1971. www.it-ebooks.info REFERENCES 409 Perry M. An example of price formation in bilateral situations: A bargaining model with incomplete information, Econometrica 54, no. 2 (1986), 313–321. Petrosjan L., Zaccour G. Time-consistent Shapley value allocation of pollution cost reduction, Journal of Economic Dynamics and Control 27, Issue 3 (2003), 381–398. Petrosjan L. A., Zenkevich N. A. Game theory. World Scientific Publisher, 1996. Rosenthal R. W. A class of games possessing pure-strategy Nash equilibria, Int. Journal of Game Theory 2 (1973), 65–67. Roughgarden T. Selfish routing and the price of anarchy, MIT Press, 2005. Roughgarden T., Tardos E. How bad is selfish routing?, JACM, 2002. Rubinstein A. Perfect equilibrium in a bargaining model, Econometrica 50, no. 1 (1982), 97–109. Sakaguchi M. A time-sequential game related to an arbitration procedure, Math. Japonica 29, no. 3 (1984), 491–502. Sakaguchi M. Solutions to a class of two-person Hi-Lo poker, Math. Japonica 30 (1985), 471–483. Sakaguchi M. Pure strategy equilibrium in a location game with discriminatory pricing, Game Theory and Applications 6 (2001), 132–140. Sakaguchi M., Mazalov V.V. Two-person Hi-Lo poker–stud and draw, I, Math. Japonica 44, no. 1 (1996), 39–53. Sakaguchi M., Sakai S. Solutions to a class of two-person Hi-Lo poker, Math. Japonica 27, no. 6 (1982), 701–714. Sakaguchi M., Szajowski K. Competitive prediction of a random variable, Math. Japonica 34, no. 3 (1996), 461–472. Salop S. Monopolitic competition with outside goods, Bell Journal of Economics 10 (1979), 141–156. Samuelson W.F. Final-offer arbitration under incomplete information, Management Science 37, no. 10 (1991), 1234–1247. Schmeidler D. The nucleolus of a characteristic function game, SIAM Journal Applied Mathematics 17, no. 6 (1969), 1163–1170. Shapley L.S. On balanced sets and cores, Naval Research Logistics Quarterly 14 (1967), 453–460. Shapley L.S., Shubik M. A method for evaluation the distribution of power in a committee system, American Political Scientific Review 48 (1954), 787–792. Shiryaev A.N. Probability. Graduate Texts in Mathematics, New York, Springer-Verlag, 1996. Sion M., Wolfe P. On a game without a value, in Contributions to the Theory of Games III. Princeton University Press, 1957. Steinhaus H. The problem of fair division, Econometrica 16 (1948), 101–104. Stevens C.M. Is compulsory arbitration compatible with bargaining, Industrial Relations 5 (1966), 38–52. Stromquist W. How to cut a cake fairly, American Mathematical Monthly 87, no. 8 (1980), 640–644. Tijs S. Introduction to games theory. Hindustan Book Agency, 2003. Vorobiev N.N. Game theory. New York, Springer-Verlag, 1977. Walras L. Elements d’economie politique pure, Lausanne, 1874. Wardrop J.G. Some theoretical aspects of road traffic research, Proceedings of the Inst. Civil Engineers (1952), 325–378. Williams J.D. The Compleat strategyst: Being a primer on the theory of games of strategy, Dover Publications, 1986. Yeung D.W.K. An irrational-behavior-proof condition in cooperative differential games, International Game Theory Review 8, no. 4 (2006), 739–744. www.it-ebooks.info 410 REFERENCES Yeung D.W.K., Petrosjan L.A. Cooperative stochastic differential games. Springer, 2006. 242 p. Zeng D.-Z. An amendment to final-offer arbitration, Mathematical Social Sciences 46, no. 1 (2003), 9–19. Zeng D.-Z., Nakamura S., Ibaraki T. Double-offer arbitration, Mathematical Social Sciences 31 (1996), 147–170. Zhang Y., Teraoka Y. A location game of spatial competition, Math. Japonica 48 (1998), 187–190. www.it-ebooks.info Index Arbitration 37, 42, 43, 45–48, 51, 53, 56, 61, 182, 189, 190, 224–226 conventional 42, 45–46, 224–225 final-offer 42–45, 225–226 with penalty 42, 46–48 procedure 42, 46, 48, 51, 53, 56–61, 63, 182, 189, 228, 229 Auction 78, 79, 190, 196 first-price 79–80 second-price 80–81 (Vickrey auction) Axiom of dummy players 300, 301 of efficiency 300 of individual rationality 281, 282, 300 of symmetry 300 Backward induction method 98–100, 164, 230, 231, 235, 240, 242, 252, 255, 256, 260, 267, 273 Best response 3–5, 8, 9, 19, 46, 77, 78, 116, 134, 135, 138–141, 143, 146, 147, 167–171, 174, 175, 178, 180, 185, 186, 188, 191, 192, 199, 200, 203, 207–211, 213, 215, 219, 220, 227, 233, 235, 237, 240, 255, 258, 272 Braess’s paradox 20 Cake cutting 155–161, 163, 166, 171–172, 174, 181, 182 Candidate object 252, 253, 259, 260, 262, 264 Coalition 278–282, 287–289, 291–293, 298–303, 306–308, 388, 391, 393, 394, 395, 398–402 Commune problem 94 Complementary slackness 14 Conjugate system 363, 366, 367 Conjugate variables 362, 364–366, 374, 380, 383 Continuous improvement procedure 3 Core 281–289, 291, 300 Delay function linear 18, 65, 168, 347 player-specific 75–77, 346, 349, 351 Dominance of imputations 281 Duels 85–87 Duopoly Bertrand 4–5 Cournot 2–3, 9 Hotelling 5–6, 20 Stackelberg 8–9 Equilibrium strategy profile 14, 15, 27, 67, 98, 100, 258 completely mixed 15, 27, 315, 319, 320–324 Mathematical Game Theory and Applications, First Edition. Vladimir Mazalov. © 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd. Companion website: http://www.wiley.com/go/game_theory www.it-ebooks.info 412 INDEX Equivalence of cooperative games 278–280 Excess final best-reply property (FBRP) 77 Final improvement property (FIP) 76 Follower in Stackelberg duopoly 8 Function Bellman 358–360 characteristic 278–280, 282–286, 289, 292, 293, 298–303, 309, 310, 312, 388, 390, 391, 393, 396, 399, 400 elementary 301 generating 304–308 Hamilton 358, 360, 361, 373, 377, 383, 385 superadditive 278, 300, 396 Game animal foraging 70, 71, 73 antagonistic 28, 31, 149 balanced 285–286 bankruptcy 293–298 battle of sexes 12, 13, 17 best-choice 250–254, 258, 259, 264, 267, 269, 276 bimatrix 12, 14–18, 32, 261 city defense 38 Colonel Blotto 34–36, 38, 39, 225 with complete information 96–98, 101, 102, 104, 111 with complete memory 105 with incomplete information 101, 103–105, 109, 111, 112, 190, 195 congestion 73–77, 329 convex 9–12, 14, 39–42, 65–66 convex-concave 37 cooperative 278–311, 389, 391 crossroad 26 dynamic 96, 230, 352–402 in extensive form 96–108 fixed-sum 29 glove market 283 Hawk–Dove 12–13 jazz band 279, 282, 283, 289 in strategic form 1–26, 64–94 linear-convex 37–39 matrix 32, 33, 37–39, 102, 260, 261, 265 multi-shot 270, 272 mutual best choice 269–270 network 314–351 noncooperative 64–93, 101–104, 182, 230, 314, 328, 353, 395 in normal form 1, 28, 29, 64, 96 n-player 64–93 optimal stopping 230–276 poker 111–113, 116, 118, 122, 127, 128, 136 polymatrix 66–68 potential 69 prediction 88–93 preference 129–130, 136, 137, 139 prisoner’s dilemma 12, 13, 16, 17, 193, 202 player-specific 75–77, 346, 349 quasibalanced 288 road construction 279, 291–292, 299 road selection 19 scheduling 279, 284 soccer 147–152 Stone-Scissors-Paper 13 traffic jamming 69, 71, 72 twenty-one 145, 146 2 × 2, 16–18 2 × n and m x 2, 18–20 value lower 29, 35 upper 29, 35, 36 voting 190, 264, 265, 267, 268, 302, 303, 309 weighted 303–305, 308 in 0–1 form 280, 281, 293 zero-sum 28–61, 226, 229 Golden section 50, 53, 159, 233, 237, 272, 334, 335 Hamilton–Jacobi–Bellman equation 358, 360, 361, 373, 377, 383, 385 Hamiltonian 362, 365, 366, 368, 371, 374, 375, 379, 381 Imputation 281–284, 286, 289–292, 294, 296–300, 302, 388–393, 398, 400 Indicator 82, 119, 131, 137, 173, 185, 251, 261, 266, 285, 310 www.it-ebooks.info INDEX 413 Influence level 222, 223, 225, 303, 306, 308, 311 Information network 314, 315 Information set 101–107 Initial state of game 96, 98, 353, 355, 367 Key player 303 KP-player 309, 314, 315 Leader in Stackelberg duopoly 8 Lexicographical minimum 290 Order 289, 297 Linear-quadratic problem 375 Majority rule 174, 176 Martingale 244 Maximal envelope 18, 19 Maximim 29–32, 35 Minimax 29–33, 35, 43 Minimal winning coalition 301, 308, 309 Model Pigou 340, 346 Wardrop 340, 341, 344–346, 349 Nash arbitration solution 391 Negotiations 43, 155–228, 230 consensus 221 with lottery 259, 260, 268 with random offers 48, 171 with voting 264, 265, 267, 268, 302–305, 308, 309 Nucleolus 289 Oligopoly 65, 66, 71, 72 Optimal control 358–361, 363, 365, 367–370, 375–377, 380–383, 385, 388, 393, 394, 399 Optimal routing 315, 319, 320, 328, 332, 335, 337, 340, 341, 343, 344 Parallel channel network 315, 319, 320, 322–324, 327, 328, 340, 346 Pareto optimality 157 Payoff integral 33, 127, 130, 131, 173, 175, 177, 180, 194, 253, 352, 360 terminal 76, 96–99, 101–107, 352, 366 Play in game 31, 97, 98, 101–107, 112, 118, 122, 127, 129, 136, 137, 139, 224 Players 1–5, 7–10, 12–14, 16, 19–21, 23, 28, 31, 35, 38, 42–45, 47, 48, 53, 56, 57, 61, 62, 64, 65–70, 72, 73, 75, 77–82, 85, 87, 88, 96–101, 104, 112, 113, 116, 118, 122, 127–130, 133, 136, 137, 139, 142, 145, 147–149, 151, 152, 155–166, 168–176, 178, 181–187, 189, 190, 193–199, 201–203, 206, 209, 212, 217–228, 230–234, 237, 241, 254, 257, 260, 261, 264, 265, 267–270, 272, 278, 279, 284, 287, 289, 292, 293, 298–304, 308–311, 314, 315, 318, 319, 324, 325, 327, 328–333, 335–341, 344, 346, 347, 349, 351, 352–357, 368–370, 372, 373, 375–377, 379, 383, 386–388, 390–394, 398–402 Pontryagin’s maximum principle for discrete-time problem 361, 365 Potential 69–74, 341–343, 346 Price of anarchy 315, 316, 324–330, 332, 334–337, 339, 340, 344–346, 349, 351 mixed 316, 332, 335 pure 316 Problem bioresource management 366, 378, 383 salary 224 Property of individual rationality 281 of efficiency 282, 300 Randomization 13–15, 31–34, 66 Rank criterion 259, 264 Set of personal positions 96, 98 of positions 99 Sequence 3–5, 33, 75–78, 89, 125, 136, 155, 164, 168, 170, 212, 216, 221, 230, 231, 241, 244, 251, 254, 255, 265, 304, 371 of best responses 3 of improvements 76, 77 www.it-ebooks.info 414 INDEX Social costs 316–318, 320–322, 325, 327, 329, 331, 333, 335–337, 339–341, 343–346, 349 linear 319, 322–324, 328, 332, 335 maximal 316, 324, 335, 336 quadratic 320 Spectrum of strategy 13, 243–249, 253, 254 Stopping time 230 Strategy behavioral 105, 107, 118, 122 equalizing 14 mixed 13, 16–18, 26, 33, 34, 36, 37, 42, 45, 48, 52, 57, 75, 79, 80–82, 87, 90, 103–105, 107, 118, 122, 124, 245, 315, 316, 325, 333 optimal 9, 13, 15, 39, 40, 42, 45, 48, 50, 53, 57, 61, 75, 87, 100, 101, 113, 114, 116, 119–124, 126, 127, 133, 135, 136, 138, 141, 145, 147, 157, 173, 175, 177, 181, 192, 194, 195, 197, 202, 220, 231, 237, 242, 243, 253, 255, 269, 324, 327, 331, 333, 336, 356 pure 13, 15, 26, 40, 42, 44–48, 65, 68–72, 74–80, 82, 104–107, 241, 261, 315–317, 328, 329, 332–334 Strategy profile 2, 12–16, 32, 38, 47, 64–68, 71, 75–79, 81, 97–100, 104, 183, 202, 233, 258, 315–318, 322, 324, 325, 328, 329, 331, 333, 336–339, 341–347, 349 Subgame 97–101, 163, 391 Subgame-perfect equilibrium 98–100, 164–167, 169–171 cooperative 356–357 indifferent 99–101 Nash 98–100 worst-case 316, 317, 325, 329, 330, 335, 336 Stackelberg 8, 27, 370, 373, 386, 387 in subgame 98, 163 Wardrop 338, 339, 342, 344, 345–347, 349 Subtree of game 97, 98 Switching 214, 215, 303, 308, 329 𝜏-value 286, 288, 289 Terminal node 96, 98, 101, 104, 107 Threshold strategy 116, 145, 146, 207, 215, 235, 237, 253–255, 272 Time horizon infinite 60, 174, 377, 383, 393 finite 355, 356, 378 Traffic 69, 71, 72, 94, 328, 329, 333, 335, 337, 339, 340–342, 345 indivisible 315–316, 324, 328–330, 332, 335–337, 340, 341 divisible 337–339, 340, 343, 344, 349 Transversality condition 362, 366 Tree of game 96–98, 101–103 Truels 85–87 Type of player 94, 95, 99 Utopia imputation 286, 289 Vector 13, 98, 107, 149, 161, 172, 174, 182, 183, 221, 223, 243–248, 268, 281, 286–290, 294, 299–303, 308, 310, 315, 316, 328, 362, 365, 391 Banzhaf 303 congestion 75–78 Deegan–Packel 308 Hoede–Bakker 309 Holler 308 minimum rights 287 Shapley 298–303, 391, 398, 400, 401 Shapley–Shubik 303 Voting by majority 306, 308, 309 Voting threshold 265 Winning coalition 301, 303, 308 www.it-ebooks.info WILEY END USER LICENSE AGREEMENT Go to www.wiley.com/go/eula to access Wiley’s ebook EULA. www.it-ebooks.info