### October 2011 Gibbs Sampling is an important method in the context of Bayesian estimation. There are other books on R that focus on Bayesian Simulation (Jim Albert), Monte Carlo simulation (Robert and Casella) that are very good and little advanced. So, where does this book fit in? If one has never done any kind of simulation in R, this book is the ideal reference. For someone who is already aware of the basic simulation stuff, there are 3 chapters in the book that talk about Markov Chain, Bayesian Estimation and Gibbs Sampler that are worth going over. I will try to summarize the main points of the book.

The authors suggest going over Appendix to R newbie. In the appendix, the book talks about basic installation procedure, vectorization present in R, basic syntax for loops, basic graphs using base package and the sample function. In all likelihood a person reading this book might already be very familiar with the basic syntax and operations in R. So, appendix is more a formality in this book. The book contains 10 chapters. The first 7 chapters of the book can be used as a supplementary material for any frequentist based course and the last three chapters for Bayesian Course.

Chapter 1: Introductory Examples: Simulation, Estimation, and Graphics:

The entire chapter is about sample function that is used to generate bootstrap samples to answer some probability questions like birthday matching problem, Envelope mismatch problem etc. By going over this section, a reader will understand all the arguments of the function “sample” in R. The chapter then talks about coverage probabilities for confidence interval, which is nothing but the proportion of time that the interval contains the true value of interest. Using a simple binomial case, the coverage probabilities are shown for various population parameter values. A first timer would find this topic illuminating and will start developing a skeptical eye towards confidence intervals based on frequentist methods. For a simple binomial proportion case, once the true parameter moves away from 0.5, the coverage probabilities drop a lot. There is a mention about Agresti-Coull confidence intervals that addresses coverage probability problem. The best thing about this chapter is the last exercise where Beta distribution is used to model binomial proportion and Bayes is used to form credible intervals for the parameter. Frankly a person exposed to Bayesian Stats will just ignore all these fancy methods like Agresti-Coull method etc and go ahead with prior+ likelihood + posterior modeling of the situation.

Chapter 2: Generating Probabilities:

This chapter starts off by talking about the ways to generate random numbers. Firstly, the random numbers that are generated by computer are pseudo random numbers as there is nothing that is perfectly random. The word “random” is more appropriately applied to the process by which a number is produced than the number itself. The section briefly traces the history of pseudo random numbers starting from Linear Congruential generator, RANDU, Mersenne twister. A few conditions are mentioned for any generator to be useful in real life, i.e. 1) large modulus and full period, 2) histogram of scaled values should look uniform, 3) pair wise independence of the values. R has Mersenne twister algo behind runif() command and its power is really amazing. Its behavior has been tested up to 623 consecutive dimensions and has a period of 4.32 followed by 60001 zeroes. These distinct values from runif() are mapped in to roughly 4.3 billion numbers that can be expressed within the precision of R. That’s the power you have when you use generators from R. The section then moves to generating random variables from uniform random variates. The quantile transformation method is used to simulate values from distributions such as beta, binomial, chi-square, exponential, standard normal, normal, Poisson, Rayleigh etc. Knowing functional forms of the density for these distributions are useful in general, as they help one immediately recognize them when they appear in various contexts. For example if you see f(x) = constant * x ^(constant- 1) , you can immediately decipher that the distribution can be a beta in some form. I mean, these are little things but are very useful. Let’s say you see a function whose cumulative density function is sqrt(x) , you should immediately of relate this to beta(1/2,1) distribution. These little linkages will help one decide the components of model better. This chapter will enable a reader to simulate values from the popular distributions, either using uniforms / standard normal variates.

Chapter 3: Monte Carlo Integration and Limit Theorems

The chapter starts off with solving an integral problem with Riemann integration method, Monte Carlo Integration, Accept-Reject Method and Random sampling method. It then explores law of large numbers and explains convergence in probability using visuals. You can’t use Law of Large numbers for Cauchy; a fact that is showed using integrals and drilled in to every student in an elementary course on statistics. But a simple visual shows it instantly. The section then talks about limiting behavior of Markov process and shows that the convergence of Markov process to be very slow. Convergence in distribution is explained using Central Limit theorem, the crucial difference between LLN and CLT being that former doesn’t given any indication about the uncertainty of the estimate while the latter gives an idea about the error possibility. The section ends with showing connections between Riemann integral and Monte Carlo simulation. It then shows the importance of Monte Carlo Simulation in higher dimensional parameter problems where Riemann integral using grid based approach turns out to be computationally inefficient.

Chapter 4: Sampling from Applied Probability Models

This chapter goes in to sampling from known distributions to estimate probabilities of situations where closed form solutions are painful/ sometimes intractable. It starts off with looking at Poisson process, exponential holding times of the process and tries to apply sampling procedures to solve simple problems from Queuing theory. Based on the assumptions of inter arrival times of customer and distribution of servicing time of various servers, many kinds of estimates can be obtained by approximating values from sampling distribution. Another application explored is order statistics. Most of the distribution related questions about Order statistics can be solved using painful integration procedures and it is a classic area where simulation gives you quick answers. Another advantage of simulation is that you can quickly check your intuition. Most of the intro stat books tell you that sample mean and sample variance are independent variables for a normal distribution. For other distributions like exponential, it doesn’t hold well. To test this, one can easily simulate some random samples from whatever non-normal distribution you want to check, look at the joint distribution of sample variance and sample mean visually. Visuals do a good job of showing the independence of these variables, though merely relying on some correlation metric could be dicey(as correlation is merely a measure of linear dependence). The chapter ends with an introduction to Bootstrapping method. I vividly remember the days when I first got introduced to bootstrapping. I was so kicked about this concept as it relieved me of memorizing painful formulae (2 sample mean comparison tests equal / unequal sample sizes). It’s a shame that I actually memorized these things way back during my MBA days and never could understand the real practical applications. Thankfully Bootstrapping got my interest back in to stats. The section provides a few exercises that illustrate the difference between non-parametric bootstrapping method vis-à-vis parametric bootstrapping methods.

A cool thing one can learn from this chapter is the projection of n dim space in to sufficient statistic space. Let’s say you simulate 5 Random variables from Beta (0.3, 0.3) distribution and then you plot the sample mean vs. sample sd. The data fall inside the 5-dimensional unit hypercube, this has 32 vertices. A large proportion of data points will fall near the vertices, edges and faces of the hypercube. The horns in the below plots are images of these vertices under the transformation from the 5-dimensional data space to the 2-dimensional space of sufficient statistics. The horns correspond to the various vertices and actually each of horns correspond to choose(5,0) , choose(5,1), choose(5,2), choose(5,3), choose(5,4), choose(5,5) vertices of the hypercube. I guess such learning is valuable as one starts visualizing stuff in higher dimensional / multivariate cases. Chapter 5: Screening Tests

This chapter gives an intro in to an estimation problem where traditional methods fail/ give absurd values. It subsequently introduces Bayes theorem, thus showing the importance of conditional probabilities in statistical inference problems. The relevance of these conditional distributions to sampling from target distributions is evident from the Gibbs Sampling method covered in the subsequent chapters.

Chapter 6: Markov Chains with Two States

Finally after 5 chapters, a reader gets to see Gibbs sampling procedure. The chapter starts off by introducing a Markov chain, a particular kind of stochastic process that allows for the possibility of a limited dependence amongst a set of random variables. Basic terminology is introduced like state space, transition matrix, homogeneity etc. The book is more a “DO” book where you are expected to understand stuff by coding up things. An example is shown where a sample R code is used to show the long run frequency of states in a simple 2 state Markov chain. acf plots are used as an exploratory tool to test out the rapidity of convergence of the Markov chain. Transition matrices are then introduced to show the ease of computation by looking at problems from a linear algebra perspective. The r step transition matrix boils down to multiplication of the transition matrix r times. Subsequently, the limiting behavior of the chain is shown. One crucial point that these examples bring out is that the rate of convergence of powers of transition matrix is not the same thing as the rate of convergence of the chain. The examples make it very clear that the transition matrix might converge very quickly but the chain might take a very long time to converge, and vice-versa. The last section of this chapter introduces Gibbs sampling where a set of conditional distributions are used to simulate the Markov chain. It is amply clear from the simple example mentioned in the chapter that it is difficult to simulate values from just one conditional distribution. The Gibbs sampling algo is meant to address this situation where the chain moves from one conditional distribution to another in a systematic manner so that the resulting Markov chain, after the initial burn out period, converges to the limiting distribution.

Chapter 7: Example of Markov Chains with Larger State Spaces

Personally, the main reason for going over this book was to understand the content in this chapter. I was looking for some simple examples to understand and internalize the core logic of Metropolis algo, Metropolis-Hastings algo and Gibbs Sampling algo and this chapter was perfect fit for my requirement. The chapter starts off with expanding on the state space of the markov chain, where K state, countably many and a continuum of states are considered. These types of state spaces are what we come across in real life situations. In the context of K state space markov chain, the chapter mentions 5 methods of computing the long run distribution. They are

1. Powers of the transition matrix
2. Simulation
4. Explicit algebraic solution
5. Means of geometric distributions

Out of the above methods, the text uses the first two methods only. Powers of the transition matrix is obviously one of the straightforward but rather a lame way of doing stuff. Till what power do you multiply? What if the matrix converges very slowly? etc are some of the questions that one must have a clue before using this method. I guess this method is at best a diagnostic tool. The second method is based on simulation. You simulate a Kstate space markov chain and then examine the limiting behavior of the markov chain. The downside of this method is that we must be sure that it is an ergodic chain that we are simulating. The section also has an example where the long-run behavior for a non-ergodic chain is the absorption of the chain in to the absorbing states of the markov chain. The section briefly talks about countably infinite state spaces with a few examples of simple random walk with varying drifts. Finally, Continuous state spaces are covered where the transition density plays the role of transition matrix that was relevant for the discrete case. An important point illustrated from an example is that “A state space of finite length does not ensure useful long-run behavior”. The highlight of the chapter is using a simple bivariate normal variates to show the application of Metropolis Algorithm , its tweak for asymmetric jumps,i.e Metropolis-Hastings algorithm and the most important Gibbs Sampling algorithm. I think the example mentioned in the context of Gibbs sampling is probably easier to remember and internalize. Since the conditional distribution of a normal random variable given another normal random variable is again normal, one can easily apply Gibbs sampling for a bivariate normal case to see its effectiveness. Once the basic algo is internalized, one can then use it for complicated cases. The beauty of this book is that there are enough R based exercises that you get a pretty good idea of the concepts involved. Exercises for this chapter mainly helped me in understanding the following aspects

• What’s a doubly stochastic matrix? Can it be non-ergodic ?
• How to code a reflecting barrier in an efficient way?
• Assuming that there is a continuous state space Markov chain that is over a finite interval. One cannot conclude that it has a long run behavior. Using a markov chain based on beta distributions, there are 2 problems mentioned where one has a long run behavior and the other doesn’t.
• How to code Metropolis-Hastings algo efficiently ?
• How to simulate a finite markov chain efficiently?
• How to code a Gibbs sampler efficiently?

There are R codes given for each of these algos that make this chapter a very useful one.

Chapter 8: Introduction to Bayesian Estimation

The chapter talks about basic applications of Bayesian statistics. The crux of the Bayes stats is : Posterior probability is proportional to likelihood times prior probability. There are 4 examples mentioned that cover the most popular distributions:

• Beta Prior + Binomial likelihood = Beta Posterior
• Gamma Prior + Poisson likelihood = Gamma posterior
• Unknown mu – Normal prior + Normal likelihood = Normal Posterior
• Unknown sigma – Gamma for tolerance + Normal likelihood = Inverse Gamma for sigma.

Chapter 9: Using Gibbs Samplers to Compute Bayesian Posterior Distributions

In previous chapter, analytical form for posterior distribution was derived and used in the computation. However closed form solutions for posterior distributions are very difficult and more so in higher dimensional parameter space. This chapter delves in to the utility of Gibbs Sampler in computing posterior distributions. If you assume a normal prior and want to estimate mean , assuming you know the population standard deviation, then posterior distribution of mean turns out to be another normal distribution. If you assume a gamma prior for the precision and want to estimate standard deviation, assuming you know the population mean, then posterior distribution of standard deviations turns out to be inverse gamma distribution. What if you are unaware of mean and sd of the population ? This is where things become slightly complicated. Through a simple example of height differences, the chapter introduces this problem and solves it using Gibbs sampler algorithm.

Here is a hypothetical example, let’s say you are interested in knowing a variable X . You can take a random sample of 40 data points , call it xi. You are interested in knowing the mean and standard deviation of X and Y. Let’s say you observe the data as follows.( Actually I generated this data with the population mean of 40 and sd as 4, so that it will help me in comparing the effectiveness of the Gibb sampler)

Data (40 numbers) :

39 41 39 38 36 32 40 47 37 41 40 46 42 43 41 38 39 32 35 42 41 34 32 38 44 41 42 37 40 35 34 36 42 30 48 32 39 43 32 37

Lets say I assume a prior for mu as normal (30, 5) .. This is a vague prior looking at data I have assumed that the mean might be 30 and the sd for the same be 5.Lets say I assume a prior for sd^2 Inverse Gamma(5, 38) . How do I get these numbers ? Well, my vague prior for sigma is that it lies between 2 and 6 – 95% of the times, i.e sigma square lies between 4 and 36 – 95% of the times, i.e 1/ sigma square lies between 1/4 and 1/36 – 95% of the times.. I assume Gamma ( alpha, kappa) in such a way that this area under the prob distribution between 1/4 and 1/36 is 95%. A simple trial and error will give the values as 5 and 38 for the gamma distribution. This means that 1/sigma^square is gamma(5,38) and hence sigma.squared is Inverse Gamma(5, 38). Now I have my priors ready, I can use Gibbs sampling to get the posterior distribution of the random variable. Since marginals (mu | x,sigma.squared ) and (sigma.squared| x,mu) have closed forms , it is easy to actually code up the Gibbs sampler manually and see the results. I generated large # of sample paths, discarded the first 50% of the steps and obtained the posterior mean as 38.9 and posterior. sd as 4.19. As one can see the posterior mean and posterior sd come very close to the population mean and sd of 40 and 4 respectively with merely 40 data points. Once you increase the sample, the credible interval for the parameter is narrowed and more robust estimates are obtained.

Not always you manage to get closed form for partial conditionals, i.e. parameter1 given (data, parameter2,….parameter n) need not always lead to closed form. In fact most often than not, it won’t be a closed form. Here is where BUGS comes in to picture and makes our life easy. Anyways that is covered in the last chapter, so will summarize about BUGS at the relevant section of this post.

The chapter introduces another example where frequentist stats fall short in giving answers. In a two step manufacturing process, there is a need to understand whether there is variance at the batch level or whether there is variable at the individual unit level with in the batch. Using gibbs sampling, the book makes a strong case for using Bayes to get a clear understanding of the variability. This example is pretty involved and it took me 3-4 hours to understand it in its entirety. However the effort was well worth it as it gave me a better understanding of stuff. With Bayes, all you need is a good specification of priors + specification of likelihood— you don’t even have to code the partial conditionals…BUGS will do the job for you . The last chapter in this book is all about BUGS.

Chapter 10: Using Win BUGS for Bayesian Estimation

The last chapter in the book is like a wonderful dessert after a nice meal. If you want to do anything in Bayes world, you got to learn BUGS. Like any other language/ tool, there is the initial pregnancy period one needs to go through where things grow in you , you have nurture it and only then you can actually produce something real Well, my analogy might be stretched but it is something which I think is very apt. Some of the people I have met think that learning skill should be done very quickly. If they fail to get it in the first few days, they either assume that the whole thing is boring or think it is not worth it. If they do get a part of stuff correctly, they assume that they know everything about it. I had a lot of trouble teaching undergrads calculus as adjunct faculty. There were some kids who were willing to slog , but the majority were looking for instant solutions/ instant techniques to solve problems. Alas! They failed to recognize that “what comes easy never sticks”. I think I am going tangential here. Let me stick to the intent of this post, i.e summarizing the book.

This last chapter gives a very good introduction to BUGS. After going through this BUGS with the examples mentioned, one can fairly confident of testing out various models and meta-models. As such BUGS UI is not all that difficult. The syntax is very much R like and one can easily learn it. The content this chapter will make a reader curious to wrap Bugs functions with R/ Matlab/ whatever be the programming environment that he/she is comfortable. That’s when things start getting interesting. Let’s say you want to use a truncated normal prior, you cannot do it with plain vanilla BUGS. You have to look around and install a few plugins to specify truncated normal prior. Things like these can be learnt/ internalized as a consequence of this wonderful chapter. This chapter alone can be read from the book, by someone who is looking for BUGS 101 principles. Takeaway:
This book is much more than an exposition of “Gibbs Sampling Method”. By systematically explaining the various algorithms that go with Bayesian estimation, it changes the way you look and formulate a statistical model in R. The tight linkage of all relevant concepts with excellent R code is the highlight of this book. Not many books manage this kind of balance between theory and practice. Emanuel Derman’s book, “Models.Behaving.Badly”, gives a physicist and quant’s perspective of models Vs theories, their nature, what to expect of them, how to differentiate between them and how to cope with their inadequacies. Derman starts off by citing a few incidents from his childhood in Africa, where political models, social movement models failed badly. He then talks about the need to understand the crucial difference between theories and models, thus providing the necessary motivation for a reader to go over the book. The book is not as much as describing the precise faulty nature of all the financial models, which would be quite a stretch that it would not fit in a single book. The book is more about the kind of basic things that a financial modeler must always keep in mind and remain humble in his pursuit, always remembering that his model will always say “what something is like?”. It will never be able to answer “what something is”. Let me summarize the main chapters of this book.

Metaphors, Models and Theories

In this chapter, Derman makes the distinction between “theory” and “model”. He starts off by quoting Arthur Schopenhauer’s metaphor about sleep.

Sleep is the interest we have to pay on the capital which is called in at the death; and the higher the rate of interest and more regularly it is paid, the further the date of redemption is postponed.

In essence, this metaphor is saying that Life is temporary nonblackness. At birth you receive a loan, consciousness and light borrowed from the void, leaving a hole in the emptiness. The hole will grow bigger each day. Nightly, by yielding temporarily to the darkness of sleep, you restore some of the emptiness and keep the hole from growing limitlessly.

What is the relevance of metaphors in a book that aims to talk about models? Well, the connection is this: A language is nothing but a tower of metaphors, each “higher” one resting on a “lower” one and all of them resting on non-metaphorical words and concepts. Metaphors by default say that X is something like Y. This is similar to a Model that tells us “what something is like”. Derman then cites Dirac, who came up with a similar metaphor in physics where he pictured positron, as a brief fluctuation in the vacuum. However what grounds Dirac’s work is the Dirac’s equation. This fundamental equation when combined with the metaphor successfully predicted the existence of a particle no one had seen before. Metaphor by itself seldom qualifies as a theory.

Why do we need Models at all? Because the world is filled with quasi-regularities that hint at deeper causes. We need models to explain what we see and to predict what will occur. So, the important thing to note is that Models are a kind of proxy to the deeper causes. Whatever be the model, i.e. a Model T from Ford, a fashion Models, an artists’ Models, a weather Model, an economic Models, Black-Scholes Model; they are all a proxy to the real world which is too complex for us to understand. A model thus is a metaphor of limited applicability, not the thing itself. A model is a caricature that overemphasizes some features at the expense of others. The world is impossible to grasp in it’s entirely. We focus on only a small part of its vast confusion and here is where models come in, They reduce the number of dimensions and allow us to make a little extrapolation in that proxy world.

What is a theory?
Weather model’s equations are a model, but Dirac equation is a theory. Similarly you build an econometric model for interest rate prediction, the equations are a model, but the Newton’s equation is a theory. What’s the difference? Derman goes on to give a nice explanation to explain this difference.

Models are analogies; they always describe one thing relative to something else. Models need a defense or an explanation. Theories, in contrast are the real thing. They need confirmation than explanation. A theory describes essence. A successful theory becomes a fact. So, basically what he saying is “Dirac equation IS the electron”, “Maxwell’s’ equations ARE electricity and magnetism”. A theory becomes virtually indistinguishable from the object itself. This is not the case with Models. By default, there is always a gap between models and real thing. A Theory is deep, A Model is shallow. A Theory doesn’t simplify. It observes the world and tries to describe the principles by which the world operates. A theory can be right or wrong but it is characterized by its intent: the discovery of essence. You can layer metaphors on the top of the equations of a theory but the equation is the essence.

The chapter ends with Derman narrating his experience with the doctors. About 25 years ago, Derman went through a retina surgery and since then a peculiar problem haunts him, Monocular Diplopia (seeing double in one eye). This problem keeps recurring at random times. In 2008 he gets an acute problem in his eye and visits n number of doctors for a remedy. None of them seem to get a handle on the problem which appears like a problem relating to retina. Finally a technician cracks the problem by looking at the problem by a simple examination of the symptoms and not bringing in too many assumptions about the problem. Derman then says that this expert problem persists in many of us, who do not step back and question the assumptions from time to time. If you ask an active fund manager who has beaten the market for x number of consecutive years, he starts attributing his success to skill rather than luck/ extraneous factors. Similarly if you go and ask a passive fund manager, he might scorn at arbitrageurs, who breathe day-in day-out “the inefficient market hypothesis”. I remember something from Twyla Tharp’s book, “The creative habit”, that goes something like this: one should always consider oneself to be inexperienced in whatever they pursue. Putting X number of years, managing X number of people, executing X number of ideas, doing X number of surgeries etc should never make one feel that he/she is an expert. Because once you think you are expert, you will definitely develop some amount of fear, be it financial/psychological/ social / intellectual/ etc… however Inexperience erases fear.. One must constantly question the assumptions in a model so that one does not risk of becoming an expert in one specific area / specific type of modeling. This also means that if you are P type quant(buy side), it might make sense to develop a Q type quant model(sell side) and vice-versa, the reason being the kind of modeling that happens on the two sides are different and you never know which model might becomes useful in a particular time frame.

The Absolute

Derman talks about the nature of theories by illustrating Baruch Spinoza’s analysis of human emotions. Spinoza theory is similar to Euclid’s geometry. Euclid starts off with a few axioms and develops entire structure of geometry. Spinoza does the same with respect to human emotions. The following diagram summarizes Spinoza’s logical structure behind the theory: This shows that Spinoza’s theory basically has three primitives, pain, pleasure and desire. There are derivative emotions like pity, cruelty etc.

In one of the blog entries, Derman gives a beautiful summary of the Spinoza theory:

That thing is called free which exists from the necessity of its nature alone, and is determined to act by itself alone. But a thing is called necessary, or rather compelled, which is determined by another to exist and to produce an effect in a certain and determinate nature.

Underlyers <==> free. Derivatives <==> compelled. Derivatives suffer passions, Underlyers have actions. Try to be an underlyer, not a derivative

What’s the relevance to the context of the book? Well, as one can see, Spinoza’s theory is not saying “something is like something”. It basically states things and shows the way world IS, or in the case the emotions ARE. Also there is also something that one can learn from Spinoza, the way to extend our inadequate knowledge. Spinoza was of the view that some of the ways to extend our inadequate knowledge was a) via particulars, b) via generalities and c) via intuition. The author gives a set of examples to show that most of breakthrough theories in science occurred via intuition that various scientists had about their inventions. Once intuition was firmly in place, ONLY then the mathematical framework was used to show it to the world.

The Sublime

Derman gives a whirlwind tour of electromagnetism and the importance of Maxwell’s intuition in development of the entire theory. Along the way Derman drives home the point that Maxwell’s equations IS electromagnetism. The theory becomes indistinguishable from the object itself. Derman also mentions about “renormalization” in physics and “recalibration” in quant’s world and says both are completely different though they seem to be connoting the same meaning. In Physics the normal and abnormal are governed by the same laws, whereas in markets the normal is normal only while people behave conventionally. In crisis the behavior of people changes and normal model fails. While Quantum electrodynamics is a genuine theory of reality, financial models are only mediocre metaphors for a part of it.

This chapter is the core of the book where Derman relegates almost all the financial models to imperfect models and says financial modeling is not the physics of the markets. Like there are theorems in mathematics, laws in physics, one tends to use phrases such as “fundamental theorem of finance” etc, which actually makes no sense. Derman raises a critical question, “How can a field whose focus is the management of money and assets possess a theorem? and physics a field which deals with real life lack a fundamental theorem “.

The basic thing that every financial modeler tries to have a grip on is “value” of a security. Unlike physics where fundamental particle’s value / charge is absolute, there is nothing absolute about the value of a financial security. Value is determined by people and people change their minds. So, it does appear that whatever value that equity-research people, technical analysts, quants come up with, it is all a big fraud.

Derman rips apart the EMM / CAPM model and raises the critical question: Is EMM a theory or a model and concludes that EMM is not a theory, it is not even a good model, it is an ineffectual model. No financial theory can dictate what return an investor should expect in exchange for taking risk that too depends on appetite and varies with time. CAPM is a useful way of thinking about a model world that is quite often far from the world we live in. Derman , on the other hand, does not criticize Black-Scholes model and says that it is the closest to robustness that any model has been in finance.Why? the replication argument says that the risk premium for a stock and risk-free bond should be same as the option, an argument which is robust even though one can debate about the computation of risk premium.

The chapter concludes with a reference to the movie Bedazzled where the protagonist fails to woo the waitress in all the seven scenarios he seeks a wish for. Devil outwits in all the seven scenarios. Derman concludes saying

The difficulty of the hopeful would-be lover is the same difficult we face when specifying the future scenarios in the financial models; as does the devil, markets eventually outwit us. The devil is indeed in the details. Even if the markets are not strictly random, their vagaries are too rich to capture in a few short sentences or equations.

Breaking the cycle

In the final chapter of the book, Derman comes to the rescue of models and says that they are useful in finance because

• Models facilitate interpolation, whatever be its rudimentary form
• Models transform intuition in dollar value( example – implied vol of the option)
• Models are used to rank securities by value

And the right way to use models are by following a few rules like

• Avoid Axiomatization
• Good Models are vulgar in sophisticated way
• Sweep Dirt under the Rug, but Let Users Know about it
• Use imagination
• Think of Models as Gedankenexperiments
• Beware of Idolatry

The book ends with Financial Modeler’s Manifesto, an ethical declaration for scientists applying their skills to finance.

The Modeler’s Hippocratic Oath

• I will remember that I didn’t make the world, and it doesn’t satisfy my equations.
• Though I will use models boldly to estimate value, I will not be overly impressed by mathematics.
• I will never sacrifice reality for elegance without explaining why I have done so.
• Nor will I give the people who use my model false comfort about its accuracy. Instead, I will make explicit its assumptions and oversights.
• I understand that my work may have enormous effects on society and the economy, many of them beyond my comprehension.

This book is a must read for financial modelers as it shows that , whatever model one comes up, it is just a metaphor. If one keeps this in mind while developing models, it is likely that at least the end result will be a vulgar but useable & practical model instead of a mathematically elegant but a totally useless model.