imageTakeaway :

This book is a beautiful book that describes the math behind queueing systems. One learns a ton of math tools from this book, that can be used to analyze any system that has a queueing structure within it. The author presents the material in a highly enthusiastic tone with superb clarity. Thoroughly enjoyed going through the book.


In the last few decades, enormous computational speed has become accessible to many. Modern day desktop has good enough memory and processing speed that enables a data analyst to compute probabilities and perform statistical inference by writing computer programs. In such a context, this book can serve as a starting point to anyone who wishes to explore the subject of computational probability. This book has 21 puzzles that can be solved via simulation.

Solving a puzzle has its own advantages. Give a dataset with one dependent variable and a set of predictors to a dozen people asking them to fit a regression model; I bet that you will see at least a dozen models, each of which could be argued as a plausible model. Puzzles are different. There are constraints put around the problem that you are forced to get that ONE RIGHT solution to the problem. In doing so, you develop much more sophisticated thinking skills.

In the introductory chapter of the book, the author provides a basic framework for computational probability by showing ways to simulate and compute probabilities. This chapter gives the reader all the ammunition required to solve the various puzzles of the book. The author provides detailed solutions that includes relevant MATLAB code, to all the 21 puzzles.

Some of my favorite puzzles from the book that are enlightening as well as paradoxical are :

  • ˆ The Gamow-Stern Elevator
  • ˆ The Pipe Smoker’s Discovery
  • ˆ A Toilet Paper Dilemma
  • ˆ Parrondo’s Paradox
  • ˆ How Long Is the Wait to Get the Potato Salad ?
  • ˆ The Appeals court Paradox

Here is the link to my document that flushes out the details of all the 21 puzzles in the book:

What’s in the above document?

I have written R code that aims to computationally solve each of the puzzles in the book. For each puzzle, there are two subsections. First subsection spells out my attempt at solving the puzzle. The second subsection contains my learning from reading through the solution given by the author. The author provides extremely detailed MATLAB code that anyone who has absolutely no exposure to MATLAB can also understand the logic. In many cases I found that the code snippets in the book looked like elaborate pseudo code. There are many good references mentioned for each of the puzzles so that interested readers can explore further aspects. In most of the cases, the reader will realize that closed form solutions are extremely tedious to derive and simulation based procedures make it easy to obtain solutions to many intractable problems.


In a book that has about 350 pages, the first 250 odd pages are devoted to probability, ODEs and difference equations. The last part of the book covers queuing theory for specific systems, i.e, Poisson arrivals, exponential service times of one or more servers. The most painful thing about this book is that there are innumerable typos. A book that is riddled with typos on almost every other page cannot be an appealing text for an undergrad.  My guess is that, this book will be never make it to an undergrad’s study table, unless the authors make a serious effort to publish an errata or come up with a better version of the book. Is there anything good about the book at all ? Well, may be, the chapter on difference equations is worth going over just once. On a second thought, I think that the first 250 pages of the book can be rewritten concisely so that it can be pushed to the appendix.  That leaves the last 100 pages of the book that reads more like a cheat sheet rather than a book from which one can really learn something. This book desperately needs a rewrite from the authors,else it is going to languish alongside the books that die silently every year.


Every year there are at least a dozen pop math/stat books that get published. Most of them try to illustrate a variety of mathematical/statistical principles using analogies/anecdotes/stories that are easy to understand. It is a safe assumption to make that the authors of these books spend a considerable amount of time thinking about the apt analogies to use, those that are not too taxing on the reader but at the same time puts across the key idea. I tend to read at least one pop math/stat book in a year to whet my “analogy appetite”. It is one thing to write an equation about some principle and a completely different thing to be able to explain a math concept to somebody. Books such as these help in building one’s “analogy” database so that one can start seeing far more things from a math perspective. The author of this book, Jordan Ellenberg, is a math professor at University of Wisconsin-Madison and writes a math column for “Slate”. The book is about 450 odd pages and gives a ton of analogies. In this post, I will try to list down the analogies and some points made in the context of several mathematical principles illustrated in the book.

  • Survivorship bias
    • Abraham Wald’s logic of placing armor on engines that had no bullet holes
    • Mutual funds performance over a long period
    • Baltimore stockbroker parable
  • Linearity Vs. Nonlinear behavior
    • Laffer curve
  • Notion of limits in Calculus
    • Zeno’s Paradox
    • Augustin-Louis Cauchy’s and his work on summing infinite series
  • Regression
    • Will all Americans become obese? The dangers of extrapolation
    • Galton Vs. Secrist – “Regression towards mediocrity” observed in the data but both had different explanations. Secrist remained in the dark and attributed mediocrity to whatever he felt like. Secretist thought the regression he painstakingly documented was a new law of business physics, something that would bring more certainty and rigor to the scientific study of commerce. But it was just the opposite. Galton on the other hand was a mathematician and hence rightly showed that in the presence of a random effect, the regression towards mean is a necessary fact. Wherever there is a random fluctuation, one observes regression towards mean, be it mutual funds, performance of sportsmen, mood swings etc.
    • Correlation is non-transitive. Karl Pearson idea using geometry makes it easy to prove.
    • Berkson’s fallacy – Why handsome men are jerks? Why popular novels are terrible?


  • Law of Large numbers
    • Small school vs. Large school performance comparison
  • Partially ordered sets
    • Comparing disasters in human history
  • Hypothesis testing + “P value” + Type I error ( seeing a pattern where there is none) + Type II error(missing a pattern when there is one)
    • Experimental data from dead fish fMRI measurement: Dead fish have the ability to correctly assess the emotions the people in the pictures displayed. Insane conclusion that passes statistical tests
    • Torah dataset (304,8500 letter document) used by a group of researchers to find hidden meanings beneath the stories, genealogies and admonitions. Dangers of data mining.
    • Underpowered test : Using binoculars to detect moons around Mars
    • Overpowered test: If you study a large sample size, you are bound to reject null as your dataset will enable you to see ever-smaller effects. Just because you can detect them doesn’t mean they matter.
    • “Hot hand” in basketball : If you ask the right question, it is difficult to detect the effect statistically. The right question isn’t “Do basket players sometimes temporarily get better or worse at making shots? – the kind of yes/no question a significance test addresses. { Null – No “hothand”, Alternate : “Hot hand” } is an underpowered test . The right question is “How much does their ability vary with time, and to what extent can observers detect in real time whether a player is hot”? This is a tough question.
    • Skinner rejected the hypothesis that Shakespeare did not alliterate!
    • Null Hypothesis Significance testing, NHST,is a fuzzy version of “Proof by contradiction”
    • Testing whether a set of stars in one corner of a constellation (Taurus) is grouped together by chance?
    • Parable by Cosma Shalizi : Examining the livers of sheep to predict about future events. Very funny way to describe what’s going with the published papers in many journals
    • John Ioannidis Research paper “Why most Published Researched Findings Are False”?
    • Tests of genetic association with disease – awash with false positives
    • Example of a low powered study : Paper in Psychological science( a premier journal) concluded that “Married woman were more likely to support Mitt Romney when they were in the fertile portion of their ovulatory cycle”!
    • Low powered study is only going to be able to see a pretty big effect. But sometimes you know that the effect, if it exists, is small. In other words, a study that accurately measures the effect of a gene is likely to be rejected as statistically insignificant, while any result that passes the pvalue test is either a false positive or a true positive that massively overstates the effect
    • Uri Simonsohn, a professor at Penn brilliantly summarizes the problem of replicability as “p-hacking”(somehow getting it to the 0.05 level that enables one to publish papers)


    • In 2013, the association for Psychological science announced that they would start publishing a new genre of articles, called Registered Replication Reports. These reports aimed at reproducing the effects reported in widely cited studies, are treated differently from usual papers in a crucial way: The proposed experiment is accepted for publication before the study is carried out. If the outcomes support the initial finding, great news, but if not they are published anyway so that the whole community can know the full state of the evidence.
  • Utility of Randomness in math
    • “Bounded gaps” conjecture: Is there a bound for the gap between two primes? Primes get rarer and rarer as we chug along integer axis. Then what causes the gap to be bounded?
    • How many twin primes are there in the first N numbers (Among first N numbers, about N/log N are prime)?
    • Mysteries of prime numbers need new mathematical ideas that structure the concept of structurelessness itself
  • How to explain “Logarithm” to a kid? The logarithm of a positive integer can be thought as the number of digits in the positive number.
  • Forecast performance
    • Short term weather forecasts have become a possibility, given the explosion of computing power and big data. However any forecast beyond 2 weeks is dicey. On the other hand, the more data and computing power you have , some problems might yield highly accurate forecasts such as prediction of the course of an asteroid. Whatever domain you work in, you need to consider where does your domain lie between these two examples, i.e. one where big data + computing power helps and the second where big data + computing power + whatever is needed does not help you get any meaningful forecast beyond a short term forecast.
  • · Recommendation Algorithms
    • After decades of being fed with browsing data, recommendations for almost all the popular sites suck
    • Netflix prize, an example that is used by many modern Machine learning 101 courses It took 3 years of community hacking to improve the recommendation algo. Sadly the algo was not put to use by Netflix. The world moved on in three years and Netflix was streaming movies online, which makes dud recommendations less of a big deal.
  • Bayes theorem
    • Which Facebook users are likely to be involved in terrorist activities? Facebook assigns a probability that each of its users is associated with terrorist activities. The following two questions have vastly different answers. You need to be careful about what you are asking.
      1. What is the chance that a person gets put on a Facebook’s list, given that they are not a terrorist?
      2. What’s the chance that a person’s not a terrorist, given that they are on Facebook list ?
    • Why one must go Bayes? P(Data/Null) is what frequentist answers , P(Null/Data) is what a Bayesian answers
    • Are Roulette wheels biased? Use priors and experimental data to verify the same
  • Expected Value
    • Lottery ticket pricing
    • Cash WinFall : How a few groups hijacked the Massachusetts State Lottery ? Link : Boston Globe, that explains why it turned out to be a private lottery.
    • Use the additivity law of expectation to solve Buffon’s Needle problem
  • Utility curve
    • If you miss your flight, how to quantify your annoyance level?
    • Utility of dollars earned for guy moonlighting is different from that of a tenured professor
    • St Petersburg paradox
  • Error correction coding , Hamming code, Hamming distance, Shannon’s work :
    • Reducing variance of loss in Cash WinFall lottery : Choosing the random numbers with less variance is a computationally expensive problem if brute force is used. Information theory and Projective geometry could be the basis on which the successful MIT group generated random numbers that had less variance while betting.
    • Bertillion’s card system to identify criminals and Galton’s idea that redundancy in the card can be quantified, were formalized by Shannon who showed that the correlation between variables reduces the informativeness of a card
  • Condorcet Paradox
    • Deciding a three way election is riddled with many issues. There is no such thing as the public response. Electoral process defines the public response and makes peace with the many paradoxes that are inherent in deciding the public response.

Quotes from the book:

  • Knowing mathematics is like wearing a pair of X-ray specs that reveal hidden structures underneath the messy and chaotic surface of the world
  • Mathematics is the extension of common sense. Without the rigorous structure that math provides, common sense can lead you astray. Formal mathematics without common sense would turn math computations in to sterile exercise.
  • It is pretty hard to understand mathematics without doing mathematics. There is no royal road to any field of math. Getting your hands dirty is a prerequisite
  • People who go into mathematics for fame and glory don’t stay in mathematics for long
  • Just because we can assign whatever meaning we like to a string of mathematical symbols doesn’t mean we should. In math, as in life, there are good choices and there are bad ones. In the mathematical context, the good choices are the ones that settle unnecessary perplexities without creating new ones
  • We have to teach math that values precise answers but also intelligent approximation, that demands the ability to deploy existing algorithms fluently but also the horse sense to work things out on the fly that mixes rigidity with a sense of play. If we don’t do teach it that way, we are not teaching mathematics at all.
  • Field Medalist David Mumford: Dispense plane geometry entirely from the syllabus and replace it with a first course in programming.
  • “Statistically noticeable” / “Statistically detectable” is a better term than using “Statistically significant”. This should be the first statement that must be drilled in to any newbie taking stats101 course.
  • If gambling is exciting, you are doing it wrong – A powerful maxim applicable for people looking for investment opportunities too. Hot stocks provide excitement and most of the times that is all they do.
  • It is tempting to think of “very improbable” as meaning “essentially impossible”. Sadly NHST makes us infer based on “very improbable observation”. One good reason why Bayes is priceless in this aspect
  • One of the most painful aspects of teaching mathematics is seeing my students damaged by the cult of the genius. That cult tells students that it’s not worth doing math unless you’re the best at math—because those special few are the only ones whose contributions really count. We don’t treat any other subject that way. I’ve never heard a student say, "I like ‘Hamlet,’ but I don’t really belong in AP English—that child who sits in the front row knows half the plays by heart, and he started reading Shakespeare when he was 7!" Basketball players don’t quit just because one of their teammates outshines them. But I see promising young mathematicians quit every year because someone in their range of vision is "ahead" of them. And losing mathematicians isn’t the only problem. We need more math majors who don’t become mathematicians—more math-major doctors, more math-major high-school teachers, more math-major CEOs, more math-major senators. But we won’t get there until we dump the stereotype that math is worthwhile only for child geniuses

The book ends with a quote from Samuel Beckett




We see/hear/talk about “Information”  in many contexts. In the last two decades or so, one can also go and make a career in the field of “Information” technology. But what is “Information” ? If someone talks about a certain subject for 10 minutes in English and 10 minutes in French, Is the “Information” same in both the instances?. Can we quantify the two instances in someway ? This book explains Claude Shannon’s remarkable achievement of measuring “Information” in terms of probabilities. Almost 50 years ago, Shannon laid out a mathematical framework and it was an open challenge for engineers to develop devices and technologies that Shannon proved as a “mathematical certainty”. This book distils the main ideas that go in to quantifying information with very little math and hence makes it accessible to a wider audience. A must read if you are curious about knowing a bit about “Information” which has become a part of every day’s vocabulary.



image Takeaway :

I think this book needs to be read after having some understanding of BUGS software and also having some R/S programming skills. That familiarity can help you simulate and check for yourself the various results and graphs, the author uses to illustrate Bayesian concepts. The book starts by explaining the essence of any econometric model and the way in which an econometrician has to put in assumptions to obtain posterior distribution of various parameters. The core of the book is covered in three chapters, the first two chapters covering model estimation and model checking, and the fourth chapter of the book covering MCMC techniques. The rest of the chapters cover linear models, non linear models and time series models. There are two chapters, one on Panel data and one on Instrument variables that are essential for a practicing econometrician for tackling the problem of endogenous variables. BUGS code for all the models explained in the book are given in the appendix and hence the book can serve as a quick reference for BUGS syntax. Overall a self- contained book and a perfect book to start on Bayesian econometric analysis journey.


The author is a CS professor at SUNY, Stony Brook. This book recounts his experience of building a mathematical system to bet on the play outcomes of what is considered the fastest ball game in the world, “Jai alai”. In the English vernacular this is sometimes spelled as it sounds,that is, “hi-li”.  The book recounts the history of the game and how it made to US from Spain and France. However the focus of the book is on using mathematical modeling and computers to analyze the game and design a betting system. The game itself is designed in such a way that it is a textbook case for analyzing the game mathematically. The players enter the competition based on FIFO queue and the player who gets to score 7 points is the winner. It takes hardly a few minutes to understand the game from this wiki.

With the help of some of his grad students, the author works on the following questions :

  • Given a player starts in a specific position, what is probability that he ends up in a Win/Place/Show ?
  • What are the best combination of numbers that have the highest probability of winning a Trifecta ?
  • How does one build a statistical model to evaluate the relative skills of the players ?
  • Given that two players A and B have probabilities of winning as pb and pb, How does one construct a model that evaluates the probability of A winning over B ?
  • How does one create a payoff model for the various bets that are allowed in the game ?
  • How do you deal with missing  / corrupt data ?
  • Given the 1) payoffs of various bets, 2) the probabilities of a player winning from a specific position, and 3) the relative skillsets, how does one combine all of these elements to create a betting strategy ?

I have just outlined a few of the questions from the entire book. There are numerous side discussions that makes the book a very interesting read. Here is one of the many examples from the book that I found interesting :

Almost every person who learns to do simulation comes across Linear congruential generator(LCG), one of the basic number theory technique to generate pseudo random numbers. It has the following recursion form :

By choosing appropriate values for a, c and n, one can generate pseudo random numbers.

The book connects the above recursive form to a roulette wheel :

Why do casinos and their patrons trust that roulette wheels generate random numbers? Why can’t the fellow in charge of rolling the ball learn to throw it so it always lands in the double-zero slot? The reason is that the ball always travels a very long path around the edge of the wheel before falling, but the final slot depends upon the exact length of the entire path. Even a very slight difference in initial ball speed means the ball will land in a completely different slot.

So how can we exploit this idea to generate pseudorandom numbers?A big number (corresponding to the circumference of the wheel) times a big number(the number of trips made around the wheel before the ball comes to rest) yields a very big number (the total distance that the ball travels). Adding this distance to the starting point (the release point of the ball) determines exactly where the ball will end up. Taking the remainder of this total with respect to the wheel circumference determines the final position of the ball by subtracting all the loops made around the wheel by the ball.

The above analogy makes the appearance of mod operator in LCG equation obvious.

One does not need to know much about Jai-alai to appreciate the modeling aspects of the game and statistical techniques mentioned in the book. In fact this book is a classic story of how one goes about modeling a real life scenario and profiting from it.


With total silence around me and my mind wanting to immerse in a book, I picked up this book from my inventory. I came across a reference to this work in Aaron Brown’s book on Risk Management.

First something about the cover:

The young woman on the right is the classical Goddess Fortuna, whom today we might call Lady Luck. The young man on the left is Chance. Fortuna is holding an enormous bunch of fruits, symbolizing the good luck that she can bring. But notice that she has only one sandal. That means that she can also bring bad luck. And she is sitting on a soap bubble! This is to indicate that what you get from luck does not last. Chance is holding lottery tickets. Dosso Dossi was a court painter in the northern Italian city of Ferrara, which is near Venice . Venice had recently introduced a state lottery to raise money. It was not so different from modern state-run lotteries, except that Venice gave you better odds than any state-run lottery today. Art critics say that Dosso Dossi believed that life is a lottery for everyone. Do you agree that life is a lottery for everyone? The painting is in the J. Paul Getty Museum, Los Angeles, and the above note is adapted from notes for a Dossi exhibit, 1999.

The chapter starts with a set of 7 questions and hit is suggested that readers solve them before proceeding with the book.


The first chapter deals with some basic terminology that logicians use. The following terms are defined and examples are given to explain each of them in detail:

  • Argument: A point or series of reasons presented to support a proposition which is the conclusion of the argument.
  • Premises + Conclusion: An argument can be divided in to premises and a conclusion.
  • Propositions: Premises and conclusion are propositions, statements that can be either true or false.
  • Validity of an argument: Validity has to do with the logical connection between premises and conclusion, and not with the truth of the premises or the conclusion. If the conclusion is false, irrespective of whether the premises are true or false, we have an invalid argument.
  • Soundness of an argument: Soundness for deductive logic has to do with both validity and the truth of the premises.
  • Validity vs. Truth: Validity is not truth. It takes premises as true and proceeds to check the validity of a conclusion. If the premises are false, the reasoning can still be valid but not the TRUTH.

Logic is concerned only with the reasoning. Given the premises, it can tell you whether the conclusion is valid or not. It cannot say anything about the veracity of the premises. Hence there are two ways to criticize a deduction: 1) A premise is false, 2) The argument is invalid. So there is a division of labor. Who is an expert on the truth of premises? Detectives, nurses, surgeons, pollsters, historians, astrologers, zoologists, investigative reporters, you and me. Who is an expert on validity? A logician.

The takeaway of the chapter is that valid arguments are risk-free arguments, i.e. given the true premise; you arrive at a valid conclusion

Inductive Logic

The chapter introduces risky-arguments and inductive logic as a mechanism for reasoning. Valid arguments are risk-free arguments. A risky argument is one that is very good, yet its conclusion can be false, even when the premises are true. Inductive logic studies risky arguments. There are many forms of risky arguments like making a statement on population from a statement on sample, making a statement of sample from a statement on population, making a statement on a sample based on statement on another sample etc. Not all these statements can be studied via Inductive logic. Also, there may be more to risky arguments than inductive logic. Inductive logic does study risky arguments— but maybe not every kind of risky argument. The terms introduced in this chapter are

  • Inference to the best explanation
  • Risky Argument
  • Inductive Logic
  • Testimony
  • Decision theory

The takeaway of the chapter is that Inductive logic analyzes risky arguments using probability ideas.

The Gambler’s fallacy

This chapter talks about the gambler’s fallacy who justifies his betting on a red slot roulette wheel; given that last X outcomes on the wheel have been black. His premise is that the wheel is fair, but his action is against the premise where he is questioning the independence of outcomes. Informal Definitions are given for bias, randomness, complexity and no regularity. Serious thinking about risks, which uses probability models, can go wrong in two very different ways. 1) The model may not represent reality well. That is a mistake about the real world. 2) We can draw wrong conclusions from the model. That is a logical error. Criticizing the model is like challenging the premises. Criticizing the analysis of the model is like challenging the reasoning.

Elementary Probability Ideas

This chapter introduces some basic ideas of events, ways to compute probability of compound events etc. The chapter also gives an idea of the different terminologies used by statisticians and logicians, though they mean the same thing. Logicians are interested in arguments that go from premises to conclusions. Premises and conclusions are propositions. So, inductive logic textbooks usually talk about the probability of propositions. Most statisticians and most textbooks on probability talk about the probability of events. So there are two languages of probability. Why learn two languages when one will do? Because some students will talk the event language, and others will talk the proposition language. Some students will go on to learn more statistics, and talk the event language. Other students will follow logic, and talk the proposition language. The important thing is to be able to understand anyone who has something useful to say.

Conditional Probability

This chapter gives formulae for computing conditional probabilities. All the conditioning is done for a discrete random variable. Anything more sophisticated than a discrete RV would have alienated non-math readers of the book. A few examples are given to solidify the notions of conditional probability.

The Basic Rules of Probability & Bayes Rule

Rules of probability such as normality, additivity, total probability, statistical independence are explained via visuals. I think this chapter and previous three are geared towards a person who is a total novice in probability theory. The book also gives an intuition in to Bayes rule using elementary examples that anyone can understand. Concepts such as reliability testing are also discussed.

How to combine Probabilities and Utilities?

There are three chapters under this section. The chapter on expected value introduces a measure of the utility of a consequence and explores various lottery situations to show that cards are stacked against every lottery buyer and the lottery owner always holds an edge. The chapter on maximizing expected value says that one of the ways to choose amongst a set of actions is to choose the one that gives the highest expected value. To compute the expected value one has to represent the degrees of belief by probabilities and the consequences of action via utiles( they can be converted in to equivalent monetary units). Despite the obviousness of the expected value rule, there are a few paradoxes and those are explored in the chapter; the popular one covered being the Allais Paradox. All these paradoxes have a common message – The expected value rule does not factor in such attitudes as risk aversion and other behavioral biases and hence might just be a way to definite utilities in the first place. So, the whole expected value rule is not as water tight as it might seem. Also there are situations where decision theory cannot be of help. One may disagree about the probability of the consequences; one may also disagree about the utilities(how dangerous or desirable the consequences are). Often there is a disagreement about both probability and utility. Decision theory cannot settle such disagreements. But at least it can analyze the disagreement, so that both parties can see what they are arguing about. The last chapter in this section deals with decision theory. The three decision rules explained in the chapter are 1) Dominance rule 2) Expected value rule 3) Dominant expected value rule. Pascal’s wager is introduced to explain the three decision rules. The basic framework is to come up with a partition of possible states of affairs, possible acts that agents can undertake and utilities of the consequences of each possible act, in each possible state of affairs in the partition.

Kinds of Probability

What do you mean ?

This chapter brings out the real meaning of the word, “probability” and probably J the most important chapter of the book.

  1. This coin is biased toward heads. The probability of getting heads is about 0.6.
  2. It is probable that the dinosaurs were made extinct by a giant asteroid hitting the Earth.
    1. The probability that the dinosaurs were made extinct by a giant asteroid hitting the Earth is very high— about 0.9.
  3. Taking all the evidence into consideration, the probability that the dinosaurs were made extinct by a giant asteroid hitting the Earth is about 90%.
  4. The dinosaurs were made extinct by a giant asteroid hitting the Earth.

Statements (1) and (4) [but not (3)] are similar in one respect. Statement (4), like (1), is either true or false, regardless of what we know about the dinosaurs. If (4) is true, it is because of how the world is, especially what happened at the end of the dinosaur era. If (3) is true, it is not true because of “how the world is,” but because of how well the evidence supports statement (4). If (3) is true, it is because of inductive logic, not because of how the world is. The evidence mentioned in (3) will go back to laws of physics (iridium), geology (the asteroid), geophysics, climatology, and biology. But these special sciences do not explain why (3) is true. Statement (3) states a relation between the evidence provided by these special sciences, and statement (4), about dinosaurs. We cannot do experiments to test (3). Notice that the tests of (1) may involve repeated tosses of the coin. But it makes no sense at all to talk about repeatedly testing (3). Statement (2.a) is different from (3), because it does not mention evidence. Unfortunately, there are at least two ways to understand (2.a). When people say that so and so is probable, they mean that relative to the available evidence, so and so is probable. This the interpersonal/ evidential way. The other way to understand(2.a) is based on Personal sense of belief.

Statement (4) was a proposition about dinosaur extinction; (2 ) and (3) are about how credible (believable) (4) is. They are about the degree to which someone believes, or should believe, (4). They are about how confident one can or should be, in the light of that evidence.The use of word probability in statements(2) and (3) are related to the ideas such as belief, credibility, confidence, evidence and general name used to describe them is “Belief-type probability”

In contrast, The truth of statement(1) seems to have nothing to do with what we believe. We seem to be making a completely factual statement about a material object, namely the coin (and the device for tossing it ). We could be simply wrong, whether we know it or not . This might be a fair coin, and we may simply have been misled by the small number of times we tossed it. We are talking about a physical property of the coin, which can be investigated by experiment. The use of probability in (1) is related to ideas such as frequency, propensity, disposition etc. and the general name used to describe these is “frequency-type probability”

Belief-type probabilities have been called “epistemic”— from episteme, a Greek word for knowledge. Frequency-type probabilities have been called “aleatory,” from alea, a Latin word for games of chance, which provide clear examples of frequency-type probabilities. These words have never caught on. And it is much easier for most of us to remember plain English words rather than fancy Greek and Latin ones.

Frequency-type probability statements state how the world is. They state, for example, a physical property about a coin and tossing device, or the production practices of Acme and Bolt. Belief-type probability statements express a person’s confidence in a belief, or state the credibility of a conjecture or proposition in the light of evidence.

The takeaway from the chapter is that any statement with the word, probability carries two types of meanings, belief-type of frequency-type. It is important to understand the exact type of probability that is being talked about in any statement.

Theories about Probability

The chapter describes four theories of probability,

  1. Belief type – Personal Probability
  2. Belief type – Logical Probability – Interpersonal /Evidential probability
  3. Frequency type – Limiting frequency based
  4. Frequency type – Propensity based

Probability as Measure of Belief

Personal Probabilities

This chapter explains the way in which degrees of belief can be represented as betting rates or odds ratio. Let’s say my friend and I enter in to a bet about an event A, let’s say, “India wins the next cricket world cup“. If I think that India is 3 times more likely to win than to lose, then to translate this belief in to bet, I would invite my friend to take part in a bet where the total stake amount is 4000(Rs). My friend has agreed to bet 1000 Rs AGAINST the event and I should take the other side of the bet by offer 3000 Rs. Why is this bet according to my beliefs? My expected payoff is (1000*3/4)+(-3000*1/4=0. My friend’s expected payoff is (-1000*3/4)+(3000*1/4) = 0. Hence from my point of view it is a fair bet. There can be a bet ON the event too. I bet 3000 Rs on the event and my friend is on the other side of the bet with 1000Rs. This is again a fair bet from my belief system as my expected value is (1000*3/4)+(-3000*1/4) and my friend’s expected value is (1000*-3/4)+(3000*1/4). .By agreeing to place a bet on or against the event, my friend and I are quantifying out MY degree of belief in to betting fraction, i.e. my bet/total stake, my friend’s bet/total stake.

It is important to note that this might not be a fair bet according to my FRIEND’s belief system. He might be thinking that the event that “India wins the next cricketing world cup” has 50/50 chance. In that case, if my friend’s belief pans out, he will have an edge betting against the event and he will be at a disadvantage betting for the event. Why? In the former case, his expected payoff would be (-1000*1/2)+(3000*1/2) >0 and in the latter case, it would be (1000*1/2)+(-3000*1/2) <0. As you can see a bet in place means that the bet at least matches the belief system of one of the two players. Generalizing this to a market where investors buy and sell securities and there is a market maker, you get the picture that placing bets on securities is an act of quantifying the implicit belief system of the investors. A book maker / market marker never quotes fair bets, he always adds a component that keeps him safe, i.e., he doesn’t go bankrupt. The first ever example I came across in the context of pricing financial derivatives was in the book by Baxter and Rennie. Their introductory comments that describe arbitrage pricing and expectation pricing sets the tone for a beautiful adventure of reading the book.

The takeaway of this chapter is , 1) belief cannot be measured exactly, 2) you can think of artificial randomizers to calibrate degree of belief.


This chapter explains that betting rates ought to satisfy basic rules of probability. There are three steps to proving this argument,

  1. Personal degrees of belief can be represented by betting rates.
  2. Personal betting rates should be coherent.
  3. A set of betting rates is coherent if and only if it satisfies the basic rules of probability.

Via examples, the chapter shows that any inconsistency in odds quoted for and against by a person will lead to arbitrate in gamble. Hence the betting fractions or the odds should satisfy basic rules of probability.

The first systematic theory of personal probability was presented in 1926 by F. P. Ramsey, in a talk he gave to a philosophy club in Cambridge, England. He mentioned that if your betting rates don’t satisfy the basic rules of probability, then you are open to a sure-loss contract. But he had a much more profound— and difficult— argument that personal degrees of belief should satisfy the probability rules. In 1930, another young man, the Italian mathematician Bruno de Finetti, independently pioneered the theory of personal probability. He invented the word “coherence,” and did make considerable use of the sure-loss argument.

Learning from Experience

This chapter talks about the application of Bayes rule. It’s basically a way to combine personal probability and evidence to get a handle of an updated personal probability. The theory of personal probability was independently invented by Frank Ramsey and Bruno De Finetti. But the credit of the idea— and the very name “personal probability”— goes to the American statistician L. J. Savage (1917– 1971). He clarified the idea of personal probability and combined it with Bayes’ Rule. The chapter also talks about contributions of various statisticians/scientists such as Richard Jeffrey, Harold Jeffrey, Rudolf Carnap, and L.J. Savage, and I.J.Good.

Probability as Frequency

The four chapters under this section explore frequentist ideas. It starts off by describing some deductive connections between probability rules and our intuitions about stable frequencies. Subsequently, a core idea of frequency-type inductive inference— the significance idea is presented. The last chapter in the section presents a second core idea of frequency-type inductive inference— the confidence idea. This idea explains the way opinion polls are now reported. It also explains how we can think of the use of statistics as inductive behavior. Basically all the chapters give a crash course on classical statistics without too much of math.

Probability applied to Philosophy

The book introduces David Hume’s idea that there is no justification for inductive inferences. Karl Popper, another philosopher agreed with Hume but held the view that it doesn’t matter as inductive inferences are invalid. According to Popper, “The only good reasoning is deductively valid reasoning. And that is all we need in order to get around in the world or do science”. There are two chapters that talk about evading Hume’s problem, one via Bayesian evasion(argues that Bayes’ Rule shows us the rational way to learn from experience) and the other one via Behavior evasion(argues that although there is no justification for any individual inductive inference there is still a justification for inductive behavior).

The Bayesian’s response to Hume is :

Hume, you’re right. Given a set of premises, supposed to be all the reasons bearing on a conclusion, you can form any opinion you like. But you’re not addressing the issue that concerns us! At any point in our grown-up lives (let’s leave babies out of this), we have a lot of opinions and various degrees of belief about our opinions. The question is not whether these opinions are “rational.” The question is whether we are reasonable in modifying these opinions in the light of new experience, new evidence. That is where the theory of personal probability comes in. On pain of incoherence, we should always have a belief structure that satisfies the probability axioms. That means that there is a uniquely reasonable way to learn from experience— using Bayes’ Rule.

The Bayesian evades Hume’s problem by saying that Hume is right. But, continues the Bayesian, all we need is a model of reasonable change in belief. That is sufficient for us to be rational agents in a changing world.

The frequentist response to Hume is:

We do our work in two steps: 1) Actively interfering in the course of nature, using a randomized experimental design.2) Using a method of inference which is right most of the time— say, 95% of the time. Frequentist says: “ Hume you are right , I do not have reasons for believing any one conclusion. But I have a reason for using my method of inference, namely that it is right most of the time.”

The chapter ends with a single-case objection and discusses the arguments used by Charles Sanders Pierce. In essence, the chapter under this section point to the conclusion of Pierce:

  • An argument form is deductively valid if the conclusion of an argument of such a form is always true when the premises are true.
  • An argument form is inductively good if the conclusion of an argument of such a form is usually true when the premises are true.
  • An argument form is inductively 95% good if the conclusion of an argument of such a form is true in 95% of the cases where the premises are true.


imageTakeaway :

The field of probability was not discovered; rather, it was created by the confusion of two concepts. The first is the frequency with which certain events recur, and the second is the degree of belief to attach to a proposition. If you want to understand these two schools of from a logician’s perspective and get a grasp on various philosophical takes on the word, “probability”, then this book is a suitable text as it gives a thorough exposition without too much of math.


This book is about a set of letters exchanged between Pascal and Fermat in the year 1654 that led to a completely different way of looking at future. The main content of the letters revolved around solving a particular problem, called “problem of points”. A simpler version of the problem goes like this:

Suppose two players—call them Blaise and Pierre—place equal bets on who will win the best of five tosses of a fair coin. They start the game, but then have to stop before either player has won. How do they divide the pot? If each has won one toss when the game is abandoned after two throws, then clearly, they split the pot evenly, and if they abandon the game after four tosses when each has won twice, they do likewise. But what if they stop after three tosses, with one player ahead 2 to 1?

It is not known how many letters were exchanged between Pascal and Fermat to solve this problem, but the entire correspondence took place in 1654. By the end of it, Pascal and Fermat had managed to do what was unthinkable till then – “Predict the future”, more importantly act based on predicting the future.

Pascal tried to solve the problem using recursion whereas Fermat did it in a simpler way,i.e. by enumerating the future outcomes, had the game continued. The solution gave rise to a new way of thinking and it is said that this correspondence marked the birth of risk management, as we know today.

The book is not so much as an analysis of the solution(as the author believes that today, anyone who has had just a few hours of instruction in probability theory can solve the problem of the points with ease) but more about the developments leading to 1654 and developments after the 1654. In the process, the book recounts all the important personalities who played a role in making probability from a gut based discipline to a rigorous mathematical discipline. The book can be easily read in an hour’s time and could have been a blog post.


I had been intending to read this book for many months but somehow never had a chance to go over it. Unfortunately I fell sick this week and lacked strength to do my regular work. Fortunately I stumbled on to this book again. So, I picked it up and read it cover to cover while still getting over my illness.

One phrase summary of the book is “Develop Bayesian thinking”. The book is a call to arms for acknowledging our failures in prediction and doing something about it. To paraphrase author,

We have a prediction problem. We love to predict things and we aren’t good at it

This is the age of “Big Data” and there seems to be a line of thought that you don’t need models anymore since you have the entire population with you. Data will tell you everything. Well, if one looks at classical theory of statistics where the only form of error that one deals with is the “sampling error”, then the argument might make sense. But the author warns against this kind of thinking saying that, “the more the data, the more the false positives”. Indeed most of the statistical procedures that one come across at the undergrad level are heavily frequentist in nature. It was relevant to an era where sparse data needed heavy assumption laden models. But with huge data sets, who needs models/ estimates? The flip side to this is that many models fit the data that you have. So, the noise level explodes and it is difficult to cull out the signal from the noise. The evolutionary software installed in a human’s brain in such that we all love prediction and there are a ton of fields where it has failed completely. The author analyzes some domains where predictions have failed, some domains where predictions have worked and thus gives a nice compare and contrast kind of insight in to the reasons for predictive efficiency. If you are a reader who is never exposed to Bayesian thinking, my guess is, by the end of the book, you will walk away being convinced that Bayes is the way to go or at least having Bayes thinking is a valuable addition to your thinking toolkit.

The book is organized in to 13 chapters. The first seven chapters diagnose the prediction problem and the last six chapters explore and apply Bayes’s solution. The author urges the reader to think about the following issues while reading through various chapters:

  • How can we apply our judgment to the data without succumbing to our biases?
  • When does market competition make forecasts better- and how can it make them worse?
  • How do we reconcile the need to use the past as a guide with our recognition that the future may be different?

A Catastrophic failure of prediction(Recession Prediction)

Financial Crisis has lead to a boom in one field – “books on financial crisis”. Since the magnitude of impact was so large, everybody had something to say. In fact during the first few months post 2008, I had read at least half a dozen books and then gave up when every author came up with almost similar reasons why such a thing happened? There was nothing to read but books on crisis. Some of the authors even started writing books like they were some crime thrillers. In this chapter, the author comes up with almost the same reasons for the crisis that one has been bombarded earlier

  • Homeowners thought their house prices will go up year after year.
  • Rating agencies had faulty models with faulty risk assumptions.
  • Wall Street took massive leverage bits on housing sector and the housing crisis turned in to a financial crisis.
  • Post crisis, there was a failure to predict the nature and extend of various economic problems.

However the author makes a crucial point that in all of the cases, the prediction were made “Out of sample”. This is where he starts making sense.

  • IF the homeowners had a prior that house prices may fall, they would have behaved differently
  • IF the models had some prior on correlated default behavior, then models would have brought some sanity in to valuations.
  • IF the Wall Street had Bayesian risk pricing, the crisis would have been less harsher
  • IF the post crisis scenarios had sensible priors for forecasting employment rates etc., then policy makers would have been more prudent.

As you can see, there is a big “IF”, which is usually a casualty when emotions run wild, when personal and professional incentives are misaligned and when there is a gap between what we know and what we think we know. All these conditions can be moderated by an attitudinal shift towards Bayesian thinking. Probably the author starts the chapter with this recent incident to show that our prediction problems can have disastrous consequences.

Are you smarter than a Television Pundit ?( Election Result Prediction)

How does Nate Silver crack the forecasting problem? This chapter gives a brief intro to Philip Tetlock’s study where he found hedgehogs fared worse than foxes. There is an interesting book that gives a detailed look at Philip Tetlock’s study titled Future Babble, that makes for quite an interesting read. Nate Silver gives three reasons why he has succeeded with his predictions:

  • Think Probabilistically
  • Update your Probabilities
  • Look for Consensus

If you read it from a stats perspective, then the above three reasons are nothing but, form a prior, update the prior and create a forecast based on the prior and other qualitative factors. The author makes a very important distinction between “objective” and “quantitative”. Often one wants to be former but sometimes end up being latter. Quantitative gives us many options based on how the numbers are made to look like. A statement on one time scale would be completely different on a different time scale. “Objective” means seeing beyond our personal biases and prejudices and seeing the truth or at least attempting to see the truth. Hedgehogs by their very nature stick to one grand theory of universe and selectively pick things to confirm to their theory. In the long run they lose out to foxes that are adaptive in nature and update the probabilities and do not fear making a statement that they don’t know something or they can only make a statement with a wide variability.

I have seen this Hedgehog Vs. Fox analogy in many contexts. Ricardo Rebanato has written an entire book about it saying volatility forecasting should be made like a fox rather than a hedgehog. In fact one of the professors at NYU said the same thing to me years ago,” You don’t need a PhD to do well in Quant finance, You need to be like a fox and comfortable with alternating hypothesis for a problem. Nobody cares whether you have a grand theory for success in trading or not. Only thing that matter is whether you are able to adapt quickly or not.”

One thing this chapter made me think was about the horde of equity research analysts that are on the Wall Street, Dalal Street and everywhere. How many of them have a Bayesian model of whatever securities they are investing? How many of them truly update the probabilities based on the new information that flows in to the market? Do they simulate for various scenarios? Do they active discuss priors and the various assigned probabilities? I don’t know. However my guess is only a few do as most of the research reports that come out contain stories, spinning yarns around various news items, terrific after the fact analysis but terrible before the act statements.

All I care about is W’s and L’s( Baseball Player Performance Prediction)

If you are not a baseball fan but have managed to read “Money ball” or watched the same titled movie starring Brad Pitt, one knows that baseball as a sport has been revolutionized by stat geeks. In the Money ball era, insiders might have hypothesized that stats would completely displace scouts. But that never happened. In fact Billy Beane expanded the scouts team of Oakland A’s. It is easy to get sucked in to some tool that promises to be the perfect oracle. The author narrates his experience of building one such tool PECOTA. PECOTA crunched out similarity scores between baseball players using nearest neighbor algorithm, the first kind of algo that you learn in any machine learning course. Despite its success, he is quick to caution that it is not prudent to limit oneself to gather only quantitative information. It is always better to figure out processes to weigh the new information. In a way this chapter says that one cannot be blinded by a tool or a statistical technique. One must always weight every piece of information that comes in to the context and update the relevant probabilities.

The key is to develop tools and habits so that you are more often looking for ideas and information in the right places – and in honing the skills required to harness them in to wins and losses once you have found them. It’s hard work.(Who said forecasting isn’t?)

For Years You have been telling us that Rain is Green( Weather Prediction)

This chapter talks about one of the success stories in prediction business, “weather forecasting”. National Hurricane Center predicted Katrina five days before the levees were breached and this kind of prediction was unthinkable 20-30 years back. The chapter says that weather predictions have become 350% more accurate in the past 25 years alone.

The first attempt to weather forecasting was done by Lewis Fry Richardson in 1916. He divided the land in to a set of square matrices and then used the local temperature, pressure and wind speeds to forecast the weather in the 2D matrix. Note that this method was not probabilistic in nature. Instead it was based on first principles that took advantage of theoretical understanding of how the system works. Despite the seemingly commonsensical approach, Richardson method failed. There are couple of reasons, one Richardson’s methods required awful lot of work. By 1950, John Von Neumann made the first computer forecast using the matrix approach. Despite using a computer, the forecasts were not good because weather conditions are multidimensional in nature and analyzing in a 2D world was bound to fail. Once you increase the dimensions of analysis, the calculations explode. So, one might think with exponential rise in computing power, weather forecasting problem might have been a solved problem in the current era. However there is one thorn in the flesh, the initial conditions. Courtesy chaos theory, a mild change in the initial conditions gives rise to a completely different forecast at a given region. This is where probability comes in. Meteorologists run simulations and report the findings probabilistically. When someone says there is 30% chance of rain, it basically means that 30% of their simulations showed a possibility of rain. Despite this problem of initial conditions, weather forecasting and hurricane forecasting have vastly improved in the last two decades or so. Why? The author gives a tour of World Weather office in Maryland and explains the role of human eyes in detecting patterns in weather.

In any basic course on stats, a healthy sense of skepticism towards human eyes is drilled in to students. Typically one comes across the statement that human eyes are not all that good at figuring out statistically important patterns, i.e. pick signal from noise. However in the case of weather forecasting, there seems to be tremendous value for human eyes. The best forecasters need to think visually and abstractly while at the same time being able to sort through the abundance of information that the computer provides with.

Desperately Seeking Signal ( Earthquake Prediction)

The author takes the reader in to the world of earthquake prediction. An earthquake occurs when there is a stress in one of the multitude of fault lines. The only recognized relationship is the Gutenburg- Ritcher law where the frequency of earthquakes and the intensity of earthquakes form an inverse linear relationship on a log-log scale. Despite this well known empirical relationship holding good for various datasets, the problem is with temporal nature of the relationship. It is one thing to say that there is a possibility of earthquake in the coming 100 years and completely different thing to say that it is going to hit in between Xth and Yth years. Many scientists have tried working on this temporal problem. However a lot of them have called quits. Why? It is governed by the same chaos theory type dependency of initial conditions. However unlike the case of weather prediction where science is well developed, the science of earthquakes is surprisingly missing. In the absence of science, one turns to probability and statistics to give some indication for forecast. The author takes the reader through a series of earthquake predictions that went wrong. Given the paucity of data and the problem of over fitting, many predictions have gone wrong. Scientists who predicted that gigantic earthquakes would occur at a place were wrong. Similarly predictions where everything would be normal fell flat on the face when earthquakes wreathed massive destruction. Basically there has been a long history of false alarms.

How to Drown in Three Feet of Water(Economic variable Prediction)

The chapter gives a brief history of US GDP prediction and it makes abundantly clear that it has been a big failure. Why do economic variable forecasts go bad ?

  1. Hard to determine cause and effect
  2. Economy is forever changing
  3. Data is noisy

Besides the above reasons, the policy decision effect the economic variable at any point in time. Thus an economist has a twin job of forecasting the economic variable as well as policy. Also, the sheer number of economic indicators that come out every year is huge. There is every chance that some of the indicators might be correlated to the variable that is being predicted. Also it might turn out that an economic variable is a lagging indicator in some period and leading indicator in some other period. All this makes it difficult to cull out the signal. Most often than not the economist picks on some noise and reports it.

In one way, an economist is dealing with a system that has similar characteristics of a system dealt by meteorologist. Both weather and economy are highly dynamic systems. Both are extremely sensitive to initial conditions. However meteorologist has had some success mainly because there is some rock solid theory that helps in making predictions. Economics on the other hand is a soft science. So, given this situation, it seems like predictions for any economic variable are not going to improve at all .The author suggests two alternatives:

  1. Create a market for accurate forecasts – Prediction Markets
  2. Reduce demand for inaccurate and overconfident forecasts – Make margin of error reporting compulsory for any forecast and see to it that there is a system that records the forecast performance. Till date, I have never seen a headline till date saying ,” This year’s GDP forecast will be between X% and Y %”. Most of the headlines are point estimates and they all have an aura of absolutism. May be there is a tremendous demand for experts but we don’t have actually that much demand for accurate forecasts.

Role Models (Epidemiological predictions)

This chapter gives a list of examples where flu predictions turned out to be false alarms. Complicated models are usually targeted by people who are trying to criticize a forecast failure. In the case of flu prediction though, it is the simple models that take a beating. The author explains that most of the models used in flu prediction are very simple models and they fail miserably. Some examples of scientists trying to get a grip on flu prediction are given. These models are basically agent simulation models. However by the end of the chapter the reader gets a feeling the flu prediction is not going to easy at all. In fact I had read about Google using search terms to predict flu trends. I think the period was 2008. Lately I came across an article that said Google’s flu trend prediction was not doing that good!. Out of all the areas mentioned in the book, I guess flu prediction is the toughest as it contains multitude of factors, extremely sparse data and no clear understanding about how it spreads.

Less and Less and Less Wrong

The main character of the story in this chapter is Bob Voulgaris, a basketball bettor. His story is a case in point of a Bayesian who is making money by placing bets in a calculated manner. There is no one BIG secret behind his success. Instead there are a thousand little secrets that Bob has. This repertoire of secrets keeps growing day after day, year after year. There are ton of patterns everywhere in this information rich world. But whether the pattern is a signal or noise is becoming increasing difficult to say. In the era of Big Data, we are deluged with false positives. There is a nice visual that I came across that excellently summarizes the false positives of a statistical test. In one glance, it cautions us to be wary of false positives.


The chapter gives a basic introduction to Bayes thinking using some extreme examples like, what’s the probability that your partner is cheating on you ? If a mammogram shows gives a positive result, what’s the probability that one has a cancer ?, What’s the probability of a terrorist attack on the twin towers after the first attack? These examples merely reflect the wide range of areas where Bayes can be used. Even though Bayes theory was bought to attention in 1763, major developments in the field did not take place for a very long time. One of the reasons was Fisher, who developed frequentist way of statistics and that caught on. Fischer’s focus was on sampling error. In his framework , there can be no other error except sampling error and that reduces as sample size approaches the population size. I have read in some book that the main reason for popularity of Fisher’s framework was that it contained the exact steps that an scientist needs to follow to get a statistically valid result. In one sense, he democratized statistical testing framework. Fisher created various hypothesis testing frameworks that could be used directly by many scientists. Well, in the realm of limited samples, limited computing power, these methods thrived and probably did their job. But soon, frequentist framework started becoming a substitute for solid thinking about the context in which hypothesis ought to be framed. That’s when people noticed that frequentist stats was becoming irrelevant. In fact in the last decade or so, with massive computing power, everyone seems to be advocating Bayesian stats for analysis. There is also a strong opinion of replacing the frequentist methodologies completely by Bayesian Paradigm in the schooling curriculum.

Rage against the Machines

This chapter deals with chess, a game where initial conditions are known, the rules are known and chess pieces move based on certain deterministic constraints. Why is such a deterministic game appearing in a book about forecasting ? Well, the reason being that, despite chess being a deterministic game, any chess game can proceed in one of the 1010^50, i.e. the number of possible branches to analyze are more than the number of atoms in the world. Chess comprises of three phases, the opening game, the middle game and the end game. Computers are extremely good in the end game as there are few pieces on the board and all the search path of the game can be analyzed quickly. In fact all the end games with six or fewer pieces have been solved. Computers also have advantage in the middle game where the game complexity increases and the computer can search an enormously long sequence of possible steps. It is in the opening game that computers are considered relatively weak. The opening of a game is a little abstract. There might be multiple motives behind a move, a sacrifice to capture the center, a weak move to make the attack stronger etc. Can a computer beat a human ? This chapter gives a brief account of the way Deep Blue was programmed to beat Kasparov. It is fascinating to learn that Deep Blue was programmed in ways much like how a human plays a game. The banal process of trial and error. The thought process behind coding Deep Blue was based on questions like :

  • Does allotting the program more time in the endgame and less in the midgame improve performance on balance?
  • Is there a better way to evaluate the value of a knight vis-à-vis a bishop in the early going?
  • How quickly should the program prune dead-looking branches on its search tree even if it knows there is some residual chance that a checkmate or a trap might be lurking there?

By tweaking these parameters and seeing how it played with the changes, the team behind Deep Blue improved upon slowly and eventually beat Kasparov. I guess the author is basically trying to say that even in such deterministic scenarios, trial and error,fox like thinking is what made the machine powerful.

The Poker Bubble

This chapter is an interesting chapter where the author recounts his experiences with playing poker, not merely as a small time bystander but as a person who was making serious money in six figures in 2004 and 2005. So, here is a person who is not giving some journalistic account of the game. He has actually played the game, made money and he is talking about why he succeeded. The author introduces what he calls prediction learning curve where if you do 20% of things right, you get 80% of the times forecasts right. Doing this and making money in a game means there must be people who don’t do these 20% of the things right. In a game like poker, you can make money if there are enough suckers. Once the game becomes competitive and suckers are out of the game, the difference between an average player and an above average player in terms of their winning stakes is not much. In the initial years of Poker bubble, every person wanted to play poker and become rich quickly. This obviously meant that there were enough suckers in the market. The author says he was able to make money precisely because of the bubble. Once the fish were out of the game, it became difficult for him to make money and ultimately the author had to give up and move on. The author’s message is

It is much harder to be very good in fields where everyone else is getting the basics right—and you may be fooling yourself if you think you have much of an edge.

Think about stock market. As the market matures, the same lots of mutual fund managers try to win the long only game, the same options traders try to make money off the market. Will they succeed? Yes if there are enough fish in the market. No, if the game is played between almost equals. With equally qualified grads on the trading desks, with the same colocated server infra, can HFTs thrive ? May be for a few years but not beyond that, is the message from this chapter.

The author credits his success to picking his battles well. He went in to creating software for measuring and forecasting baseball player’s performance in the pre-money ball era. He played poker when there was a boom and where getting 20% of things right could reap good money for him. He went in to election outcome forecasting when most of the election experts were not doing any quantitative analysis. In a way, this chapter is very instructive for people trying to decide on the fields where their prediction skills can be put to use. Having skills alone is not enough. It is important to pick the right fields where one can apply those skills.

If you can’t beat ‘em(Stock Market Forecasting)

The author gives an account of a prediction markets site, Intrade run by a Wharton professor Justin Wolfers. These markets are the closest thing to Bayes land where if you have believe in certain odds and see that there is someone else having a different odds for the same event, you enter in to a bet and resolve the discrepancy. One might think that stock markets also perform something similar, where investors with different odds for the same event settle their scores by entering in to a financial transaction. However the price is not always right in the market. The chapter gives a whirlwind tour of Fama’s efficient market theory, Robert Shiller’s work, Henry Blodget’s fraud case etc. to suggest that market might be efficient in the long run but the short run is characterized by noise. Only a few players benefit in the short run and the composition of the pool changes from year to year. Can we apply Bayes thinking to markets ? Prediction markets are something that is close to Bayes land. But markets are very different. They have capital constraints, horizon constraints, etc. Thus even though your view is correct, the market can stay irrational for a longer time. So, applying Bayesian thinking to markets is a little tricky. The author argues that market is a two way track, one that is driven by fundamentals and pans out in the long run correctly, the second is a fast street that is populated by HFT traders, algo traders, noise traders, bluffers etc. According to the author, Life in the fast lane is high risk game that not many can play and sustain over a period of time.

A climate of healthy Skepticism(Climate Prediction)

This chapter talks about the climate models and the various uncertainties/issues pertinent to building such long range forecasting models.

What you don’t know can hurt you (Terrorism Forecasting)

This chapter talks about terrorist attacks, military attacks etc. and the contribution of having a Bayes approach. Post Sept 11, the commission report identified “failure of imagination” as one of the biggest failures. The Nationality security just did not imagine such a thing would happen. Basically they were completely blinded to a devastation of such scale. Yes, there were a lot of signals but all of them seem to make sense after the fact. The chapter mentions Aaron Clauset, a professor at the University of Colorado who compares a terrorist attack prediction to that of an earthquake prediction. One known tool in the earthquake prediction domain is the loglog scale plot of frequency to the intensity. In the case of terrorist attacks, one can draw such a plot to at least acknowledge that an attack that might kill a million Americans is a possibility. Once that is acknowledged the terrorist attacks falls under known unknown category and at least a few steps can be taken by national security and other agencies to ward off the threat. There is also a mention of Israeli approach to terrorism where the Israeli govt. makes sure that people get back to their normal lives soon after a bomb attack and thus reducing the “fear” element that is one of the motives of a terrorist attack.


The book is awesome in terms of its sheer breadth of coverage. It gives more than a bird’s eye view of forecast / prediction performance in the following areas:

  • Weather forecasts
  • Earthquake forecasts
  • Chess strategy forecasts
  • Baseball player performance forecasts
  • Stock market forecasts
  • Economic variable forecasts
  • Political outcome forecasts
  • Financial crisis forecasts
  • Epidemiological predictions
  • Baseball outcome predictions
  • Poker strategy prediction
  • Climate prediction
  • Terrorist attack prediction

The message from the author is abundantly clear to any reader at the end of the 500 pages. There is a difference between what we know and what we think we know. The strategy to closing the gap is via Bayesian thinking. We live in an incomprehensibly large universe. The virtue in thinking probabilistically is that you will force yourself to stop and smell the data—slow down, and consider the imperfections in your thinking. Over time, you should find that this makes your decision making better. “Have a prior, collect data, observe the world, update your prior and become a better fox as your work progresses” is the takeaway from the book.

Next Page »


Get every new post delivered to your Inbox.

Join 155 other followers