I had been intending to read this book for many months but somehow never had a chance to go over it. Unfortunately I fell sick this week and lacked strength to do my regular work. Fortunately I stumbled on to this book again. So, I picked it up and read it cover to cover while still getting over my illness.
One phrase summary of the book is “Develop Bayesian thinking”. The book is a call to arms for acknowledging our failures in prediction and doing something about it. To paraphrase author,
We have a prediction problem. We love to predict things and we aren’t good at it
This is the age of “Big Data” and there seems to be a line of thought that you don’t need models anymore since you have the entire population with you. Data will tell you everything. Well, if one looks at classical theory of statistics where the only form of error that one deals with is the “sampling error”, then the argument might make sense. But the author warns against this kind of thinking saying that, “the more the data, the more the false positives”. Indeed most of the statistical procedures that one come across at the undergrad level are heavily frequentist in nature. It was relevant to an era where sparse data needed heavy assumption laden models. But with huge data sets, who needs models/ estimates? The flip side to this is that many models fit the data that you have. So, the noise level explodes and it is difficult to cull out the signal from the noise. The evolutionary software installed in a human’s brain in such that we all love prediction and there are a ton of fields where it has failed completely. The author analyzes some domains where predictions have failed, some domains where predictions have worked and thus gives a nice compare and contrast kind of insight in to the reasons for predictive efficiency. If you are a reader who is never exposed to Bayesian thinking, my guess is, by the end of the book, you will walk away being convinced that Bayes is the way to go or at least having Bayes thinking is a valuable addition to your thinking toolkit.
The book is organized in to 13 chapters. The first seven chapters diagnose the prediction problem and the last six chapters explore and apply Bayes’s solution. The author urges the reader to think about the following issues while reading through various chapters:
How can we apply our judgment to the data without succumbing to our biases?
When does market competition make forecasts better- and how can it make them worse?
How do we reconcile the need to use the past as a guide with our recognition that the future may be different?
A Catastrophic failure of prediction(Recession Prediction)
Financial Crisis has lead to a boom in one field – “books on financial crisis”. Since the magnitude of impact was so large, everybody had something to say. In fact during the first few months post 2008, I had read at least half a dozen books and then gave up when every author came up with almost similar reasons why such a thing happened? There was nothing to read but books on crisis. Some of the authors even started writing books like they were some crime thrillers. In this chapter, the author comes up with almost the same reasons for the crisis that one has been bombarded earlier
Homeowners thought their house prices will go up year after year.
Rating agencies had faulty models with faulty risk assumptions.
Wall Street took massive leverage bits on housing sector and the housing crisis turned in to a financial crisis.
Post crisis, there was a failure to predict the nature and extend of various economic problems.
However the author makes a crucial point that in all of the cases, the prediction were made “Out of sample”. This is where he starts making sense.
IF the homeowners had a prior that house prices may fall, they would have behaved differently
IF the models had some prior on correlated default behavior, then models would have brought some sanity in to valuations.
IF the Wall Street had Bayesian risk pricing, the crisis would have been less harsher
IF the post crisis scenarios had sensible priors for forecasting employment rates etc., then policy makers would have been more prudent.
As you can see, there is a big “IF”, which is usually a casualty when emotions run wild, when personal and professional incentives are misaligned and when there is a gap between what we know and what we think we know. All these conditions can be moderated by an attitudinal shift towards Bayesian thinking. Probably the author starts the chapter with this recent incident to show that our prediction problems can have disastrous consequences.
Are you smarter than a Television Pundit ?( Election Result Prediction)
How does Nate Silver crack the forecasting problem? This chapter gives a brief intro to Philip Tetlock’s study where he found hedgehogs fared worse than foxes. There is an interesting book that gives a detailed look at Philip Tetlock’s study titled Future Babble, that makes for quite an interesting read. Nate Silver gives three reasons why he has succeeded with his predictions:
Update your Probabilities
Look for Consensus
If you read it from a stats perspective, then the above three reasons are nothing but, form a prior, update the prior and create a forecast based on the prior and other qualitative factors. The author makes a very important distinction between “objective” and “quantitative”. Often one wants to be former but sometimes end up being latter. Quantitative gives us many options based on how the numbers are made to look like. A statement on one time scale would be completely different on a different time scale. “Objective” means seeing beyond our personal biases and prejudices and seeing the truth or at least attempting to see the truth. Hedgehogs by their very nature stick to one grand theory of universe and selectively pick things to confirm to their theory. In the long run they lose out to foxes that are adaptive in nature and update the probabilities and do not fear making a statement that they don’t know something or they can only make a statement with a wide variability.
I have seen this Hedgehog Vs. Fox analogy in many contexts. Ricardo Rebanato has written an entire book about it saying volatility forecasting should be made like a fox rather than a hedgehog. In fact one of the professors at NYU said the same thing to me years ago,” You don’t need a PhD to do well in Quant finance, You need to be like a fox and comfortable with alternating hypothesis for a problem. Nobody cares whether you have a grand theory for success in trading or not. Only thing that matter is whether you are able to adapt quickly or not.”
One thing this chapter made me think was about the horde of equity research analysts that are on the Wall Street, Dalal Street and everywhere. How many of them have a Bayesian model of whatever securities they are investing? How many of them truly update the probabilities based on the new information that flows in to the market? Do they simulate for various scenarios? Do they active discuss priors and the various assigned probabilities? I don’t know. However my guess is only a few do as most of the research reports that come out contain stories, spinning yarns around various news items, terrific after the fact analysis but terrible before the act statements.
All I care about is W’s and L’s( Baseball Player Performance Prediction)
If you are not a baseball fan but have managed to read “Money ball” or watched the same titled movie starring Brad Pitt, one knows that baseball as a sport has been revolutionized by stat geeks. In the Money ball era, insiders might have hypothesized that stats would completely displace scouts. But that never happened. In fact Billy Beane expanded the scouts team of Oakland A’s. It is easy to get sucked in to some tool that promises to be the perfect oracle. The author narrates his experience of building one such tool PECOTA. PECOTA crunched out similarity scores between baseball players using nearest neighbor algorithm, the first kind of algo that you learn in any machine learning course. Despite its success, he is quick to caution that it is not prudent to limit oneself to gather only quantitative information. It is always better to figure out processes to weigh the new information. In a way this chapter says that one cannot be blinded by a tool or a statistical technique. One must always weight every piece of information that comes in to the context and update the relevant probabilities.
The key is to develop tools and habits so that you are more often looking for ideas and information in the right places – and in honing the skills required to harness them in to wins and losses once you have found them. It’s hard work.(Who said forecasting isn’t?)
For Years You have been telling us that Rain is Green( Weather Prediction)
This chapter talks about one of the success stories in prediction business, “weather forecasting”. National Hurricane Center predicted Katrina five days before the levees were breached and this kind of prediction was unthinkable 20-30 years back. The chapter says that weather predictions have become 350% more accurate in the past 25 years alone.
The first attempt to weather forecasting was done by Lewis Fry Richardson in 1916. He divided the land in to a set of square matrices and then used the local temperature, pressure and wind speeds to forecast the weather in the 2D matrix. Note that this method was not probabilistic in nature. Instead it was based on first principles that took advantage of theoretical understanding of how the system works. Despite the seemingly commonsensical approach, Richardson method failed. There are couple of reasons, one Richardson’s methods required awful lot of work. By 1950, John Von Neumann made the first computer forecast using the matrix approach. Despite using a computer, the forecasts were not good because weather conditions are multidimensional in nature and analyzing in a 2D world was bound to fail. Once you increase the dimensions of analysis, the calculations explode. So, one might think with exponential rise in computing power, weather forecasting problem might have been a solved problem in the current era. However there is one thorn in the flesh, the initial conditions. Courtesy chaos theory, a mild change in the initial conditions gives rise to a completely different forecast at a given region. This is where probability comes in. Meteorologists run simulations and report the findings probabilistically. When someone says there is 30% chance of rain, it basically means that 30% of their simulations showed a possibility of rain. Despite this problem of initial conditions, weather forecasting and hurricane forecasting have vastly improved in the last two decades or so. Why? The author gives a tour of World Weather office in Maryland and explains the role of human eyes in detecting patterns in weather.
In any basic course on stats, a healthy sense of skepticism towards human eyes is drilled in to students. Typically one comes across the statement that human eyes are not all that good at figuring out statistically important patterns, i.e. pick signal from noise. However in the case of weather forecasting, there seems to be tremendous value for human eyes. The best forecasters need to think visually and abstractly while at the same time being able to sort through the abundance of information that the computer provides with.
Desperately Seeking Signal ( Earthquake Prediction)
The author takes the reader in to the world of earthquake prediction. An earthquake occurs when there is a stress in one of the multitude of fault lines. The only recognized relationship is the Gutenburg- Ritcher law where the frequency of earthquakes and the intensity of earthquakes form an inverse linear relationship on a log-log scale. Despite this well known empirical relationship holding good for various datasets, the problem is with temporal nature of the relationship. It is one thing to say that there is a possibility of earthquake in the coming 100 years and completely different thing to say that it is going to hit in between Xth and Yth years. Many scientists have tried working on this temporal problem. However a lot of them have called quits. Why? It is governed by the same chaos theory type dependency of initial conditions. However unlike the case of weather prediction where science is well developed, the science of earthquakes is surprisingly missing. In the absence of science, one turns to probability and statistics to give some indication for forecast. The author takes the reader through a series of earthquake predictions that went wrong. Given the paucity of data and the problem of over fitting, many predictions have gone wrong. Scientists who predicted that gigantic earthquakes would occur at a place were wrong. Similarly predictions where everything would be normal fell flat on the face when earthquakes wreathed massive destruction. Basically there has been a long history of false alarms.
How to Drown in Three Feet of Water(Economic variable Prediction)
The chapter gives a brief history of US GDP prediction and it makes abundantly clear that it has been a big failure. Why do economic variable forecasts go bad ?
Hard to determine cause and effect
Economy is forever changing
Data is noisy
Besides the above reasons, the policy decision effect the economic variable at any point in time. Thus an economist has a twin job of forecasting the economic variable as well as policy. Also, the sheer number of economic indicators that come out every year is huge. There is every chance that some of the indicators might be correlated to the variable that is being predicted. Also it might turn out that an economic variable is a lagging indicator in some period and leading indicator in some other period. All this makes it difficult to cull out the signal. Most often than not the economist picks on some noise and reports it.
In one way, an economist is dealing with a system that has similar characteristics of a system dealt by meteorologist. Both weather and economy are highly dynamic systems. Both are extremely sensitive to initial conditions. However meteorologist has had some success mainly because there is some rock solid theory that helps in making predictions. Economics on the other hand is a soft science. So, given this situation, it seems like predictions for any economic variable are not going to improve at all .The author suggests two alternatives:
Create a market for accurate forecasts – Prediction Markets
Reduce demand for inaccurate and overconfident forecasts – Make margin of error reporting compulsory for any forecast and see to it that there is a system that records the forecast performance. Till date, I have never seen a headline till date saying ,” This year’s GDP forecast will be between X% and Y %”. Most of the headlines are point estimates and they all have an aura of absolutism. May be there is a tremendous demand for experts but we don’t have actually that much demand for accurate forecasts.
Role Models (Epidemiological predictions)
This chapter gives a list of examples where flu predictions turned out to be false alarms. Complicated models are usually targeted by people who are trying to criticize a forecast failure. In the case of flu prediction though, it is the simple models that take a beating. The author explains that most of the models used in flu prediction are very simple models and they fail miserably. Some examples of scientists trying to get a grip on flu prediction are given. These models are basically agent simulation models. However by the end of the chapter the reader gets a feeling the flu prediction is not going to easy at all. In fact I had read about Google using search terms to predict flu trends. I think the period was 2008. Lately I came across an article that said Google’s flu trend prediction was not doing that good!. Out of all the areas mentioned in the book, I guess flu prediction is the toughest as it contains multitude of factors, extremely sparse data and no clear understanding about how it spreads.
Less and Less and Less Wrong
The main character of the story in this chapter is Bob Voulgaris, a basketball bettor. His story is a case in point of a Bayesian who is making money by placing bets in a calculated manner. There is no one BIG secret behind his success. Instead there are a thousand little secrets that Bob has. This repertoire of secrets keeps growing day after day, year after year. There are ton of patterns everywhere in this information rich world. But whether the pattern is a signal or noise is becoming increasing difficult to say. In the era of Big Data, we are deluged with false positives. There is a nice visual that I came across that excellently summarizes the false positives of a statistical test. In one glance, it cautions us to be wary of false positives.
The chapter gives a basic introduction to Bayes thinking using some extreme examples like, what’s the probability that your partner is cheating on you ? If a mammogram shows gives a positive result, what’s the probability that one has a cancer ?, What’s the probability of a terrorist attack on the twin towers after the first attack? These examples merely reflect the wide range of areas where Bayes can be used. Even though Bayes theory was bought to attention in 1763, major developments in the field did not take place for a very long time. One of the reasons was Fisher, who developed frequentist way of statistics and that caught on. Fischer’s focus was on sampling error. In his framework , there can be no other error except sampling error and that reduces as sample size approaches the population size. I have read in some book that the main reason for popularity of Fisher’s framework was that it contained the exact steps that an scientist needs to follow to get a statistically valid result. In one sense, he democratized statistical testing framework. Fisher created various hypothesis testing frameworks that could be used directly by many scientists. Well, in the realm of limited samples, limited computing power, these methods thrived and probably did their job. But soon, frequentist framework started becoming a substitute for solid thinking about the context in which hypothesis ought to be framed. That’s when people noticed that frequentist stats was becoming irrelevant. In fact in the last decade or so, with massive computing power, everyone seems to be advocating Bayesian stats for analysis. There is also a strong opinion of replacing the frequentist methodologies completely by Bayesian Paradigm in the schooling curriculum.
Rage against the Machines
This chapter deals with chess, a game where initial conditions are known, the rules are known and chess pieces move based on certain deterministic constraints. Why is such a deterministic game appearing in a book about forecasting ? Well, the reason being that, despite chess being a deterministic game, any chess game can proceed in one of the 1010^50, i.e. the number of possible branches to analyze are more than the number of atoms in the world. Chess comprises of three phases, the opening game, the middle game and the end game. Computers are extremely good in the end game as there are few pieces on the board and all the search path of the game can be analyzed quickly. In fact all the end games with six or fewer pieces have been solved. Computers also have advantage in the middle game where the game complexity increases and the computer can search an enormously long sequence of possible steps. It is in the opening game that computers are considered relatively weak. The opening of a game is a little abstract. There might be multiple motives behind a move, a sacrifice to capture the center, a weak move to make the attack stronger etc. Can a computer beat a human ? This chapter gives a brief account of the way Deep Blue was programmed to beat Kasparov. It is fascinating to learn that Deep Blue was programmed in ways much like how a human plays a game. The banal process of trial and error. The thought process behind coding Deep Blue was based on questions like :
Does allotting the program more time in the endgame and less in the midgame improve performance on balance?
Is there a better way to evaluate the value of a knight vis-à-vis a bishop in the early going?
How quickly should the program prune dead-looking branches on its search tree even if it knows there is some residual chance that a checkmate or a trap might be lurking there?
By tweaking these parameters and seeing how it played with the changes, the team behind Deep Blue improved upon slowly and eventually beat Kasparov. I guess the author is basically trying to say that even in such deterministic scenarios, trial and error,fox like thinking is what made the machine powerful.
The Poker Bubble
This chapter is an interesting chapter where the author recounts his experiences with playing poker, not merely as a small time bystander but as a person who was making serious money in six figures in 2004 and 2005. So, here is a person who is not giving some journalistic account of the game. He has actually played the game, made money and he is talking about why he succeeded. The author introduces what he calls prediction learning curve where if you do 20% of things right, you get 80% of the times forecasts right. Doing this and making money in a game means there must be people who don’t do these 20% of the things right. In a game like poker, you can make money if there are enough suckers. Once the game becomes competitive and suckers are out of the game, the difference between an average player and an above average player in terms of their winning stakes is not much. In the initial years of Poker bubble, every person wanted to play poker and become rich quickly. This obviously meant that there were enough suckers in the market. The author says he was able to make money precisely because of the bubble. Once the fish were out of the game, it became difficult for him to make money and ultimately the author had to give up and move on. The author’s message is
It is much harder to be very good in fields where everyone else is getting the basics right—and you may be fooling yourself if you think you have much of an edge.
Think about stock market. As the market matures, the same lots of mutual fund managers try to win the long only game, the same options traders try to make money off the market. Will they succeed? Yes if there are enough fish in the market. No, if the game is played between almost equals. With equally qualified grads on the trading desks, with the same colocated server infra, can HFTs thrive ? May be for a few years but not beyond that, is the message from this chapter.
The author credits his success to picking his battles well. He went in to creating software for measuring and forecasting baseball player’s performance in the pre-money ball era. He played poker when there was a boom and where getting 20% of things right could reap good money for him. He went in to election outcome forecasting when most of the election experts were not doing any quantitative analysis. In a way, this chapter is very instructive for people trying to decide on the fields where their prediction skills can be put to use. Having skills alone is not enough. It is important to pick the right fields where one can apply those skills.
If you can’t beat ‘em(Stock Market Forecasting)
The author gives an account of a prediction markets site, Intrade run by a Wharton professor Justin Wolfers. These markets are the closest thing to Bayes land where if you have believe in certain odds and see that there is someone else having a different odds for the same event, you enter in to a bet and resolve the discrepancy. One might think that stock markets also perform something similar, where investors with different odds for the same event settle their scores by entering in to a financial transaction. However the price is not always right in the market. The chapter gives a whirlwind tour of Fama’s efficient market theory, Robert Shiller’s work, Henry Blodget’s fraud case etc. to suggest that market might be efficient in the long run but the short run is characterized by noise. Only a few players benefit in the short run and the composition of the pool changes from year to year. Can we apply Bayes thinking to markets ? Prediction markets are something that is close to Bayes land. But markets are very different. They have capital constraints, horizon constraints, etc. Thus even though your view is correct, the market can stay irrational for a longer time. So, applying Bayesian thinking to markets is a little tricky. The author argues that market is a two way track, one that is driven by fundamentals and pans out in the long run correctly, the second is a fast street that is populated by HFT traders, algo traders, noise traders, bluffers etc. According to the author, Life in the fast lane is high risk game that not many can play and sustain over a period of time.
A climate of healthy Skepticism(Climate Prediction)
This chapter talks about the climate models and the various uncertainties/issues pertinent to building such long range forecasting models.
What you don’t know can hurt you (Terrorism Forecasting)
This chapter talks about terrorist attacks, military attacks etc. and the contribution of having a Bayes approach. Post Sept 11, the commission report identified “failure of imagination” as one of the biggest failures. The Nationality security just did not imagine such a thing would happen. Basically they were completely blinded to a devastation of such scale. Yes, there were a lot of signals but all of them seem to make sense after the fact. The chapter mentions Aaron Clauset, a professor at the University of Colorado who compares a terrorist attack prediction to that of an earthquake prediction. One known tool in the earthquake prediction domain is the loglog scale plot of frequency to the intensity. In the case of terrorist attacks, one can draw such a plot to at least acknowledge that an attack that might kill a million Americans is a possibility. Once that is acknowledged the terrorist attacks falls under known unknown category and at least a few steps can be taken by national security and other agencies to ward off the threat. There is also a mention of Israeli approach to terrorism where the Israeli govt. makes sure that people get back to their normal lives soon after a bomb attack and thus reducing the “fear” element that is one of the motives of a terrorist attack.
The book is awesome in terms of its sheer breadth of coverage. It gives more than a bird’s eye view of forecast / prediction performance in the following areas:
Chess strategy forecasts
Baseball player performance forecasts
Stock market forecasts
Economic variable forecasts
Political outcome forecasts
Financial crisis forecasts
Baseball outcome predictions
Poker strategy prediction
Terrorist attack prediction
The message from the author is abundantly clear to any reader at the end of the 500 pages. There is a difference between what we know and what we think we know. The strategy to closing the gap is via Bayesian thinking. We live in an incomprehensibly large universe. The virtue in thinking probabilistically is that you will force yourself to stop and smell the data—slow down, and consider the imperfections in your thinking. Over time, you should find that this makes your decision making better. “Have a prior, collect data, observe the world, update your prior and become a better fox as your work progresses” is the takeaway from the book.