The book’s intro caught my attention which said This book contains stories that provide real life lessons on understanding and managing uncertainty”.  That statement was good enough to motivator to read through the 21 short stories about uncertainty

The book starts off by describing the sample variance equation as the most dangerous equation, dangerous because of its profound misuse. Well, any elementary stats book tells that sample standard deviation is inversely proportional to square root of sample size. The author talks about various situations where a misunderstanding of this simple rule caused tremendous losses.

  • Trial of Pyx where the mint owners did a fraud on the gold coin manufacturing for over 600 years.
  • Gates foundation giving millions of dollars to support the cause of small schools , assuming that small schools yield better performance.
  • Lawrence Summers innocous comment costing him the job at Harvard
  • Studies where underlying population is heterogeneous in size but the conclusions on the metric of study does not mention this aspect, where by a lot of the studies one sees in the newspapers are flawed

All these points illustrate one single point, i.e , smaller sample size causes higher variation in the tails. As a side note, here is a visual that summarizes Trial of Pyx and the 600 years of blunder.


The desirable variation in the weight of a coin is shown as green bar in the illustration on the left. However trials were made on 100 coins each and the false notion that variation scaled proportionately to the number of trials was held for 600 years .. Instead of having restriction limits based on the green area ( covered in the illustration towards the right),  the restrictions were placed based on the red area!!! The notion was a big blunder because variation scales as a square root of coins tested.. For 600 years, mint owners created high variation in the gold coins and were never caught in the trial runs. !! Heaven for Mint people and Hell for the barons, who supplied gold to the mint owners!!.. No wonder the sample variance equation is termed as “Most Dangerous Equation”, by the author

Part II of the book covers Political issues from graphics and stats perspective The takeaway from the bunch of the stories mentioned is that , Articles cited in the media , when looked in detail, tell a different story , than that of the caption. One nice learning from this part of the book is “Never trust a graph which has 2 y axis on either side”. It is extremely easy to manipulate the scales of the y axis and tell any story!!

Part III of the book takes a dig on the testing machinery in the educational system where uncertainty and its shades have not  been appreciated properly. The first story talks about the ways to standardize the test measure across normal students and disabled students. How can one mix the scores of a person who is given unlimited time to solve a test Vs a normal student who needs to answer in a stipulated time ?  The second story talks about a published report on Whites Vs Black Performance on SAT Scores and uses a simple simulation to show the flaw in the argument. This mode of thinking is useful in many places, especially more so in the field of finance where any price evolution is a realization of some stochastic process. It would be foolhardy to build a model /estimate with out taking in to consideration the alternate worlds. The third story in the book is very interesting as it shows that Standard error is not really useful in the context of an academic entrance exam as these exams are more of contests. Standard error which is more of a test of reliability of observations and its usage as supporting arguments in the case of exam is flawed. Whenever one uses a standard error of an estimate, it is important to distinguish whether the situation is a  diagnostic test or a Contest. For a Contest, the statistical machinery is not useful.

Part IV of the book explores graphical aspects of depicting uncertainty. Three principles of effective display are mentioned that can be called  “THREE COMMANDMENTS of effective data display of data”

  1. Remind us that the data being displayed do contain some uncertainty , and then
  2. Characterize the size of that uncertainty as as it pertains to the inferences we have in mind , and in so doing
  3. help keep us from drawing incorrect conclusions through the lack of a full appreciation of the precision of our knowledge.

I guess it makes a LOT of sense to keep the above things in mind whenever any data display for ANY purpose is produced. It goes with out saying that these apply more to financial world than any other discipline

This part of the book talks about using graphic displays such as Catalogtree, Confidence aperture plot, basic stem and leaf plots, error plots etc. One takeaway for me was the subtle point of displaying time series data. Author clearly shows examples where doing away with the legend and labelling the series at both ends gives a far more clear display than using legends , colors, and the jazz associated.

Mendel effect is a good story about the importance of the choosing the correct binning criterion . Basically if we have two uncorrelated variables, you can cut the data to suit your need!!. If you want to show the means of groups is increasing/decreasing/remains constant, it is very easy to cut the groups accordingly and creating a FRAUD graph… The good old scatterplot is the solution to these fraud graphs.

After reading this part of the book, one would definitely get motivated to refer to the works of William Playfair,  Tukey’s Exploratory Data Analysis , Edward Tufte’s books.

Part V of the book is a historical narrative of people who introduced innovative graphics / maps to display 4/5/6 dimensionality data. Some of the stalwarts covered in this part are William Playfair , Charles Joseph Minard , and Jacques Bertin. Various graphics from each of these individuals are displayed and explained to give a sense of richness , a well thought out graphic can bring out.

image Takeaway

A key tool to understand uncertainty is a graphic display . A display that shows the data in all of their variability acts as a control , preventing us from drawing inferences based on a single summary number(eg, mean) that can only feebly characterize a complex situations.

As they say, Stories have immense power to communicate. This book is primarily in a story format and hence a reader is likely to forever remember the moral of the stories.