Statistics stands on two pillars, estimation and inference. Pretty much anything you work on stats, you end up either estimating something or inferring something. If you take a random sample of people who have taken a stats101 course at some point in their lives and ask them what was the course all about , a most likely answer would be, "It was something to do with pvalues". Statistics at it core is about comparing a set of numbers with each other , with theoretical models and with past experience. But most of the introductory textbooks contain scary formulae and distribution tables that need to be used by students. I think if you are a teacher introducing statistics to a new batch of students, it will do a world of good, if you dramatize a specific act : Walk in to the class and tear all the pages in the appendix that have these tables and arcane formulae that only scare people out of developing a statistical mindset. It will at least drive home the point that there is no formal textbook to interpret real life data. Why do you think textbooks make assumptions about the distributions of the data ? Pause for a few seconds and think about it. Well, one of the main reasons is that unless you have some assumptions, you cannot fill up the textbook with neat formulae. Yes, think about it. Unless you assume a certain distribution, you cannot put a neat formula for estimate. You cannot put a neat formula for confidence interval and so on and so forth. What’s the use of those formulae ? Not much.
As an example, let’s say you record your evening commute times from office to home, daily for a month or two. You want to see whether the your average commute time on Monday is less than Friday.What would you do ? Well, you can calculate the mean of the commute times on both days. What you are doing is estimating the average time ? What you intended was inference , i. whether your commute time on Monday is less than Friday’s?. If you open up a stats book, it will tell you to use some formula that involves mean of the commute times,pooled standard deviation etc. From that you form a t statistic and check whether the respective pvalue for the statistic is less than 0.05. The whole procedure that is mentioned in the book is brimming with a ton of assumptions ? Why should one assume same variance across two samples? Ok, if you don’t assume that, there is another formula where you can make that adjustment?. Either use this formula or that formula..argg..There is a fundamental problem with this approach.The use of parametric statistics to determine the critical value at which observed t becomes significant. In these days of abundant and cheap computing power, nobody follows such textbook approach anymore. One uses nonparametric distribution free methods for as simple a test as mentioned above. This was unthinkable a few decades ago when computing power was expensive, when software was expensive. You can generate let’s say , for the above example, 500 resampled data and get a practical answer to the question. In todays world of free open source software, it would be unthinkable if someone were to open a textbook in trying to answer the question mentioned above.
Books such as these are helpful to the general public who don’t run statistical analysis in their day to day work, but have to interpret statistical results in their professional or personal life. The book does not have a single formula but it tries to impart knowledge more than most of the boring and sleep inducing textbooks that one comes across The book tries to weave some important stats101 concepts by telling 34 stories, stories that one can easily read and remember the associated lesson with it. Stories are always a great way to teach/learn/understand stuff. Let me attempt to list the concepts that these 34 stories cover:

When a child coughs, the mother overreacts and father underreacts ? What are the consequences ? A story that illustrates the concept of specificity and sensitivity of a test,i.e probability that the test shows positive/negative given the patient has disease/nodisease. If a doctor has to discuss a test result with a patient, specificity and sensitivity are of not much help. One is supposed to talk about probability that the patient has disease/nodisease given the test shows positive/negative, called the positive predictive value and negative predictive value.

Statistics has to link math to whatever field you are working. This linkage is what makes working in stats such a fun thing. Here is a wonderful illustration that is spot on about the real world and statistical world. Universities, Schools, Textbooks focus on the right side of the picture. They teach you all the math, probability, models to equip you enough, so that you can go in to the real world, and make all the connections.
pvalue is the probability that the data would be at least as extreme as those observed if the null hypothesis were true. The author takes the reader through a set of 34 stories to explain various concepts around pvalue with out using any math. Somewhere in the book he says,
Good Statistics is a bit like a pair of high quality stereo speakers. It allows you to hear the data clearly with out distortion. Yet the best speakers in the world isn’t going to make a CD sound good if the music was badly played or recorded.
Well real world data is messy and chaotic. Statistics via its high quality stereo speakers voices out all the screeches, scratches, crackles, and occasional melody. This helps in reducing Type I and Type II errors. Books such as these help in clearing out the unnecessary math / formula garb that one encounters in various Stats101 books.
Leave a Reply