image

The author in his preface says that the book is targeted not towards the 1 reader in 100 who will go on to specialize in statistical analysis, but for the other 99 who will only obtain an overview of the subject, yet will have to deal in their professional lives with the design, analysis and interpretation of research by interfacing with specialists in the field. Indeed by the end of the book, a reader can walk away with a decent intuition of the multivariate statistical techniques.  To write a book on multivariate stats in plain English is a great achievement and the author deserves a big applause for the same. I think the book  needs to be read by any stats newbie wanting to get some intuition behind the multivariate math. For a seasoned stats analyst, the book might give enough “aha” moments as the author manages to strip down all the math behind a technique and explain various techniques in a simple language. There is hardly any prerequisite for reading this book. The first two chapters cover some basics stats and probability concepts to get the reader up to speed.  BTW, the first 116 / 278 pages of the book are set aside for introducing the subject, so in a sense the book does an elaborate handholding.

The book deals with the following techniques :

  • Correlation Analysis
  • Regression Analysis
  • ANOVA
  • Discriminant Analysis
  • Factor Analysis
  • Cluster Analysis
  • Multidimensional Scaling

I will cite an example( Canonical correlation analysis) from the book that illustrates the way in which the book provides intuition of various techniques. A simple regression deals with inferring the relation between single criterion variable and a set of predictor variables. Canonical correlation analysis involves a slight generalization. Let’s say you have a set of criterion variables y1, y2 and you have a set of predictor variables(x1, x2, x3). The objective is to find a relationship between a set of criterion variables and a set of predictor variables. There are tons of applications of canonical correlation analysis in many areas including statistical arbitrage. In stat arb, if you are trying to find the right combination of more than 2 stocks that mean revert, you have to analyze using canonical correlations, the math behind which is not trivial. Where do books such as these help ? The book gives the intuition behind the method by saying “Canonical correlation analysis is a technique of weighing criteria variables, i.e a y1 + b y2  and weighing the predictor variables, i.e c x1 + d x2 + e x3 in such a way that the correlation  between resultant derived variables is maximum”.  This is a perfect statement that captures everything about Canonical correlation analysis. Once you get the intuition behind the technique, it is so much easier to understand the math behind. There are several such examples cited throughout the book that will help the reader gain intuition behind the common multivariate statistical techniques.  I think this book deserves to be read before a deep dive in to multivariate statistics.

Advertisements