Last few weeks have been good , for most of the trades have gone right. When something works, I generally attribute it to 90% luck + 10% logic. When something doesn’t work, I usually have the proportions reversed in my mind as it helps in creating better logic/algo. It really doesn’t matter what one believes when something has already worked, but one’s attitude/belief system DOES matter when things don’t work . On this “feeling lucky” note, I thought I should write something about Multivariate Stats .
Knowingly or Unknowingly , any analyst deals with multivariate stuff, if his/her study contains an analysis of more than one variable OR variable with more than one dimensions. Average/ Quartiles,/ Median / Mode are all known to most of us one way or the other. But things become interesting as well as complicated when we move to the multivariate world.
This book is probably the most easily understandable text out there. Typically books like classic texts by Anderson are laden with complicated math that a novice to this area would be overwhelmed by it. In contrast, this book can be termed as a more MBAish book where there is less emphasis on theorems / proofs/lemmas and more emphasis on the applications. The latest book is the sixth edition which obviously means that this book is a hit in some part of the reader community. Ok, let me get on with summarizing the chapters of the book
To start with , this book is organized in such a way that the first 4 chapters give all the math that is needed to understand multivariate analysis. One thing about doing work in the multivariate area is that "a knowledge of matrix algebra” is vital to doing the most basic analysis in MV world.
Chapter 1 : Applications of Multivariate Techniques
Data reduction / data sorting / investigation of dependence among variables / prediction / hypo testing are some of the useful apps of MV Techniques. One of the first thing that anyone can do with out going through math is GRAPHIC display. It is often said that a good graphic display in all its variation is half the analysis done. Tools that are available to any analyst are the usual scatterplot, marginal dot plot, scatterplot plus boxplots on the same graphic, 3d scatter plots, Star plots, Distance plots. Chernoff faces. I have typically used all these graphics at some point in time or the other, except the last one, Chernoff faces. When I first came across this type of graphic (Chernoff faces) , i thought it was pretty cool technique though I am yet to use it in real life. The funda behind it is simple. It is easy to recognize faces by humans, a little change in a feature of a face and we can instantly recognize it. This aspect of human brain is used to create a graphic of multidimensional data in the form of human faces so that patterns can easily be detected.
As a side note, this chapter uses Mahalanobis Distance to show contours of equal density. This made me think about the concept of distance itself. Probably any high school kid who learns coordinate geometry knows the distance formula between 2 points. As he progresses, he learns more and more complicated formulae, theorems, etc…alas! uncertainty is never discussed.. I don’t recollect any teacher till date, posing a question as following :
If there is uncertainty in the measurement of x coordinates and y coordinates and let’s say you know by what uncertainty the measurements along x axis and along y axis are collected, Can you come up with an alternate formula than the usual distance formula ?
If you think about it, this is what we find in reality. Take any application of the real world, uncertainty is unavoidable. So, a question as simple as above one, is good enough to motivate a young mind to explore the problem and come up with an alternate measures of distance. Well, one measure which has appeared in various applications is from an Indian Scientist , Prasanta Chandra Mahalanobis . Anyways I have digressed from the intent of the post.
Chapter 2 : Matrix Algebra & Random Vectors
Matrices are your best friends in dealing with multidimensional data. For any number cruncher a thorough understanding is imperative. In that sense, this book merely scratches the surface. Obviously it gives all the important results that are needed to get your hands dirty doing MV Stats. Personally I found the proof of Extended Cauchy Schwartz inequality much more intuitive to understand than other books. I have always been fascinated by math inequalities. Inequalities become very powerful when used in the right application. Touch any math-fin-stat area, you are bound to see innumerable inequalities applied to real life problems like valuation / hedging / forecasting etc. I had my crush 🙂 on inequalities after reading the book “The Cauchy-Schwartz Master Class” by Prof Steele. If you want to know the kind of applications where inequalities can be used, the book will be a fascinating account. Will blog about Prof.Steels’s book some other day.. Anyways coming back, one of the applications of Extended Cauchy Schwartz is the Optimization of Quadratic forms where the inequality helps one to connect an Optimization problem to the eigen values of the matrix involved in the optimization. Truly beautiful linkage between optimization and matrix algebra through an inequality.
Chapter 3 : Sample Geometry and Random Sampling
Basic properties of a p dimensional data matrix such as sample mean, sample covariance are given a geometric interpretation. Basic stuff like mean being a projection on each column of the data on a unit vector, connection between determinant of the covariance matrix with generalized sample variance and significance of the same, etc are provided. Given a dataset , if you know already the way to compute mean, covariance of the original dataset OR compute the mean , cov of linear combination of the columns in the dataset, you can safely ignore this chapter.
Chapter 4 : The Multivariate Normal Distribution
Well, basically this chapter is about data which is generated from a multinorm distribution. The framework of this chapter is again intuitive and nothing fancy. Took a brief pause before the going through chapter and asked myself , “ What would I teach somebody about Multivariate Dist , if I were asked to ?“. Well , the following would be the basic stuff that I would cover in relation to X( a pdim Normal Random Variable)
Basic density form of X
Properties which would help in checking whether subsets of X converge to the same distribution
How to identify independent components of the X ?
Sample mean and Sample Covariance of the p dimensional Random Variable( RV)
Relevance of Mahalanobis distance and constant probability contours for a p dim RV
How to connect between ChiSquare distribution and Ellipsoids arising out of a p dim RV ?
How do you simulate a pdim RV ? Can you simulate given any customized estimator of mean and covariance ?
What are the ways of estimating the covariance from the sample ? What are the robust estimators ? Which one to choose and Why ?
Sampling distribution of Sample mean and Sample covariance matrix. The former is again a p dim Normal RV while the latter is a Wishart Random variable.
Where is Wishart distribution used ? How do you simulate a RV from Wishart ?
What are the characteristics of Wishart distribution ?
Law of Large Numbers and CLT in the context of X and sample mean
How can you test whether the data actually comes out of p dim normal RV ?
How can you test whether the data has no tail dependency ?
How do you transform the data so that you have marginals and joint distribution as normal distribution ?
Out of this laundry list, the book covers most of the aspects…Again the treatment is MBAish..So you might get an intuitive feel of things..crunching data is the only way to understand the above stuff.
Now, why the hell should real life data that we see should be a realization of a multivariate normal distribution ? In 99% of the cases, especially financial data, it will not be true… So, what’s the point in going through the above stuff ? All I can say for now is that it will make you skeptical and enthuse you to figure out something in the non parametric world . Subsequently you can marry stuff from parametric and non parametric worlds. Also, it will make you extremely skeptical of the off-the-shelf solutions that sell side vendors provide in the name of quant models.
Chapter 5 : Inferences about the Mean Vector
t test is a classic test that is covered in any stats101 class for testing sample means. By squaring the t test statistic, one can use an equivalent F statistic. This t^2 statistic in a multivariate case becomes Hotelling T square in honor of Harold Hotelling, a pioneer in multivariate analysis. Thankfully there is a way to compare Hotelling T square with F distribution and hence it becomes easier to check the null, create confidence intervals for the component means. The importance of this chapter lies in the formulation of control charts for multidimensional data. Having univariate control charts with a specific sigma level is not going to be useful and instead a chart based on Hotelling T Square is used.
Chapter 6 : Comparison of Several Multivariate means
This chapter is basically the extension of Chap 5 to more than one mean. Well, the statistic remains the same , Hotelling T Square, except that it is valid under specific assumption relating to covariance matrix. This chapter is pretty useful as it mentions testing covariance matrices across populations and this is something that is pretty useful in finance. Imagine you have n assets and you have a sample covariance matrix in a time period t1 to t2. One of the basic questions to ask is , “Has the covariance matrix changed "?” . Well, this chapter clearly show you the way to test the invariance of covariance structure. There is also a mention of Path Analysis , MANOVA and the stats behind them.. I am going to refer to this chapter very often for it has a lot of relevant stuff relating to finance.
Multivariate Linear Regression models are the most basic models which any econometric test would cover extensively. Starting from the data matrix and formulation of a linear regression equation, the entire regression structure is built ground up. Thankfully matrix notation is used extensively as it makes the transition from a single predictor to multiple predictor analysis easier. MLE estimates, their distribution, inferences about the regression model , Likelihood ratio rests for the parameters are all the discussed thoroughly. If you are well versed with the reg model, this chapter would serve as a quick recap of all the concepts including outlier detection, residuals analysis etc.
PCA is basically used for data reduction and integration. Unlike its counterpart which is extremely popular in finance, PCA works on covariance matrix. There is no need for the underlying data to be a realization of multivariate distribution. PCA basically tries to line up the various combination of the p dimensional vectors in such a way that the principal components are ordered by the variation captured. So, the resulting principal components are nothing but finding an appropriate bases for the data matrix so that maximum variation is captured on each of the bases vectors. Spectral decomposition is used to calculate the principal factors and the eigen values associated with the components play a very important role in the analysis. Very low eigen values typically means that there is data dependency in the structure and a subset of variables need to be removed for better data interpretation. Very few high eigen values could mean that there are a few major modes which gives rise to the variation in the data, meaning most of the variation seen is the common cause variation. There are some graphical tools mentioned in the chapter like scree plot, T square control chart , Constant elliptical density charts that can be used in the context of multivariate data. What graphic needs to be used is obviously dependent on the context and the nature of the data used for the analysis
Chapter 9 : Factor Analysis
Factor analysis is something used synonymously with PCA but there is fundamental difference between the two. PCA works on covariance matrix and has no model assumptions while Factor analyses by definition hypothesizes a model and works on Correlation Matrices. So, when one hypothesizes a model, one obviously needs to estimate the parameters and test the assumptions. This is where Factor analysis gets tricky. There are a ton of assumptions and at least 3-4 estimation methods. Also the solutions are not unique , so it has spawned a ton of literature on factor rotations etc. I read somewhere that if you initial hypothesized model does not work properly out of sample, don’t rotate factors and all the crap. Just ditch your model and start from scratch.
Will summarize the rest of the three chapters of the book at a later date. After a typing marathon of 2 posts today, my fingers and my mind are crying for a break 🙂