A few weeks ago , I realized that my GLM knowledge was pretty rusty as I was struggling to get meaningful estimates from a poisson regression.
Decided to immerse myself over the weekend on what I gather to be one of the best books on GLM , written by Annette J Dobson. Well , there is always the classic monograph by Nelder & McCullagh , “Generalized Linear Models”, that one can refer to , which to me was a little too dry to go over. So, there is that little guilt feeling in me that I have never read the classic monograph from the father of GLM(J.A.Nelder) . I have always found this to be the case in quite a few stuff that I have learnt over the years. For example, I have never read till date, Bjarne Stroustrup’s C++ , the supposedly classic book on C++. However I have coded C++ like crazy and have used C++ to perform whatever that was needed . So, I always have this guilt trip whenever I think of C++. Well, may be someday I will go through the classic book . In a similar vein, I will go through Nelder’s classic stuff someday, hopefully 🙂
Ok, coming back to this book, I could only get the second edition of the book. Basically will have to wait for a kind soul to upload the third version on the net 🙂 .. Till then second edition will do 🙂 . The third edition has some chapters on Bayesian stats appended towards the end of the book.
The brilliant stuff about this book is organization of ideas and thoughts with just enough math to get you going on coding your own GLM estimation and diagnostic functions.
There are about 11 chapters in the book neatly organized based on the type of data that one is dealing, namely, Continuous, Binary, Nominal with > 2 categories, Ordinal, Counts, Failure times, Correlated Responses. Basically one can visualize this book as learning the GLM tool box for various combinations of dependent and independent variables. GLM toolbox broadly comprise all the elements mentioned under the methods column in the following table.
Chapter 1 : Introduction
This lays out the basic notation that has been followed in the book. Some basic results relating to normal distribution, Chi-Square , Non Central Chi Square distribution, F distribution, Quadratic forms, Cochran’s theorem are mentioned. Also the basic MLE method is described. If I had read this book a few years ago, I would be wondering why should one talk about distributions in the introduction. Pick up any book and you will usually find some angrezi about the book, meaning some general gyan about why some topic is important , why the author is writing the book, blah blah… But this books cuts all the crap and gets to THE MOST IMPORTANT thing that is needed to specify, estimate the model parameters, DISTRIBUTIONS. Typically these are stuffed in the appendix. The fact that they are mentioned in Chap1 talks a lot about the way author presents this book as compared to the run-of-the mill kind of books.
Chapter 2 : Model Fitting
Two basic examples of competing model evaluation are mentioned to kick off the discussion. Poisson with one general parameter Vs two parameters, Linear Regression with same beta Vs different beta’s for specific categories are discussed and ways to choose competing models are discussed. As clearly and repeatedly mentioned through out the book, GLM broadly comprises 2 major aspects, the first aspect being, the choice of the distribution of dependent variable from the exponential family of distributions, the second aspect being the link function between the expectation of dependent variable and covariates. Both examples more than reinforce this thinking.
Chapter 3 : Exponential Distributions
This chapter talks about the various forms in which Y , the dependent variable can appear in GLM. Normal , Poisson and Binomial are the widely used distributions that fall under exponential distributions. Density relating to each of these variables is written in the exponential form and their properties are discussed. The chapter ends by giving 3 examples where the link function is indentified so that one can apply GLM framework.
Chapter 4 : Estimation
This chapter is one of the key chapters of the book as it talks about the estimation framework that needs to be followed in developing a GLM model. Well one can formulate any fancy model but one needs to keep in mind the estimation methods. In the case of regression , the estimation procedure is known to everyone. Least squares is almost always the chosen method to get the estimates. In the case of GLM , there is a twist in the tale. The twist is that , the relation between dependent variable and independent variable is not a linear one but is dependent on the link function. So, here you are with a set of values , supposedly a dependent variable following any of the distribution from the exponential family. You also have a set of covariates and you have a link function. How does one use MLE to get estimates is the central question answered in this chapter. The procedure behind the math can be summarized as follows
- Form the log likelihood function for the Yi’s.
- Compute the score statistic with respect to one of the covariates.
- The score statistic unlike the usual MLE will contain a term which captures the link function.
- At this junction, Information matrix needs to be computed as it is needed for the Newton Raphson formulation of the equation relating to estimating of beta’s
- At this step , it becomes very obvious that Newton Raphson equation becomes equivalent to an iterated Weighted least squares procedure.
Thus an estimation of any simple GLM model connects a whole lot of concepts like MLE, score statistic, link function, Newton Raphson method, Iterated least square method. When one uses off the shelf stuff to get estimates, one doesn’t get to the see what happens behind the scenes and a novice can fail to appreciate the beauty behind the procedure.
The chapter ends with a poisson regression example which can be worked out manually, thus allowing one to appreciate the connections between various aspects of GLM estimation. The beauty of this chapter is that it allows you get in to the intricate details of estimating the parameters for a GLM using nothing else but matrix algebra and Newton Raphson method. Once you work out an example from scratch, you will obviously be amazed at the power of stat software like SAS, R, Matlab which give the output in a jiffy.
Chapter 5 : Inference
This chapter provides the link between score statistic and the distribution of MLE estimates by explaining the procedure to find the sampling distribution of score statistic and MLE estimates. It also talks about deviance function for some common models. Deviance function, as the name implied would be the difference between the null model and the hypothesized model. Asymptotics are invoked on this Deviance estimate to choose between competing models. A few examples of Hypothesis testing are show using Wald Statistic and Deviance statistic
Chapter 6 : Normal Linear Models
This chapter deals with the Normal Models. Basic Multiple Linear Regression concepts are discussed along with outlier detection methods based on metrics like Influence measure and Cook’s distance. The discussion about ANOVA and ANCOVA has been presented with great clarity. Until I read this chapter, I had never known that deviance function of a one factor anova is the same irrespective of the linear model specification. In most of the texts that I have read till date, there was no special emphasis on the fact that deviance function is the key to this entire ANOVA, ANCOVA business 2 factor ANOVA and ANCOVA models are also discussed with complete derivation of the estimates, thus making a reader appreciate the Fstats that are available at the call of a simple command. One thing which is common across all these jargon models is , “the basic matrix algebra is similar to simple linear regression”. However I have not seen till date , any powerful application of these ideas such as ANOVA , ANCOVA etc in stat arb . I guess it falls under “nice to know” stuff. I do vaguely remember of an example where anova was used to get some information about the seasonality of the spreads.. However I haven’t still come across a WOW application till date for this ANCOVA stuff in finance.
Chapter 7 : Binary Variables and Logistic Regression
Logits and Probits are discussed in this chapter. Various stats for model selection such as Pearson Chi Square statistic, Hosmer-Lemeshow statistic, Deviance statistic , likelihood ratio chi-squared statistic , pseudo Rsquare statistic are derived. Even though knowledge about these models and estimations look nice on paper, I sometimes wonder about the usefulness of models based on such simplistic equations. Shouldn’t the logits be a stochastic process ? What about the covariates ? Unlike social science applications, in finance, almost always the covariate is going to be a arma process or a non stationary series. The information in this chapter will help one understand a basic logit model math, but if you have to apply this to the real world, one quickly realizes that most of the assumptions for plain vanilla logit /probit model are useless atleast to the models applicable to the trading world. You have to bring in the stochasticity in to variables. I guess for a successful application of any of these models in the real world, one requires solid intuition + deep understanding of the contextual knowledge of the variable being studied + skills to model systematic randomness not gaussian types + LUCK 🙂 to get it right. …LUCK factor , sometimes, might be the only variable that seems to be a sane explanation of the success of your model :) .
Chapter 8 : Nominal and Ordinal Logistic Models
At this stage of the book , the reader can actually guess the contents and structure of a chapter. As the title of the chapter goes, the models considered are nominal logistic and ordinal logistic. There are two approaches to modeling the response variable if it is a categorical variable with multiple levels. First is by using nominal and ordinal logistic models and Second is by framing it as a poisson regression. This chapter dwells on the first method. One of the first observations made is that a multinomial distribution is not in the form of generalized exponential distribution family and hence will not directly fit the framework described in the book. However a multinomial distribution can be regarded as the joint distribution of Poisson random variables, conditional upon their sum. This result provides a justification for the use of generalized linear modelling.
Nominal logistic models can be used when there is no natural order in the categories shown by the response variable. Any random category can be picked up as the reference category and logit models are built with respect to the reference category. This clearly means if there are any correlations between the levels of the response variable, then plain vanilla logistic is NO GOOD.
The usual summary statistics are derived such as Pearson chi square, Deviance, Likelihood ratio Chi Squared, Pseudo R squared. Good thing about this chapter is the usage of single example to illustrate both nominal and ordinal logistic models. In the former case, the variable is assumed to be nominal and modeled. In the latter case, the variable is enhanced with the natural ordering present and a ordinal logit model is built. Both the examples illustrate the importance of not relying on just a single goodness of fit statistic. The more goodness of fit measure you compute , the better it is for you to get an idea of the strength of the model. The biggest learning for me is to keep in mind Likelihood Chi square statistic and Pseudo R Squared statistic, besides the usual deviance statistic which is mentioned in everywhere.
Chapter 9 : Count Data, Poisson Regression and Log Linear Models
As the title of the chapter suggests, this deals with the case when the response variable is a count variable. First method discussed is the plain simple poisson regression where the response variable for each covariate pattern is modeled a poisson variable and log function is used to link the expected value of the response variable and the covariates. In the case of contingency tables, based on the number of constraints of the data, the appropriate models are formulated. For example a contingency table where the sum of all elements is fixed, one can use a multinomial model. If there are more fixed marginal totals than just the overall total n, then appropriate products of multinomial distributions can be used to model the data. One good take away from this chapter is the case where data in the contingency tables are not independent and there is a suggestion to use negative binomial distribution to take care of this over dispersion problem.
The last two chapters of the book applies the GLM framework to Survival analysis and Longitudinal Data.