August 2012


The author Anne Fadiman considers herself a common reader. Who is a common reader ? In her words,

The common reader differs from the critic and the scholar. She is worse educated, and nature has not gifted her so generously. She reads for her own pleasures rather than to impart knowledge or correct the opinion of others. Above all, she is guided by an instinct to create for himself, out of whatever odds and ends she can come by , some kind of whole.

Anne Fadiman is a faculty at Yale and her love of books has probably made her write 18 essays on books, over a period of 4 years. So, this is a book about books. The title of the book, Ex Libris is a Latin phrase, meaning , “from the books”. It is often used to indicate ownership of a book, as in “from the books of …" or “from the library of …”. Anne Fadiman hails from a literary family, her parents being authors and popular literary personalities. So, it comes as no surprise that she was surrounded by books right from childhood. In one sense she is not a common reader by any standard!.

Each of the 18 essays speak about one specific aspect, each exploring a quirky aspect of either the author or her family members.

In one of the essays she says that despite being married for 5 years , her books and her husband’s books were always in separate shelves. Both shelves sported independent identities. Only when she finally manages to mix her books with her husband’s, categorize them and arrange the two book shelves together, does she consider the true consummation of her marriage!

In another of her essays she gives a funny account of their entire family being obsessed with words, new words, old words, words used in different contexts.When the author mentions about her family talking about etymology of words as their dinner conversation, a reader can imagine the kind of family environment the she grew up. The family is so obsessed about the language that the first thing that they do when they pick up a menu card at a dinner in any restaurant is to correct the grammatical errors in it. About correcting sentence structures, there are far more odd things mentioned about her father that amuses any reader.

She devotes one essay on her book shelf, one on inscriptions on used books, one on ink ,one on her dad’s bookshelf etc. None of the essays are boring at all because there is a ton of wit and sarcasm all through out. One of the essays talks about two different kinds of book lovers. Courtly lovers and Carnal lovers. Courtly lovers are those who never put the book face down, remove the bookmarks after they read, never write on margins , leave the book as it was before the read and make the book look virginish. Carnal book lovers are those who write incessantly on margins, who inscribe on fly leafs, write their feelings all over the book, even tear a part of the book for easy reading. The author considers herself a Carnal book lover as she is of the opinion that hard use is not a sign of disrespect but a sign of intimacy! Surely you would across people who belong to either of the categories.Some people welcome comments on second hands /borrowed/lent books. Some people fume when they see their book devirginized by others. Its funny though, that the author has used such strong words to emphasize the difference!

The author mentions in one of her essays that she even reads telephone directories and goes over each of the items from mail order catalogues! . Her husband regularly trashes some of her mail order catalogues so that she can meet her writing deadlines!..This is really a crazy reading obsession. Be that as it may, like this book that is charmingly witty and deserves a mighty compliment, her accomplishments are praiseworthy too!.


Was recovering from a brief illness. Tried reading this book just to recover from my drowsy and sullen mood.

I found the first part of this book interesting. Given the amount of information overload it often helps us to understand how our brain functions. “How do we use our brains for understanding, deciding, recalling, memorizing and inhibiting information ?” is an important question that we all need to answer, to function efficiently in our lives. I loved the initial part of the book because it uses the metaphor of stage and audience to explain the scientific aspects of our brain, more specifically the prefrontal cortex. Also the book is organized in such a way that it is presented as a play with various scenes(like a theater play). Each scene has two takes, first take is one where the actor has no clue on how the brain works and messes it up, and the second take is where the actor performs in full cognizance of the workings of the brain.

Firstly , about the stage-audience metaphor for prefrontal cortex


The prefrontal cortex that comprises 5 to 10 % of the brain is responsible for some of the most important tasks that we do in our daily lives. Best things do indeed come in small packages!. The author uses the metaphor of the stage to explain the various functionalities of the prefrontal cortex

  • Understanding : Bringing new actors on the stage and see how they connect to the audience
  • Deciding: To Put actors(external or from the audience) on the stage and compare them to one another
  • Recall: Bring members from the audience on to the stage , the audience from the front seats representing short term memory and back seats representing long term memory.
  • Memorize: Moving the actors from the stage and make them sit as audience, be it in the front row or the back row
  • Inhibition: Not allowing some actors to get on to the stage

Also, the lights on the stage keep dimming as time passes. The only way to bring back the brightness is to energize yourself with constant breaks, exercise , pursuing various activities , i.e mixing it up. Any prioritizing activity takes up a lot of energy and hence you need to do such a task at the very beginning of the day when you are lights are bright on the stage, i.e your energy levels are high.

Since the stage is very small, one must be careful to organize the stage in such a way that the act is pleasant to watch by the audience. Bringing too many actors on the stage is NO-NO. Simplifying the stage, the number of actors, chunking the scene in the specific sequences are some of the actions one can take, to reconcile with the limited stage space. Also it so happens that the audience in the front row always want to come on to the stage and they need not be the most useful actors for the specific act (example, for critical decisions, the things in the immediate memory are not always important. Sometimes actors who are sitting way back in the audience might be extremely important)

Also just as a theater act, only one actor is allowed to speak at a time. You can put how many actors you want on the stage,obviously the lesser the better. When the scene starts, only one actor can act. This is the basic limitation of prefrontal cortex. Not only is the stage limited (the number of items that you can have on it limited) but also, what you the actors can do is also limited. This essentially means that single tasking is usually the best way to go. Whenever more than one task is done, accuracy drops. If you reply to emails + talk to someone + be on a conference call+ decide the venue for the dinner, all at once, then all the tasks will suffer to some extent or the other.

The book describes an interesting experiment that shows that developing a language for an activity enables us to catch yourself before doing that specific activity. This means if we have the language to describe the feeling of having too much on stage at once, we will be more likely to notice it. So, by giving explicit language, metaphors, analogies, terms for various functions of brain that are known implicitly to many, this book aims to help us stage the entire play (our lives) in a better way.

Talking of distractions and how they kill our ability to job efficiently, the book says that recognizing the distraction and vetoing it is very important. This requires cutting down distractions at the very source. Meaning it is better to work with mobile phone switched off than working with a visible missed call, working with email program closed is better than email box open with unread mail populating,etc. Simple actions can cause quite a bit of improvement in the way we can manage distractions. More importantly, the fact that we have a language to talk about and take cognizance of these distractions, help us to veto them.

Part I of the book ends with the author talking about ways to get over impasse. He quotes experiments forms scientific literature that says that breakthroughs, insights often come from shutting off the stage completely,i.e., instead of living in limited stage space of prefrontal cortex amongst the audience and external actors, it is better to shut off the stage completely and explore completely different paths.

The book then introduces the “Director” of the play, i.e. ‘Mindfulness’. If the stage is a metaphor for narrative circuit in our brains, director is a metaphor for experiencing circuit. The director can observe the play, the actors, scenes etc. and has the power to control them. Sometimes most of us only operate only with our default network, i.e the stage where actors seem to be dropping by with out any control. We are never directly experiencing anything completely. Even if we are reading a good book/ watching a movie / seeing a play/ sitting on a beach, our thoughts are far away from the stuff that we are experiencing. This is mainly because our director is missing and the stage is out of control.

Part II of the book is about emotional functions in the brain. Called the Limbic System, this is the seat of emotions that help us take millions of little decisions in our daily lives. In fact that is what makes us human. The downside is that when there is an over-arousal, we tend to under perform.This causes scenes on the stage to go haywire. Unnecessary actors get on to the stage, Director goes missing , wrong dialogs are uttered by the actors etc. The content in this part of the book says that you can get out of this over-arousal tendency by either labeling an emotion or reappraisal of the emotion. Both are easier said than done but by constant practice , you can see to it that director and stage is intact whenever there is amygdala hijack. Another way to save from emotional hijack is altering your perception about the event.

The last 2 parts of the book talk about things that crop up in social interactions and change management areas..More MBA style content in the last 2 parts and hence it was, needless to say, damn boring.

imageTakeaway :

By presenting “stage” as metaphor for brain’s “prefrontal cortex” and “director” as a metaphor for “mindfulness”, the book is pretty engaging for the first 100 pages. The rest is crap!


This book is written by Annette J. Dobson, a Biostatistics Professor at University of Queensland(Brisbane). I had come across this book way back in May 2010 and had worked through out the book. Here is what I wrote about it back then. While trying to understand local likelihood modeling, I realized that I had forgotten some basic principles relating to diagnostics and model evaluation for GLM. Sometimes I wonder what makes things stick. May be there is no magic bullet at all. One has to keep revisiting concepts to understand and remember them.

With that mindset, I reread this book again after more than two years . My understanding of GLM models is so much better this time. I will attempt to list down my totally random thoughts from this book on my second read.

  • Understanding the connections between Chi-square distribution and Quadratic forms.
  • Useful to know the formula for general exponential family of distributions as you can compute the mean and variance quite easily of any distribution that follows this general family.
  • The exponential families include many of the most common distributions, including the normal, exponential, gamma, chi-squared, beta, Dirichlet, Bernoulli, categorical, Poisson, Wishart, Inverse Wishart etc.
  • A number of common distributions are exponential families only when certain parameters are considered fixed and known, e.g. binomial (with fixed number of trials), multinomial (with fixed number of trials), and negative binomial (with fixed number of failures). Examples of common distributions that are not exponential families are Student’s t, most mixture distributions, and even the family of uniform distributions with unknown bounds
  • Can you do a GLM modeling on any response variable from the exponential family of distributions ? No. It can be done only on canonical form ( a specific form of the exponential family of distributions)
  • Connection between link function and natural parameter of an exponential family of distributions
  • If the link function can be estimated numerically, then such a class of models are called “Generalized Additive Models” , GAM models. I have heard that they are very useful in micro-market structure studies.
  • If likelihood goes up as a Predictor value goes up – Use a logistic model , i.e log odds modeling
  • You can’t form a GLM model with the assumption of Pareto distribution for response variable. Why ? Pareto is not in a canonical form amongst the exponential family of distributions
  • Common model used for time to failure modeling is Weibull
  • qqplot() function in R that can be used to compare observed quartiles vs theoretical quartiles. Plot is between sample~theoretical quantiles
  • estimating parameter of an known distribution given Yis entails calculated log likelihood, taking the derivative of log likelihood to obtain score function. This score function is equated to 0 to get MLE estimate. One can use Newton Raphson to get this estimate. It is called “Method of Scoring”
  • If one looks at MLE for varying thetas, the score function should be sharp near MLE to get lesser variance in the MLE estimate. If the score function is kind of flat near MLE, then there could be any number of MLE estimates. In other words, MLE estimate has high variance. So, MLE estimate variance is inversely related to derivative of score function. This kind of intuition helps in understanding the relationship between MLE estimate and variance of score function.
  • Like the way Newton Raphson’s iterative equation can be linked to MLE estimate, similarly GLM estimation can also be referred to a variant of Newton Raphson. The procedure to obtain estimates is called iterated weighted least squares method
  • Using Newton Rapshon algo, generalized linear model estimates can be found. It is better to work out the model estimates on a toy dataset.
  • Learnt the procedure to compute GLM estimates using simple matrix operations and iterative procedures. This kind of working from scratch is useful knowledge as one knows what’s exactly happening behind off the shelf packages
  • While reviewing some old code, I found that I could not reconcile the standard errors from manual calculation with that from glm function. I found the bug and fixed it. The standard errors from pen and paper calculations now match with the ones from glm function.
  • The code I wrote 2 years back for some estimation problems was completely wrong. I was so dumb to code that way. Now when I see the code , I am happy that I am able to fix it with ease. What has this code review taught me, besides writing effective R code ? Well, it has taught me something about getting used to math symbols and equations. In the last two years, more than anything else I have grown more comfortable with handling the math behind the stat equations.
  • The book has an exercise problem that asks one to fit an exponential distribution for a response variable with a log link function. Big learning from  R forums is that you fit it with a gamma distribution and estimate the model. You use the shape parameters in the summary(fit, dispersion=1) to get the standard errors. Nuisance parameters don’t matter for estimating coefficients in a GLM framework. They matter in computing the standard errors of the coefficients
  • image
    • This is the basic equation for GLM estimation. Let’s say you are fitting an exponential distribution for the response variable and you have a log link function, the R function glm has family variable that takes Gamma distribution. So how does one get the estimates for Exponential distribution that is basically Gamma with shape parameter as 1. I came across a post in R forum that finally answered my question. For estimating the coefficients of GLM, the shape parameter doesn’t matter because if you look at the equation above, the shape parameters that appears in W matrix gets cancelled from both sides. However for estimating the standard error , the matrix imageis used. Hence summary(fit, shape=1) needs to be used.
  • glm(y~1, family =whatever) gives the MLE for scale parameter for y if y belongs to an exponential family distributions with canonical form
  • You can get least square estimates by using glm(y~x, family = Gaussian). The estimates match if the model Y = Xb + error (iid with known variance)
  • Score function is a damn important function as the Variance of score function is related to the standard errors of the MLE
  • Newton Raphson’s iterative procedure framework is immensely useful in remembering the way one needs to iterate the Score function in a GLM to obtain the estimates
  • Sampling distribution of deviance statistic is used for model selection
  • Score statistic is something that is useful to compute for a likelihood estimation for a single parameter or multiple parameter case
  • One can use GLM framework to derive least square estimates . Its kind of obvious though as GLM is a superset of Simple Linear Model.
  • Deviance statistics is useful in hypothesis testing. Typically for variable that does not have nuisance parameters, Deviance statistic calculated from the sample follows a chi-square distribution.
  • In cases where there are nuisance parameters, deviance statistic in its original form cannot be computed. Hence an alternative statistic that is a function of deviance statistic is calculated so that hypothesis testing can be carried out. Classic case is that of Normal distribution based model where sigma is the nuisance parameters. Deviance cannot be directly calculated. Hence the ratio of two chi-square variables is taken and F statistic is used in Hypothesis testing.
  • Connection between standardized residuals and variance estimate in a linear regression framework
  • Deviance typically results can be approximated by a non-central chi square distribution.
  • Sampling distribution of MLE estimators
  • Sampling distribution of Score statistics
  • For a binary response variable, the link function cannot be chosen in an arbitrary way. The fact that mean value of the random variable lies between 0 and 1, means that one can use only a restricted set of link functions like
    • Cumulative Distribution
    • Inverse of Standard Normal (Probit – logit)
    • Extreme value distribution (complementary log log function)
  • Alternative to deviance is weighted sum of squares also called Pearson Chi-square statistic
  • Hosmer Lemeshow statistic asymptotically has a chi square distribution with dof equal to # of groups – 2. It provides a convenient form of testing hypothesis when the predictor variable is continuous and response variable takes either 0 or 1.
  • 2(Likelihood of fitted – Likelihood of minimal ) is called likelihood ratio chi-squared statistic
  • Residuals for a binary response variable can be
    • Pearson Chi-Squared residual
    • Standardized Pearson or Chi-Squared residual
    • Deviance Residual
    • Standardized Deviance Residual
  • Important aspect of “Assessing the adequacy of Models“ is Overdispersion .Overdispersion is one of the things to test incase the current model is a bad fit
  • In a linear regression model, you can see the random component right in the equation. For a GLM it is kind of embedded as one usually sees the equation for expected value of Yi. But keep in mind that assumption of iid Y can be violated and that leads to overdispersion
  • Sum(residuals(fit)^2) gives deviance whereas Sum(residuals(fit, type=”Pearson”)^2) gives Pearson chi-square deviance
  • Deviance and Pearson Chi-squared fitness of test can be used for testing the model fitness. However Chi-Squared test is better as it performs better than Deviance in the presence of small frequencies. Why ? Small frequencies give rise to higher variance and hence the Pearson Chi-square weighs the observations according to inverse of variance. Thus all small frequencies are given lesser representation in the over all estimation procedure.
  • D(Deviance), X^2 (Chi-squared goodness of fit) , C(likelihood ratio chi-squared statistic) , pseudo R^2 are the four goodness of fit stats that are thrown as output in many statistical packages.
  • Joint probability distribution of Yjs conditional on their sum n, is a multinomial distribution. This is a key fact used in defining contingency table
  • Hierarchical models have a different meaning in the frequentist world. It means that if there is a higher-order term in the model, then all the related lower-order terms are also included
  • What comes to your mind when you hear contingency table data ? Well like any sort of data that you come across , does the data be made to fit any distribution to get a handle on it ? Using the fact that joint distribution of poisson conditional on sum is a multinomial, you now have to estimate the parameters of a multinomial.
  • Log-linear model is one where expectation of Yi can be written as offset plus a linear model term
  • Looking at contingency data, besides the usual Chi-squared statistic, what are the other models that can be fitted ? Log-linear additive model, Log-linear Saturated Model, Log-linear Minimal model ?
  • For a 2 by 2 contingency variable, the model forms for the following are stated clearly in the chapter
    • Log-linear Additive Model
    • Log-linear Saturated Model
    • Log-linear Minimal Model
  • One can account for overdispersion by taking an alternative model than poisson, i.e., negative binomial distribution
  • mu in poisson model is always rate specified in terms of exposure. Log-linear model has an offset term to account for this.
  • ANOVA framework can be applied to additive, saturated, minimal models.
  • Lets say you see a cross tab with data. Anything you want to analyze, you must always think in terms of null , alternate hypo. After framing the analysis in those terms, one must think about the joint distribution of the response variables. Does it come from exponential family ? Does one have to take care of over-dispersion ? What’s the link function to be used ?

I have also made a reference sheet containing various formulae mentioned in various chapter of the book.

Obviously the logical step after dwelling in the GLM world is the Generalized Additive Modeling (GAM) world. I hope to explore GAM world soon.


Reading this book is like rising above the Normal Linear Model world
seeing a far more broader picture of various models.