February 2010


If you look around, you cannot miss the fact that people are living in a road runner culture, meaning , People want to do everything fast.

  • Speed publishing – Blogging , Twittering
  • Speed Talking –  IMs
  • Speed Dialing
  • Speed Dating
  • Speed Sex
  • Speed Walking
  • Speed Yoga
  • Yogaerobics – I saw this ad in Midtown NY and was  rather amused at the way yoga, an essentially slowing down activity  , is being marketed with aerobics tag

Nowadays it looks like people feel that Instant gratification is also too long. They want an instant gratification of instant gratification 🙂

However there are some people who realize that they are on this fast track, and ironically want to slow down quickly 🙂

Well, most of us at some point  or the other in our life time would have read some self-help book, be it on public speaking, seven habits , 10 rules, dozen things to do 🙂 … It doesn’t take long for most people to realize that all those books serve no one but the publishing industry..

Given my view of how speed has engulfed all of us, I picked up this book mainly to laugh at the content. A self help book which was offering solutions to all our problems in less than a minute..Wow!! never though speed will enter self-help domain too 🙂

Audience for this book : People who want to quickly change themselves 🙂 and people like me who want to laugh at the rather serious tone of the book.

image

The author talks about 10 aspects of human lives .Happiness, Persuasion , Motivation , Creativity , Attraction , Stress , Relationships , Decision making , Parenting , Personality . He provides one minute solutions to each of these complex aspects of our lives. By the way, the author backs it up with scientific research 🙂 People can go to any extent to sound rational.

If you want to laugh heartily , you can read this book, after all “Laughter is the best medicine”.

So, the author unknowingly is doing good to readers 🙂 .Or may be the author is smart enough to cash on the speed culture 🙂

Advertisements

image

I stumbled on to “Better” at a crossword store in Mumbai. Picked it up immediately after spending a minute browsing through the book… The book turned out to a great read.

So, with that experience behind, I ordered the other two books written by the author Dr. Atul Gawande. His first book which is extremely popular is “Complications”. “Better” is his second book and recently he came out with a book titled “Checklist Manifesto” . With very high expectations, I began reading “Complications” and by the end of it , the book did not disappoint me.

This book talks various incidents in the medical field under three themes which are Fallibility, Mystery and Uncertainty .


image Fallibility

The image of bad doctors ,for most of us  , are doctors who do evil things with their patients. However the author recounts a different experience. There are lot of good doctors who become bad doctors, and what he means by bad, is that they unknowingly do harm to the patient , more so because of their personal problems. Be it divorce , alcohol, burnout , or plain neglect to be constantly aware of the patient needs drives some of the doctors to do more harm than the status quo of the patient. Some doctors , as it seems, forget the nature of human fallibility and the consequences of dealing with such doctors becomes rather tricky for the hospital and the supporting staff.

For people like me, who have no clue what goes on in the medical world, it was interesting to read about the way doctors learn from their mistakes, individually & collectively.  The author mentions about M&M, Morbidity and Mortality conference where the docs talk about their mistakes. The purpose of such meeting is to discuss faults, failures in various surgeries that were undertaken by the respective hospitals.

Another interesting aspect mentioned about the field of anesthesiology is the way the docs are prepared and trained. A life size computer driven mannequin is used to to train anesthesiologists. It reminds me of the book “Talent Code”, where there are umpteen number of examples where learning was accelerated when there was an environment to make errors and quickly learn from them. The computer driven mannequin served the same purpose. With out harming any real human being, the docs learnt about various errors and hence they could deliver superior performance in the real world.

I can related one thing I have been doing for the past few years ONLY, though I should have done this much much earlier. Whenever I learn something , I try to simulate data and examine the results, rework and get the formula / estimate being talked about. Simulating data and then working on this data has given me an idea about the things that could go wrong in an analysis. I can test all sorts of bias and see whether my analysis holds good.  I have increasingly started living, examining, testing in this artificial simulated world for the real world offers very few data points and the mistakes , if they happen, you pay a heavy price.

The book talks about Shouldice hospital, a case in point of specialization and focus. In most of hospitals 10-15% of hernia operations fail. But not at Shouldice hospital. Why ? What’s so special about this hospital ? Well, it turns out that Shouldice hospital employs about dozen surgeons who don’t do anything but hernia operations. They do about 680 hernia operations per year , more than a surgeon would do in a life time. So, this repetition of the same type of operation day in day out, changes the way the docs at Shouldice hospital think about their problems. In Gladwell’s words, may be the sheer # hrs logged by the docs performing hernia operations makes them outliers. In Gawande’s words, Repetition changes the way the docs think. Most of the activities are done automatically with doctor’s mind focused on those little nuances of the patient’s health which are crucial to the make or break of an hernia operation.  This is a beautiful case which shatters the myth that doctors should dish out individual customized treatment. Well it sounds good in theory, but as far as stats are concerned, systems , error induced learning, # of hrs logged in doing that activity play a far more important role in the overall success rate of a hospital , in the context of a specific surgery.

“The key then to perfection is routinization and repetition”

So, to think of it, if a job at a company is routine and repetition, then probably there is a chance for you to master something. If you are working on multiple projects in a year, ya , you might know a lot of things , but it is unlikely that you will master something!! . When you hear your friend / coworker saying “ My job sucks because it is routine..”.. Well,  I guess you can sympathize with them not because of the situation they are facing, but because they are not utilizing the situation that something is routine and repetitive , to master it.

The author mentions another interesting point about high performance surgeons versus the rest. He quotes a few research studies which bring out the point that , highly successful surgeons are those who are willing to engage in sustained training. Skill and Confidence in surgery is learnt through experience, haltingly and humiliatingly. So, one’s willingness to practice is all that matters. High performers are clearly those who put in a greater amount of deliberate practice. This means that,

“The most important talent may be the talent for practice itself”

image Mystery

This part of the book deals with various mysteries surrounding the field of medicine, which we outsiders think that docs have all the solutions in the world. The author takes the reader through a few essays that deal with issues that doctors consider it a mystery . Some of the mysteries explored are the mystery of pain in a human body, mystery of nausea , mystery of blushing in a person, mystery of overeating. With specific instances for each of these above mentioned issues, the author drives home the point that science has no clear cut answers to identify symptoms and offer remedies to the patients.

image Uncertainty

In the world of stats, there is a term called “ Aggregation Bias” . This basically means that if you observe an effect at a global level, you cannot generalize at a local level. For example, if you think have data from all the major cities in India about 2 variables, Y being income levels, X being % educated population . Lets say you make a model out of these two variable only. You might see some effect of education on the income levels. Can you generalize it to every person in every city..No ..This effect is valid at a city level. By applying it to an individual, you are assuming that the city and % educated population are independent. Are they ? They need not be and in most cases they are not. So, this aggregation bias is a killer for most of the social sciences papers that have been published..You see some effect at a macro level but you cant say anything at a local level.

Ok, why is the above stuff relevant to the book? Well, doctors are trained on general rules and general symptoms etc. Basically you can think of it as imparting education at a macro level. However when they work with the patients, do those rules, heuristics, education etc help ? Well , to some extent says the author. More likely it is not going to help . Author gives an analogy of ice cubes. If you study the properties of a set of ice cubes, you can be reasonably sure about the property of an individual  ice cube..Alas!! humans are complex species. So, the doctors always face uncertainty in dealing with their patients and this part explores a few uncertainties that are part of everyday life of docs.

image Takeaway :

The book is a real page turner which gives one , an insider’s look in to the medical field. Your understanding about Fallibility, Mystery and Uncertainty will definitely be more richer after reading this book.

Image-134

Had a dose of stats at work during this week and I am still struggling to find a way out the model that I am working on. To take a break from that mode of thinking and also thanks to this three day weekend , I am writing a rather elaborate summary of , what I consider to be one of the well written books in statistics for linear models. The point of this summary is  to motivate beginners – intermediate level stats oriented students to have a look at this book . I am certain that  there will something to take away from this book that is wonderfully written. Ok, now coming to the summary..

This book gives a nice recap of the basic linear models with just the right amount of math.


Chapter 1
starts off with one of the most important aspects of linear models, Confounding. Ideally a randomized controlled experiment is best as one can minimize confounding. However most of the data, at least in the financial world is observational. Dealing with stocks, options, vol data, pretty much everything is observational . So, one might get a handle on the association between  variables but one needs to be very careful in making a causal inference. This chapter has three case studies , the null hypothesis of first case being, "Does mammography speed up detection by enough to matter ? ". Second case is relating to the outbreak of cholera and various causal inferences about it. Third case is the famous study by Yule relating to the poverty and policy choices. This chapter kind of sets the tone for the topics to follow.

 
Chapter 2 is about simple one variable regression. Often neglected point in most of the books in stats is the difference between parameters and estimates. There is often a confusion between the two and sometimes people use them interchangeably. Infact this is the first book on stats that I have come across, where this is emphasized very very clearly. Estimates aren’t parameters , and residuals aren’t random errors.When I hypothesize a linear model between let’s say Y and X variables, I am assuming Y = aX + b + Epsilon as the model . a, b, Epsilon are all parameters and whatever statistical technique is used to compute a, b, Epsilon, the values resulting are estimates. Karl Pearson thought that all the data comes from a distribution and by collecting large enough data you can have the knowledge of the distribution. However it was Fisher who believed that all data you see are realizations of an abstract distribution. All one can do is estimate the parameters of the abstract distribution from the given data. Parameters are basically functional forms of the data. When you assume a Data generative process like a simple regression, you are assuming an abstract distribution for a and b. So , as per Fisher, one can get an estimate of a and b. This point is made repeatedly so that the reader never confuses every again the distinction between estimates and parameters. 


Chapter 3
can be skipped if you are aware of basic matrix algebra. When I look back at my lectures that I have attended at various places, somehow the profs never emphasized the importance of matrix algebra. A sound knowledge of matrix algebra is a pre-requisite for understanding stats.  So, may be I was at the wrong places:) or May be I was not concentrating well…. Later , when I actually started crunching numbers to work on them , I realized that with out matrix algebra, you just can’t do anything in stats. Pretty much everything in stats required a basic to intermediate working knowledge of matrices. Some of the concepts I think that are very useful from a practical point of view are Four Fundamental Subspaces, Projection on to sub spaces, Positive definite matrices, Eigen value decomposition, Row and Column Rank of matrix, Row and Column Spaces, Idempotent matrices, Projection matrices, Various ways to decompose matrices starting from Cholesky, QR, SVD , Matrix differentiation, Orthogonal Basis, Linear Transformations . Even a simple regression of Y with X1 and X2 needs a matrix decomposition algo. Why ? Because estimates are usually present as inverse of some combination of matrices. Inverse of matrix means one has to use some kind of  decomposition for numerical stability. Anyway, for a reader who is well versed with these concepts, one can safely ignore this Chapter.


Chapter 4 i
ntroduces Multiple Regression. Ideally the content in this chapter is pretty straight forward with assumptions relating to linear model and based on the assumptions , computing the estimates of the parameters. However the beauty of this chapter are set of well laid out questions at the end of it, which makes you think about a lot of aspects on a model , as simple as Multiple regression. Until a few years ago, I thought that this is all there is to modeling. How naive of me!! Once I got introduced to PDEs and Stochastic Models, I came to realize that I knew absolutely nothing about modeling. Whatever modeling I had done, was fairly  basic. I still remember the days at an analytics firm in Bangalore where I was building logit models for mortgage prepayments. We used to build a large number of logit models and were pretty content with it. In the hindsight, what I was doing was like a drop in the ocean of statistical modeling. In that sense, this book too gives you only the basics about statistical modeling. But the basics are extremely well written. I wish this book was published when I was starting off on stats! . Ok, coming back to the questions that you will think about after reading this chapter are

  • Why is sum of residuals not equal to 0 if your initial linear model has no intercept term ?
  • What exactly is the problem with collinear independent variable in a multiple regression ? Does it effect the estimate or the standard error of the estimate ?
  • Why Hat matrix is important ?
  • What’s the problem with omitted variable model ?
  • If a model is represented as  Y ~ X1 + X2 , what is effect of error terms being correlated with one of the independent terms ? What is the effect of residuals being correlated ?
  • What happens when you exclude a variable which is orthogonal to all the variables in the model ?
  • If a model is represented as y~X1 + X2 + X3 , how do we test the hypo that beta_2 + beta_3 = some constant, beta_2 , beta_3 being the estimates of the coefficients of X2 and X3

Actually by merely reflecting on the model , there can be tons and tons of questions which you can think about and probably answer them. For me the biggest take away from this chapter is the important of crossprod(X,X) , meaning X^T times X. I had never realized before reading this chapter, that Xtransp*X governs a lot of things about the estimates.


Chapter 5
introduces GLM, a topic which I am extremely interested in because I still don’t know how to deal with them. In theory I do know, but I have never implemented a GLM model till date in finance. Well,as far as fin applications are concerned ,  you will never have residuals which are IIDs. This was a learning that was drilled in to my head by my guide during masters. How do you model the estimates if you know that errors are correlated , errors form a stationary process, errors are Poisson, errors are from a prior distribution ? You will get a basic flavour of it from this chapter. For doing anything in the real world, you would probably have to refer some other book on  GLM wont suffice. 


Chapter 6
talks about Path Models. This chapter is suited for bed time reading. Well, for first timers, path model is a graphical way to represent a set of structural equations. The chapter starts off with standardized regression , then gives a very superb discussion about the way in which a physics model is different from a statistical model. I loved this part of the chapter, as the discussion between Hooke’s law and a possible regression equation was too good. If you cannot verbalize the difference between a statistical model and Physics based model to a sophomore, then in all likelihood, you might want to read this part of the book. Some of the terms which you will be very clear after reading this chapter are

  • Causal Mechanism
  • Selection Vs Intervention
  • Response Schedule
  • Dummy Variables – An interesting thing, I feel like writing here , relating to dummy variables, is about my experience with some person who had done a masters in statistics from reputed institution. I remember asking that person a question on Dummy variables,many years ago. If a variable takes p categorical values, then why should there be only p-1 variables in the regression equation. She mumbled something like “ Well you can obviously get the other effect from the p-1 effects..” Intuitively I get it, there is a redundancy. I did get that point , but my question was more from a math point of view. What happens in a regression equation if I have a dummy variable for each of the levels of the categorical variable? However all I got was some angrezi and I wanted an answer in Math. Anyways, later I figured out that by placing all the p categories as variables, you get a design matrix X which does not have full rank. If the design matrix is not full rank, it is not invertible. If it is not invertible, forget regression !. Sometimes people think intuition is the killer . I don’t deny the importance of intuition at all. But sometimes arguments made in math are easy to understand. Every thing falls in place and you just get it..There is no touchy feely thing here.Its plain simple math…i am digressing from the intent of the post. Coming back to the things you learn from this chapter on path models,
  • Association
  • Linkage
    I love diagrams and this one sums up a basic structural model in a nice way. Boxes at the top represent distribution and arrows represents the realization from the distribution.
    image

Chapter 7 talks about Maximum Likelihood Estimation( MLE). The basic idea behind MLE is that you specify the distribution of the variable under study , and then find the parameters of the distribution such that the data available has the maximum likelihood. With any estimate, there needs to be inference. That’s where Fischer Information comes in, which helps one to get an idea about the variance of the estimates. Once you step out of the simple regression world and enter in to the world of logits, probits, bivariate logits, etc. MLE is inevitable. Also there are very many methods to find MLE using optimization. Starting from basic grid search to sophisticated simulated annealing algorithms, one can use a lot of methods to find the parameters and their estimates . Somehow this chapter fails to mention the classic trinity tests. Likelihood Ratio ( LR) , Wald test , Lagrange Multiplier test( LM) test. One inevitably uses one of these in MLE estimation. This chapter does give a basic intro of MLE , just enough to read far more math oriented books. Personally I feel MLE is fascinating. For all the  years that I have been thinking that stats is bayesian or frequentist, I have realized that there is a third direction called MLE which makes statistics beautiful. 

 

Chapter 8 is about Bootstrapping, my love of life for the past 1.5 year. As the name suggests, it means data will pull itself up to give you the estimate you are looking for.

 

image

Basic idea is resampling from the empirical distribution function. Typically in a Monte Carlo, you simulate from the distribution that you have assumed. In Bootstrapping you trust the data and basically resample from the data and compute the statistic that you are interested in. This chapter takes you through a few examples of using bootstrapping techniques.  Resample from the data and calculate the sample statistic that you are interested. If you do it N number of times and take the average , then the estimate converges to population mean. Will it always work?  No. If the sample is not representative of the original population, no amount of bootstrapping will help you . Let’s say you generate 100 standard normal numbers each of which is greater than 3. No amount of bootstrapping is going to be of help you compute the correct population mean.One crucial thing to remember is that you are sampling data from an empirical distribution of the sample. Bootstrapping can be extended to figure out parameter standard errors in a regression equation. It can also be extended to Auto regressive equations, GLM models etc.

Chapter 9
is about Simultaneous equationsAs the name suggests, this relates to modeling a set of equations. Path diagrams are heavily used to convey the connections.

Chapter 10 concludes with some issues in Statistical modeling. Well, in reality , a few pages alone from this chapter would make you realize that most of the models described in the chapter 1-9 are good for publishing research papers. But in reality, these models breakdown .  Residuals are not IID in the real world, they are DDD-Dependent and differently distributed. Models are rarely LCC – Linear with constant coefficients. They are NLNC – Non linear with Non Constant coefficients.  The author sums up saying

“The goal of empirical research is – or should be – to increase our understanding of the phenomena, rather than displaying our mastery of a technique”

Principles and concepts which are in Chap 1 – Chap 10 is basically about 200 odd pages, but the next 200 pages of the book comprises some of the amazing case studies which will make you think , make you question , make your mind stretch on a ton of aspects relating to linear models.

image Take away :

If you use linear models in your work, you cannot miss this book .
Its priceless !


For any e-curious person, some of the basic questions that one might have are

  • Who came up with this constant ?
  • Where did it first appear ?
  • Why is this number so important ?
  • Why should one make a function out of this constant ( exp(x) ) ?
  • What is its relation to complex numbers ?
  • What is its connection with hyperbola , as it appears in coshx , sinhx etc ?

This book is a collection of stories about various developments around e. However there is a common thread which runs across all the stories. Each of the story has in its essence, a development which lead to the world we are living in, where we take e for granted in most of the applications we deal with.

It all began with Napier’s 20 year effort. John Napier worked on a single idea for 20 years. What was his idea ? Multiplication and Division of large numbers are difficult operations than addition and subtraction. If numbers could be represented as an exponent of some number, then multiplication will turn in to addition and division would turn in to subtraction. Napier thought of a base and created elaborate tables where by you can multiply and divide by looking up relevant numbers in the tables and performing addition and subtraction instead. Henry Briggs, a geometry professor from Gresham college suggested a few improvements like having a base of 10. Thus the concept of logarithm was born.

What’s logarithm got to do with e ?

Finding area under the curve y = 1/x was a problem which racked quite a few mathematicians. Fermat had developed a way to find out the area under the curve y = x^n , but for n=-1 , he said it did not work. The crucial breakthrough took place when Anton de Sarasa, a Jesuit mathematician noted that area under the curve y = 1/x behaved like logarithms. For lengths which grow geometrically on the x axis, the area grew arithmetically. Thus he was able to connect Napier logarithms with the behavior of area under 1/x and thus was born, logarithmic function.

Subsequent to this, there were developments in the area of calculus that brought e closer to discovery. One of the fundamental theorems of calculus states that rate of change of area function with respect to t is equal , at every point x=t, to the value of the original function at that point.

Simply put, for y = 1/x kind of situation , d(A)/d(t) = -1/t …It was already established that A was log(t) though base was something which was not yet fixed. base of the logarithm could have been anything. If we let the base be some number b, then we see that the equation turns in to a problem where we need to find the base such that derivative of a function is the function itself. Thus the base of logarithm e was born. ln(x) and exp(x) thus become known to be inverse functions of each other.

So, as one can see that exp(x) is closely related to hyperbola ( y = 1/x) and this spawned a new set of trigonometric entities called sinhx, coshx, tanhx etc. These are the analogs of sinx , cosx, tanx relevant to the circle. Thus e became to Hyperbola what pi became to a circle.

The author subsequently takes the reader through multiple facets of e, like the use of logarithm/e in representation on the music scale, its presence in the eulers formula , its presence in the nature ( sunflower spiral, nautilus shell ) . its presence in ancient and modern architecture etc

e was first noticed in continuous compounding of money. The expression (1+1/n)^n as n tends to infinity was known to converge to a number 2.7182.. But it took a century for the math to develop that so called number 2.7182 in to a tractable exponential function, e^x and finally stamping the property of transcendental on e.

Today , one cannot think of doing math/fin with out exp(x) function , for this makes a whole set of problems analytically tractable. This book makes one pause and look at the massive efforts of various mathematicians / scientists who contributed to the understanding of e.