June 2010


Systematic Investment Plans(SIPs), Value Investment Plans (VIPs), Value Averaging Transfer Plan( VTP) , Dollar Averaging Concept, Value Averaging Concept,  are ALL usually associated with answering a question – “ How to invest money over a long period of time to get adequate returns, given that it is impossible to time the market exactly ? “.

It has been quite some time since I had done any analysis at a portfolio level. Was occupied in doing something on the relative value arb stuff . So, my mind was a VERY RUSTY and was not in a PLUG and PLAY mode for portfolio analysis. The moment I started looking at the plans available in the market, the first thing that hit me was- “Where is the mention of Risk in these products “? Yes, risk is an abstract quantity which can be defined in more than one way, but shouldn’t that somehow figure in an investment plan ?. SIPs , VIPs, VTPs are all focused /designed based on returns and they have no mention of risk! . At a 10,000 ft , it looks like they can be summarized as a mix of contra-trend following  + formula based investment strategy.

As these thoughts were bouncing in my head, I stumbled on to the classic book on Value Averaging by Michael Edleson. As a concept , Value averaging is very simple to understand. You expect your portfolio to grow by a certain amount every month/quarter/year etc and accordingly you buy and sell the portfolio. Why should this method work ? Does it always work ? Is a better way for investing LONG term ? Tried going over the book to get some pointers to the questions in my mind. The fact that risk was not associated with any of these plans was a great motivator to read this book. Has the author addressed in some corner of the book about the risk associated with the investments ? What’s the book about ?

Timing the market is difficult. Some are successful but most get burnt. This book talks about two investment techniques called, dollar cost averaging and value averaging, both being a formula based investment methods  where emotions are taken out of the investing process. The book starts off with some thing very obvious. The returns variation goes down as your holding period increases. Empirical histograms are plotted for various holding periods for supporting the above argument . Basically all this means is that volatility goes down over a longer time frame and returns are much smoother.

The book then gives an overview of Dollar averaging. Dollar Averaging is explained as a technique where a fixed amount of investment is made at regular intervals, thus resulting in – “buy less when asset values are high and buy more when asset values are less” , type of investment process. Thus on an average the buy price is averaged out and an investor is not exposed to fluctuations in the market price. So, in a sense it can be termed as “Buy low, Buy less higher”. It does not say anything about the SELL aspect. However there is a drawback in this concept. If the asset values keep going up, you end up buying lesser units of the assets, and hence on an average you are less invested in the market over a longer period of time. This is one of the reasons for its underperformance with a constant share portfolio. Hence a growth-equalized variation of Dollar averaging would be better than plain simple dollar averaging.

The book then gives an overview of Value averaging. Value Averaging is somewhat different tack than Dollar averaging. It basically means that the investor should keep a constant growth of value each month. The biggest advantage as compared to the Dollar averaging is that it gives an opportunity to sell. When the portfolio posts gain and is higher than the planned portfolio value, investor can opt in to sell a specific amount of portfolio so as to stick to the old plan. However the plain vanilla value averaging is still a problem as the constant growth of value that is to be followed becomes insignificant as the accumulation period increases. After a few years it might so happen that the entire value comes from the original investment made at a discount and the investor ends up selling and goes in to a de-accumulation state. So, obviously be it Dollar averaging or Value averaging, it is extremely important to get the long term trend incorporated in the formula based investment. One of the ways could be to incorporate inflation adjusted growth Or incorporate some number above the inflation adjusted growth rate.  Here is something important

A bad year is a bad year even with value averaging.The investment vehicle you choose is far more important to your results than the mechanical rules you follow to invest in it. To that end, it is best to use value averaging with very diversified investments, such as a broad-based mutual fund or, preferably, an index fund.

Be it Dollar averaging or Value averaging, incorporating long term trend is the KEY to get effective returns. The right set of diversified instruments is extremely important. Even though the author advices to use index fund, how does value averaging work with a set of index funds representing various asset classes ? That’s an interesting thing to work on.

The book then goes in detail in to Dollar averaging and the ways to tweak the concept so that the portfolio keeps pace with inflation, short term volatility , long term overall growth of the market etc. 70 pages in to the book, I kind of stumbled on to the thing I was looking for. Here is what the author has to say on the risk and suggests to invest in lower risk instruments.

The reason for the shift should be clear. Investing in the stock market is great for long term goals, but as you approach your goal-spending requirement (for example, as tuition comes due), you most likely do not want your entire college fund sitting in a risky mutual fund. A bad market result could cause you to suddenly come up very short of funds at the last minute. Over time, it makes sense to gradually shift more funds from risky to less risky investments, realizing that your expected return will go down as you do this.

It is wise to gradually down-shift your risk level as the time of your investment goal approaches. Start

with conservative estimates of how well your investments will do, and take opportunities to shift to lower-return, lower-risk investments later in the plan if you are doing well and if it suits your purpose. You will then be at less risk of missing your final investment target.

Basically the author is hinting at a glide path based investment policy. This is something the target date funds use it in their investment methodology. So, obviously a SINGLE instrument + SIP option is not the BEST option for the investors. You have to have different asset classes with different risk structures as you reach your target maturity for the investment. So, the author clearly hints that there is a need for glide path+  formula based investing plan for the investor.

At any point between the start and the target maturity, there are multiple paths to reach an investment objective. In the case of SIP, one can tweak the constant amount invested or one can tweak the growth rate of the amount invested to correct for incorrect returns estimates that would have been used at the beginning of the investment plan.

Finally there is a chapter on the details of Value Averaging incorporating growth rates. In this chapter, I found some mention about the area which I was looking for.

The big problems occur with value averaging (or most other strategies) when you have a really bad market performance after you’ve already built up a sizable portfolio toward your goal. Investors approaching their final goal in December 1987 were certainly shocked and disappointed by the crash in October of 1987 and certainly would have missed their end-of-year December goal, which had been almost achieved.

In some sense, the risk of bad performance hurts more as you get closer to your investment goal, because there’s really no time to recoup losses. To this end, it may make sense to be a bit conservative in your initial expectations.

Shouldn’t one be holding less of risky assets and investing more in a defensive kind of portfolio as time progresses ?

The book then talks about simulating various value paths and getting the distribution of terminal portfolio values. Obviously the returns are assumed to be log normal and the terminal values are simulated. There is an entire chapter on comparing strategies as a result of simulation.

At this stage of reading the book, it is best to simulate stuff and get a feel of results than merely looking at the results given in the book. After all, we are just taking about a single asset simulation. Well, you can take a hypothetical asset, simulate the performance of an investment under various asset value realizations.

For example, let’s take an investment of 2000 Rs Monthly is made in to VIP scheme which is structured in such a way that it grows by 1% every month.  For a simulated asset path, the following shows the value path and monthly outflow for a 60 month period.  The first graph shown below is the Value path which is what the investor’s portfolio will mirror. The second graph is the cash flows that will result in a simulated asset value realization.



Even though investor thinks that his value is going up by 1% every month,here is the interesting part..

One can use all the fancy metrics like IRR, MIR, etc …But let me stick to the very basic return i.e return on total investment = Final Portfolio Value / Sum( Cash Flows)

If you sum up all the cash flows from the investor , in the above run, it is Rs. 1,70,042  and the portfolio’s final value is Rs.166, 972 . They are almost the same value. So, what has VIP done in the above hypothetical run ? Even though your portfolio value is growing by 1% each month, what’s the final return on all the cash flows ? Zilch!

Now obviously an inference from one simulation is naive.. But the point to note that VIP like any investment can result in flat or negative returns.

To make a better inference, let me  simulate 100,000 runs and see what’s the returns on the total cash flows.



What can one infer from the above graphs ?

The minimum and maximum total returns is –25% to 40% ..Mean and std dev of returns are about 1% and 7% respectively…  So, after 5 years your total returns could be any of the following numbers present in the above histogram. However the mean and sd of the returns paints a rather sad picture for this hypothetical security. So, obviously there needs additional elements to a VIP plan to give better risk adjusted returns!

It can be empricially checked that a VIP scheme is much better than the SIP schemes floating in the market. However if you look at the above monthly cashflow graph, there is every possibility that , if the market tanks towards the end of target maturity of the investor, the cash flows are going to be erratic and the returns could be flat or even negative. If there is a limit on the max investment, then obviously the value path projected at the beginning of the plan will no longer be achieved if the market tanks towards the end of maturity period.

What’s the basic problem with the  VIPs for a longer term investment horizon ? Well, if an investor looks at VIP and wants to invest with a specific Target Maturity and Target Value in mind, there is a chance that the volatility towards the end of the investment horizon might be bad for the entire portfolio. What’s the alternative ? Shifting of money in to different asset classes is a MUST as one moves towards to the Target Maturity Period.

For shorter term investment horizons, VIPs are a classic way to invest. To obviate idiosyncratic risk, the instrument chosen for VIP could as well be an ETF or a broad market based index. BUT for Longer term investment horizon, there needs to be a blend of VIP and an asset allocation scheme with gradual change of asset allocation scheme as one moves towards the target maturity date.

How do you design such a scheme ?.. Well, I have no clue. But there must be a way to do it.

The book finally ends with a suggestion that Value averaging is risky if it is followed using a single stock. A well diversified index fund / ETF is the best kind of instrument to use value averaging on. It is as close as one can get to “ Buy Low Sell High” with out a crystal ball.

image  Takeaway

Well, Value averaging and the way it works is obviously any reader’s take away.

But for me, the takeaway is that,  it is very much possible that a plain vanilla VIP might give flat or negative returns in the long run. Hence there is a need for a product where Value Investment Plans are combined with asset allocation strategies , that change as the investor moves towards the target maturity date.


In US, it is widely believed that math education at school level needs a BIG reform. The comparison is with respect to Asian kids who supposedly do well in math.  My strong belief is that math education is broken in some of the Asian countries too, including India. One might mistake the immense problem solving exercises that most of the Indian kids go through to get to engineering colleges as a strong foundation for mathematical thinking. Nope. Having gone through the system myself, I think that most of the Indian students are good at pattern recognition in problems that they are posed, that too a specific kind of problems that figure in the entrance examinations.

The other day I was reading about students putting in15 hrs of practice for engineering exams. The kids might be brilliant, but I would guess that their enormous practice at solving1000’s of problems relating to algebra, trigonometry, coordinate geometry , calculus, etc make them Excellent Pattern Recognition Engines who work brilliantly till they crack an engineering exam like JEE or State level engineering entrance and then they STOP FOREVER, at least for most of the kids. Most of them , as they reach adult stage, fondly recall math as their favourite subject in their childhood, but struggle to formulate problems using math, think mathematically about the problems. 

Yes , this enormous focused Pattern recognition exercise does sometimes help some of the students sail through their entire lives with out really exercising their brains as far as real math is concerned.  If you crack open a recently graduated engineering student, I bet that the ability to pose problems and solve them creatively would be very less and in the name of math, there would be tons of methods, techniques, formulae, notations, symbols floating around in his mind.  This cannot be called math education. 

The same is the case , sadly , with statistical education at a PG level course in India. Methods, techniques gain upper hand over the narrative . Many students would rattle the formulae for ordinary least squares estimate  but spend very little time to get to the story behind least squares estimate, the epic battle between Legendre and Gauss to stamp their supremacy on OLS. Stories always gives an opportunity to be creative. But sadly, statistics which is a result of enormous trial and error is presented in the curriculum in an extremely dry , placid and boring manner. Ask any PG student who has taken statistics in his academic course, to talk about statistics with out equations and techniques for just 5 minutes, and you will know how badly the education system sucks in India. Anyways coming back to this book,

This book is an essay by Paul Lockhart which goes over the lack of MATHEMATICS in school. It is a book which offers no solutions per se, but points to the specific problems of math education at the school level.

Mathematics is Art, says the author.  Math education has become a set of techniques , often irregularly spaced and disorganized. No effort is put in to provide the context. Math, like painting , is a result of hard creative process and often the result which is an outcome of the process is far more joyous and wonderful for a student rather than the other way around. I can relate to this aspect somewhat, as I have spent 1.5 years teaching Pre Calculus and Calculus to undergrad students at CUNY. One incident I particularly remember is that of a student asking me the relevance of knowing Difference Quotient. I told him that it would be used in Limits and conveniently ignored so as to save class time. But the question stuck in my mind since then. Why should I teach the definition of Difference Quotient , quiz the kids on the definition and related exercises and leave it for them to come back in some other class to appreciate its usage in limits. Almost all the sections I had taught, I just could not deviate from the syllabus which comprised, giving definitions, framing questions around those definitions, asking students to work on some exercises etc. Useful for cracking the exams but utterly useless in the long run.

Author, calls, Mathematics curriculum as  a confused heap of destructive disinformation . He shatters something called the ladder myth, where curriculum is designed as a set of ladders where in the initial classes , students are taught definitions, notations, proofs and statements thus supposedly preparing them for higher class math. This is similar to teaching painting to kids with a tremendous amount of focus on paint theory, theory about colour, than allowing them to take a brush and start off painting and discover the techniques in the process.

With a few insightful examples from the book, the author makes a point that techniques are useless with out thinking about the context and the process of mathematical education. BTW, this process is not to be confused with the process term used in the industrial world context. Here process is highly customized , individual specific where the education is loosely coupled with an underlying theme, Problems and the context matters and not specific techniques or a Russian doll kind of mathematical education where you proceed from definitions to notations to functions to dy/dx , integration with no real context.

Yes , school has become a training ground for children to perform so that they can be sorted. Math is not a collection of facts but is about reason and understanding. We want to know WHY and not for any practical purpose.  In that sense, this book is a wake up call to all the teachers and the curriculum designers of math education to relook at the entire Math education

This book also serves as a reminder that Its not notations, but notions that help in our progress”. Unfortunately , it is former that the school is focusing on , leaving no time for the kids to indulge in the latter activity.

image  Takeaway :

Math education does not need reform as it equates to rearranging chairs on a sinking Titanic.

We need to build a new ship .


A few weeks ago , I realized that my GLM knowledge was pretty rusty as I was struggling to get meaningful estimates from a poisson regression. 

Decided to immerse myself over the weekend on what I gather to be one of the best books on GLM , written by Annette J Dobson. Well , there is always the classic monograph by Nelder & McCullagh , “Generalized Linear Models”, that one can refer to , which to me was a little too dry to go over.  So, there is that little guilt feeling in me that I have never read the classic monograph from the father of GLM(J.A.Nelder) . I have always found this to be the  case in quite a few stuff that I have learnt over the years. For example, I have never read till date,  Bjarne Stroustrup’s C++ , the supposedly classic book on C++. However I have coded C++ like crazy and have used C++ to perform whatever that was needed . So, I always have this guilt trip whenever I think of C++. Well, may be someday I will go through the classic book .  In a similar vein, I will go through Nelder’s classic stuff someday, hopefully 🙂

Ok, coming back to this book, I could only get the second edition of the book. Basically will have to wait for a kind soul to upload the third version on the net 🙂 .. Till then second edition will do 🙂 . The third edition has some chapters on Bayesian stats appended towards the end of the book.

The brilliant stuff about this book is organization of ideas and thoughts with just enough math to get you going on coding your own GLM estimation and diagnostic functions. 

There are about 11 chapters in the book neatly organized based on the type of data that one is dealing, namely, Continuous, Binary, Nominal with > 2 categories, Ordinal, Counts, Failure times, Correlated Responses.  Basically one can visualize this book as learning the GLM tool box  for various combinations of dependent and independent variables. GLM toolbox broadly comprise all the elements mentioned under the methods column in the following table.



image  Chapter 1 : Introduction
This lays out the basic notation that has been followed in the book. Some basic results relating to normal distribution, Chi-Square , Non Central Chi Square distribution, F distribution, Quadratic forms, Cochran’s theorem are mentioned. Also the basic MLE method is described.  If I had read this book a few years ago, I would be wondering why should one talk about distributions in the introduction. Pick up any book and you will usually find some angrezi about the book, meaning some general gyan about why some topic is important , why the author is writing the book, blah blah… But this books cuts all the crap and gets to THE MOST IMPORTANT thing that is needed to specify, estimate the model parameters, DISTRIBUTIONS. Typically these are stuffed in the appendix. The fact that they are mentioned in Chap1 talks a lot about the way author presents this book as compared to the run-of-the mill  kind of books.


image  Chapter 2 : Model Fitting
Two basic examples of competing model evaluation are mentioned to kick off the discussion. Poisson with one general parameter Vs two parameters, Linear Regression with same beta Vs different beta’s for specific categories are discussed and ways to choose competing models are discussed.  As clearly and repeatedly mentioned through out the book, GLM broadly comprises 2 major aspects, the first aspect being, the choice of the distribution of dependent variable from the exponential family of distributions, the second aspect being the link function between the expectation of dependent variable and  covariates.  Both examples more than reinforce this thinking.



Chapter 3 : Exponential Distributions
This chapter talks about the various forms in which Y , the dependent variable can appear in GLM. Normal , Poisson and Binomial are the widely used distributions that fall under exponential distributions. Density relating to each of these variables is written in the exponential form and their properties are discussed. The chapter ends by giving 3 examples where the link function is indentified so that one can apply GLM framework.



Chapter 4 : Estimation 
This chapter is one of the key chapters  of the book as it talks about the estimation framework that needs to be followed in developing a GLM model. Well one can formulate any fancy model but one needs to keep in mind the estimation methods. In the case of regression , the estimation procedure is known to everyone. Least squares is almost always the chosen method to get the estimates. In the case of GLM , there is a twist in the tale. The twist is that , the relation between dependent variable and independent variable is not a linear one but is dependent on the link function. So, here you are with a set of values , supposedly a dependent variable following any of the distribution from the exponential family.  You also have a set of covariates and you have a link function. How does one use MLE to get estimates is the central question answered in this chapter. The procedure behind the  math can be summarized as follows

  • Form the log likelihood function for the Yi’s.
  • Compute the score statistic with respect to one of the covariates.
  • The score statistic unlike the usual MLE will contain a term which captures the link function.
  • At this junction, Information matrix needs to be computed as it is needed for the Newton Raphson formulation of the equation relating to estimating of beta’s
  • At this step , it becomes very obvious that Newton Raphson equation becomes equivalent to an iterated Weighted least squares procedure.

Thus an estimation of any simple GLM model connects a whole lot of concepts like MLE, score statistic, link function, Newton Raphson method, Iterated least square method. When one uses off the shelf stuff to get estimates, one doesn’t get to the see what happens behind the scenes and a novice can fail to appreciate the beauty behind the procedure.

The chapter ends with a poisson regression example which can be worked out manually, thus allowing one to appreciate the connections between various aspects of GLM estimation. The beauty of this chapter is that it allows you get in to the intricate details of estimating the parameters for a GLM using nothing else but matrix algebra and Newton Raphson method. Once you work out an example from scratch, you will obviously be amazed at the power of stat software like SAS, R, Matlab which give the output in a jiffy.



Chapter 5 : Inference 
This chapter provides the link between score statistic and the distribution of MLE estimates by explaining the procedure to find the sampling distribution of score statistic and MLE estimates. It also talks about deviance function for some common models. Deviance function, as the name implied would be the difference between the null model and the hypothesized model. Asymptotics are invoked on this Deviance estimate to choose between competing models. A few examples of Hypothesis testing are show using Wald Statistic and Deviance statistic   



Chapter 6 : Normal Linear Models 
This chapter deals with the Normal Models.  Basic Multiple Linear Regression concepts are discussed along with outlier detection methods based on metrics like Influence measure and Cook’s distance. The discussion about ANOVA and ANCOVA has been presented with great clarity. Until I read this chapter, I had never known that deviance function of a one factor anova is the same irrespective of the linear model specification. In most of the texts that I have read till date, there was no special emphasis on the fact that deviance function is the key to this entire ANOVA, ANCOVA business  2 factor ANOVA and ANCOVA models are also discussed with complete derivation of the estimates, thus making a reader appreciate the Fstats that are available at the call of a simple command.  One thing which is common across all these jargon models is , “the basic matrix algebra is similar to simple linear regression”. However I have not seen till date , any powerful application of these ideas such as ANOVA , ANCOVA etc in stat arb . I guess it falls under “nice to know” stuff. I do vaguely remember of an example where anova was used to get some information about the seasonality of the spreads.. However I haven’t still come across a WOW application till date for this ANCOVA stuff in finance.



Chapter 7 : Binary Variables and Logistic Regression 
Logits and Probits are discussed in this chapter.  Various stats for model selection such as Pearson Chi Square statistic, Hosmer-Lemeshow statistic, Deviance statistic , likelihood ratio chi-squared statistic , pseudo Rsquare statistic are derived. Even though knowledge about these models and estimations look nice on paper, I sometimes wonder about the usefulness of models based on such simplistic equations. Shouldn’t the logits be a stochastic process ? What about the covariates ? Unlike social science applications,  in finance, almost always the covariate is going to be a arma process or a non stationary series.  The information in this chapter will help one understand a basic logit model math, but if you have to apply this to the real world, one quickly realizes that most of the assumptions for plain vanilla logit /probit model are useless atleast to the models applicable to the trading world. You have to bring in the stochasticity in to variables.  I guess for a successful application of any of these models in the real world, one requires solid intuition + deep understanding of the contextual knowledge of the variable being studied +  skills to model systematic randomness not gaussian types + LUCK 🙂 to get it right. …LUCK factor , sometimes, might be the only variable that seems to be a sane explanation of the success of your model :)  .



Chapter 8 : Nominal and Ordinal Logistic Models 
At this stage of the book , the reader can actually guess the contents and structure of a chapter. As the title of the chapter goes, the models considered are nominal logistic and ordinal logistic. There are two approaches to modeling the response variable if it is a categorical variable with multiple levels. First is by using nominal and ordinal logistic models and Second is by framing it as a poisson regression. This chapter dwells on the first method. One of the first observations made is that a multinomial distribution is not in the form of generalized exponential distribution family and hence will not directly fit the framework described in the book. However a multinomial distribution can be regarded as the joint distribution of Poisson random variables, conditional upon their sum. This result provides a justification for the use of generalized linear modelling.

Nominal logistic models can be used when there is no natural order in the categories shown by the response variable. Any random category can be picked up as the reference category and logit models are built with respect to the reference category.  This clearly means if there are any correlations between the levels of the response variable, then plain vanilla logistic is NO GOOD.

The usual summary statistics are derived such as Pearson chi square, Deviance, Likelihood ratio Chi Squared, Pseudo R squared. Good thing about this chapter is the usage of single example to illustrate both nominal and ordinal logistic models. In the former case, the variable is assumed to be nominal and modeled. In the latter case, the variable is enhanced with the natural ordering present and a ordinal logit model is built.  Both the examples illustrate the importance of not relying on just a single goodness of fit statistic. The more goodness of fit measure you compute , the better it is for you to get an idea of the strength of the model.  The  biggest learning for me is to keep in mind  Likelihood Chi square statistic and Pseudo R Squared statistic, besides the usual deviance statistic which is mentioned in everywhere.



Chapter 9 : Count Data, Poisson Regression and Log Linear Models  

As the title of the chapter suggests, this deals with the case when the response variable is a count variable. First method discussed is the plain simple poisson regression where the response variable for each covariate pattern is modeled a poisson variable and log function is used to link the expected value of the response variable and the covariates.  In the case of contingency tables, based on the number of constraints of the data, the appropriate models are formulated. For example a contingency table where the sum of all elements is fixed, one can use a multinomial model. If there are more fixed marginal totals than just the overall total n, then appropriate products of multinomial distributions can be used to model the data. One good take away from this chapter is the case where data in the contingency tables are not independent and there is a suggestion to use negative binomial distribution to take care of this over dispersion problem.

The last two chapters of the book applies the GLM framework to Survival analysis and Longitudinal Data.


image Takeaway : A great book with just enough math to explain the workings of a GLM model estimation procedure thoroughly.