June 2011


Books on R are tricky to read especially when the sheer amount of things that R can do is mind-boggling. So, there are books that range from very specialized to very generic and there is no choice but to refer this gigantic range of collection based on one’s needs. The flip side to this vast amount of stuff is, “it is likely that a first timer would fail to see the forest for the trees”.

Paul Teetor’s book is different from the existing books on R. Firstly, this cannot be your first book on R. So, where does this book fit in the gamut of documents/vignettes/books on R? Well, if you have coded R for sometime / used R at work, then there is every chance that you would have carried out tasks mentioned in the book in some way or the other. May be you have written your own functions/ your own code/ your own packages / or forked a package etc. Whatever be the case, the tasks mentioned in the book would have figured out in your work. Now here is where this book is useful. You can look at a task, form a mental image of the code that you are going to write, compare with Paul Teetor’s solution and reflect on the discussion provided. In most of the cases, your solutions might match. However the discussion provided at the end of each solution is illuminating. There are specific things that you would have missed, certain options in the function prototype that you would never bothered to use, certain ways of thinking about the function that would have never crossed your mind. If you approach the book with the mindset that – “you are going to think about the gaps between your mental image of the code and the code given in the book”, then it is likely that you are going to enjoy this book. I think one should just go over the entire book on a rainy day, instead of waiting for a task to crop in your work and then referring this book.

I approached the book with the mindset -”Can this book teach me a better way to do stuff as compared to what I am doing currently ?” Subsequently, I religiously went over the discussion provided for each of the tasks to note all the gaps that were present in my code. Here is a list of points that this book taught/reiterated/explained in a better way. Chapter 1 and Chapter 2 deal with some very basic stuff that one needs to work using R. May be the first few days you start coding in R, you will probably need stuff from these chapters. So, I will skip the beginning two chapters and list with my takeaways from Chapter 3 onwards.

Chapter 3: Navigating the Software

  • ·Startup sequence for R
    • R executes RProfile.site script
    • R executes .Rprofile script
    • R loads the workspace from .RData
    • R executes .First function
    • R executes .First.sys function

Chapter 4: Input and Output

  • Read.csv assumes that header exists. I had always written header=T unnecessarily. Classic case of not reading the function prototype properly
  • Put rownames=F so that a file written in the output csv has no unnecessary row numbers
  • Use dput and dump to save the data in ASCII format
  • Use format along with cat to control the number of digits printed

Chapter 5: Data Structures

  • Usually one creates a matrix which is purely numeric / purely character data. Here was a surprise to me from this chapter. You can have mixed data types in a matrix. Just use a list which has numeric and character data and then give a dim attribute to the list. Voila, you see a matrix with mixed attributes. A pathological case arises!
  • append function – Have never felt the need to use this function till date
  • Use stack to combine a list in to two column data frame
  • lst[[n]] is an element not a list, lst[n] is a list , not an element
  • To remove an element from the list, assign NULL. One can create a list with NULL elements but if one assigns an element to NULL, then the element is removed
  • Initializing a data matrix- Try typing the data itself in a rectangular shape that reveals the matrix structure
  • drop argument is used to retain the matrix structure when any sub selection is made
  • Initializing a Data from from Row Data – Store each row in a one-row data frame, Store the one-row data frame in a list. Use rbind and do.call to bind the rows in to one large data frame
  • It is better not to use matrix notation to access data frame columns. Why ? The result could be a vector or a data frame based on the number of columns you access It’s better to use list based syntax as it leads to clearer code. Or use drop = false so that the return type is always a dataframe
  • Elegant way to select rows and columns from a datamatrix is by using select and subset

Chapter 6: Data Transformations

  • s in sapply stands for simplify
  • mapply is used when the function that needs to be applied is not vectorized( example gcd)
  • if you want to apply function for specific block of rows, use “by”

Chapter 7 Strings and Dates

  • Had used date objects but never carefully understood the reasoning behind naming convention
  • POSIXct – stores the number of seconds since 1970. Why ct ?. It stands for compact and the now it makes sense. The date is stored in the most compact form
  • POSIXlt – stores the date in the form of a list with 9 elements. Pretty elaborate structure and hence the appearance of lt(list) in POSIXlt
  • Mention of lubridate package – never explored this package till date. Since it is from Hadley Wickham and Co, it must very useful. Will check this out someday.
  • cat will not reveal special characters but print does
  • Not to forget adding 1900 to the year in POSIXlt

Chapter 8: Probability

  • Remember that the distribution parameters that R asks are sometimes not the natural parameters of the distribution. For example for exponential, you don’t use beta , but use 1/beta as input
  • MASS library has MLE and fitdistr functions to fit distributions for all the usual flavors that one comes across
  • If you specify lower.tail = F in pnorm/pt/pgamma, you get the survival probability
  • Set density parameter in polygon to 10 if you want 45% shading lines in the selected area, else set density parameter as – 1and specify the color

Chapter 9: General Statistics

Nice recap of conventional hypothesis testing. Almost for all the tasks, I would have followed the same code given in the book. So, nothing new here for me.

Chapter 10: Graphics

  • Nice recap of base graphics tasks, though I must admit that after working with ggplot2, I don’t think I would ever go back to base graphics. Even for some quick and dirty plots, I think ggplot2 is better. There is some learning curve for ggplot2 but I think the effort is worth it as you will start producing publication ready graphics. I missed a lecture recently on lattice and have plans of exploring lattice soon. I have been told that whatever can be done in lattice can be done in ggplot2. So, I am basically waiting so that I have enough motivation to go over lattice.
  • Using xpd option in barplot2 is a nice way to restrict bar from going outside the graphical region
  • Quantile plots for other distributions. qqnorm and qqline are a part of basic vocab for any data analyst. However there is a task which generates quantile plots for other distributions. I mean you can do the same using a couple of statements , but the author does it in a nifty way
  • Plotting a variable in multiple colors – Till date , I had always drawn multiple variables with a separate color for each variable. But never came across a situation where a single variable needs to be shown in multiple colors. So, in that sense the task solution and description is a learning for me
  • Instead of default R window, use win.graph( width = 7, height = 5.5) where the graph dimensions are based on Golden Ratio
  • Plot(y~x) is far more intuitive than plot(x,y)
  • Changing global parameter has a global effect meaning , if you call packages that create their own graphics, the global changes will affect them too. So, need to be careful while coding par(globalparam=value)

Chapter 11: Linear Regression and ANOVA

  • Poly(x,n) can be used for polynomial regression and adding raw=T computes using simple polynomials instead of orthogonal polynomials
  • boxcox() function gives you the best transformation( MLE is the intuition)
  • outlier.test() captures outliers
  • influence.measures() – A function which is most important as the output needs to interpreted carefully. Cook’s distance (talks about each observation’s effect purely from looking at the independent variables structure, dependent variable is not considered here) and hat values ( values that talk about the influence of each observation on the dependent variable) are needed to do healthy diagnostics.This function does everything for you.
  • oneway.test(x~f) is better than anova(x~f) as the former does not assume equal variance amongst the factor data
  • krusal.test(x~f) is a safer test(as it is non parametric ) than oneway.test , anova which assume a gaussian realization
  • One of the best uses of anova is to compare competing models

Chapter 12: Useful Tricks

  • You can define your own binary operator %XYZ% by assigning it to a function. Never did this till date in my code.
  • match can be used to find a specific element in a vector
  • do.call(cbind, complex.list) is useful when the list structure is complex. unlist might throw junk in such cases and hence the importance of this recipe

Chapter 13: Beyond Basic Numerics and Statistics

  • Need to be careful while using optimize as it reports only one minimum if there are multiple minimums in the given search range
  • Orthogonal regression might be a better choice than simple regression. The visual explaining the basic problem in simple regression is excellent
  • Three lines of code can help you do a cluster analysis 101 ; Use dist, hclust and cuttree
  • I always use to write my own code for bootstrapping. Came to know that “boot” package has quite some functions that can be used, especially the various confidence intervals for the estimates. Got to explore it someday in a detailed manner.
  • princomp and factanal is a good start to a full-fledged factor analysis.

Chapter 14: Time Series Analysis

  • Padding or filling a time series with empty data is something that I found interesting. I think it is an immensely useful hack
  • na.pad = True is an option which I should be using from now on as it makes coding much more clean. This is a classic case of me not reading the function arguments properly.
  • There are a few tasks on fitting arima models, checking for unit root and smoothening time series. As the book clearly mentions, the solutions merely scratch the surface. Each of these topics would themselves need a book length of content to do justice. In any case, the solutions presented in this chapter will at least give you a start.

clip_image002  Takeaway:

Whatever you program in R, the tasks mentioned in this book are going to be a part of your code thus forming the building blocks of your code. The book talks about 246 tasks that generally arise in the daily life of an R programmer. Even if only 10% of those tasks teach you something new, book is still worth your while.


It has started raining in Mumbai and the pleasant climate after three months of scorching heat, enlivens the spirit. Will attempt to write a few words about this book.

Since I have been staying alone for the past few years in Mumbai, I have gone back to “Sitar” which I could not practice in NY for a couple of reasons: Firstly, I missed the space needed for practicing any instrument. Staying with two other guys in a flat was not particularly conducive to playing an instrument without distractions. You have to actively seek out “No Distraction time” so that you don’t cause any distraction to others -:) . Well, work + academics + programming left me with little energy to pursue “Sitar”. Secondly, I could not afford a teacher. Self-study / Self-training in any field requires you to be skillful up to a certain level/ be an apprentice for some time, after which you can be in a cruise-control mode in exploring stuff. I wasn’t anywhere close to that stage and a Sitar teacher was imperative to my practice. Some quick calls to a craigslisters revealed that they were too costly for me to even think of regular classes. Thanks to my decision to head back to India, I found two things in my life needed to play any instrument, a certain level of solitude & a teacher. The former, I deliberately opted for (don’t know how long I can be in this state), the latter,”finding a teacher”, happened out of a chance conversation with someone.

I don’t know why I am writing about “Sitar” when I had meant to give a quick summary of this book. May be, I associate solitude with two activities, “Playing Sitar” and “Doing Math/Stats”. I have found that these two activities demand alonetime, atleast for me. I really appreciate people who can do these activities despite the distractions in their lives. In the process I have actually started enjoying solitude. In fact it has become addictive too!(don’t know whether it is harmful or harmless).

Ok coming back to the book; The book has been on my mind since quite some time. It recounts the experiences of Doris Grumbach, an American Novelist , who takes up solitude for 50 days. Well, actually it is a “forced “solitude. The “forced” word here refers to the situation when the author’s companion Sybil , goes away to procure books for her business to another town for an extended period of time.

Instead of filling it with regular distractions, Doris Grumbach decides to embrace solitude and see what that leads to. When asked in a Charlie Rose interview, why she decided to do so, She said, “ I wanted to spend time alone as I had never spent time alone for many decades. Parents, spouse, kids, neighbors, etc put innumerable covers on me and I never had a chance to look at what kind of person I am, once those covers fell off. How many layers that are not me that I have covered up? How much it is in me that can sustain me, without any human contact? . I was seeking answers to some of these questions”.

Cutting off human contact is unthinkable for many. Reminds me of the words of Anne Lindbergh who says in her book “Gift from the sea”:

We seem so frightened today of being alone that we never let it happen. Even if family, friends and movies should fail, there is still the radio or television to fill up the void. Women, who used to complain of loneliness, need never be alone anymore. We can do work with soap-opera heroes at our side. Even day dreaming was more creative than thisl it demanded something of oneself and it fed the inner life

For 50 days, Doris Grumbach cuts off human contact, TV, newspaper to a large extent and explores the experience such a state provides. She restricts herself to books, music, writing and an occasional NPR news diet.

I found something ironic in a way she mentions at the very beginning of the book that knowing that this solitary period was only a temporary phase helped her embrace it . So, does one have to think that “this solitude has a termination date” to make it easy to enjoy it? In fact I think it is contrary to that kind of thinking. A frame of mind that you will be alone for a week(for example), and all the noise and clutter will be back in to your life in a week’s time Vis-à-vis A frame of mind where you are prepared to be face solitude for as long as you please give vastly different experiences. So, may be the author is cheating her mind to get in to this mode where she can make allies with silence and solitude. In the Charlie-Rose interview, she mentions her time alone as “seductive way of living”, the reasoning being that you tend to appreciate art / beauty/ little things much better in solitude.

This content of the book is pretty random and is reminiscent of a person facing solitude for the first time in life( whose thoughts & actions are very often desultory). It takes one some time to settle in to solitude. In the same way, this book made me feel that the author is trying to grapple with the alonetime and trying to find sanity in this situation of “forced solitude”. The book is interspersed with a good collection of reflections that makes this book a worthwhile read. Some of them that I found interesting are:

  • Order, Sequence is the secret of being alone
  • The true purpose of art is not to create beautiful objects. It is a method of understanding, a way of penetrating the world and finding one’s place in it, and whatever aesthetic qualities an individual canvas might have are almost an incidental by-product of the effort to engage oneself in this struggle, to enter in to the thick of the things
  • The reason that extended solitude seemed so hard to endure was not that we missed others but that we began to wonder if we ourselves were present, because for so long our existence depended upon assurances from them.
  • Loneliness is the poverty of Self, Solitude is the richness of Self
  • In the context of painting : It is only when an object exists in our lives for no other purpose than to be seen that we really look at it
  • In what odd places, we must go to find solitude!. A physician, I know told me that he did not mind one whit having an examination known as MRI, a diagnostic procedure most people hate because the patient is encased for so long, lying still in a machine that is almost tomblike. He said it was the only quiet time he had had in a long time, and he was able to think very well in the stillness and silence the examination provided
  • I realized how much more I was aware of my vices(envy, gloating, egotism) when I was alone. In the presence of others, it was possible to ignore them , or even deny their existence. In Solitude, they are there, omnipresent and bountiful , unable to be dispelled by the unknowing flattery of kindly others
  • I learned that there is softness about being alone in the country, even the frozen, snow-filled country. The city is a multitude of rigid right angles forced down upon each other. But the country, even in the dead of winter , is composed of the circles and arcs and ovals of blessedly unpopulated, almost empty space
  • People who cannot bear to be alone are the worst company
  • The less one talks, the more one thinks? Is thought internalized speech rising from and then directed back, soundlessly, in to oneself? Talk uses up ideas, although others have told me that they find it profitable to talk out their ideas and plans for a story because it clarifies their intentions. Not I. Once I have spoken them aloud, they are lost to me , dissipated in to the noisy air like smoke, Only if I bury them, like bulbs, in the rich soil of silence do they grow.
  • “We live, as we dream – Alone “ – Joseph Conrad

The book ends with the author, a novelist well past her age, finding her creative muse again, and getting back to writing non-fiction.

image image

A classic is a book that has never finished saying what it has to say –  Italo Calvino.

These words definitely apply to Feller’s books. Both the volumes by Feller on probability can be considered as classics. The first book deal with the discrete variables while second is far advanced as it deals with measure theory and continuous variables. Markov processes are one of the main sources for Martingales and I had to go back to Feller to work on Markov Processes and managed to go over the entire book instead of only looking up stuff on Markov Processes. Let me attempt to various probability topics that are covered in Volume I. This post is going to serve mainly as my reference to various ideas and thoughts that the book brings out. This is definitely one of the longest posts that I have written till date. It is a ~5500 word summary and hence this post will definitely have a list of some of the powerful ideas in probability.

The author starts off with mentioning three aspects of any theory, a) formal logical content, b) intuitive background, c) the applications. From what I could make out, I think he urges the reader to work on all the three areas for a good understanding of the theory. Formal logical content comprises the axioms, lemmas and theorems based on which the entire probability theory is built. The intuitive background, I guess refers to a constant re-evaluation of our beliefs. Applications, the third and I think the most important part for a guy like me whose sole purpose in slogging through this stuff is to build useful models.

So, in one sense this book needs to be read in such a way that one must check the progress in all the three directions at the end of every chapter by asking these questions to oneself

  1. What axiomatic theory has been developed in the chapter?
  2. What intuitive notions that the theory supports or refutes?
  3. Where can I apply this theory?

Unless you keep asking these questions at regular intervals, it is likely that you will have a limited understanding of the theory. A trivia from the introduction – “Sample space“ – term was coined by Von Mises.

Chapter 1: Sample Spaces

The chapter starts off with a basic description of sample spaces. As the book is all about discrete sample spaces, the number of events in the probability space is considered to be denumerable. The chapter provides umpteen number of examples of sample spaces and provides the necessary understanding to distinguish outcomes to events in a probability space. A fantastic point mentioned in this chapter is about “how probability and experience go hand in hand in selecting the models?” If there are r balls to be places in n slots in such a way that there are r1, r2, r3,….., rn balls in each of the slots then the probability of occurrence of this event is n! /( Π ri! * rn) . This is what is taught in elementary courses in probability. However who says this model is correct? Well there are models for which one cannot assign the above probability. In the case of Bose-Einstein Statistics experience leads to an assignment of 1/ n+r-1cr probability whereas Fermi-Dirac computes the probability as 1/n+r-1cr. It is impossible to select or justify the probability model by a priori reasoning. Thus there is an intricate interplay of theory and experience while assigning probabilities. There is always this argument you get to hear the finance models are so different from physics models and one must be cautious about it. Well, here is a argument which says physics and finance models do have something in common. Experience and Theory should go hand in hand while assigning probabilities in finance, much like assignment of probabilities in Bose-Einstein/Fermi-Dirac/Maxwell-Boltzman worlds.

Chapter 2: Elements of Combinatorial Analysis

This chapter deals with all the necessary tools to think about combinatorial problems. Tons of examples are given to make the reader understand sampling without replacement, sampling with replacement. The highlight of this chapter is it shows the way to crack occupancy problems. Based on your notions about the probability spaces, a partition of r balls in n slots can take various forms. Hyper geometric distribution is introduced and a nifty linkage is provided between the usage of the distribution to Maximum Likelihood estimate. A glimpse in to “waiting times” is provided using simple balls/urns example. This intuition / simple examples are needed to understand sophisticated concepts such as hitting times in martingales and ultimately develop a model for American option. Yes everything is connected in one way. If you want to evaluate an American option, hitting times become important. To understand hitting times via Martingales, you might have to understand hitting times in simple coin toss examples, which in turn means that you need to understand stuff using simple ball/urn models. It is often seen that students are interested in fancy models and no one is interested in spending time in understanding a few basics. Fancy models look good only in “talks” and not in “action”. Anyways coming back to the chapter, it concludes with a bunch of equalities involving binomial coefficients , stirlings approximation etc. This chapter is a reference chapter of sorts and one can come back to the content at frequent times during the book.

Chapter 3: Fluctuation in Coin Tossing and Random Walks

This chapter is a brilliant and an eye opening one; It can be read independently of the entire book. Many myths are shattered by looking at the world of coin tosses. It explores the concepts of random walks using simple coin tossing experiments. Sophisticated concepts can be extrapolated once random walks are seen as realization of infinite coin tosses. The chapter starts off with “Refection Principle” which finds its applications almost everywhere throughout the chapter. A few examples are given at the beginning of the chapter to provide motivation to any reader. Ballot Problem, Galton’s rank order test, Kolmogorov-Smirnov tests become analytically tractable by viewing it from coin toss perspective. The following questions are explored in this chapter:

  • What is the probability that no return to the origin occurs up to and including epoch 2n ?
  • What is the probability that the first return to the origin happens at epoch 2n ?
  • What is the probability that a random walk always stays to one side until epoch 2n ?
  • Probability that the last visit to the origin occurs at epoch 2k considering that the random walk is at epoch 2n now?
  • How does the above probability relate to arc sine distribution?
  • What is the probability that the random walk spends k % of the time of the total time ?
  • Surprising connection between Sojourn times and Last visit times
  • What is the probability that the first passage time through r occurs at epoch n?
  • What is the probability that the rth return to the origin occurs at epoch n ?
  • Connection between first passage time and return time
  • What is the probability that the maximum value of the random walk = r in its journey till epoch = 2n

Each of the above questions is solved in a detailed way so that user understands the logic + intuition + practical application of the theory behind these questions.

This chapter is a little overwhelming and needs a re-read atleast a few times to appreciate the results. The applications of concepts from this chapter are wide and far reaching. For example, the sojourn times and last visit to origin’s distribution – ArcSine law can be used in deciding the holding period in a pair trading model. Assuming the spread reverts to origin x number of times in a year, one can fit an arc sine distribution to decide the holding period of the pair.

Chapter 4: Combination of Events

If you consider a set of events A1, A2, …AN, How does one calculate the probability of the event that at least one of the events happen? How does one calculate the probability of the event that exactly k out of N events happen? Once the number of events grows beyond 2, one needs a systematic way to solve these problems. This chapter introduces an approach to calculate the probability that at least one of event happens in a set of events. This approach is then applied to “Matching Problem”, “Occupancy Problem”. Occupancy problem is described in detail and a connection is made to the fact that probabilities reach Poisson distribution asymptotically. This chapter is essential to compute the probability of events such as

  • Probability that atleast one of events happens from the event set (A1, A2, …AN)?
  • Probability that exactly k events happen from the event set (A1, A2, …AN)?
  • Probability that atleast k events happen from the event set (A1, A2, …AN)?

Even though closed form solutions are given for these kind of problems, I wonder whether one really applies these closed forms. For example, if you had 1000 slots and 5000 balls and you distribute these balls randomly in to 1000 slots, Let’s say you wanted to find the probability that exactly 3 boxes are empty ; Just put this command in a loop that goes over N times 

  • ( 1000 – length( unique( c ( sample( 1:1000, 5000, T ) ) ) ) == 2 )
  • Count the number of TRUE values for the condition in the loop, (count / N) gives the probability.

You don’t need to know any of the stuff mentioned in this chapter. However the reason you probably NEED to know the closed form probability is that it builds the intuition behind such stuff. Simulation gives you an estimate but it will not help you build a probabilistic mindset I guess.

Chapter 5: Conditional Probability and Stochastic Independence

Conditional probability is introduced for discrete variables. The real power of Conditional probability and Conditional expectation is usually seen in the continuous case. Dealing with discrete is easy and appeals to our intuitive notions. The highlight of the chapter is the description of urn models and its usage in the formulating conditional probability models. Polya’s urn scheme is discussed so as to model contagions. Independence is also explored where conditions are given in order for a group of variables to be independent. Product spaces are merely introduced as a thorough description would need measure theory. There are several starred sections in this chapter which can be skipped during the first read.

Chapter 6: The Binomial and the Poisson Distributions

The most popular distributions in the probability theory are introduced in this chapter, namely binomial and Poisson. As is well known that binomial distribution B(r;n,p) increases and then decreases as r increases. Nifty tail approximations are provided in the book that are useful for quick back of envelope calculations. Weak law of large numbers is introduced where convergence in probability is discussed. An easy derivation of weak law of large numbers is shown applying tail approximations of binomial distribution. Subsequently Poisson as a limiting distribution for binomial is introduced. A dozen examples or so are provided to see the relevance of Poisson process. The chapter ends with a brief description on multinomial distribution

Chapter 7: The Normal approximation to the Binomial distribution

For very many cases, we usually deal with large values of n and hence it makes sense to approximate the distribution with a normal or a Poisson distribution. Both versions of binomial distributions, symmetric & asymmetric are shown asymptotically to converge to normal distribution. However the approximation sucks as we move away from the mean of the binomial distribution, a subtle point often not paid attention to. Demoivre-Laplace theorem is then stated and proved which gives a convenient way to formulate probabilities for events that restrict the random variable realizations. One thing to be noticed is that binomial approximation to Poisson is better when lambda is small whereas it does not matter whether you approximate with Poisson or Normal if lambda is large and n is large.

Chapter 8: Unlimited Sequences of Bernoulli Trials

This chapter deals with Borel Cantelli Lemmas that are tremendously useful in formulating probabilities on an infinite probability space. One of the lemmas is subsequently used to prove strong law of large numbers, i.e the cases where the law does not hold good, form a negligible set. Law of iterated logarithm is also introduced in this chapter which I have skipped and would revisit at a later date. Author clearly suggests to skip certain sections of the book so that the reader does lose sight of the forest for too many trees.

Chapter 9: Random Variables; Expectation

Basic definition of random variable, density function, distribution function, expectation of a random variable is given. The chapter can be quickly covered as most of the concepts any reader would have come across in some elementary course. The author makes a few interesting points like

The reduction of probability theory to random variables is a short-cut to the use of analysis and simplifies the theory in many ways. However, it also has the drawback of obscuring the probability background. The notion of random variable easily remains vague as "something that takes on different values with different probabilities." But random variables are ordinary functions, and this notion is by no means peculiar to probability theory.

BASICALLY measure theory is a MUST to get clarity in your understanding.

Negative binomial distribution and its connection to exponential distribution is shown. Examples on waiting times that figure out in Negative binomial distributions can be very crucial in understanding stuff about Martingales. I had never come across Kolmogorov inequality till date and was realized that it is a scaled version of Chebyshev’s inequality.

One thing I consciously keep in mind while reading is that – “Is this content/chapter/theorem/lemma changing the way I think about stuff? Sometimes I forget this principle and I end up in a state of “Awareness”, instead of “Understanding”. Let me give an elementary example. If you were asked to compute the mean and variance of a hypergeometric distribution, then you can as well write its probability mass function and then religiously compute the mean and variance. However that is not the elegant way to do things once you see the covariance formula. Covariance formula gives you the power to break up a big problem in to series of small problems. Applying to this hyper geometric example, you can actually look at the distribution as sum of iids taking 1 or 0 based on the selection of defective ball. Once you zero on to the individual block, you can calculate mean and variance at an individual level, covariance between these individual elements and then just sum up the expectation and variance to get the metrics at a aggregate level. So, this is an example where the existence covariance formula changes the way you think /approach / solve problems. In this book, the author delights at so many places that I bet even a well trained probabilist will have aha moments from this book

Chapter 10: Law of Large numbers

Law of Large numbers and Central Limit theorem are stated at the very beginning of the chapter. Subsequently, the book makes an important point that there is a need to formulate an analogous form for variables which do not have finite expectation. It is a mistake to dismiss variables which do not have finite expectation from our study as they frequently occur in many stochastic processes. The Proof of Law of Large numbers is given by using “method of truncation”, a technique which is commonly used to prove limit theorems.

The chapter talks about “Petersburg game”, the puzzle which stumps any beginner who needs to deal with a situation where there is a need to formulate long term probabilities but the random variables have no finite expectation. Petersburg game is one such example where the random variable has infinite expectation but one can still one form some sort of bounds in order to compute the cost of entering in to the game.

Some nice examples are given to show the wide applicability of Central limit theorem. So, one should not have the notion that central limit theorem is a stricter condition and applies to a subset of the sequences for which weak law holds good. Sufficient conditions for Weak law and Central limit theorem are mentioned clearly and derived. In this context, Lindeberg theorem is mentioned as the sufficient condition for CLT to hold good. For Law of Large numbers to hold good the variables need to be uniformly bounded. The chapter shows that Strong law of large numbers deals with almost sure convergence whereas Weak law of large numbers deals with convergence in measure. It is put in words brilliantly by the author as follows:

“Weak law does not imply the average remains small as you increase the sample size.The value continues to fluctuate between finite and infinite limits. However one can conclude that large values occur at infrequent intervals.”

Strong law on the other hand talks about the entire tail of the sequence of random variables and hence is a stricter condition, meaning Convergence almost sure implies Convergence in measure

Chapter 11: Integral-Valued Variables. Generating Functions

Generating function is a very useful tool for solving a multitude of problems in the discrete variable space. Events for which probability computations appear very complex become tractable with the usage of generating function. What is a generating function in the first place? If you are given a set of real numbers, in this context a set of discrete probabilities, you can create a summary function for all the numbers in the form of a generating function. The name is apt as the function appears like generating all the probability values of a discrete variable. One of the powerful applications of generating function is to compute convolutions. If you have a convolution, i.e sum of certain number of random variables, the probability that the sum of random variables behave in a specific way is analytically intensive from first principles but becomes solvable using generating function. For example, if you toss n coins, what is the probability that no 3 heads appear together in all the trials? Attempt this problem from first principles and then attempt the same using generating function, you will clearly see the power of generating functions. In chapter III of the book, various interesting events such as first passage times, equilibrium times, return times to origin are derived using nifty tricks and observations that involves reformulating the problem so that the problem changes to a known form. However the same problems are solved using generating functions in this chapter and the advantage that one gets from using this tool is that you get to see not only the probabilities but also the various moments.

For example, the probability that in a simple random walk, the random walk always stays equal to or below 0 and at epoch n , reaches 1 can be solved using generating function. The resulting generating function gives much more information of the entire process and helps one get to a ton of results at once. Bivariate generating functions are also touched upon where a brief introduction is given to compound Poisson distribution. The applications of compound Poisson distribution is immense in field of quant finance. Think about it, the bid and ask processes are generally modelled using Poisson distribution and there is a need to understand the generating function of the combined bid and ask process. I have never built a model till date on bid and ask processes but certainly hope to do so in my working life. The chapter ends with a discussion on “Continuity theorem”, which says generating functions convergence is necessary and sufficient condition for a distribution to converge to the limiting distributions. For example when binomial generating function converges to Poisson generating function, then it means that in the limiting case, binomial converges to Poisson. One way to visualize generating function is that it is the DNA of discrete variable. You can study all the properties of the creature using this basic DNA. You can extend the analogy to characteristic functions for more generic random variables.

Chapter 12: Compound Distributions, Branching Processes

One often deals with a sum of random variables in real life applied math problems. The previous chapter dealt with the power of using generating functions to simplify things; a generating function of a convolution can easily be written as product of generating functions of the involved variables. Most often we come across convolutions where the number of independent variables involved in the convolution is a Poisson variable and hence the resulting distribution is called a compound Poisson distribution. A few examples are given relating to compound Poisson distribution so that a reader has a clear idea of the place where such distributions arise. The chapter then moves on to discussing branching processes.

I was little skeptical to go over this stuff mainly because my naive brain said that these processes would not be used as such in math finance. However a quick perusal of math fin literature showed that there was a serious paper on pricing barrier options using branching process. Instead of using standard GBM for the price path evolution, the authors have used a branching process to evaluate a barrier option. Motivated by seeing such an application, I began understanding various aspects of branching process. Frankly till this date, I had always though people use GBM for prices primarily and never thought there could be something else to model the price evolution. I hope someday in the future I will check the validity of using a branching process for price evolution. In theory, the problem that branching process deals is “Given an ancestor which spawns children with a certain probability distribution, what is the probability of certain event after n generations”? The math is little tricky here as , one does not compute a generating function per se but one needs to come up with a recurrence relation which is then used to find out the probability and expectation of certain events. The chapter ends with a discussion of a detailed example of total progeny in a branching process. This chapter makes one realize the immense power of generating functions in probability models.

Chapter 13: Recurrent Events, Renewal Theory

This chapter serves as a superb introduction to Renewal theory which is crucial to OR Models. From a Quant finance perspective, I came across an order book model where Renewal processes where used. There is also a usage of Renewal processes in high frequency trading. So, this chapter has important implications for those looking to develop quant fin models on high frequency data.

The chapter introduces about 9 examples of recurrent events.

  1. Return to equilibrium in a sequence of Bernoulli trials and event is “Accumulated number of successes and failures are equal”
  2. Return to equilibrium through negative values
  3. Accumulated number of successes equals K times the accumulated number of failures
  4. Partial sums exceed all the preceding sums
  5. Success runs in Bernoulli trials
  6. Continuation – Repeated patterns
  7. Geiger counters
  8. Simple Queuing process where the event that “server is free “constitutes a recurrent pattern.
  9. Servicing of machines

In all the above examples, the inter arrival times are iid. They need not be exponential as in the case of Poisson distribution. This is what differentiates Renewal theory from the usual Poisson processes. If the inter-arrival is geometric distribution then the recurrent events/ renewal process is Bernoulli.

Some basic terminology of event types is introduced such as persistent event, transient event, and periodicity of the event. The crucial link between inter-arrival distribution and the waiting time is summarized by generating function, which is then used to arrive at waiting distribution under some of the above recurrent events mentioned. Elementary Renewal theorem is derived; Central limit version for Renewal process is derived. This chapter took me the longest to understand and I have left about 3 sections to revisit at a later date. It is too overwhelming for me on the first read.

Chapter 14: Random Walks and Ruin Problems

This chapter gives a firsthand introduction to stochastic processes. Coin tossing examples are extremely useful to get an intuitive sense of stochastic processes. The chapter begins with the classic ruin problem where difference equations are used to answer the following questions

  • What is the probability that a gambler gets ruined?
  • What is the probability of a ruin if the casino has infinite amount of money?
  • What is the expected duration of gambler’s ruin?
  • What is the expected duration of gambler’s ruin if casino has infinite capital?

The chapter is fairly advanced for the first time reader. Generating function approach is used to compute duration and first passage times. The highlight of this chapter is where the author makes the connection between coin tosses and diffusion processes. By imposing some conditions on the size of the simple random walk and time duration of the random walk, one gets a good intuition about general stochastic processes. By providing two approaches, one by explicit probability density computations and second by differential equation approach, the author shows the 2 basic approaches that are used to formulate various stochastic processes.

Chapter 15: Markov Chains

Markov chains are extremely important in developing analytical techniques for solving classical probability problems. Before encountering Markov chains, I had always thought that the only way to approach problems like the following, was to set up recurrence relations and solving them.

  • In a simple random walk, what is the probability that the random walk is between a and b?
  • What if the absorbing barrier at a is made in to a reflecting barrier?
  • What if the absorbing barrier at b is made in to a reflecting barrier?
  • What is the mean first passage time of a particular coordinate in the simple random walk?
  • What is the mean recurrence time of a particular coordinate in a simple random walk?

This was slightly posing a resistance in my mind as I wanted to computationally solve such problems. I was oblivious to Markov chains which were developed almost 100 years ago and that were used to solve precisely the problems such as above, using matrices. As I spend more time doing math, I am realizing that any problem especially which involves lot of repeated calculations, one should use matrices in some way or the other.

Coming back to the chapter, it starts off by giving idea of a Markov chain using examples like random walk, urn models. It makes it clear that Markov chain by definition generalizes the outcomes to be dependent on the previous outcome, thus it is defined in terms of conditional probabilities. Since one is dealing with a number of possible states, one needs a matrix to represent these conditional probability movements. This kind of matrix is called transition probability matrix. A dozen examples are given with their transition matrices so that a reader gets an idea of the whole stuff. At this stage I was left wondering the reason for not thinking/using Markov chains in my work. I had vaguely heard of Monte Carlo using Markov chain but had never ventured in to understanding it. Last night I was hearing a introductory lecture on Markov Chain and one of the Stanford professors says that “Where do I use MCMC methods is like asking an undergrad, where do you use quadratic equation?” I felt ashamed of the fact that I had never worked on MCMC till date. Have made a resolution that I will learn this technique and see where all its usage makes a tremendous sense in math finance.

The chapter introduces Chapman-Kolmogorov equations which talk about the transition probabilities from state i to state j after n trials. Subsequently concepts such as Closures and Closed Sets are introduced. The terms Closed Sets and Closures might seem abstract but they are extremely useful in breaking up the transition matrix in to stochastic independent matrices which can then be analyzed. This logically leads to the definition of irreducible chain where every state can be reached from every other state. One must be careful to note that an irreducible chain need not be a regular chain , but every regular chain is an irreducible chain.

After the first 4 sections, the chapter then moves in to classifying states. The terms that one must be clear while classification are

  • Persistent – means that the unconditional probability that the path moves from state j to state j is 1.
  • Transient – means that the unconditional probability that the path moves from state j to state j is <1.
  • Periodic – States are visited periodically
  • Aperiodic – There is no sense of periodicity in the way states are visited.
  • Null – a persistent state where mean recurrence time is infinite.
  • Not Null – a persistent state where mean recurrence time is finite.

The above definitions are useful in classifying states. So, one can classify the Markov chain in to the nature of states. Typically one is interested in “aperiodic-persistent -Not Null “states. These are called Ergodic states.

The chapter subsequently talks about invariant distributions, a key concept that will be used in Markov Chain Monte Carlo. In an ergodic markov chain , if one looks at the probability of reaching a specific state from any of the initial starting states, it converges to a fixed number. This means that after n iterations, you will end up with a transition matrix that is a fixed row matrix. This means that it does not matter what initial distribution you start from. You end up with a constant probability in a specific state. Now one can flip the question – What is the condition that the fixed weight vector that you end up is the same as the initial distribution? The answer is not easy, given any transition matrix. However there are certain types of transition matrices for which one can work out. A number of examples that you come across have transition matrices that are band matrices with values only on the upper diagonal and lower diagonal. For such type of matrices, the book gives a condition for the presence of an invariant distribution. This condition is then applied to a variety of situations, post which, one can formulate an opinion about the stability of the transition matrix. The examples where the invariant distribution is checked are

  • Reflecting barriers
  • Ehrenfest model for diffusion
  • Bernoulli- Laplace model for diffusion
  • Recurrent events

Once you read this section, you wonder how easy computations become once you formulate problems using Markov chain. Choosing the states of the Markov chain is not always obvious. Look at this card shuffle example where choosing states in a smart way reduces the computational complexity of the problem.

After reading this chapter, you will understand the difference between Markov chain and stochastic process. In a Markov Chain, all that is needed to understand the conditional probability for a state n is the previous state. In a stochastic process, it is not just the previous state but the entire history of the process will play a role in the prediction of future state. Markov property is related to conditional probability whereas Martingale is related to Conditional expectation. So, there is a lot of difference between the two mathematical objects. A Markov process need not be a martingale, though Markov processes are a good source for martingales. Similarly not every martingale is a Markov process.

Chapter 16: Algebraic Treatment of Markov Chains

I skipped this chapter and instead preferred Matrix approach presented in “Finite Markov Chains”.

Chapter 17: Simplest Time Dependent Stochastic Process

This chapter uses the concepts of conditional probability to set recurrence relations which in turn could be looked as a PDE. You get to see the construction of simple time dependent stochastic processes like Poisson, Birth process, Birth-Death Process etc. The last chapter gives a flavour of the stochastic process from a discrete world view.

image Takeaway:

The author’s enthusiasm for the subject is infectious, as evident from the way the book is written. The book has outstanding breadth and depth. Any seasoned probabalist would still find something new in this book, even on an nth reading. Every year there are professors/ practitioners/ instructors/ modelers referring to / debating on /agreeing with /pondering over various aspects mentioned in this book. In that sense, the content of the book is timeless.


The story is about an old professor and a special bond that develops between the professor and his housekeeper. The professor has a peculiar problem that he cannot remember anything beyond 80 minutes. His memory loss caused by a certain accident in his middle age leads to a strange life that he leads. Since he cannot remember anything beyond 80 minutes, he is forced to write short notes and pin it up on his suit so that he can glance at them everyday and remember stuff. His sister-in-law generously takes care of him and provides him with basic amenities like food , shelter and a study room. The professor despite his old age continues doing math. To take care of the cooking,cleaning and other household activities, his sister-in-law appoints a housekeeper.

Since the professor has no memory beyond the last 80 minutes, the housekeeper has to introduce herself everyday to the professor to gain entry in to the house. The professor has a peculiar way of letting her in to the house. He would ask a few questions whose answers are numbers like her shoe-size, her date of birth and allows her to enter the house. The housekeeper brings along her son to the work, at Professor’s request . The Professor soon develops a liking toward the kid and slowly starts taking time to teach him some basic math. He fondly calls him “root” , because of his rather flat head that resembles a square root sign.

Math becomes the focus of most of the conversations in the house. The housekeeper is sucked in to the infectious enthusiasm that the professor displays. Even though she is supposed to merely take care of household chores, she tries to strike conversation with the professor whenever possible and in the process learns about a whole lot of math stuff like perfect numbers, amicable numbers, prime numbers, fascinating square root sign, 0 (queen of numbers), triangular numbers, Euler’s equation, the way all the primes can be divided exactly in to two categories etc. Professor’s liking towards the kid makes him spend more time with him teaching math in such a way that it brings an element of delight and surprise even in some mundane math concepts.

The book ends with Professor’s death and the Housekeeper’s son becoming a math faculty at a local college. Well, as the story has a bit of melodrama in it, it is no wonder that this book was made in to Japanese film titled “The Professor and His Beloved Equation”.


Via Wikipedia :

The Professor’s Beloved Equation is a Japanese film released January 21, 2006 and directed by Takashi Koizumi. It is based on the novel The Housekeeper and the Professor.

In contrast to the original work, which is told from the perspective of the narrator, the film is shown from the perspective of a 29 year old root as he recounts his memories of the professor to a group of new pupils. Though there are a few differences between the film and the original work (for example, the movie touches on the relationship between the professor and the widow while the book does not give much detail), the film is generally faithful to the original.