April 2011

image I have found this book very challenging to go over a few years ago. This was one subject that I found it very difficult to understand. The main reason being I was never exposed to any real analysis course back then. Needless to say, my understanding was shallow. The offshoot of this limited understanding has come to plague me now when my task is to develop a model that is completely based on general measures. The model that I am trying to customize is heavily based on general measures. The model’s creator has given a few guidelines and that’s about it. The guidelines point to heavy usage of martingale theory. I realized that my understanding of general measures and martingales was pathetic and I decided to go over this book, being well aware of the fact that the book would be very challenging.

Reading this book after 3.5 years, I realized that this book is amazingly useful for a persistent reader. However unlike the last time, I was prepared with the prerequisites needed to go over this book. This book has a tall order of pre-requisites something the title fails to convey. A reader must have good knowledge about Analysis on Real Line, Finite Dimensional Vector Spaces, Metric Spaces and at least 10,000 ft view of Functional Analysis. Summarizing a math book is very tricky and usually no summary can do justice to the contents of the book. My intent of writing this summary is to highlight an important aspect of the book, i.e, the way probability theory and measure theory concepts are explained in parallel. The intent behind theorems is made crystal clear by providing the relevant applications in probability theory and math finance. As you go over Lebesgue theory, you will be able to connect the various probability concepts that are related to Lebesgue theory. In that way, you will not find the book dry at all. Let me try to summarize the main chapters of the book from this point of view. 

Chapter 1 : Motivation and Preliminaries

As the title of the chapter suggests, it contains some necessary real analysis concepts to follow the content of the book. It starts off with asking a simple question “What is the probability of choosing a random number between (0,1)?”. For a reader who is exposed to classic probability stuff where he/she deals with finite outcomes, the way to answer such a question poses a challenge. How to compute probability when there are infinite outcomes? Such questions can be only answered with the help of a mathematical framework. The chapter introduces some set theory notation and topological concepts. It then moves on providing arguments for an alternate integral rather than Riemann integral. The classic Dirichlet example is used to show the reader that there are integrals which are not Riemann integrable. Infact there are tons of issues with Riemann integration. A reader would be well advised to read other books before reading this concise Riemann bashing in the book. The authors state that the focus of the book is to show the essential role of Lebesgue measure in the description of probabilistic phenomena based on infinite sample spaces.

Chapter 2 : Measure

Probability is basically a type of measure on a probability space. In order to understand probability spaces, one needs to have a thorough knowledge of sigma fields and lebesgue measure. Probability measure is a countably additive measure on the sigma algebra of the outcome space. This chapter does a superb job of making the reader understand where all the pieces fit like outer measure, lebesgue measure, Lebesgue measurable field, Borel sigma algebra etc. The chapter starts off with defining null sets as they are key to understanding measure of sets like rationals,finite sets, cantor sets. All these sets have lebesgue measure 0. Outer measure is defined and its properties are discussed. The typical problem with outer measure is explained, i.e measure is countably subadditive. Subsequently, Lebesgue measure is introduced precisely to solve this issue. By selecting a subset of the power set of R(Lebesgue measurable sets), countably additivity is achieved. Topological properties of Lebesgue measurable sets are discussed so that one can use appropriate definitions and concepts from topology to prove and derive new theorems. Borel Sets and Borel Sigma algebras are then introduced so that one can conveniently talk about family of Lebesgue measurable sets which have the necessary properties. It is also shown that the Lebesgue sigma algebra has more elements than Borel Sigma algebra, though for probability applications, one can conveniently ignore the differences and work with Borel Sigma algebra.

Chapter 3 : Measurable Functions

Random variables , a 101 term in probability theory, is actually a misnomer. It is not a random variable but a function. To understand the function and properties of the function, this chapter talks about measurable functions. The beauty of these functions is that limiting operations on a sequence of measurable functions preserve measurability. One often comes across “Sigma field generated by X” in most of the theorems. This chapter makes the term precise by giving a definition for the same. Intuitively the sigma field generated by X is coarser than the sigma field of the measurable space. Once you understand that a probability distributions is nothing but a set function on a coarser sigma algebra you can organize your thoughts the following way:

  • You have Omega which is the set of all out comes
  • You define a measure on a sigma algebra generated by Intervals ( Borel Sigma Algebra)
  • The measure is defined in such a way that the measure is countable additive on Borel Sigma Algebra
  • This measure defined in such a manner is Lebesgue measure
  • Each subset of Sigma algebra is an event
  • You define random variable on Sample space
  • So, obviously the sigma algebra of this random variable is coarse
  • The probability measure on this coarser sigma algebra can be summarized with the help of distribution function / discrete measures / absolutely continuous measures / mix of discrete and absolutely continuous measures.

Chapter 4 :  Lebesgue Integral

“What has probability distribution got to do with integration”? , is a beginner’s question as it opens up the whole Lebesgue integral theory. An integral is basically an area under the curve of a function, at least that’s the intuitive notion. Now one can find measures on probability space such that integral with respect to a measurable function f, over the measurable space is 1. Such measures are called absolutely continuous measures. From a computational angle, one always writes an integral for continuous variable for calculating moments, probabilities etc. But behind the integral lies the entire theory of Lebesgue measurable functions. A non negative Lebesgue measurable function serves as the density function for an absolutely continuous variable. For a very loooooong time I thought there were only discrete / continuous/ mix of discrete and continuous variables. However after going over Cantor set and its mysterious properties, I realized that it is very much possible to construct measures which do not belong to { Discrete, Absolutely continuous, mix of Discrete and Absolutely Continuous measures}. They go by the name singular measures. Anyways, this chapter does not discuss such measures. The relevant background to understanding concepts like density function, distribution function, expectation and characteristic function of a random variable is provided in this chapter. They are

  • Definition of Lebesgue integral via Simple functions
  • Properties of Lebesgue integral
  • Convergence Theorems like Monotone Convergence theorem and Dominated Convergence theorem
  • Relation between Riemann Integral and Lebesgue integral .

Chapter 5 : Spaces of Integrable Functions

Moments for Random variables are usually introduced in a computational way. If it is discrete variable or a continuous variable, then there are usually formulae that are introduced in any undergraduate course. However to marry with the concept of “X is a measurable function”, one needs to know functional analysis. Any concept relating to Moment calculations or Characteristic function computations is related deeply to functional analysis. This chapter introduces the concepts of metric space, vector space and inner product spaces. The concept of Lebesgue measurable function is expanded to spaces where each individual function is a point. Vector spaces with Lebesgue measurable functions are introduced with the norm defined. Subsequently Hilbert Spaces are introduced where each point is a Lebesgue measurable function and the functions have their L2 norm finite. Hilbert Space is important as one can define an inner product space and can make use of orthogonality aspects of a vector elements. The inner product induced norm gives more flexibility in working with orthogonal variables and Fourier functions. Even though Fourier series are not talked about in the book, Hilbert Spaces are extremely useful in relation to Fourier series. The other interesting aspect of Lp spaces is that they are complete, meaning every Cauchy sequence of functions converges to a point in the space.

One of the highlights of this chapter is the construction of Conditional Expectation where Hilbert Spaces are used to give a preliminary introduction. Conditional expectation is a very interesting concept and opens up a completely different way to look at random variables. Conditional expectation deals with understanding a random variable under a sub sigma-algebra. For a discrete variable E(X/Y) calculation is easy. For a continuous variable, it is easy if the conditional density is given. This is the way it is presented in undergrad books where all the discussion about sub sigma algebra is avoided. However if Y is a general random variable, then Expectation of X given Y has no formula. It needs to be implicitly calculated. Existence of such a variable is obtained from a deep theorem in probability Radon-Nikodym theorem. This theorem is covered in a separate chapter of the book, a chapter which I found to be the most difficult to follow.

Chapter 6 : Product Measure

The natural transition from a single dimensional probability space to multidimensional probability space is shown in this chapter. For simplicity sake, if there are two sigma fields, one must work on product sigma field to make sense of random variables defined on both spaces. Lets say X is defined on Sigma algebra F1 and Y is defined on Sigma Algebra F2, then a variable like Joint distribution would be defined on the product sigma algebra. For this new space, obviously one needs a measure. The notion of length in a one dimensional space is extended to a set of intervals on a plane to define the measure. The chapter gives a very good construction to the product measure by introducing projections. It culminates in Fubini Theorem which gives the condition for swapping integrals for a multiple integral case. All these fundas are extremely useful in describing Joint Distributions, Convolutions, Conditional density and Characteristic functions. There is a long proof to show that Characteristic functions determine distributions for random variables.

Chapter 7 : Radon Nikodym Theorem

I think this is the toughest of all chapter as there are far too many concepts that are introduced. At times I felt overwhelming. It starts off by introducing a property of measures called “absolutely continous”. Measure m1 is said to be absolutely continous with respect to measure m2 if they agree on Null sets. So, why is it relevant to explore this property ? The reason behind this question is “Radon Nikodym theorem”. The relation between the measure m1 and m2 where m1 is absolutely continous with respect to m2 is given by Radon Nikodym derivative. If I were to describe naively, it would be like this : Lets say you have a space Omega1 equipped with m as measure. Now if one were to look at another measure m1 which is absolutely continous with m, then all the computations with respect to m1 can automatically be written as an equivalent problem with respect to measure m. (Option pricing math thrives on the existence of Radon Nikodym derivative). The section ends with Lebesgue decomposition theorem which states that a sigma finite measure m2 can be expressed as a combination of two measures , one being absolutely continous with m1 and other being singular / perpendicular to m1. This smells like the case of a vector in finite dimensional space which can be decomposed as a vector in a projected space and vector perpendicular to the projected case. In this case, instead of vectors, we are dealing with measures.

The chapter then moves on Lebesgue-Stieltjes measure. Well, frankly I did not understand this section and had to go over a book on Lebesgue – Stieltjes to make sense of this section. The effort of going over a separate book and then coming back to this section proved useful in understanding this section. It took me sometime to understand this section and I think I kind of got the crux of it, thought I intend to read it a few more times later. The crux of Lebesgue Stieltjes measure is this : If you want to weight the interval over which a function is integrated in a different way, lets say g(x), then one needs Lebesgue Stieljes integral and this integral induces a sigma finite measure called Distribution measure, which is a function. Now given this distribution function F, the key questions to answer are:

  • Does this induce a measure ?
  • Does this induce a density function ?

Both are very important questions. It is shown methodically that F does induce a measure mF

However there are additional restrictions for F in order to induce a density. The condition is that F should be absolutely continuous. Only when F is absolutely continous , one can talk about density of such a Distribution function. What’s the connection between this density and the lebesgue measure ? It’s the Radon Nikodym derivative which again comes to rescue and clearly fits in the linkage. There are other measure decompositions that are discussed like Hahn-Jordan decomposition. An elegant description of these decompositions can be found at A Mind for Madness. Connecting these concepts to probability applications, the chapter explores the properties of Conditional Expectation. It then discusses Martingales, which are the most important mathematical objects as far as math-fin is concerned. Typically a stats student spends his time looking at random variables, distributions, limit theorems like Weak Law, Strong Law, Central Limit Theorems etc. However this exposure is not enough for understanding random processes. Like one tries to understand iids, one must spend time in understanding Martingale processes. One slowly begins to realize that Martingales have their own set of rules, set of limiting theorems, set of inequalities etc. So even though this book covers martingales in a few pages, one must not mistake that that is all there is to it. This chapter requires multiple readings as it has too many concepts to be digested at one go. I am certain to revisit this chapter over the course of my working life.

Chapter 8 : Limit Laws

This is the most useful chapter of the book from stats perspective. To begin with, the chapter discusses various modes of convergence for random variables. Uniform, pointwise, almost everywhere, Lp norm and convergence in probability. One needs to have a clear picture of the relationships between these modes of convergence. The chapter then introduces weak law of large numbers and strong law of large numbers where the former talks about convergence in probability and the latter talks about almost sure convergence.

Convergence in distribution is a very different animal as compared to other modes of convergence. That’s the reason it is called weak convergence. Irrespective of the individual distribution of random variables, the cumulative distribution function can converge to a specific probability distribution and hence the name, convergence in distribution. Central Limit Theorem is one of the theorems introduced in various elementary statistics course / business statistics courses in MBA etc. One usually comes across the statement and wonders why it is true. Why should let’s say a specific form of sum of random variables converge to standard normal distribution? The question might seem appropriate but to understand the rationale behind it requires a good understanding of limit laws. In that sense this chapter is very unique. However the theorems themselves are very long winding. I have gone through them but I will certainly go over them again on some long rainy day.

image Takeaway:

Authors build up measure theory ground up and provide motivation for the reader to understand the concepts by shows applications in math finance area. In every chapter, whatever concepts are introduced, an immediate application in math-fin area is provided, thus making the seemingly dry measure theory, an interesting read.


The book is a memoir from Namita Devidayal. Belonging to a cosmopolitan family in Mumbai,  Namita narrates a parallel universe that she lives in, i.e the world of Hindustani Classical, which is vastly different from her home environment. This parallel universe is given a metaphorical name,”The Music Room”. Namita is dragged in to the world of music at a young age by her mother to improve her marriage prospects. Namita reluctantly starts taking music lessons from Dhondutai Kulkarni, a musician from Jaipur gharana. This book’s central character is Dhondutai(Yes, the name sounds little odd. Infact the book somewhere mentions that Dhondutai was the only surviving child of a family from Kolhapur and the elders in the family decided to name the child , Dhondu, literally meaning “Stone” in order to cast off evil spirits.)

Namita slowly realizes that her teacher Dhondutai is also living in a parallel universe, very different from the neighbourhood, Kennedy Bridge, which is a shabby locality in Bombay. Dhondutai’s life revolves around music and gods. She remains unmarried throughout her life and is totally devoted to music. Somewhere in the book she also gives a reason for deciding not to marry, “You cannot have two master ; It has to be music or man; Both demand too much from one.” She sees a very close connection between the music and spirituality, weaves her music around gods. Her attitude towards music also makes Namita extremely curious about the world of music. Dhondutai gradually teaches Namita all aspects of music, be it the philosophical, historical, notational, style element etc.

Dhondutai believed that “Once, you forget yourself and the world around you, once you dismiss all the rewards and recognition you could be getting for your art, and sing only as a form of meditation, your music will break free. You also begin to know things that other people don’t know. Truths reveal themselves to you. It comes from living in solitude and meditating only on music.” Her belief made a lot of impact on young Namita.

Dhondutai’s technique of teaching swara and raga was very different from the usual teachers which Namita had learnt earlier from. According to Dhondutai, “ To understand and perform the raga in its true sense requires a life-long meditation on the notes – and on yourself. Merely mastering notes is not enough. You have to reflect on the human condition, on life itself. Every time I sing a raga, it unfolds and expands, revealing new insights and pathways. That’s why they say that a musician really becomes a musician at the end of his life. It is only once you use the notes to tell a greater story that you are floating in a bottomless ocean.Dhondutai started teaching Namita two integral aspects of Jaipur Gharana , “how to throw voice ?” and “breath control ?”.

Through a series of anecdotes, the book captures the guru-shishya tradition that is essential to Indian classical music. One can’t learn music from books. There has to be a teacher who guides you based on how you sing / play an instrument. I can relate this to Sitar as in you can play the note dha in let’s say Rag Yaman , either on the fret , or you can use a meend , i.e you pull the string at pa to produce dha or you can put an extra effort to pull the string from ma to produce a dha. So, there are tons of ways to make an instrument say dha. But only in the presence of a guru, will you know what to play in what context. Sometimes a flat note sounds better and sometimes mainly in aalaap , meend is the preferred technique. While playing a taan, either of the techniques could be used. So, one needs a guru to actually make you perceive the difference. I think the book says it better

Teaching is an important process as performing, for this is what takes this music in to posterity. The books cannot tell you which raga to start with and how to keep time, why a particular taan is not sounding quite right. There are secrets only a guru can give you selectively, gradually and when the student is ready to receive them

Interwoven in the conversations between Dhondutai and Namita, the book brings out historical aspects of music. It traces three important personalities of Hindustani Music, from whom Dhondutai learnt music from, Ustad Alladiya Khan, Bhurji Khan, Kesarbai Kerkar. 

  Dhondutai  Kulkarni
image image image

Ustad Alladiya Khan

Bhurji Khan

Kesarbai Kerkar

Besides tracing the lives of Alladiya Khan and Kesarbai Kerkar, Dhondutai narrates her own learning process, like the instance when Alladiya Khan asks Dhondutai to sing with out Harmonium.

Human nature is such that it always seeks easy options.The Harmonium makes singing easy because you don’t have to work on making independent connections with the notes. By playing it when you sing you will never hit the right note with exactitude because you will have this backup.

The setting of the story is in Mumbai and there are snapshots of various facets of Mumbai life that are captured throughout the book. Life in a local train, the bonds that are formed between passengers/strangers during train rides, the travails of getting in to a local train, the strong Hindutva culture one sees around , cultural clubs etc are peppered throughout the book and make it interesting read.

There is a paragraph in the book that I particularly liked where Namita describes her feeling of going to her music class at Borivali.

Whenever I went back to Dhondutai, I felt I was entering a space that was timeless. Nothing changed. The same black-and-white television set layered with dust; the plastic milk bottle with a spray of fake flowers that I had gotten her years ago; the pictures on the wall of Kesarbai, her parents, Ganesha and Khansahibs; ad her other everlasting companions, the tanpuras. In deference to modern life, she had acquired a refrigerator at some point but even that remained mostly bare.

It was utterly reassuring-like going back to your childhood room many years later, and finding your teddy bear perched exactly where you left it, with its left eye still hanging loose. For me, Dhondutai was like that stuffed toy, unconditionally affectionate and always around. I went back to her and was at peace. We would tune the tanpura and pick up where we had left off

Makes me feel that , may be,  we all need to have our own version of “ The Music Room”, in order to bring semblance and serenity in our ever uncertain lives.


Firstly, something on the pronunciation –:).Lebesgue Stieltjes is pronounced as Le-BECK Steel-ye. The former being a French mathematician and the latter being a Dutch mathematician and the integral being named after their outstanding contribution to the field of analysis. I came across Stieltjes integral for the first time in Marek Capinski’s book on Probability. It was introduced in relation to measure decomposition theorems. The treatment in Marek Capinski’s book is very concise and hence I really did not understand the significance of Lebesgue Stieltjes integral. Also I came across this reference to this integral while I was studying Ito’s integral a few years back. Back then, I had no clue about Lebesgue Stieltjes integral and had never bothered to check its limitations in describing Brownian Motion. All I cared that it was somehow useless for describing Brownian motion. My limited understanding became a handicap in understanding  measure decompositions. So, finally I had to get out of limbo and slog through to understand the integral. I picked up this book which looked like an accessible introduction to the integral, which in the hindsight appears the perfect choice. Let me try to summarize the contents of the book.

The first three chapters of the book cover the prerequisites needed to understand the Lebesgue-Stieltjes integral. Basic terms like supremum, infimum, Cardinality, Topology of R are covered in the Chapter 1. Chapter 2 talks about monotone sequences, monotone functions and explores their properties. It subsequently introduces the concept of bounded variation and absolute continuity. Absolute continuous functions are a subset of continuous functions. A few theorems are stated relating to functions which have bounded variation. Chapter 3 gives a basic introduction to Riemann integral and cites the Dirichlet function as an example where Riemann integral breaks down. Not only is the function not Riemann integrable, it reveals a bigger problem with Riemann integral. If we consider a sequence of functions that are Riemann integrable, then we might not be certain that the limiting function is Riemann integrable.

Chapter 4 is the meat of the book. The approach taken by this book is the step function approach. Instead of taking measure theory approach, the book takes a step function approach. As is found in most books which avoid measure theory, the step function approach is an easier way to understanding Lebesgue Stieltjes and Lebesgue integral. The following term are introduced :


Subsequently Lebesgue-Stieltjes integral is defined as follows. Let me try to put in simple words( which is always a challenge than writing an equation). One starts off with sequence of step functions, applies certain criteria like finite measure and then creates a sequence of these functions called alpha summable. The key idea is this :

If you have a function and you want to integrate the function over Interval I with respect to alpha measure, You create a set of all possible alpha summable functions and in each case the alpha summable function is always greater than the function that we are interested in integrating. So, instead of integrating the complicated function , we are integrating all possible alphas summable functions over the same interval. Obviously the latter is easier to do as they are all internally a combination of step functions. Once you find the integral of all these alpha summable functions, you take the minimum of those values, or to use the right word, infimum of all those values, what you get is the Lebesgue-Stieltjes integral with respect to alpha measure. (That was mouthful!) . To see it with math symbols( which is so elegant) it is as follows :


The chapter ends with a discussion on the differences between Riemann and Lebesgue integral. Basically the thing to be understood is that Lebesgue integral of a function is computed by a set of approximating functions which are much more generic than the step functions used in the Riemann sense. This coupled with the fact that the measure itself is generic makes Lebesgue integral applicable to a mind boggling set of functions. However as the author rightly points, Lebesgue integral’s true power is visible in Convergence theorems which help in computing integrals of very complicated functions over complicated sets –:)( For example Dirichlet function). There is also a nice example of a function which is Riemann integrable and not Lebesgue integrable. One key requirement for any function to be Lebesgue integrable is that the integral must be absolutely convergent.

Chapter 5 deals with the properties of Lebesgue-Stieltjes integral. Integrability of f+, f, |f| are explored. The section on null sets and null functions is very tedious. For someone who is exposed to measure theory where sets with measure 0 and null sets are explored at the very beginning, the treatment in this book looks very circuitous. All the proofs in this section of the chapter can be massively simplified by using set theory concepts and using measure theory. So, even though step functions can be used to interpret lebesgue integrals easily, there is a flip side to it. Things like null sets becomes very challenging to understand. The chapter then states the important convergence theorems like Monotone Convergence Theorem, Fatou’s Lemma, Dominated Convergence Theorem, Beppo-Levi’s theorem. All of these are merely stated and no proof is given. The reader should appreciate that these theorems are the main reason that Lebesgue theory became massively popular. Dominated Convergence theorem is a very broad sufficient condition ( Note: It is not a necessary condition). So, these theorems can be used to approximate complicated integrals by sequence of functions and then integral and limit can be swapped to compute the value of the integral. I found the last section of this chapter to be very interesting. It talks about extending theory in two directions 1) by allowing integration over sets than intervals 2) allowing the alpha measure to be a function of bounded variation. The second direction is about splitting a alpha measure function with bounded variation in to two monotonic functions. This discussion is closely related to measure decomposition. A related topic is Jordan decomposition. I came across a nice way of describing measure decomposition in a blog “A Mind for Madness”.Its worth a read

Chapter 6 is useful from a computational point of view. A laundry list of theorems is stated that are useful for evaluating Lebesgue-Stieltjes integral. The chapter also explores change of variable and integration by parts in L-S world.A few examples are cited where to evaluate an integral, it is better to differentiate the entire equation and then apply integration on the differential. Most of them are computational tricks rather than any conceptual principles. Chapter 7 extends the Lebesgue Stieltjes integral to multidimensional case. While dealing with double integral , the order of integration does matter, meaning it is possible that if you integrate first with respect to x and then with respect to y, you get a totally different result than if you had integrated first with respect to y and then x. The condition under which the order of integration does not matter is when the function involved in the double integral sign is absolutely convergent with respect to x and y. Fubini’s theorem is just this. It specifies conditions under which the order of integration does not matter. In terms of the usual measure theory perspective , Fubini’s theorem is nothing but a condition on the integrabilty of projection  function on the various measures involved. If these projections are measurable functions, they Fubini’s theorem says that order doesn’t matter.

Chapter 8 offers a crash course in functional analysis. Beyond a certain point, one would like some structure in various Lebesgue measurable integrals. The chapter gives some basic definitions of vector spaces, normed vector spaces, norm, linear dependence, function spaces, Cauchy sequences etc. It then shows that the space of continuous functions cannot be equipped with norm based on Riemann integral as it creates a lot of problems. Dirichlet function is one where the sequence of functions are Riemann integrable but the limiting function escapes from this space where norm is defined based on Riemann Integral. Hence there is a case for a better norm and that norm turns out to be based on Lebesgue integral. There is a small wrinkle with using Lebesgue integral as one of the definitions of norm is violated. If you take a function which is 0 almost everywhere, the norm turns out to be 0 but the function as such is not 0. Spaces such as these have a specific name , “semi normed vector spaces”. It can be shown that semi normed vector spaces can be partitioned in to equivalence classes based on almost everywhere property and subsequently , this space is a nice mathematical space meaning, it is a complete space. Every Cauchy sequence of functions based on L1 norm converges to a function in the same space. Similarly LP spaces are also complete normed vector spaces. Any space which is a complete normed vector space is a Banach space and hence all LP spaces are Banach spaces.

Amongst all LP spaces , the space relating to p = 2 offers an unique opportunity to define inner product space and thus enabling one to marry the concepts of orthogonality/projection etc with Lebesgue measurable functions. This space is referred to as Hilbert Space, in recognition to the remarkable contribution of David Hilbert, the German mathematician who developed a broad range of fundamental ideas in many areas, including invariant theory and the axiomatization of geometry. Hilbert Spaces are pivotal to functional analysis and are discussed in Chapter 9. It talks about application of properties of Hilbert Spaces to Fourier Series and PDEs. Obviously you get only a 1000 ft view of this stuff and you have to refer other texts if the applied math interests you in a specific area. From a math fin perspective though, viewing Fourier series from Hilbert space is vital as many option pricing problems are solved using Fourier Series applications.

Chapter 10 is a cautionary note from the authors that all is not rosy with Lebesgue Stieltjes integral as it is valid only for absolutely convergent integrals. There is a whole host of conditionally convergent integrals which Lebesgue cannot handle. As a cue to an interested reader, the authors show developments post Lebesgue like Denjoy-Perron integrals, Henstock-Kurzweil integrals , thus ending with a opinion that perfect integration method is still an elusive dream.

I personally think that to integrate, you must be a fox rather than a hedgehog. Depending on the context, one must use a specific method and get done with the problem. An all encompassing integration method is like a hedgehog , and to me it appears that such a hedgehog has become an extinct species and chances of appearing is a remote possibility.

clip_image014 Takeaway:

This book provides an easy access to Lebesgue-Stieltjes via step function approach. One does not need to know measure theory to understand the book; however exposure to measure makes it an easy read. The highlight of the book is that it has a lot of examples that show the motivation behind Lebesgue-Stieltjes integral.


This book has a nice collection of problems related to Modern Probability. Unlike the classical problems which are related to discrete variables, the problems in the book are related to variables whose measure is a combination of discrete , absolute continuous and singular measures.

My motivation in going through this book was Chapter 10 which is on Conditional Expectation. Martingales are very important mathematical objects , as far as math-fin stuff is concerned. They appear everywhere, be it option pricing/ hedging/ stat arb etc.They also play an important part in stochastic portfolio theory. Martingales are Conditional Expectation variables.

Conditional expectation is a very tricky concept to understand mainly because there is no ready formula to compute Conditional Expectation. You can only guess the form based on a few constraints. E(X/Y) is easy to compute if Y is an event or if Y is a discrete variable. However it becomes a non trivial problem if Y is a general random variable. Existence of such a variable is guaranteed by Radon Nikodym derivative.  Computation though is from indirect means.

If you look at the definition of Conditional Expectation , it goes something like this :

Let X and Y be any random variables. The conditional expectation  E[ X / Y ]  of X given Y is a random variable such that


Each of the above statements need to be understood carefully. The first statement says that Sigma algebra of this variable is always a subset of Sigma algebra generated by Y. Second is a condition on projection operator, ( projection of X on the sigma algebra generated by Y).

Now one way to understand these issues is to actually pick up a few random variables and calculate Sigma algebra and see to it the above two conditions do actually satisfy. Another way is to look up books like these where exercises are tailor made so that one can understand these aspects. In that aspect, the problems in Chapter 10 are priceless as they give clues to compute implicitly the conditional expectation of general random variables.

 image Takeaway:

Conditional Expectation and the relevance of Radon-Nikodym derivative is shown via a superb set of problems. The fundas relating to  Conditioning a variable on a sub sigma algebra become crystal clear after working through these problems.


I was going through this book after a gap of 3 years, reason being, I had conveniently forgotten some important stuff relating to martingale theory.  Now that my work demanded a thorough application of this theory, I had to go over it again.  In order to refresh my memory, I thought I should go over this book by Jacod & Protter where I had read about Martingales for the first time.

However,after some thought,I went over the book from scratch.Why ? A book is like a good old friend you meet, who always tells you something that you can connect . So, instead of merely reading up martingale theory, I re-read the entire book. After 2 years of being away from this book, the re-read was worth the effort. This book cannot be the first book for someone looking to understand probability. It is too precise. However for some one who is already exposed to modern probability concepts, this book would appear awesome because such a person would appreciate precision than redundant ranting about stuff. 

Ok, the purpose of this post is to summarize the chapters in plain English. Writing about math without equations sometimes skirts the danger of appearing ugly. Anyways let me give it a try.

The book starts off with defining the triple (State space, Events and Probability). State space is basically all the outcomes of an experiment. Event is a property that can observed after the experiment is done (a subset of state space) and Probability is a mapping from the family of all events to a number between 0 and 1.  The three main ingredients of probability theory are clearly defined. Subsequently, the book introduces random variable and cautions the reader not to confuse it with a variable in the analytical sense. The random variable is in fact a function of the outcome of the experiment and hence the probability associated with X are termed as law of the variable X to distinguish it from the original probability measure on the entire state space.

Two axioms are given in Chapter 2 from which almost the entire theory of probability flows. These axioms are 1) probability of state space is 1, and second axiom relates to the concept of countable additivity which is different from finite additivity.  These axioms are the foundations of the entire probability theory. Basic difference between the two types of additivity are given with the help of few examples. In the subsequent chapter (Chapter 3) , a basic definition of Conditional probability is given and the linkage between conditional probability and independence of events is explained using Partition equation and Bayes’ theorem.

Chapter 4 gives the initial flavour of the method to construct probabilities. By focusing on the state spaces which are countable, authors tie the frequentist intuition that we all have AND the definition of probability measure on a countable space. Well , if it is a countable space, we all know that probabilities of an event A is nothing but the proportion of number of times the event A can occur in the entire state space. The same intuition is shown using the two axioms of probability stated earlier in the book. For a finite or countable space, one usually constructs a probability measure by defining probability for the atoms in the finite space. Once you define probability for atoms of the sample space, you can easily compute probabilities of the events in the sigma algebra, which itself is finite. So, finite case is a trivial case where all the intuitive knowledge about classical probability comes true.
Chapter 5 of the book is about formalizing the definition of a random variable on a countable space.

Chapter 6 moves on to constructing a probability measure on a measurable space where there is no longer the restriction that state space is countable. Now this throws up a very large state space and hence a convenient smaller collection of sets (sigma-algebra) is used for defining the measure. I did not understand this aspect for quite some time since the first time I read this book years ago. However after getting used to these terms and seeing them in various theorems, I understand this stuff better. I will attempt to verbalize my understanding. If you want entire subsets of R, you have to sacrifice countable additivity feature as it breaks down, if you are considering all subsets of R. However if you restrict your universe to almost all subsets of R, meaning measurable sets (Lebesgue measurable space) , then countable additivity hold good for such sets and you can gladly compute all the events in the sigma algebra of the measurable sets. That’s the key trade off which is not usually mentioned properly in most of the books / maybe it is mentioned and I never understood it in the first attempt. Now how does one go about constructing a measure for countable sets ? Extension theorem is the tool/ technique. I think one must spend (whatever time it takes) to understand Extension theorem properly as this serves as a foundation for probability theory.

One usually ends up defining a measure on a semi-algebra and then extends it to the sigma algebra after imposing some conditions on the sets. In this book, though the author starts off with an algebra and then extends it to a sigma algebra. This chapter is very hard to understand as the author leaves the derivation of the key idea , the existence of a measure , and asks the reader to refer other books. So, what’s the point in knowing that this measure somehow exists and you end up reading uniqueness of this measure ? Ideally one should read this construction from a better book rather than trying to understand the terse statements from this chapter. Take away from this chapter : Skip this chapter and read the existence and uniqueness from a better book –:)

Chapter 7 is a special case of Chapter 6 where the state space is R. One comes across the distribution function induced by the probability P on the space( R, B). The distribution function F characterizes the probability. Basic properties of distribution function are given and a laundry list of the most popular continuous distributions is provided.

In a general case where the random variable maps the events on to a space, the task is to construct a random variable in such a way that for every set in the range of the function, there is a pre-image defined in the sigma algebra of the domain space. Hence the concept of measurable function becomes important.

Chapter 8 talks about the generic case where measurable spaces ( Ω, ƒ , Ρ) , (R,Β) are defined and the mapping between measurable spaces now becomes a measurable function. Thus a word like random variable , technically speaking is a measurable function which maps two measurable spaces. Why should one take an effort to understand this ? Well , because measures on  Ω are difficult to construct mathematically whereas measures on (R,B) can be constructed and worked on. This is the key idea is thinking about distribution of X.
The concepts related to measurable functions are gradually built in this chapter. After giving a basic definition of measurable function, it starts off with an important theorem that can be used to check whether a function is a measurable function or not. The basic condition for a measurable function is that inverse image of every Borel set in Borel Sigma Algebra on R should be present in ƒ , the sigma algebra of the domain space. However to check every Borel set in R is painful and hence the theorem states that if you check for any class of function which generates the Borel Sigma algebra, then it is good enough. Thus if you can check for a class of interval like (-inf, a] then it is good enough for the entire Borel Sigma algebra. The good thing about measurable functions is that a lot of operations involving measurable functions retain the measurability, like sup, inf, lim sup, lim inf, closed under addition, multiplication, minimax operations. More over if a random variable converges pointwise, the function it converges to is also measurable. So, in a sense, the gamut of measurable functions is very large and it is really tough to produce a non measurable function. Infact it look a lot of time since Lebesgue introduced these functions in1901 for someone to come up with non-measurable functions. The key idea though of this chapter , is , one can define a law for Random variable X and thus can create valid probability triple ( R,B,Px) so that one can forget about the original domain space and happily work with this space as it is analytically more attractive.

Chapter 9 talks about the role of integration with respect to the probability measure. Why is integration figuring in  probability ? Well, if it is countable state space then expectation of the random variable is in terms of the summation of the events and their respective probabilities. But if it is countably infinite set, then the summation is replaced with integral sign.

Another nifty way to explain the integration connection with probability measure is that , expectation is defined as follows: If you take a collection of simple random variables so that they are less than the random variable that is studied, take the Expectation of this collection and find supremum , you get the expectation of random variable. With out writing the equation and instead explaining in words, the above description sucks. In any case, the point to be understood is this :Any Riemann integral is obtained by Supremum of collection of simple random variables and in the case of bounded functions, Riemann integral and Lebesgue integral converge. That’s the reason for the connection between Integral and Expectation.

This chapter starts off with defining simple random variables and describes various properties. Using Simple random variables, expectation of general random variables are computed. Key theorem such as Monotone Convergence theorem and Dominated Convergence theorem are described. Towards the end of the chapter, Inequalities like Cauchy Schwartz , Chebyshev’s are introduced. I did not really understand the reason for introducing them arbitrarily in this chapter. So, in that sense this part of the chapter is really tangential.

Chapter 10 is about independence of two variables. Well the idea is pretty straightforward – If the joint distribution function splits in to product of marginals then the two variables are independent. However the chapter is something that I have skipped / speed read always. It was a little different this time as I could understand the various arguments made. The chapter starts off with the notion of independence of random variables and ties it to the fact that sub sigma algebras generated by those random variables are independent. This is the correct view to hold , rather than what is usually taught in analytically focused courses on probability. Atleast I remember the definition in this way : P(A^B) = P(A)P(B). This is true but fails to give the right picture. Only when you think in terms of sub sigma algebra’s things become clear and you can extend this definition in to equivalent forms such as : If X and Y are independent, f(X) and g(Y) are independent too for every pair(f,g) of measurable functions. Similarly independence also means E[ f(X) g(Y) ] = E[ f(X) ] E[ g(Y) ]. Product sigma algebras are explored as they become crucial for defining joint distributions and marginal distributions. The chapter ends with Borel-Cantelli theorem. I found the proof of this theorem simpler in Rosenthal’s book . It is nifty simple and clear. Protter’s proof is somewhat round about in nature. Borel Cantelli theorem is striking as it says if {An} sequence of events are independent then P(lim Sup An) is either 0 or 1.It is not ½ or ¼ etc. For a simple application , here is an example – In an infinite coin toss event space, let Hn be the event that nth coin toss is Heads, the theorem says that P(Hn infinitely often ) = 1 , meaning there is a probability 1 that infinite sequence of coin tosses will contain infinitely many heads. With out the knowledge of these theorems and terms, if one were to asked the same question, one can only answer based on intuition and sometimes it can lead to wrong answers!

After a very abstract and conceptually difficult chapter, Sanity is restored for a reader like me, in Chapter 11 🙂 where things can be understood/ can be related to the real life applications!. For certain measures, we can find density function so that probability and area under curve can be connected. Certain Probability measure determine density up to a set of Lebesgue measure zero. One must note the subtle difference between almost everywhere and almost sure. The former is used for functions while the latter is used for convergence in probability context. The chapter then talks about “law of unconscious statistician”, where expectation of a function of random variable is computed. Most of the practical applications of probability involve an appropriate function of random variables. For example, a payoff of a plain vanilla option is a function of random variable S denoting the price of the underlying security. One needs a method to construct density of a transformed random variable.  For a specific transformation of a variable, one can investigate the monotonicity and differentiability of the function to split the domain of the function (so that it is bijective in its intervals) and then apply a few theorems mentioned in this chapter to arrive at the density.  One example which is always quoted in this context is that of chi square distribution, i.e.  Transforming a standard normal to a chi square variable). Chapter 12 is the extension of the previous chapter to n dimensional space.  The key idea in this chapter is the computation of the density of transformed multivariate random variable. There are at least three different methods illustrated using examples but the easiest one is by using Jacobi’s Transformational formula.

One of the uses of transformations like Fourier and Laplace is to formulate the solution of a problem in the transformed space and map it back it to the original space. In the context of probability measure, Fourier transform of the measure has a name,”characteristic function”. These Characteristic functions are dealt in Chapter 13 and Chapter 14 . Typically these are extremely useful in computing higher order moments / testing the independence of random variables etc. A laundry list of characteristic functions for common random variables is stated. Also uniqueness of the Fourier Transform of the probability measure is proved. This means that if two measures have the same characteristic function / Fourier transforms then the two measures are identical. This is useful thing to keep in mind.

In stats, most of what is done involves linear transformation of random variables. Thus sum of independent random variables as an idea needs to be studied as there are tons of applications in real life. Take for example as simple as a sample average. It involves the sum of the random variables and one need tools to compute the probabilities and densities of such sums. Chapter 15 talks about the convolution product of the probability measures of the individual random variables and provides a methodology to compute the distribution measure of the sum of random variables. Usage of Characteristic function is made in all the examples to easily compute the distribution of sum of iids.  Chapter 16 is a very important one as it deals with Gaussian variables in multi dimensional space. It is necessary to first analytically identify whether a set of variables is indeed from a multivariate normal. It is the form of characteristic function that plays an important role. For any combination of variables to be called multivariate normal, a simple litmus test is that any linear combination of the variables involved should be a normal distribution.  As an example, a linear regression model is taken and the distribution of its estimates are computed. The authors avoid using matrix algebra for deriving the distributions and thus make the computations very ink-intensive :). The chapter ends with mentioning 6 important properties of multivariate normal distribution which make a multivariate normal distribution analytically attractive. One casual remark we often hear “ Normal distribution is everywhere in the nature “ .  If one thinks about it, Normal distributions do not really exist in nature. It arises via a limiting procedure (Central Limit Theorem) and thus is an approximation of reality and often it is an excellent approximation. The irony is that normal distribution is a great approximation to the True distribution of most of the natural phenomena (which itself is not precisely known!!!).

Whenever we talk about limiting procedures, approximations, we need tools to compute and think about. Most of the classical stats is developed using asymptotics, where limiting behaviour of random variables are invoked to justify hypothesis tests and inferences of parameters in a model. Hence the study of convergence of random variables becomes important, which is dealt in Chapter 17.  One usually comes across point wise convergence in calculus courses but such point wise convergence is too harsh / precise to be applicable to the probabilistic world. The chapter discusses 3 other types of convergence, which are, almost sure convergence, convergence in pth mean, convergence in probability. Here is a nice visual to summarize the relationship between various modes of convergence is


Chapter 18 introduces the most important type of convergence , in the context of Statistics, the weak convergence or convergence in distribution. Most of the stuff you come across in stats use convergence in distribution to make statements. With this type of convergence you can make statements relating to distribution of Xn and X without worrying about whether there is a relation between Xn and X. There is no mention about Xn and X, meaning they can exist in different sigma algebra, can have different laws etc. It doesn’t matter and the weak form can be applied away to glory. This is the strong point 🙂 of the weak form of convergence. Firstly, how does one check whether Xn converges in distribution to X ? lim E[ f(Xn) ] should be equal to E[ f(x) ]. This condition must be checked for all f continuous and bounded functions. There is also a mention of a theorem which reduces the test cases. Instead of testing all the continuous functions, one can instead test bounded Lipschitz continuous functions. Slutsky’s theorem is derived which is a very useful theorem in statistics. It talks about convergence of a random variable based on the distance metric between two random variables.

Chapter 19 makes the relationship between weak convergence and characteristic functions.  This relationship forms the key to limit theorems. When we say that, irrespective of the underlying distribution, the centered mean of the variables divided by the deviation converges to standard normal, the proof depends on this critical relationship between weak convergence and characteristic functions. Thus the three chapters 17, 18, 19 prepare the ground for launching in to developing limit theorems.

Chapter 20 talks about the strong law of large numbers : If there are n independent and identically distributed variables, then the average of the sum of the variables for large n converges ( almost surely & converges in L2 ) to the population mean . Weak law is the same as above but the convergence is in probability sense. The proof for these laws is elegantly shown using various modes of convergence discussed in the previous chapters.  Finally an example of strong law is shown in the Monte carlo world where integration of a complex function can be computed using a simulation of uniforms.

Chapter 21 is all about, the most widely used theorem in stats, the Central Limit Theorem, which essentially says that, irrespective of the underlying distribution of random variables, a particular transformation of “sum of random variables “converges in distribution to standard normal. The proof of the theorem uses the relationship between weak form of convergence and characteristic function. The chapter also provides CLT in multidimensional case. There is also some stuff where you get to know the rate of convergence of strong law vis-a-vis CLT.

Chapter 22 might sound rather abstract. What’s the point in understanding that there Hilbert Spaces, What’s the connection anyway between Hilbert Spaces and Probability. Such questions are only answered in Chapter 23. So, a reader needs to understand these concepts and have a vague notion that they will be somewhere used in the book. Frankly when I had read this chapter for the first time, I was swamped by the sheer terminology like – complete spaces, inner product spaces, normed vector space, metric space, orthogonal operator etc. For any reader who is in a similar situation, my suggestion would be put this book aside for some time and read up on metric spaces , vector spaces and inner product spaces thoroughly. Understand their relevance, historical significance to general mathematics. Once you are at least familiar with some basic stuff about functional analysis, in the sense that, you must be able to cogently explain, all the following questions:

  • What are metric spaces?
  • How is a metric defined?
  • Can the same metric space have two different metrics?
  • What do you mean by metric space being complete?
  • What is vector space? What is normed vector space?
  • Can a metric always induce a norm?
  • Can a norm always induce a metric?
  • What is inner product space?
  • Is inner product space subset of vector space? Is it a subset of metric space?
  • Can inner product induce a norm on the vector space?
  • Can inner product induce a metric?
  • What is complete normed vector space?
  • What is complete inner product space?
  • How to check whether a space is complete, be it metric/ normed vector / inner product space?

Unless you convince yourself that you know the answers to the above questions, it is better to keep this book aside and work through metric spaces. Once you are comfortable with the above questions, this chapter can be read easily.

Chapter 23 is one of THE MOST important chapters of the book, from a math fin perspective. In almost all cases, be it option pricing / hedging / econometrics based forecasting, one always deals with conditional expectation model. Regression, which is considered as workhorse of statisticians is a conditional expectation model E(Y/X). Undergrad intro courses in probability usually introduce conditional probability and leave it at that. Or may be they scratch the surface of conditional probability models by spelling out a formula for E(X/B) where B is some event, E(X/Y) where Y is a discrete random variable. In all such cases, the formula based approach hides the complexities behind computing conditional expectation. For a case where E(X/Y) where both X and Y are random variables, how does one go about computing E(X/Y)? You need exposure towards Hilbert Spaces to understand Conditional Expectation. This is where all the slog one goes through in understanding sigma algebras, Borel functions, etc pay off. There are two things that one realized when computing E(X/Y). Well three things actually. First is that E(X/Y) is itself a random variable. Secondly, the sigma algebra of the inverse images of this random variable is a subset of sigma algebra of the inverse images of Y. One must not take this statement at face value. Just cook up some example and check it out for yourself. A simple example of a dice thrown twice and computing E(X/Y) where Y is the first throw and X being the sum on the two dice will convince you that it is indeed that Sigma algebra of E(X/Y) is a subset of Sigma algebra of Y. Third aspect to be kept in mind is that on any borel set belonging to Sigma algebra of Y, the expectations of E(X/Y) matches with E(X). Again this is abstract and makes sense once you work with an example and see for yourself that this is indeed the case. So, basically there are two properties which are damn important while thinking about Conditional probabilities.


Conditions (a) and (b) impose conflicting restrictions on E(X\Y). On the one hand, E(X\Y) needs to have a rich enough structure to satisfy (b). On the other hand, it cannot be too rich or the sigma-field clip_image006 would be too large to satisfy (a). Meeting the two conditions simultaneously calls for a compromise.

So, the conditional expectation is calculated implicitly in such a way that the above two conditions are satisfied. I did not understand this aspect of conditional probability for a very looooooong time. However one you understand that E(X/Y) can be computed implicitly, you begin to appreciate Radon Nikodym theorem. So, the takeaway from this chapter is that you will start to appreciate that E(X/Y) should be looked at from E(X/clip_image008 ) perspective. Thus you no longer care of Y in the sense of what values it takes, but all you are interested is in Sigma algebra of Y. This is the key to understanding Conditional Expectation.

Ok, I did not mention here the relevance of Hilbert Spaces. Here is the connection: If one looks at complete inner product spaces, one can use the orthogonality concept, E(X/Y) becomes the best estimate of X given information of Y. This is essentially projecting X in the sub algebra of Y. For projection to make sense, the concept of inner product is used and the first construction of Conditional Expectation is done on Hilbert Spaces. However L2 spaces are only a subset of L1 spaces which most of us would be interested. The extension of Conditional expectation from L2 spaces to L1 spaces is done using the standard procedure of 1) showing that it works for indicator functions 2) it works for simple functions 3) it works for non negative random variables 4) it works for general random variables. Another key aspect to understand about conditional expectation is that it is only unique in “almost sure “sense. Meaning there could be more than one conditional expectation variables that meet the criterion, where the variables only differ on 0 measure sets.

Chapter 24 deals with the properties of a sequence of random variables (Xn) instead of Sequence of iids, which is usually the norm. A specific type of sequence of random variables that is relevant to math fin area is “Martingale”. A Martingale is a sequence of random variables with the following properties

  • Each element of the sequence is in L1
  • Xn is Fn measurable
  • The most important being E[ Xn/ Fn ] = E[ Xm ]

Several properties of Martingales are very appealing from fin modeling perspective. Martingales have constant Expectation and hence attacking a problem like option valuation from a martingale perspective always makes one hopeful of ending up with a martingale.

The other class of variables that are discussed in the chapter are Stopping times. Bounded Stopping times form Martingales. These are extremely useful in American option pricing. Doob’s Optional Sampling theorem is also discussed in this chapter.

Chapter 25 explains super martingales and sub martingale, that form a class of useful mathematical objects in financial modeling. Most importantly it talks about decomposing a super martingale or a sub martingale in to martingale and an increasing/decreasing process. Chapter 26 and  Chapter 27 explore Martingale Inequalities and Martingale Convergence Theorem. Chapter 28 is about Radon-Nikodym theorem. This theorem is used in a ton of places in math-fin area. However this chapter uses Martingales to prove the theorem. Ideally it would have been better if the theorem was proved using measure theory concepts. So, in that sense the organization of the last section of the book was little challenging for me. The book introduces martingales and then introduces measure change. As stated earlier, there is no neat closed formula for E(X/Y) where X and Y are both random variables. Radon- Nikodym derivative provides a neat way to show the existence of such a variable. One can always show that such a variable exists for Hilbert Spaces but for L1 spaces, one has to prove it using indicator functions, simple functions, non negative random variables and general random variables.

image Takeaway:

I think this book is too concise for some one looking to understand concepts.  Existence theorems are conveniently ignored for some important mathematical objects. However this book is an awesome reference to most of the theorems of modern probability.


The book is divided in to three parts. The first part of the book gives a basic primer to the Quant world which includes some description about the benefits of quant trading and intro to the structure of a basic quantitative trading system. The second part of the book deals with the key elements of the black box and the third part of the book is relevant to investors / managers who would want to evaluate various quant trading strategies.

Part I – The Quant Universe

The first part of the book introduces the reason for considering quantitative trading as a strategy, by describing the basic difference between quantitative trading and discretionary trading. The author starts off with a fact that Algorithmic / Computer driven trading is a reality, whether one likes it or not. 60% of the trades in US from the buy side are computer driven and about 45% of the trading volume in Europe are system driven. Given that quant trading is a market reality, what is it that one can learn from a quant’s approach to markets. The book cites three main reasons for understanding quant trading

1. Deep thought : Since quant trading strategies are executed using a system, it becomes very important to pen down the exact strategy , signal generation , signal checking conditions, stop-loss rules etc . The rules have to be precise, at least in the probabilistic sense. Thus unlike a discretionary trader, who most often than not, cannot verbalize or pin down his trading strategy , a quant trader can exactly tell the strategy in a series of steps(almost). One of the other skills a quant brings to the table is “Data Visualization”. Well, as is well known that 80% of our brain is allocated to visual processing. So, by providing visuals to the various trends about parameters / P&L / simulated scenarios, one can look at an investment strategy from multiple angles and reduce Type I error (Trade when there is no signal) and Type II error( Not trade where there is signal).This reduction of Type I & Type II error itself can add value to an investment strategy.

2. Measurement and Mismeasurement of Risk : This is a very debatable aspect of the use of quant methods to measure risk. Taleb has rallied against anyone who uses math to define risk. However there are a lot of situations where it is possible to tame risk. I strongly believe that some map is better than no map. At least it gives some sense of direction to the portfolio manager who can then use a mix of Bayesian and Fischerian stats to get an idea of risk. Well, whatever be math used, Black swinish events are anyways beyond the scope of quant’s work as they come under the category unknown-unknowns. So any risk metric for that matter should always be taken with a pound of salt!.

3. Disciplined Implementation: Quant trading tries to cut out emotion, fear , greed, manual mistakes out of the execution process and thus brings in a discipline in to a trading process / investment strategy.

The above three reasons should make any investor, trader , portfolio manager to be interested in the quant world.The book gives an interesting definition of quant strategy. It goes like this:

There is a full spectrum between fully discretionary strategies to fully automated strategies. The key determination that puts quants on one side of spectrum and everyone else on the other side is : Whether daily decisions about the selection and sizing of portfolio positions are made systematically or discretionarily. If both the questions of “what positions to own?” and “how much of each to own?” are answered systematically, that’s a quant strategy. If either of the two questions are answered by a human, that’s not a quant strategy

It then goes to describe a schema for understanding quant trading black box


The above flow chart is merely a schema . Obviously there are many quant strategies where only a few of these elements are chosen (ex: transaction costs in to the alpha model itself). There are strategies where the flow could be recursive. However the above schema helps one to have a discrete map of various components of a quant system. Alpha model is designed to predict the future of those instruments that the quant wants to consider trading , in order to generate returns. Risk models are designed to help limit the amount of exposure the quant has , to those factors that are unlikely to generate returns but could drive losses. The Transaction cost model is used to help determine the cost of whatever trades are needed to migrate from the current portfolio to whatever new portfolio is desirable to the portfolio construction model.The alpha, risk and transaction cost models then feed in to a portfolio construction model, which balances the tradeoffs presented by the pursuit of profits, the limiting of risk, and the costs associated with both, thereby determining the best portfolio to hold. Based on the portfolio construction model, the new trades are then sent through the execution model. The whole system’s life line is Data + Research. Without these two elements the black box is a deadbox!.

Part II – Inside the Black Box

This part of the book explains each of the blocks of the above schema. A chapter is dedicated to each of the components of the schema.

Alpha Models
The output from Alpha model is either a return forecast or direction forecast.The book starts off by explaining the core difference between the vast numbers of the alpha models. Either they belong to theory driven type or empirical type. In the former one, a quant starts off with a theoretical relationship and vets the data to calibrate the parameters of the model. The advantage with this is that the model can be easily communicated to people. However just because you can communicate the theory behind a model does not mean that the model will make money. There is another school of thought, i.e empirical type where you allow the data to speak for itself and you build/ trade accordingly. Personally , the latter type is far more appealing to me as you don’t have to carry theoretical baggage while building the model. Yes, there is a risk of data mining. So , the approach depends on the time horizon of the strategy. If you are dealing high freq data , I guess the best thing is to work on empirical model building. Most of the people whom I have seen building models in the high freq give a damn to theories. They crunch the data and trade on the pattern. Their belief I guess is ,”At such a frequency, Who has valid theories ?” In a way their opinion is right. Micro Market Structure studies is still at a nascent stage and researchers / academicians have not made a great amount of literature public. Most often than not, if a professor cracks an algo at a high frequency level, he is likely to trade it in a bank/hedge fund. Classic case that I know of is Dr.Robert Almgren from NYU who has written fantastic stuff on execution algos at high frequency scale and has started his own firm, Quantitative Brokers to capitalize on the algos. So , in a way all these theory laden quant models are suited for may be medium term to long term investment strategies. At a high frequency scale, I think empirical models are the flavour at the moment.

Anyways coming back to this book, the author expands on the Alpha block in the following manner:


The theory driven alpha models are explored from two perspectives, price based models and fundamental analysis based models. From whatever I have seen or read till date, quants primarily using price /volume/ etc to churn out models. Fundamental Analysis based quant models , I think are actually fraud models. My professor used to say that you must build a model that makes money, and once all alpha is gone, you can bring in some fundamental attributes and give a spin to the model so that the model is publication ready in some journal. Strange things happen when you mix academia and hedge funds! .I have started to believe that this fundamental attributes based quant model is bogus. For example, I have seen a few factor models which are based on part statistics, part fundamental analysis. They make fantastic back testing results and sadly perform pathetic in production. Most of them are often flawed as there is not enough data to test their statistical significance. For quants who actually take the pains to test such factor models using random matrix theory or equivalent theory , I think they trade away the signal asap rather than making in to a sales pitch to buy side.

The chapter then delves in the implementation aspects of any such alpha model. These details include Time Horizon, Bet structure, Instrument type, Run Frequency of the trading model. One inevitable conclusion that a reader draws from this book is that, there are only a few alpha seeking strategies but the implementation attributes are so diverse that it can potentially give rise a ton of trading strategies.

Risk Models
The book states at the very beginning that risk management function from a quant perspective is not about reducing risk , but it is primarily about selecting sizing the risk exposure. Post that , the quant can decide whether it makes sense to go ahead with the strategy. For sizing the risk exposure, one can simulate and statistically formulate risk exposure bands or apply heuristics to size the exposure. The other aspect to be dealt with is leverage. The author makes a passing remark about Kelly’s criterion and its use for optimal sizing of exposure.

The author makes the same type of distinction in risk models , i.e theory driven risk model and empirical risk model. Stat arb world seems to like empirical risk model as they bring out the risk factors from the data. However the downside is that there could be risk factors which have characteristics that cannot be hedged away in the real world.May be there are no instruments in the market you are working , that can be used to hedge risk factors. If you are working in emerging factors where things are still developing, you might as well take cognizance of the fact that your position is unhedged and hence you don’t have a choice but to reduce risk by tweaking the size your exposure. That’s the best you can do. If you do not have control on risk exposure type, at least you can control the size of risk exposure. So, a quant has to strike a balance between theory laden risk modelling approach and empirical approach. In any case, he/she can embed the risk model in the alpha model itself or can use it as an external component. It all depends on the strategy I guess.

Transaction Cost Models
The three key elements of transaction cost models are Commissions & Fees, Slippage Costs and Market impact costs , with increasing order of difficulty in building a model. Commission& Fees is the easiest to incorporate as they are mostly fixed hurdle costs for a trade. Slippage is difficult to model as it depends on the strategy being adopted. Slippage cuts in to Trend following strategies whereas it aids mean reverting strategies. The most difficult one to model is the Market impact and the book mentions the four common models used for Market impact, i.e Flat, linear, piece-wise linear and quadratic. The takeaway from this note is that transaction cost model’s purpose is to simply advice the portfolio construction model how much it might cost to transact. Its job is not to minimize the cost of trading.A nice analogy given about the three components described is as follows: The alpha model plays the role of the starry eyed optimist, and the risk model plays the role of the nervous worrier, and transaction cost models act as frugal accountants.


Portfolio Construction Model


Portfolio construction model acts as an arbitrator between Alpha, Risk and Transaction Cost Models. Given that alpha model is an eternal optimist , Risk Model is an eternal pessimist and Cost Model being the guy who reminds of the trading costs, the portfolio construction model balances the output from these models and tries to construct a portfolio based on some objective function. There are basically two types of Portfolio construction models. First are the Rule based construction models like equal weights, equal vols, decision tree methods etc. These are heuristic by definition and hence will appear as adhoc than quantish models. However it is still a debatable issue among finance literature whether equal weight model beats all the fancy optimizers out there. Second type of portfolio construction models are optimization based. There is an objective function, there are constraints on the size of holding and type of holdings and you basically optimize the function. Here the book mentions a laundry list of techniques like plain vanilla unconstrained optimization, constrained optimization, Black Litterman model, Grinold and Kahn’s factor portfolio approach, Michaud’s resampling approach(which is far more appealing to me from stats perspective) and Data mining approach. Given the arbitrator’s role played by the construction model, this is one area which can come close to being termed as “Black Box” as there are tons of variations that a quant can choose based on statistical tests and market conditions. The author mentions an interesting observation that quants who build Rule based construction models typically take intrinsic alpha approach(meaning they rely on individual security forecasts), while quants who build Optimization based portfolios take a relative value alpha approach.

Execution Model
This is one of the toughest and exciting area for any quant. Probably that is one of the reasons for sparse academic literature and trading strategy documentation. Recently I came across a flyer about a high frequency trading shop that is going to be conducted in Singapore / Mumbai in April 2011 that costs a couple of thousand dollars!!. Are the strategies so secretive that a two day workshop costs ~ $3000? I guess this execution world is more like a social video game. The guy who has a better strategy in a video game wins. This chapter talks about various technology options for executing a trade and talks about DMA, high frequency trading platform, that are very much a reality in developed markets. These are useful in markets like US where getting alpha from medium term to long term strategies is difficult. So, I guess Execution Model is the new Alpha Model in all such markets.

Data & Research

The concluding sections talk about data and research components of the black box. Data is the starting point for quant’s work and the book merely scratches the surface about data by giving an overview of data procurement, data treatment and data storage issues. The section on research talks about sources of idea generation for a quant strategy and research issues( more from a back testing perspective)

Part III – A practical guide for investors in Quantitative Strategies

The book talks about a couple of inherent risks in any quant strategy that an investor should be aware of, they being , Model Risk, Model Mis-specification Risk , Regime change risk, Exogenous shock risk , Contagion or common investor risk. Most of these terms are pretty self explanatory. The author narrates Aug 07 quant crisis to highlight the crowding phenomenon of quant strategies.

The author gives his opinion on some of criticisms of quant trading that one gets to hear such as

  • Trading is an art and not a science. Quant trading is of no use.
    • No”, says the author, though the argument is little weak as he focuses on all the successful quant firms. Taleb’s “silent evidence” makes the argument weak.
  • Quants cause more market volatility by underestimating risk
    • “No “ , says the author with a few numbers based arguments
  • Quants cannot handle unusual events
    • “Valid criticism” says the author.
  • Quants are all the same
    • “No “, says the author by giving two arguments. First the components of black box described has so many levels of combining that quant has a very large degree of freedom. Second reason being numbers based. He cites his own firm where he gives an estimate of 30% of quant trades in opposite directions. He also shows correlation between quant long short funds to be very close to 0, vis-à-vis a relative high hedge funds return correlation
  • Only a few large Quants can thrive in the long run
    • “No”, says the author citing about half a dozen reasons. Some of them are compelling and make the case that a boutique shop is as appealing as a large quant fund for an investor
  •  Quants are guilty of data mining
    • “Not a fair claim “, says as it is often use interchangeably with “curve fitting” which obviously is useless.

The last section of the book talks about evaluating quants and quant strategies. It talks about the ways to interview quants, what sort of questions to ask quants , how to understand what quants are doing and finally the way to incorporate quant traders in to your overall portfolio strategy. I found the last chapter of the book particularly interesting as it mainly flows from the author’s experience in hiring and managing quants.


image Takeaway:

The schema described in the book is very appealing and can serve as a good framework for all the quant strategies that one comes across. So, in that sense, the book does indeed demystifies black box trading by showing various components of the black box and the interdependencies.