April 2013


This book gives a non-rigorous treatment to Brownian motion and its applications to finance. Let me summarize a few points from various chapters.

Chapter 1 : Brownian motion

This chapter starts off by specifying Brownian motion by the properties of its increments such as independence, first and second moments, transition density etc. A discrete approximation of BM is shown via a binomial tree. Covariance of BM process is derived. A way to manufacture correlated BM is shown. Illustrations are provided to show that BM is nowhere differentiable. The most important property of BM, the quadratic variation, is shown via a few simulation runs.

Chapter 2 : Martingales

The concept of conditional expectation is dealt in this chapter. The explanation is at a 10,000 ft. view and various properties of conditional expectation are listed. In order to rigorously prove the properties of conditional expectation, measure theory is the only route. However one can always get the intuition behind the properties by working through a binomial asset pricing model. I think the key property that one needs to understand is that of “partial averaging”. In order to understand “partial averaging “’ it is always better to write down the partial averaging condition , think about it for some time, and then try to verbalize the condition in simple words. By forcing oneself to write the condition in symbols and then translate the same in to words, one might get a proper understanding of conditional expectation. Martingales are mathematical objects defined in terms of conditional expectations. In option pricing it is important to manufacture a martingale from the terminal value of a derivative claim. The concept of filtration helps one manufacture a martingale from the claim. This process of creating a martingale by conditioning is described in the chapter. This chapter also gives a sample list of martingales to make the reader get an intuitive sense of these objects.

Chapter 3 : Ito Stochastic Integral

Ito Stochastic integrals have peculiar features and they cannot be integrated in the Riemann sense. The integrand is often a general random variable and the integrator is a stochastic process like a Brownian motion. For integrating with respect Brownian motion, one cannot use Riemann Stieltjes integration as the Brownian motion has unbounded variation. Hence an alternate route is taken to compute these integrals. In fact an Ito integral can only be computed in the mean square sense.

One needs to start with a sequence of non anticipating integrands and then extend the results of the stochastic integral for these non anticipating integrands to a general integrand. The whole magic happens in the Hilbert space where one can approximate the Ito integral of a general integrand with a limiting value of a sequence of Ito integrals of non anticipating integrands. Obviously this method is pretty cumbersome. For every integral if one needs to find approximating functions, evaluate the limiting value, it is like doing Riemann integration by partitioning for every single function. Like there are standard rules for Riemann calculus, there is one savior for Ito calculus, the “Ito’s lemma”. Thanks to Ito’s lemma, the evaluation of stochastic integral becomes relatively easy.

The exercises in the chapter are laid out in such a way that they reinforce the mean square convergence aspect of Ito integrals. The recipe for computing Ito integral is : a) formulate a sequence of non anticipating sequence of functions that converges in mean square to the general integrand b) write down the discrete stochastic integral for the sequence , and c) evaluate the limit of the discrete stochastic integrals. This converged value is the Ito integral value. Basically whenever you see Ito integral equals something, the equality sign should be interpreted in the mean square sense.

Chapter 4 : Ito Calculus

For computing Ito integrals, one of the most important tools is Ito’s lemma. This chapter covers Ito lemma in various shades and colors. Levy characterization of Brownian motion is stated so as to easily identify a Brownian motion. Basic recipe for simulating a Multivariate Brownian motion is given.

Chapter 5 : Stochastic Differential Equations

SDE for the following processes are described and solved

  • Arithmetic Brownian Motion
  • Geometric Brownian Motion
  • Ornstein-Uhlenbeck SDE
  • Mean Reversion SDE
  • Mean Reversion with square root diffusion SDE
  • Coupled SDE

Solving an SDE, as they say is part art and part science. A good guess is all that is required sometimes. However the chapter tries to give a generic framework for SDEs that are a combination of Arithmetic BM and Geometric BM. The chapter ends with Martingale representation theorem that basically says that, “If Brownian motion is the only source of randomness, then a continuous martingale can be expressed as a driftless SDE driven by Brownian motion”. This is the heart of option pricing framework. MRT guarantees that the claim process can be replicated. However the thing to keep in mind is that it only guarantees replication, it does not tell you the exact hedge.

Chapter 6 : Option Valuation

This chapter tries to do too many things, i.e. 1)PDE approach to option valuation, 2)Risk neutral approach to option valuation and 3) connecting PDE with martingale pricing using Feynman Kac. It is like squeezing in 120  pages of crystal clear treatment by Shreve( in his book on stochastic calculus) to 20 pages. I think this chapter needs to be rewritten so that it can at least give a good direction to the option pricing framework.

Chapter 7 : Change of Probability

This chapter is written very well. It starts off with a change of measure for discrete random variable. For a random variable taking countable values, one can adjust the probabilities in such a way that you will be able to shift the first moment of the random variable. Subsequently, a change of measure is done for a standard normal variable to shift its mean. These two examples are followed by changing the measure for a Brownian motion using Girsanov transformation. Unlike the simple cases, one needs to rigorously prove that the Brownian motion after a measure change results in another Brownian motion with change in drift. Enough examples are given so that the reader gets a good idea about, “how to move from one measure to another?” The application of Radon Nikodym derivative is shown via Importance sampling, a technique to produce robust simulation results. The concept of equivalent measures is also touched upon towards the end.

Chapter 8 : Numeraire Pricing

The last chapter like the chapter on option valuation covers too much ground and hence falls flat. An enormous number of topics are touched upon in 20 pages and ends up not doing justice to any of them. May be the author meant it that way so that a curious reader can explore things in other books.


The title of the book appears daunting but the contents of the book are accessible to a pretty large audience. The math covered in this book does not require many prerequisites from the reader other than basic calculus and probability concepts. The background to all the discussion about Brownian motion is the option pricing application and hence can be read by most of the finance professionals who are looking to get a little deeper understanding of the math behind option pricing.


Steven Shreve’s books on Stochastic calculus (Volume I + Volume II) are amazing in terms of breadth. Basic intuition is built in Volume I using a discrete-time binomial asset pricing model. In Volume II, the author introduces all the concepts needed to build a financial model in continuous-time. In this post, I will try to summarize a few points from Volume II.

Chapter 1: Introduction

The most important mind shift that one needs to make when moving from the discrete-time case to continuous-time case is that of “uncountable outcome space”. This means that intuitive understanding of probability is not enough. One needs to have a decent understanding of measure theory. The first chapter and second chapters of the book serve as a crash course to measure theory.

Chapter 1 starts off with discussing two examples where the outcome space is uncountable. These examples show how one can create the event space from the outcome space. Once the event space is created, a measure is clipped on to it so that one moves from measurable space to measure space. Probabilities are assigned to sets rather than atoms in the case of uncountable outcome space. Hence one needs to work with sets. There are some examples given that highlight the need to have a firm grasp of set theory. There are many complicated sets that have probability but which cannot be described explicitly. Hence set theory helps formulate the interested event as a combination of simpler sets for which probability is known. The key probability related concepts covered in Chapter 1 are Random variables, Distribution measure, CDF, the condition for equivalence of Riemann and Lebesgue integral, Monotone convergence theorem, Dominated convergence theorem, Law of Unconscious Statistician (LOTUS), Equivalent measures, Change of measures. Obviously from a math-fin perspective, the most valuable section is the change of measure.

Enough explanation is given so that the reader understands that it is necessary to "separate the process from the measure". A process with an outcome space will have different distributions based on what measure is being applied. This concept is made specific with some examples like converting a normal random variable in to a standard normal random variable using measure change. A key tool for measure change is the Radon Nikodym derivative. To get an intuition behind this tool, I think its better to review Volume I and then follow the content from this chapter.

Chapter 2: Information and Conditioning

It is very important to get the intuition right about the concepts such as sigma algebra generated by a variable, filtration, adapted stochastic process and conditional expectation. It is hard to appreciate these objects in an uncountable outcome space without seeing how they behave in a countable or in a simplified outcome space.

The chapter starts off with a three coin toss outcome world and gives the reader, a good intuition about the sets of a sigma algebra. By using phrases such as "sets are resolved by the information", the user gets a good idea about the meaning of filtration. In a typical undergrad setting, one does not need concepts such as sigma algebra as the outcome space is pretty much well known and you are trying to estimate something about a random variable that is defined on the entire outcome space. However things become murkier in the real world where you have random variables defined on partial information. The random variables themselves generate sigma algebras and you need to be comfortable in working with them. Filtrations are key math objects that appear in defining an adapted stochastic process. The beauty of this chapter as mentioned earlier is that these concepts are explained using a three coin toss outcome space. You can clearly see the connections between various concepts.

The chapter explains the principles of independence, conditional expectation, Markov property and Martingales. I liked the way conditional expectation is explained in the chapter. Personally I have always found conditional expectation to be the toughest concept in probability theory. May be because one needs to guess the variable and there is no well defined way to go about guessing. All you have is that conditional expectation should follow two properties. One of the properties is “partial averaging”. One must guess the random variable so that it satisfies “partial averaging ” property. This chapter lists all the necessary properties of conditional expectation.

Chapter 3: Brownian motion

The chapter starts off with a section on symmetric random walk and lists the properties of the object such as independence of increments, it being a martingale etc. It then uses a scaled version of symmetric random walk to illustrate the concept of quadratic variation. The intention behind introducing scaled random walk is that it converges in distribution to Brownian motion and thus is a nice way to look at a discrete process that can generate Brownian motion.

To keep continuity with Volume I of the book where binomial asset pricing model is dealt, the chapter uses the limit of a binomial process to illustrate log normal distribution, the most common assumption for the distribution of stock prices. Given this prelude, Brownian motion is formally defined and properties of univariate and multivariate Brownian motion are given. The fact that Brownian motion has a Gaussian distribution at its core gives the flexibility of defining a Brownian motion in terms of moment generating functions, or mean and covariance matrix or in terms of independent increments and their distributions. The section on Quadratic variation explores the peculiar behavior of Brownian motion as compared to other random processes. Since the process is nowhere differentiable, one sees that it has a quadratic variation property. This property is the reason why one must learn about Ito’s calculus. There is a little section that shows that by assuming GBM, one can compute the volatility of the asset based on a sample path. Brownian motion is a flexible object because it is a Martingale as well as a Markov process. The Markov property is especially useful to compute the expectation for a variety of functions that are dependent on the Brownian motion.

One learns the importance of first passage time in a discrete Markov chain setting where it can be used to classify states as null recurrent or positive recurrent state. In the case of Brownian motion, one can guess that it is a null recurrent process as it is nothing but a scaled symmetric random walk, where the latter is a null recurrent chain. This guess is made rigorous by introducing exponential martingale that contains a Brownian motion. One can use optional sampling theorem on this martingale and come up with the result that exponential martingale stopped at first passage time is still a martingale. This fact can be used in a beautiful way to compute the hitting time probability of Brownian motion and the transition density of the Brownian motion. These properties of the first passage time are rederived using reflection principle.

Introduction to any stochastic process must not overwhelm the reader and this chapter does just that. It gives the right amount of math to start working with the process. Thankfully construction from the first principles is left out. If you have worked on continuous Markov chains with discrete state space, one can appreciate the transition density concepts in a better way. In a CTMC, you assume that the holding times are exponentially distributed with a parameter that is dependent on the state and hence one can talk about transition probabilities. Since Brownian motion is a continuous state continuous time process, you talk about transition density instead of transition probabilities. The other thing that one needs to appreciate is the Markov property of Brownian motion. It is extremely useful for simulating various types of sample paths and computing various aspects of the sample path. For example an option which is dependent on the maximum of Brownian motion, one can use the Markov property to get the conditional density of the maximum of Brownian motion given the value of the Brownian motion at a specific time.

Chapter 4: Stochastic Calculus

This chapter introduces many concepts of stochastic calculus such as Ito integral, Ito processes and Ito’s lemma. The Ito’s integral is defined for simple integrands and main properties such as mean, variance, Quadratic variation and martingale property are explored. Subsequently Ito’s integral for general integrands is introduced and the following properties of general Ito’s integral are explored

  • Continuity
  • Adaptivity
  • Linearity
  • Martingale
  • Ito’s Isometry
  • Quadratic variation

The chapter contains a thorough introduction to Ito’s lemma for various stochastic processes. Numerous examples are given so that a reader is comfortable in applying Ito’s lemma to Brownian motion, functions of Brownian motion, Ito processes etc. Black Scholes PDE is derived assuming that a replicating portfolio exists. This might be a little odd for someone who has not come across replication argument. Why should there be a hedge? This is dealt in the chapter on risk neutral pricing. In any case once you assume a self replicating portfolio, you can equate the stochastic component and time component of the SDEs to get Black Scholes PDE. Levy’s characterizations for univariate and multivariate Brownian motions are given. The chapter ends with a section on Brownian bridge that is useful for Monte Carlo simulation.

The takeaway from this chapter is – Ito integrals are to evaluating using two steps. First step involves finding a non anticipating function that converges to the integrand in the mean square sense. Second step involves formulating the discrete stochastic integral of the non anticipating function. The final step involves taking the limiting value of the discrete stochastic integral to arrive at the Ito integral.

Chapter 5 – Risk Neutral Pricing

The section starts with the most important process, the Radon Nikodym derivative variable that is relevant to risk neutral pricing. This is denoted by Z and it plays a key role in changing the measure of a random variable. If you have a normal random variable with a constant mean, using Z, one can change the measure so that the variable is a standard normal under the new measure. From a computational perspective, Radon Nikodym derivative is used to swap between the real world measure and risk neutral measure for calculating the expectations. As far as changing the measure on a stochastic process, you need much more than a simple variable, you need a process to do that job. This is precisely done by manufacturing a Radon Nikodym process by conditioning on the filtration.

The chapter introduces one dimensional Girsanov theorem that is very useful to change the measure for a Brownian motion. If you take a Brownian motion, any measure change can only change the drift component. The volatility of the original process remains the same as volatility determines the possible price paths and any measure change does not interfere with the price paths. It changes the likelihood of the price paths.BTW, these measures old and new, go under the name,”equivalent measures’”. However if the original process becomes a martingale by changing the measure, then the new measure is called ”equivalent martingale measure”. Expressions for GBM and discounted GBM are given under risk neutral measure. The chapter then talks about Martingale representation theorem that basically says that you if you have two martingales with respect to same measure, you can manufacture one from another.

For some reason, I think this chapter should have had a clear description right at the beginning of the chapter, about the need for understanding measure change, martingale etc. I love the presentation in the book by Baxter and Rennie who give the three step procedure to find the option value, right at the very beginning :

  • Find a measure so that discounted stock price is a martingale. Here is where one can used Girsanov
  • Form a martingale process involving the claim of the relevant derivative
  • Use Martingale representation theorem to guarantee a self replicating portfolio

A portfolio with long stock and long money market account is a martingale under risk neutral measure. Since the discounted stock price is a martingale, you can manufacture a replicating portfolio using Martingale representation theorem. One way to understand Martingale Representation theorem is

  • In the big bad world, there is P measure
  • You can use Girsanov’s theorem to change to any measure. One can use the theorem to see to it that discounted stock price is a Martingale
  • Once you are in this world where discounted stock price is a martingale, it means that the discounted wealth equation of long stock and long money markets is also a Martingale
  • You can manufacture a process from the claim by conditioning on the sigma algebra.
  • You have now two martingales, one from the claim process, and one from the wealth equation.
  • You can use Martingale representation theorem to manufacture both the above processes from discounted stock price equation. Why? Both processes are martingales and hence you can merely scale and shift the discounted price martingale to manufacture other martingales

The chapter then uses Multidimensional Girsanov and Multidimensional Martingale theorem to state two fundamental theorems of asset pricing. The first states that if a market model has a risk-neutral probability measure, then it does not admit arbitrage. The second theorem is about the uniqueness of risk neutral measure. The chapter concludes by using risk neutral framework for valuing options on stocks that pay continuous dividends, stocks that pay discrete dividends, forwards and futures.

Chapter 6 – Connections with Partial Differential Equations

This chapter gives the four-step procedure for finding the pricing differential equation and for constructing a hedge for a derivative security. They are

  • Determine the variables on which the derivative security price depends. In addition to time t, these are the underlying asset price S(t) and possibly other stochastic processes. We call these stochastic processes the state processes. One must be able to represent the derivative security payoff in terms of the state processes
  • Write down a system of SDEs for the state processes. Be sure that, except for the driving Brownian motions, the only random processes appearing on the right hand side of these equations are the processes themselves. This ensures that the vector of state processes is Markov
  • The Markov property guarantees that the derivative security price at each time is a function of time and the state processes at that time. The discounted option price is a martingale under the risk neutral measure. Compute the differential of the discounted option price, set the dt term equal to 0, and obtain thereby a PDE.
  • The terms multiplying the Brownian motion differentials in the discounted derivative security price differential must be matched by the terms multiplying the Brownian motion differentials in the evolution of the hedging portfolio.

Chapter 7 – Exotic Options

This chapter contains the pricing for three kinds of exotic options:

  • Barrier options: PDE approach and risk neutral approach are described. In the risk neutral expectation evaluation, joint density of maximum of Brownian motion and Brownian motion is used to derive a closed form solution. To make the computations easy, another change of measure is done from the risk neutral measure so that the term with constant drift term is also removed from the context. So, in all there are two change of measures that take place in the evaluation of barrier options, first is the change of measure from real world to risk neutral world, second is the change of measure from a risk neutral world to a world that makes computations even more convenient. PDE pricing is done using stopping times and optional sampling theorem. Out of the two approaches, PDE approach is more appealing to me.
  • Lookback options: Floating strike case is analyzed where the lookback option is priced using PDE approach as well as risk neutral approach. In using the PDE approach, there is an extra dY term that is not like dW or dt. This differential gives rise to a new boundary condition. PDE approach looks elegant as compared to several pages of ink wasted on deriving a closed form solution
  • Asian options: The PDE approach has a twist here. One has to introduce a new state variable so that the pair of processes involving the stock price and the new process constitute a two dimensional Markov process. The PDE looks similar but the only change is a new boundary condition. This option has no closed form solution and hence risk neutral expectation approach is not explored. Instead a Numeraire based approach is given. Numeraire based option valuation is a powerful way of thinking about option valuation. The advantage of learning and understanding this approach is that it can applied to a larger universe of derivative securities valuation.

Chapter 9 – Change of Numeraire

This chapter deals with numeraire, unit of account in which other assets are denominated. What’s the advantage of valuing assets in terms of a numeraire? Well, firstly there could be financial considerations where the claim processes force to value the claim in terms of different currency, i.e. different numeraire. More often, numeraire approach is taken for ease of modeling. A model can be complicated or simple depending on the choice of numeraire. Firstly one must keep in mind that the risk neutral measures changes as soon as the assets are accounted by a different numeraire. Hence one might have to change the measure again so that risk neutrality is preserved. One massive advantage is that this change of measure arising due to numeraire has a very appealing Radon Nikodym derivative process. It turns out that the discounted numeraire that is normalized turns out to be the Radon Nikodym derivative process.

This simplifies many computations. The chapter shows applications of three numeraires

  • Domestic money market account
  • Foreign money market account
  • A zero-coupon bond maturing at time T also called the T-forward measure

In the case of the first two numeraires, appropriate measures are computed where by the relevant quantities become martingales. For example a stock and foreign money market that are valued according to the domestic money market numeraire have a domestic risk neutral measure under which the two assets are martingales. It is important to find out the martingale measure so that one can use Martingale representation theorem to create a self-replicating portfolio. In the same manner, the Domestic money market and stock valued according to the foreign money market numeraire have a foreign risk neutral measure.

Siegel’s exchange rate paradox is explained very well using the domestic risk neutral measure and foreign risk neutral measure. The chapter ends with valuing an option under a random interest rate environment using forward measure. There is an exercise problem on quanto option that illustrates the power of numeraire in valuing options. Quanto option pays off in one currency the price in another currency of an underlying asset without taking currency conversion in to account. For example a quanto call on a British asset struck at $ 25 would pay $ 5 if the price of the asset upon expiration of the option is £30. To address this problem, you take the asset with GBM; divide by exchange rate to get the price in the foreign currency. You then show that the price process is a GBM too and then value quanto options.

Chapter 11 – Jump Processes

If one has to introduce jumps in to the price process, one of the elementary ways that is analytically tractable is the Poisson process. Some preliminary background is provided for the reader so that Poisson and Compound Poisson processes can be incorporated in to derivative modeling. Poisson process is an appealing process for many reasons, one of it being that it is memoryless. Poisson processes are characterized by exponentially distributed interarrival times or gamma distributed arrival times or as a counting process. A homogeneous Poisson process has stationary and independent increments. Typically one comes across all the properties are Poisson processes in any elementary text on probability. A variant of basic Poisson process that is relevant from derivative pricing perspective is the Compensated Poisson process. This process is a martingale and like everywhere else in math fin, martingales are cherished objects for risk neutral valuation.

Poisson process is too simplistic for financial markets. The most basic variant of Poisson process is Compound Poisson that allows for random jump sizes. These random jumps are IID and are independent from the Poisson process. A compensated compound Poisson process is defined and is shown that this is a martingale. One of the classic ways to look at Compound Poisson process with finite jump size is by using superposition principle. If you consider a time interval, one can define several Poisson processes that have fixed jump size whose intensity is proportional to the intensity of the original process. This decomposition of the original compound Poisson process in to multiple Poisson processes of fixed size is an analytical convenience. Many problems can be solved by this property of splitting and merging Poisson processes.

In a pure Black Scholes world the delta hedge position needs to be integrated with respect to dW, the Brownian motion process. In the case of a jump diffusion process, the integrator is a process that has a pure jump process component and a Brownian motion component. The chapter deals with processes with finitely many jumps in any time interval. Obviously with this new component in the price process, one needs tools to work with. One of the first techniques that need to be learnt is the application of Ito’s lemma for a process with jumps. Since the jump process and Brownian motion are independent, Ito’s lemma for a jump process looks very similar to Ito’s lemma defined for a Brownian motion functional. The only extra term is the one that captures the jump behavior over an interval. Stochastic integral of a function with respect to jump process is defined and it has most of the structure of a general Ito process. Obviously there are a few restrictions on the integrand so that stochastic integral makes sense. Quadratic variation of the process also changes as there is a jump component. Again since the jump component is independent, the quadratic variation term arising from Brownian integrator tags along with the quadratic variation of the jump process. Ito’s lemma for multiple jump processes is also mentioned. Perhaps the most challenging section of the chapter is the one on risk neutral measure. A change of measure for a simple Poisson process affects the intensity of the process. A change of measure for a compound Poisson process affects the intensity and the distribution of the jump sizes. In each case, an equivalent of Girsanov theorem is stated for changing the measure. Detailed explanation and derivations are given for change of measure for a homogeneous Poisson, compound Poisson, compound Poisson + Brownian motion. The associated Radon Nikodym derivatives are also provided. The chapter ends with pricing a call option under jump diffusion process.



This book provides a clear exposition of all the concepts relating to the stochastic calculus that are needed for understanding advanced continuous-time models.


“Big Data” has entered every one’s vocabulary, thanks to the wild success of few companies that have used data to provide valuable information and services. This book gives a bird’s eye view of the emerging field.

The book starts off with an interesting example of the way Google predicted the spread of flu in real time after analyzing two datasets, first one containing 50 million most common terms that Americans type and second one containing the data on the spread of seasonal flu from public health agency. Google did not start with a hypothesis, test a handful models and pick one amongst them. Instead Google tested a mammoth 450 million different mathematical models in order to test the search terms, comparing their predictions against the actual flu cases. They used this model when H1N1 crisis struck in 2009 and it gave more meaningful and valuable real time information than any public health official system.

There is no formal meaning for the term, “Big Data”. Informally it means the ability of society to harness information in novel ways to produce useful insights or goods and services of significant value. It is estimated that the total amount of stored info in the world is close to 1200 exabytes(1 exa byte = 1000 GB). Only about 2% of it is analog. So, after the basic digitization efforts, the next game changer, the book predicts is going to be “Big Data”. At its core, big data is about predictions. Though it is described as a branch of computer science, “machine learning”, this characterization is misleading. Big Data is not about trying to “teach” computer to “think” like humans. Instead, it’s about applying math to huge quantities of data in order to infer probabilities.

The book talks about the three shifts that are happening in the way we analyze information :

  1. “N = all” : Gone are the days when you had constraints on getting the entire dataset. In the “small data” world, one could start with a few hypotheses, employ stats to get the right sample from the population, employ estimation theory to find the right statistic to summarize data, and then draw conclusions. This procedure is becoming irrelevant, at least a predominant part of it. Once you are dealing with the entire dataset, there is a richer structure of data available. The variations amongst subcategories can be studied to one’s heart’s content.
  2. Messy: Gone are the days when you had to crave for exact datasets. The data nowadays is stored across multiple servers and platforms and is inherently messy. Instead of trying to make it structured and make it suitable to relational database, the database technologies are evolving where “no structure” is the core philosophy. noSQL, Hadoob, Map Reduce, etc. are a testimony to the fact that database technology has undergone a radical change. With messy data, comes the advantage of breadth. Simple models with lots of data are performing better than elaborate models with less data. One of the examples mentioned in this context is the grammar checker in MS word. Instead of spending efforts in developing more efficient algos than that are already available, the guys at MSFT decided to focus efforts on building a large corpus. This shift in big data thinking dramatically increased the efficiency of algos. Simple algos were performing much better than complicated ones with large corpus of words as ammunition. Google has taken grammar check to a completely different level by harnessing big data.
  3. Correlations : “What” is important and matters more than “Why”. Causality can be bid goodbye once you have a huge datasets. Correlations that are notoriously unstable in small datasets can provide excellent patterns in analyzing big data. Non linear relationships can be culled out. Typically nonlinear relationships have more parameters to estimate and hence the data needed to make sense of these parameters becomes huge. Also the parameters have high standard errors in the “small data” world. Enter “Big Data” world, the “N=all” means that the parameters will tend to show stability. Does it mean whether it is end of theory? Not really. Big data itself is founded on theory. For instance, it employs statistical theories and mathematical ones, and at times uses computer science theory, too. Yes, these are not theories about the causal dynamics of a particular phenomenon like gravity, but they are theories nonetheless. Models based on them hold very useful predictive power. In fact, big data may offer a fresh look and new insights precisely because it is unencumbered by the conventional thinking and inherent biases implicit in the theories of a specific field.

The book goes on to describe various players in the Big Data Value chain

  • Data Holders : They may not have done the original collection, but they control access to information and use it themselves or license it to others who extract its value.
  • Data Specialists : companies with the expertise or technologies to carry out complex analysis
  • Companies and individuals with big data mindset : Their strength is that they see opportunities before others do—even if they lack the data or the skills to act upon those opportunities. Indeed, perhaps it is precisely because, as outsiders, they lack these things that their minds are free of imaginary prison bars: they see what is possible rather than being limited by a sense of what is feasible.

Who holds the most value in the big-data value chain? According to the authors,

Today the answer would appear to be those who have the mindset, the innovative ideas. As we saw from the dotcom era, those with a first-mover advantage can really prosper. But this advantage may not hold for very long. As the era of big data moves forward, others will adopt the mindset and the advantage of the early pioneers will diminish, relatively speaking.

Perhaps, then, the crux of the value is really in the skills? After all, a gold mine isn’t worth anything if you can’t extract the gold. Yet the history of computing suggests otherwise. Today expertise in database management, data science, analytics, machine-learning algorithms, and the like are in hot demand. But over time, as big data becomes more a part of everyday life, as the tools get better and easier to use, and as more people acquire the expertise, the value of the skills will also diminish in relative terms. Similarly, computer programming ability became more common between the 1960s and 1980s. Today, offshore outsourcing firms have reduced the value of programming even more; what was once the paragon of technical acumen is now an engine of development for the world’s poor. This isn’t to say that big-data expertise is unimportant. But it isn’t the most crucial source of value, since one can bring it in from the outside.

Today, in big data’s early stages, the ideas and the skills seem to hold the greatest worth. But eventually most value will be in the data itself. This is because we’ll be able to do more with the information, and also because data holders will better appreciate the potential value of the asset they possess. As a result, they’ll probably hold it more tightly than ever, and charge outsiders a high price for access. To continue with the metaphor of the gold mine: the gold itself will matter most.

What skills are needed to work in this Big Data world?

Mathematics and statistics, perhaps with a sprinkle of programming and network science, will be as foundational to the modern workplace as numeracy was a century ago and literacy before that. In the past, to be an excellent biologist one needed to know lots of other biologists. That hasn’t changed entirely. Yet today big-data breadth matters too, not just subject-expertise depth. Solving a puzzling biological problem may be as likely to happen through an association with an astrophysicist or a data-visualization designer.

The book ends with chapters on risks and control, where the authors cover a variety of issues that will have to be dealt in the “Big Data” world. The book in trying to explain the field, gives a ton of examples. Here are some that I found interesting :

  • Google Flu trends – Fitting half a billion models to cull out 45 variables that detect the spread of flu.
  • Entire dataset of Sumo Wrestlers results analyzed by freakonomics authors to cull out interesting patterns
  • Farecast, a site that helps predict the direction of air fares over different routes
  • Hadoob : Open source alternative to Google’s Map Reduce , a system to handle gigantic datasets
  • Recaptcha : Instead of typing in random letters, people type two words from text-scanning projects that a computer’s optical character-recognition program couldn’t understand. One word is meant to confirm what other users have typed and thus is a signal that the person is a human; the other is a new word in need of disambiguation. To ensure accuracy, the system presents the same fuzzy word to an average of five different people to type in correctly before it trusts it’s right. The data had a primary use—to prove the user was human—but it also had a secondary purpose: to decipher unclear words in digitized texts.


  • 23and Me – DNA sequencing using BIG Data mindset
  • Billion Prices project : A project that scours web for price information and gives an indication of CPI real time. This kind of information is crucial for policy makers.
  • ZestFinance – Its technology helps lenders decide whether or not to offer relatively small, short-term loans to people who seem to have poor credit. Yet where traditional credit scoring is based on just a handful of strong signals like previous late payments, ZestFinance analyzes a huge number of “weaker” variables. In 2012 it boasted a loan default rate that was a third less than the industry average. But the only way to make the system work is to embrace messiness.
  • Endgame cracked : Chess endgames with lesser than 6 or fewer pieces on the board has been cracked. There is no way a human can outsmart a computer.
  • NY city manholes problem solved using Big Data thinking.
  • Nuance made a blunder while licensing technology to Google for the service, GOOG-411 for local search listings. Google retained the voice translation records and reused the data in to a whole gamut of services.
  • Flyontime : Visitors to the site can interactively find out (among many other correlations) how likely it is that in- clement weather will delay flights at a particular airport. The web- site combines flight and weather information from official data sources that are freely available and accessible through the Internet.
  • Decide : Price-prediction engine for zillions of consumer products.
  • Prismatic : Prismatic aggregates and ranks content from across the Web on the basis of text analysis, user preferences, social-network-related popularity, and big-data analytics. Importantly, the system does not make a big distinction between a teenager’s blog post, a corporate website, and an article in the Washington Post: if the content is deemed relevant and popular (by how widely it is viewed and how much it is shared), it appears at the top of the screen.
  • Zynga : “We are an analytics company masquerading as a gaming company. Everything is run by the numbers,”says an Zynga executive.


Big data is a resource and a tool. It is meant to inform, rather than explain; it points us toward understanding, but it can still lead to misunderstanding, depending on how well or poorly it is wielded. This book gives a good overview of the major drivers that are taking us towards the Big Data world.


The author of this book, Anatoly B. Schmidt, has written a book on Market Microstructure and an introductory book on Quant Fin for Physicists 

Schmidt is a Physics PhD from Russia who moves to NY and joins the quant-bandwagon on Wall Street. From a person like that one might expect that his autobiography would contain war stories about building-tweaking-testing-implementing models. Unfortunately this book mentions none whatsoever. Instead the author writes about his affairs with various women and his failed marriage. Utterly useless book.


The book begins with a brief note that highlights the difference between “expectation pricing” and “arbitrage pricing”. It gives an example of a bookmaker, someone who takes bets on horses. The bookmaker can always stay in the business by setting up odds based on the money at stake, rather than based on actual probabilities. If the book maker does a statistical analysis of horse performances,track conditions, historical data, etc.. and then sets the odds, there is always a possibility of a huge loss and getting wiped out. If the odds are quoted based on the amount bid on various horses, whatever be the outcome, he can always stay in business. This one little page in the preface is illustrative of the powerful technique to price financial instruments, i.e. “arbitrage pricing”.

Chapter 1

The first chapter is a shocker for someone coming from a pure statistics world where the expectation of a random variable is like a biblical term in the context of a “unbiased estimate of the random variable”. The expectation of a random variable is good but there is a problem with it if you apply directly to the financial instruments. The author takes a simple example of a forward contract that is priced at K via the expectation of the stock price with respect to real world measure, i.e. evaluate the expectation by taking log normal distribution of stock prices. This sounds reasonable as the expected value of the stock price should be equal to the forward price. But this kind of logic for pricing forwards is useless. There is another more powerful force that is the feature of markets, “arbitrage”. A forward contract payoff  can be replicated by going long the stock and borrowing cash from the money market. The forward price should be based on this replicating portfolio. Anything other price quote, there will be an arbitrage. Hence, SLLN(Strong Law of Large Numbers) or expectations based pricing is not wrong, but it is not enforceable. This is the case with any instrument in finance. You can’t use strong law and expectation to price them. The takeaway from this chapter is: If there is an arbitrage price, any other price is too dangerous to quote.

Chapter 2

The chapter starts off with a discrete price process where the stock behaves like a coin toss. Using a one period binomial model, the author shows that the price of the option ought to be time 0 value of a replicating portfolio of stocks and bonds. Any other price quote, there will be an arbitrage. The key ideas one needs to understand are:

  • The discounted price process is a martingale under risk neutral and actual probabilities
  • The claim process is a martingale on the tree
  • The fact that there are two equivalent martingales measures means that one can produce one random process from another.( This is mentioned as Binomial representation theorem). This representation guarantees a replicating strategy.

All the terms that are relevant to Binomial representation theorem are explained using examples and visuals. The terms defined are

  • Process
  • Measure -  For one not familiar to measure change on the same outcome space, this is quite a new thing. In fact the whole derivative pricing hinges on change of measure.
  • Filtration -  It is important to think about these terms than merely reading up the definitions and having a vague notion of filtration. Some questions that can aid a reader’s thinking process are :
    • What’s the need for introducing filtration?
    • What exactly does it mean when you come across the phrase,” Xi is Fi measurable”
    • How does one turn the phrase “ more information as time moves forward” in to more precise language using sigma algebra notation ?
    • Can you think of simple process and write down the filtration for it ?
  • Claim
  • Conditional Expectation operator – It has two parameters, the measure and the history. Ideally whenever you use the term expectation, you should always append the phrase ,”with respect to xyz measure”. May be outside quant fin, one usually never hears this phrase as typically the word expectation means that you are talking with respect to the real world measure.
  • previsible process – This is a process whose value does not depend on the future. Given a particular time, the value this process takes is exactly known. 
  • Martingale  – Basic definitions are given and the a connection between measure and martingale is explained. A measure needs to be attached to every martingale , much like a measure needs to be attached to expectation operator.

With the above terms explained, the chapter then goes on to explain “ Binomial representation theorem”, a theorem that is key to understanding derivative pricing. The continuous-time form of the theorem is stated and proved in the next chapter. Seeing the theorem in the discrete form gives a reader enough intuition to understand the details of the continuous time form.

The crux of the binomial representation theorem is that there are two martingales under the same measure, you can manufacture one from another. In the case of a discrete binomial tree model, the discounted price process and discounted claim process are both martingales under risk neutral measure. Hence you can hedge one with the other. The theorem states that such a hedge is possible. Basically scaling and shifting of one random process can create another process. Subsequently, the necessary condition for a portfolio to be a self-replicating portfolio is also derived. The logical conclusion of this chapter is in the form of two slogans.

  1. There is a self-financing strategy in the binomial tree that duplicates any claim.
  2. The price of any derivative within the tree binomial tree model is the expectation of the discounted claim under the risk-neutral measure that makes the discounted stock a martingale.


Chapter 3

This chapter builds upon the principles covered in the previous chapter and moves to continuous-time domain. Given a discrete time model , one might think of extending the model to continuous time by applying limit conditions on tick times. However the limiting arguments are too dangerous to be used rigorously. What are continuous-time processes ? Continuous process can change at any point in time, can take values that can be expressed in arbitrarily fine fractions, and have no jumps. If you look at the literature on option pricing, the process that will stand out as the king of all processes is the “ Brownian motion” process.

It is sophisticated enough to produce interesting models and simple enough to be tractable. The first thing that strikes a reader is ,”What’s the connection between Brownian motion and stock prices ?” There is no relation of a stock price to Brownian motion at a global level, but when you get down to local behavior, the representation of a Brownian motion looks similar to a stock price movements. The point the authors make with the help of arguments and visuals is that “Brownian motion can’t be the whole story but locally Brownian motion looks realistic to a stock price movement”.

A intuitive way of understanding Poisson processes begins with applying scaling a discrete binomial process. Similarly , one can look to binomial process to help one get a good intuition of the Brownian motion. An easy way to understand Brownian motion is to take a symmetric random walk and then change the time scale. One can construct a random walk process in such a way that marginal and conditional distributions match as that of a Brownian motion. There are some peculiarities of the Brownian motion (BM)  that are highlighted such as

  • BM is a nowhere differentiable function
  • BM will hit any value with probability 1
  • Once BM hits a value, it immediately hits it again infinitely often
  • BM is a fractal

The chapter subsequently talks about the Geometric Brownian motion(GBM), the standard price process assumed for a stock in most of finance. The chapter then gives definitions for a stochastic process and introduces the symbolic notation via Stochastic differential equation. Ito’s calculus is then introduced. Ito’s lemma is mainly used to formulate a SDE from process or cull out the process from the SDE. The latter essentially means that it is a tool that is useful for solving SDE. Univariate and Bivariate Ito’s lemma are then introduced and some basic examples are given.

The highlight of this chapter is Cameron-Martin-Girsanov theorem. In order to understand this theorem, one needs to understand

  • Change of measure
  • Radon Nikodym derivative and Radon Nikodym process
  • Equivalent measures

Instead of jumping in to the continuous-time domain, the author takes time to go over Radon Nikodym process in a discrete setting using good visuals. These visuals are really a good way to illustrate the procedure of changing measure. The reason the process is important is that it gives a way to change a price process from one measure to another. It serves as the bridge between the expectation of a random claim under two different measures. In the context of derivative pricing, one typically deals with market measure and risk neutral measure. A few examples are given to illustrate the outcome of change of measure. It is made clear that the process outcomes are not changed, only the likelihood of those outcomes are changed. For example a Brownian motion with drift becomes a Brownian motion without drift etc.

After the explanation of all the relevant math tools, the chapter states Cameron-Martin-Girsanov theorem. The crux of the theorem is that given a drifting Brownian motion under a measure, it can be transformed in to a Brownian motion with out drift under a different measure. The link between the two is the Radon Nikodym process.

Finally Martingale representation theorem is stated , a continuous-time version of Binomial representation theorem. The essence of theorem is : Given two martingale processes under risk neutral measure , one can use one to manufacture other. The discounted price process is a martingale under risk neutral measure, the discounted claim process is a also a martingale under risk neutral measure and hence there is a replicating strategy that links between these two processes.

The existence of replication strategy is guaranteed by Martingale representation theorem. As in the case of discrete case, the theorem does not give you the exact strategy, it merely states that there is one. With all these concepts explained, the author discusses the famous Black Scholes pricing and Black Scholes PDE. In doing so, the author clearly states the three steps to replication. Instead of using symbols, I will write down the steps in words

  1. Find a risk neutral measure that makes the discounted stock price a martingale
  2. Form the discounted claim process under this risk neutral measure
  3. Find a previsible process such that a self replicating portfolio exists

To make the transition smooth, the chapter discusses the replication strategy in a world with no interest rates. It then moves in to the world with interest rates to show the basic replication strategy for a derivative security. The beauty of risk neutral valuation is that the formula remains more or less the same for a variety of derivative securities.

Chapter 4

The most important lesson that this chapter tries to impart is that “Martingales are tradables” and "Non Martingales are non tradables”. The chapter considers various cases where the derivative is written on something that is not directly tradable. In the case of the foreign exchange process one needs to convert from a non-tradable cash process to a tradable discount bound process. For dividend paying equities, the model process needs to be changed so that dividends are reinvested. For bonds, the coupons need to be reinvested in the numeraire process. Underlying all this is a tradable/non tradable distinction. Unless you create a tradable that is martingale, there is no way one can use CMG theorem.

This distinction between tradable and non tradables is made concrete by connecting it with martingales. Through nice and easy no arbitrage arguments, the chapter proves that

  • Martingales are tradables
  • Non tradables are Non Martingales

There is an interesting connection between CGM theorem and market price of risk. If there are x number of tradable assets, it means that the discounted price process for the assets should be martingales under same measure Q. This means all tradables in a market should have the same market price of risk. The market price of risk is actually the drift change of the underlying Brownian motion given by CMG theorem. If we write the SDEs in terms of Q(risk neutral measure) Brownian motion, then the asset is tradable if and only if its market price of risk is zero.

The above statements become important in a one factor model when one is trying to create a replicating portfolio of a claim. There are times when the claim is a function of non tradable , for example in the case of a stock paying continuous dividends. If a call option is written on this stock, you cannot hedge the position with the original price process. So, you try to find a function of that non tradable that is tradable and then use that function to work with the CGM framework. In the one factor model world where there are two independent tradables, the cash bond and the stock, all the other tradable are nothing but a linear combination of the two assets. Again the assumption here is that there is a single source of randomness.

The key idea that one needs to get from this chapter is that there are claims that are written on non tradables( the classic example is that of zero coupon bond that is a claim on the interest rate, which is a non tradable), and one needs to formulate an appropriate function of the tradable and use CGM framework to form a replicating portfolio.The chapter ends with a discussion of quantos.Quantos are interesting derivative contracts whose derivative payoffs are paid off in a different currency. As with any exercise of valuing derivatives, one must observe the tradables and the number of Brownian motions that are driving the processes. Once the tradables are identified, the procedure is similar to the one that is used through out the book,i.e cut the drift of the Brownian motions using Girsanov theorem, form the discounted claim process and use Martingale representation theorem. The takeaway from the section on Quantos is that since there is a measure under which dollar tradables are martingales, one can price quanto options.

Chapter 6: Bigger Models

The chapter starts off by considering a generalized GBM where the drift and volatility parameters are dependent on the previsible processes. Despite these relaxations, there is no change to the procedure to value a derivative on the stock. The same three steps to replication can be used to value an option. The flip side of making fewer assumptions is that you don’t end up with closed form solutions for derivative prices. 

Further generalization of the GBM model is done by allowing an n-dimensional Brownian motion. To deal with n-dimensional volatility processes, there is a crash course given on n-factor Ito’s lemma and n factor Martingale representation theorem. The price one needs to pay for allowing n dimensional Brownian motions is that there are restrictions on the existence of Martingale measure. Once these restrictions, termed as “market price of risk equations” are satisfied, the three step replication framework works like a charm. The last section of the book is actually the section that fits all the pieces of jigsaw puzzle together. It states the arbitrage-free and completeness theorem that is the basis on which “three step procedure” works. 


Risk neutral pricing technique comprises three main steps, i.e. 1) Finding a measure under which tradables are Martingales, 2) Constructing a claim process under the measure found in the previous step, 3) Using Martingale representation theorem to form a self replicating portfolio. This framework is used to value many types of derivative contracts, in each case tweaking some aspect but retaining the overall philosophy. The highlight of the book is that the authors emphasize the three step framework over and over again at so many places that it becomes your natural way of thinking about any derivative instrument pricing.


In the recent years there have been a lot of books written about quants on Wall Street. Some of the books have been a journalistic account of people, events, technologies that have revolutionized Wall Street. Some of the books are more technical that look like applied math/stats books. This book is a welcome addition to the existing literature. The book has been written by a Physics PhD who traces the rise of Physicists on Wall Street. Emanuel Derman wrote his story (My Life as Quant) way back in 2004 and after a long gap, this is another book from a Physicist.

In the recent crisis quants were blamed for their models. Taleb says you can’t model anything other than plain vanilla options and even that is difficult. So, one sees a lot of people saying financial modeling is just a fancy exercise with no meaningful contribution to the financial world. If that is the case, then what are physicists, mathematicians, statisticians doing in finance? Shouldn’t they be disillusioned after they see that modeling in finance is farce? Is money involved making them believe that they are doing something meaningful? There have been a lot of articles blaming physicists and quants for running failing funds and raking up losses. But the story that is lost in whole of this literature is that, “Someone with some business sense had been convinced that the quants were on to something “. The author says that it is this part that got him intrigued and made him research on this subject. In that sense it is a historical account of ideas from physics that made sense in the financial markets and on Wall Street. In this post, I will try to summarize the physicists, mathematicians, scientists mentioned in the book

Primordial Seeds

Louis Bachelier

The book starts off with a narrative that aims to capture the efforts of various people who brought scientific thinking in to markets. Some of the events described are:

  • 1526: Cardano, a physician and a gambler, writes a book that gives a systematic treatment to probability.
  • 1654: Pascal and Fermat work on the foundations of the modern theory of probability
  • 1705: Jacob Bernoulli works on relationship between probabilities and frequency of events
  • 1900: Louis Bachelier writes his PhD thesis, “The theory of speculation”. It was one of the first attempts to introduce mathematics in to stock markets. He postulated that , if a stock price undergoes a random walk, the probability of it taking a given value after a certain period of time is a Gaussian distribution. He is credited to be the first person to look at markets as random walks. He developed the first option pricing model but it never got popularity as he did not offer any clear insight of incorporating it in to a trading strategy.
  • 1905: Einstein explains Brownian motion
  • 1930s: The term “model” made its way in to economics , courtesy physicist turned economist, Jan Tinenberg. The term is used in physics to refer to something just shy of a full physical theory. A theory is an attempt to completely and accurately describe some feature of the world. A model, meanwhile is a kind of simplified picture of how a physical process or system works.
  • 1930s: DuPont’s famous Nylon project helped demolish the wall between pure physics and applied physics


Swimming Upstream

Maury Osborne

Narrative : 1916-1959

In 1959 Maury Osborne publishes a paper titled” Brownian Motion in the Stock Market”. He argues that it is the rate of return that is normally distributed and not the prices. By establishing that the prices are lognormal and rate of returns are normal, he makes a key change to the price process. Instead of the earlier arithmetic Brownian motion, he argues that stocks follow geometric Brownian motion. He had two arguments up his sleeve. By assuming GBM, the stock prices could never be negative. If the return is a huge negative number, the stock will be very close to 0 but never below 0.The second argument is that the investors care about returns and not about absolute value of stock prices. The success of this model clearly shows that the best mathematical models are the ones that take psychology in to account. Osborne’s story illustrates an important message. Models are built incrementally. He made a surprising connection between his research (migratory efficiency of Salmon) and stock market. Once he realized that probability of up move and down move were different, he changed the assumptions of random walk model to make it better. This sort of approach “looking at data, building a model with assumptions, carefully looking at all the assumptions that breakdown frequently, modifying the model “ , is one of the reasons for Physicists making a breakthrough in finance.


From Coast Lines to Cotton Prices

Benoit B. Mandelbrot

Narrative : 1950 – 2008

Mandelbrot’s PhD thesis on Zipf’s law and his subsequent explorations of fractal geometry turned out to be a bitter pill to most economists. If Mandelbrot’s central ideas are correct, everything traditional economist believe about markets is fundamentally flawed. The assumptions that underlie most of the modern financial theory fall in to the category where random events are treated as outcomes of coin tosses or casino games. God tosses a coin and stock moves up and down based on the outcome. Osborne improved on this when he found that stocks change by fixed percentage rather than fixed amount depending on god’s coin toss outcome. This modification led to the observations that rates of return are normally distributed and prices should be log normally distributed. Mandelbrot was at the other extreme. He believed that random events are not mild as described by normal distributions. They are wild, like the Cauchy distributions. Came across a nice analogy of Cauchy distribution

Imagine a drunken firing squad.Each member stands, rifle in hand, facing a wall. (for argument’s sake, assume the wall is infinitely long.) Just like the drunk walking, the drunks on the firing squad are equally liable to stumble one way as another. When each one steadies himself to shoot the rifle, he could be pointing in any direction at all. the bullet might hit the wall directly in front of him, or it might hit the wall 100 feet to his right (or it might go off in the entirely opposite direction, missing the wall completely). Suppose the group engages in target practice, firing a few thousand shots. If you make a note of where each bullet hits the wall (counting only the ones that hit), you can use this information to come up with a distribution that corresponds to the probability that any given bullet will hit any given part of the wall. When you compare this distribution to the plain old normal distribution, you’ll notice that it’s quite different. the drunken firing squad’s bullets hit the middle part of the wall most of the time — more often, in fact, than the normal distribution would have predicted. But the bullets also hit very distant parts of the wall surprisingly often — much, much more often than the normal distribution would have predicted.this probability distribution is called a Cauchy distribution. Because the left and right sides of the distribution don’t go to zero as quickly as in a normal distribution (because bullets hit distant parts of the wall quite often), a Cauchy distribution is said to have “fat tails.” one of the most striking features of the Cauchy distribution is that it doesn’t obey the law of large numbers: the average location of the firing squad’s bullets never converges to any fixed number. If your firing squad has fired a thousand times, you can take all of the places their bullets hit and come up with an average value — just as you can average your winnings if you’re playing the coin-flip game. But this average value is highly unstable. It’s possible for one of the squad members to get so turned around that when he fires next, the bullet goes almost parallel with the wall. It could travel a hundred miles (these are very powerful guns) — far enough, in fact, that when you add this newest result to the others, the average is totally different from what it was before Because of the distribution’s fat tails, even the long-term average location of a drunken firing squad’s bullets is unpredictable

Fractals exhibit wild randomness. If you try to find the length of a coast line, every time you add observations, the estimate changes so dramatically that the average does not make sense for such thing. There is no expected value for the average size of a feature on a coast line. from one point of view, they are beautifully ordered and regular; from another, wildly random. And if fractals are everywhere, as Mandelbrot believed, the world is a place dominated by extremes, where our intuitive ideas about averages and normalcy can only lead us astray.

Mandelbrot has a freakish gift for visualizing abstract algebraic problems. Working at IBM, he found fractals in income distributions.By analyzing the 20% of the rich people’s incomes, he found that there was another 80-20 rule that separated the rich from ultrarich. In a chance encounter with another professor, he concluded that cotton prices had fractals. He saw fractals everywhere and thus his range of research contributions is truly outstanding.

In 1964, Paul Cootner translates Bachelier’s work and makes it accessible to a large audience. He also includes Mandelbrot’s work. So around 1965 , financial theorists had a choice – follow Osborne or follow Mandelbrot. Cootner made the argument this way at a meeting of the econometric Society, in response to Mandelbrot’s work on cotton prices

Mandelbrot, like Prime Minister Churchill before him, promises us not utopia but blood, sweat, toil, and tears. If he is right, almost all of our statistical tools are obsolete. . . . Almost without exception, past econometric work is meaningless. Surely, before consigning centuries of work to the ash pile, we should like to have some assurance that all our work is truly useless.

According to some, finance took a wrong turn around 1965 by assuming mild randomness. This mild randomness assumption became even more popular, courtesy Eugene Fama. In 1965, Eugene Fama research on efficient market hypothesis sets the path for many economists at University of Chicago.

Beating the Dealer

Ed Thorp

Narrative : 1932 – 1987

Ed Thorp showed that physics and mathematics can be used to profit from financial markets. Information theory proved to be the missing link between the statistics of market prices and winning strategy on Wall Street. Card card counting is a process by which you gain information about the deck of cards — you learn how the composition of the deck has changed with each hand. Ed Thorp was the first one to connect the information flows and money management principles. Connecting Shannon’s information theory and Kelly’s criterion, he applied their principles to stock markets. Ed Thorp took the side of Osborne as he believed in the random walk of returns. His interest in warrant pricing made him the first person to have cracked option pricing with replication argument. He never published this stuff and thus options pricing remained elusive until Black Scholes worked out the pricing. Also his approach made it a trading strategy rather than a replication strategy, the latter was very appealing to banks selling options. Ed Thorp created an arbitrage strategy between warrants and stocks. By creating a delta hedge on the warrants and using his knowledge from blackjack, he ran a successful quant fund (of Princeton-Newport , 20% returns every year for 40 years) that launched a slew of quant funds in the decade that followed.

Physics Hits the Street

clip_image008 clip_image010 clip_image012

Fischer Black – Myron Scholes – Robert Merton

Narrative : 1964 – 1987

These years were characterized by physicists moving over to Wall Street in big numbers, thanks to Black Scholes and Merton model and exponential rise of derivative trading and market making. Thorp took Osborne’s idea and assumed that the option should be priced as a fair bet, worked out the option price and then created a delta hedging strategy. Black Scholes approach was the reverse. They created a dynamic replicating portfolio and argued using CAPM that this should yield a risk free rate of return. In this risk neutral world, one can price the option. This approach became popular and useful as compared to Thorpe’s as the banks could now manufacture an option. Thus the role of market making for options exploded using black schools replicating strategy. This blind faith in Black Scholes ended in a rude shock , thanks to 1987 stock market crisis. An intense work on reworking on some of the assumptions of Black Scholes was taken up by many quants.


Emanuel Derman came up pricing with the volatility smile that appeared in the markets, post 1987 crash. Several other tweaks were made to Black Scholes pricing to relax the strict assumptions of the original model.


The Prediction company

James Doyne Farmer & Norman Packard

Narrative : 1975 – 1999

The story is about two physicists who go on to start a company, “The Prediction company”, with the purpose of applying chaos theory and nonlinear forecasting techniques to cull out patterns at a micro level. This was one of the first companies on Wall Street that used “Black Box” models for creating trading strategies. The wild success of “The Prediction company” can be attributed to luck, but who knows, maybe they were on to something that enabled them to read the micro level inefficiencies in the market and let them capitalize on them. One thing is certain from reading their story : their years of toil behind making their first company “eudaemonia”, a group that developed strategies to win roulette gave them a ton of learning, which they finally managed to translate in to winning strategies on Wall Street.

Tyranny of the Dragon King

Didier Sornette

Narrative : 1983 – 2012

I have learnt a new term from the book, “Dragon King”, a word coined by Didier Sornette describing extreme events that come with a warning. These events are different from “Black Swan” kind of events that are completely unpredictable.

“Dragon Kings” : The word, “king” because, if you try to match plots like Pareto’s law—the fat-tailed distribution governing income disparity that Mandelbrot studied at IBM — to countries that have a monarchy, you find that kings don’t fit with the 80–20 rule. Kings control far more wealth than they ought to, even by the standards of fat tails. they are true outliers. And they, not the extremely wealthy just below them, are the ones who really exert control. The word dragon, meanwhile, is supposed to capture the fact that these kinds of events don’t have a natural place in the normal bestiary. They’re unlike anything else. Many large earthquakes are little ones that, for whatever reason, didn’t stop. These are not predictable using Sornette’s methods. But dragon-king earthquakes, the critical events, seem to require more. Like ruptures, they happen only if all sorts of things fall into place in just the right way.

Sornette’s story is presented in the book to cite an example of the way scientific principles can be applied to finance at a macro level. A physicist by profession, Sornette has contributed in a dozen fields ranging from material science material science to geophysics, to decision theory , even to neuroscience. In all these fields, Sornette’s work involves identifying patterns endemic to the structures of the complex systems and using these patterns to predict critical phenomenon. His work lies on an assumption that there are always telltale patterns of self-organization, coordination before an extreme event. His theory is not so much about predicting the next tulip mania or next disaster. Instead his methodology helps in identifying situations when herding effects have already taken place. He then uses the system conditions to predict the critical point when the entire thing explodes. Sornette has successfully applied his methodology to a host of crisis situations in the last 20 years, including the subprime mess. The story is very interesting because it tells about a physicist who has used Mandelbrot’s methods and has successfully identified many systems that had become instable. So, in one sense when the whole army of economists and finance researchers were moving along the Osborne lead path, there were a few renegades who struck gold and Sornette is one amongst them.

A New Manhattan Project

clip_image022 clip_image023
Eric Weinstein & Pia Malaney

Narrative : 1988 – 2012

This story is about the couple, Eric Weinstein (mathematician) & Pia Malaney(economist) who come together to solve the index number problem. The CPI is a metric that captures the cost of ordinary things a person buys, a metric that is notoriously difficult to calculate as it needs to compare values at different times, for people living with different lifestyles. Economists world over realize that CPI needs to be fixed, but still nothing much has been done. Eric and Pia have brought fresh thinking in to economics by introducing “gauge theory” principles in various contexts. They firmly believe that economists need to broaden their theoretical framework to account for a wider variety of phenomena, i.e., the need of the hour is a new generation of theories and models, suited for the complexity of the modern world. It calls for a large-scale collaboration between economists and researchers from physics and other fields – an economic Manhattan Project



The book is about 200 pages long and out of which 160 odd pages contain stories that are already mentioned in other books. Having said that, I found the book hard to put down as it is fast paced and well written. The good thing about the book is that it is not a model bashing book but acknowledges that good models are a combination of math, stats, investor psychology and a ton of common sense. “Only through repeated iterations can models be truly useful”, is the takeaway from this book.


The purpose of the book is to illustrate various asset pricing concepts in a coin tossing world. Imagine a world where there is a money market instrument, an underlying security and various derivatives written on the underlying security. This world has a peculiar feature, i.e., the stock price can move up or down based on a coin toss. So, if one is looking at an option that has expiry 3 units from the current time, there are 8 outcomes of the stock price based on 8 coin toss realizations (HHH,HTH,HHT,HTT,THH,TTH,THT,TTT). Also, the probability of heads and tails are known upfront. One can’t think of a more simpler world for understanding option pricing. So, given this setting, the book explores the valuation of the following securities:

  • European Derivative Securities whose payoff is not path dependent
  • European Derivative Securities whose payoff is path dependent, like an Asian option or a lookback option
  • American Derivative Securities whose payoff is not path dependent
  • General American Derivative Securities
  • Perpetual put – an abstract instrument to illustrate various principles of American option pricing
  • Derivatives in the Fixed Income world.

The book starts off with a single period model where the payoff of a derivative security is replicated using a specific portfolio that is long some units of stock and money market. To replicate the portfolio, convenient variables are introduced which go by the name “risk neutral probabilities”. These are not the actual probabilities of the world but fictitious probabilities that help in solving a set of equations (that are used to obtain the hedge ratio and initial wealth needed to replicate the option payoff). Once the intuition is established using a one period model, the replicating portfolio is constructed dynamically for three periods. One sees that the risk neutral probabilities remain the same, and the framework that is applicable for a one period can be extended to multi-period model. The replication in the multi-period binomial model for a derivative security has one issue that is highlighted in the section titled “computational considerations”. If one needs to evaluate an option that has an expiry 100 units in to the future, one ends up with 2100 different coin toss realizations. This means that the final value of the option needs to be computed for 2100 coin toss realizations. Fortunately there is an easy way out. A flavor of Markov chain property is given in the first chapter itself, even though it is dealt at length in the subsequent chapters. The Markov property of the price process allows a computationally efficient way of pricing a derivative in a multi-period set up. There are two examples that illustrate the usefulness of Markov property. These two examples give an indication of the real power of Markov property. In the case of path dependent option, you can enlarge the state space to include variables that will make the new process possess the Markov property. I guess one can always come back to these two examples after one is comfortable with Markov property principles that are mentioned in subsequent chapters.

Chapter 2 gives enough math tools to work with the asset pricing model introduced in the first chapter. In the context of a coin toss model, basic concepts of probability are introduced such as the probability model, expectation, Jensen’s inequality, conditional expectation. Enough care is taken so that terms such as “Sigma Algebra”, “Lebesgue measure” are avoided. These terms usually demotivate a reader who is looking to gain intuition in to derivative pricing models. The key mathematical object discussed in this chapter is “Martingale”. Basic definitions and properties of martingales are illustrated using the price process in the binomial outcome world. The core idea is that the discounted wealth process is a Martingale. This coupled with the fact that the wealth process actually concurs with the payoff of the derivative security in every outcome, implies the discounted value of the derivative price process is also a Martingale.

The other aspect that is discussed in this chapter is the Markov property. Usually one comes across Markov chains definitions that have terms such as transition probabilities, initial distribution, etc. Nothing of that sort is mentioned here. May be the author already assumes some sort of background from the reader. Or maybe he does not want to unnecessarily introduce things that are not used anywhere in the book. The chapter defines Markov property using a conditional expectation definition. The Markov property becomes very important to reduce the computational complexity of the derivative pricing algorithm dealt in the first chapter. Using the multi-step-ahead Markov property, a recursive algorithm for a path dependent option is developed. If you start seeing things in the discrete-time world, I think it provides a good insight in to continuous-time world. For example, the recursive equations for the derivative in the discrete setting are equivalent to Partial differential equations in the continuous-time world. Feynman-Kac theorem that comes up in the Volume II of this book helps one move from a continuous-time analogue of risk-neutral pricing formula to PDEs. One might be comfortable with understanding Feynman-Kac theorem without ever coming across the discrete version. But I think the discrete time analogue of the same in the binomial model is going to be very useful for a lot of readers, to gain a better understanding of Feynman-Kac theorem.

Chapter 3 is all about Radon Nikodym derivative denoted by Z in the book. This variable is the bridge between evaluating expectation in the risk neutral world and the real world. Besides the price process and the derivative security price process, there is a process associated with Z. The way to manufacture this process is to from a ratio between the risk neutral probabilities and real world probabilities. Various properties of Z are explored such a Martingale property and these properties in turn help in evaluating the expectation under the real world measure. There is also a section on CAPM, which could have easily been pushed to appendix. Somehow I failed to appreciate the relevance of CAPM framework in this chapter.

Chapter 4 extends the concepts to American derivatives. This complex topic is dealt in a beautiful way. The multi-period binomial model introduced in the first chapter is extended to situations where the derivative can be exercised at any point in time before the expiry. The replication of a path dependent security is laid out in the chapter with enough numerical examples to get a sense of the algorithm. Stopping times are introduced as they are essential for the valuation of American derivative securities. Stopping time principle is useful in a lot of places and helps in reducing the computational steps in many problems. For example one can prove that a symmetric random walk is null recurrent using stopping time in just a few steps. The fact that Martingale stopped at a stopping time is a Martingale, submartingale (supermartingale) stopped at a stopping time retains its original property becomes critical in evaluating general American derivative securities. What’s the connection between stopping times and American options? One can think of the option being exercised at a time as a stopping time. The discounted derivative security price process of an American option is a supermartingale. It has a tendency to go down at exactly those moments when it should be exercised. Hence on way to get a grip on the price process is to think of all the stopping times till expiry and evaluate the risk neutral expectation of the discounted intrinsic payoff value at the stopping rule that makes this expectation attain its maximum. The chapter ends with the application of stopping rule principles to non-dividend paying American call option and shows that it not worth exercising the option at any point before expiry.

Chapter 5 talks about symmetric random walk and its properties. Well, the first time I came across symmetric random walk was in the context of Markov chains. While learning the concepts of classification of states, one usually comes across symmetric random walk as an example of null recurrent chain. There are many ways to prove that a symmetric random walk is a null recurrent chain. One can try to establish the probability that a chain hits a specific level in n steps. Then use limiting condition to show that the symmetric random walk will almost surely hit any level. Another way is to use stationary distribution of a countable Markov chain and prove that the expected time to hit a specific level is infinite and hence it is a null recurrent chain. The treatment in this book is different and infact elegant as it uses Martingales. Manufacturing a Martingale that has a symmetric random walk as a term is the key to efficient computation. The chapter provides a Martingale for a symmetric random walk and one can see that most of the calculations become pleasant. One can show that a state in symmetric random walk is null recurrent after wrangling with the relevant Martingale. The chapter subsequently introduces reflection principle, which is basically a time saving tool for someone who is working with random walks and Brownian motion. The chapter ends with an elaborate discussion of a hypothetical instrument, “Perpetual Put option”. The purpose of dealing with “Perpetual Put” option is to illustrate the concepts of Martingale and Stopping times. The last chapter deals with derivative securities in the fixed income world.

image Takeaway :

This is a book that deals with a world where the price of a security moves based on a coin toss. Limiting the number of periods to three, the author shows various techniques to value European and American Derivative securities. This book serves as a good foundation for someone entering in to the continuous-time world of derivative pricing.