This is a book which I have been planning to read for quite some time, may be since an an year . Finally found sometime to spend on this book.
However the book turned out be just about OK, not engaging, considering the amount of hype around it .Having said that, this book has to be read by anyone who is in to back testing trading rules / strategies and has only a vague idea of stats!
Anyways, Here are my brief takeaways :
Chapter 1 – Objective Rules and Their Evaluation
- One needs to formulate a basic idea of a rule. Rule is an objective statement that can be implemented by a computer program and that generates unambiguous long/short/neutral signals
- Position bias needs to be tested in a trading rule. Randomize + Equivalent position would serve as a good control group
- Detrending is very imp in back testing rules
- Avoiding look-ahead bias and accounting for trading costs are very important!
Chapter 2 – The Illusory Validity of Subjective Technical Analysis
- Important to understand the various bias that can creep in validating a claim
- Overconfidence,Optimism bias
- Hindsight bias
- Confirmation bias
- Illusory correlation
- Sample size neglect
- Representative bias
- Biased second hand knowledge
- Attribution bias
- Knowledge Illusion
- It is important to recognize it is easy to fit a pattern to a time series. However it could be only wishful thinking unless it is tested vigorously with scientific rigor.
- Good stories well told can make people misweight or ignore facts.
- People are not naturally rigorous logicians and statisticians. A need to simplify complexity and cope with uncertainty makes us prone to seeing and accepting unsound correlations. We tend to overweight vivid examples, recent data and inferences from small samples.
- Scientific method is a reliable path to validity, mitigating the misleading effects of our cognitive biases
Chapter 3 – The Scientific Method and Technical Analysis
This chapter summarizes the development of scientific method. Aristotle’s syllogisms, Bacon’s contradictory thinking, Popper’s falsification method, Null Hypothesis-Alternate hypothesis development. This chapter leaves the reader with a powerful impression that advancement in any field needs the 5 stages of work
Unless these become a part of TA, practitioners of TA are nothing but astrologers, alchemists and folk healers
Chapter 4 – Statistical Analysis
Nothing much …For a person who doesn’t know abt stats, this chapter might be useful…Definitely skimmable for someone who uses stats for rozee roti
Chapter 5 – Confidence Intervals
Nothing much…Except that one can use resampling method Or Montecarlo method for testing hypothesis and establishing confidence intervals.
Most of the content in this chapter is Stats101. One takeaway for me,is that one has to zero-center the sample for conducting resampling method. This is done to align with the universal null hypothesis of a trading rule –Ho– Returns from the trading rule are 0.
Chapter 6 – Data mining Bias – Fool’s gold of Objective TA
For me, this section is the meat of the book. I took quite some time to read slowly what this section talks about and here are my takeaways.
After the fact probability tends to much higher before the fact. A monkey coming up with a master piece is very less, but given a history, the probability of a masterpiece being produced is very high
Be aware of data mining bias – Random component + Genuine predictive power of rule , the random component dominates the data mining route. You are just lucky with the rule in the history and out of sample is bound to under perform.
Data mining – Always the best performing rule is chosen…Well, the rule was just lucky in that period
Five Factors one needs to think about Number of rules being tested, Data size, Correlation between rules, variation in the expected returns amongst the rules, Presence of outliers
Three ways to cut DM Bias – Out of sample testing(walk-forward testing), Markowitz scaling down method, Resampling methods and Montecarlo methods
Chapter 7 – Theories of Non Random Price motion
This section looks at the shaky foundations of Efficient Market Hypothesis. All forms of EHM are analyzed. Arguments based on logic as well as empirical evidence are put forth in order to show that EMH is crap and there is enough scope for Arbitrage opps in a market
S&P Case study:
The last part of the book deals with testing about 60,000 binary signals to S&P data. If you can read through relatively dry chapters 1-7, you are bound to enjoy this part of the book. Once you start thinking about the ways in which trading rules can be built , this section is priceless as allows to zoom in to various aspects of back testing..
Personally, I think that the S&P case study at the end is far more valuable than the first 400 pages of dry read. However if you haven’t used stats for a while, you might like the first 400 pages too.
My takeaway from the book is : There are a ton of insights from the S&P case study towards the end of the book!, the rest is jazz.
This post is going to serve as a good visual narrative of the development of frequentist world. If you love stats and have worked on identifying patterns in data, it is but obvious that you would have met a host of tests, with different names from different fields. Statistics is one field where the contribution has been made from all kinds of fields ranging from agriculture, clinical psychology, math, finance etc.
WHY do you think so many people contributed to this field ? Just pause for a few seconds and think about it.
I had never paused and thought about it…. and the opening remarks of this book provided that insight,though very obvious in the hindsight. Statistics, a significant part of it deals with the way experiments need to be conducted ,and experiment/test/learn is the way of life for any scientist irrespective of the domain he/she is working. So ,the contributions to the field of stats are going to come from all possible fields.
Ok,let me attempt to do a visual summary of this book, for the simple reason that this book is a fantastic narrative of the history of statistics, the people who contributed, a peek in to their idiosyncrasies, their likes,their dislikes, is something that is a delight for the readers. I will try to cover most of the personalities mentioned in the book and their contributions.
Pearson was the first to look at the world from the eyes of a “distribution”. What we are see are nothing but realizations of a distribution. If we collect more data, then we know more about the distribution. Under this assumption , he went on to create families of distributions based on mean, standard deviation , skewness and kurtosis. Experimental results are thus a distribution of numbers and distribution equations tell the probability of occurrence of these numbers. Measurements themselves have a probability than the errors in measurements which was the prevalent thought ( Galton , Abraham De Moivre, Carl Gauss).
Pearson also needed a tool to fit the measurements to distributions and he came up with a powerful tool called “Goodness of fit” which is used till date. (An example, if you have to select amongst a host of ARIMA models , for a given realization, one uses goodness of fit like AICC and chooses the model)
Gosset was a classical empiricist. He felt Pearson theory is good but difficult to practice in reality as one needs to deal with small sample sizes. He worked on this problem after his work hours( classic success of “sex and cash” theory) . He said two estimates mean, standard deviation are enough to say something about the distribution. He published under a pseudonym – “Student t”. This was very useful ..think about it. With pearson’s case, you estimate 4 parameters, then you estimate the estimate of 4 parameters, ..it is an infinite loop. But with Gosset’s insight, you stop at the first computation….wow!! It is a marvelous achievement.
Ronald Aylmer Fisher
Fisher, a personality, has had tremendous influence in the development and usage of statistics. His experiments, “Studies of crop variation”, produced gem of results. ANOVA, MANOVA,ANCOVA, etc. Separating the main effects of the experiment was the underlying philosophy behind randomization. “Statistical Methods for Research Workers”, a book stripped of complex math equations and made available for easy usage was instrumental in the adoption of Fisherian thought everywhere. Fischer was the person who first introduced the words “Degrees of freedom”. may be becoz, he was always inclined to think geometrically.
The fundamental difference between Person’s philosophy and Fisherian view is that : Pearson believed that if data represented distribution. Fischer believed that distribution is an abstract concept and he believed that all one can do is find a statistic describing the abstract concept. This statistic can be anything, mean, median, iqr etc. All these statistics will be random and hence one needs to study these estimates as such. He was also instrumental in coming up with “Maximum Likelihood estimates” , a way to iteratively figure the best values for the estimates give the data and a distribution .Today with the advent of computers, mle is a command away to this powerful and time consuming procedure which was very mathematical and laborious to do by hand.
Fisher also introduced p values, which went on to become the basis for hypothesis testing
Tippett Gumbel, Emil Julius
100 year flood prediction is difficult using Fischerian concepts . Hence Tippet and Gumbel studies this aspect and made a significant contribution. Today we know the work by the name Gumbel distribution.
Neyman was the first person to ponder on the question, of connecting p values to hypothesis statements. He figured out that one needs to have a null hypothesis and alternate hypothesis for an experiment to use p values. One can only reject null hypothesis or fail to reject null hypothesis. No causality is being talked in here. This was his biggest contribution. He also developed and gave the interpretation for the word “confidence intervals”
Bayes , a priest by profession, made an outstanding contribution to the world of stats. He dealt with the world of inverse probability. Look at the data and adjust the prior hypothesized distribution was the thought, a thought which was very very radical. For some reason, Bayesian stats is still not taught properly in MBA courses, Finance courses across the world, in spite of the fact that Google made tons of money using baye’s fundas. There are umpteen disciplines that are crying for the application of bayesian principles . Risk management, for certain!
LEBESGUE, Henri Léon – Lebesgue measure
Lebesgue measure — Key to the development of Real Analysis
Kolmogorov, Andrey Nikolayevich – MOZART of Mathematics Kolmogorov’s contribution to probability and statistics is pivotal.
He was the first mathematician to use measure theory and lift probability from a step sister treatment in math, to the grand status of what it is today. With out Kolmogorov, it is very unlikely that development would have happened at this rate. He pondered on 2 questions and spent his entire life on them
Florence Nightingale David
Lot of people think that she was founder of nursing profession. That is just one part of story.But for stats guys, her contribution to the theory of statistics is path breaking. Any paper you pick on stats, with in 2 -3 handshakes, you will find a reference to F.N. David’s work
Wilcoxon, Mann of Mann-Whitney test
Why not do away with parameters, a revolutionary thought, in the hind-sight was first envisaged by Wilcoxon, Prof Mann and his student Donald Whitney. Their efforts gave rise to an entire branch on non parametric statistics.
Prasanta Chandra Mahalanobis
Mahalanobis is credited for coming up with “Randomized sampling”, instead of “opportunity” Or “judgment” sampling. Under the then Indian PM, Nehru, he went to create economic indices which became crucial for tracking the performance of various five year plans.
The first example of inverting a 24*24 matrix and helping a noble prize winning economist are some of the few contributions of Jerome Cornfield. In a way one can say that he helped in bringing out the popular Input-Output economic analysis.
Many of Professor Cornfield’s numerous contributions to both biostatistics and public health grew out of his research on the health effects of smoking. He became interested in the use of case-control studies after reading seminal papers by Doll and Hill (1950) and Wynder and Graham (1950), which used this methodology in early discoveries of the association between smoking and lung cancer. Professor Cornfield then demonstrated that case-control studies can be used to estimate the risk of disease as a function of smoking status so long as the rate of disease in the population is known. He also showed that the odds ratio, an approximation to the relative risk for rare diseases, can be estimated either prospectively or retrospectively. These results form the basis for much of the modern era’s epidemiologic research.
Cox is the first woman to be elected into the International Statistical Institute.In 1950 Cox and William G. Cochran wrote the book Experimental Design that became a classic in the design and analysis of replicated experiments.
Another towering woman in the statistical world
Samuel S Wilks
Samuel Stanley Wilks was an American mathematician and academic who played an important role in the development of mathematical statistics, especially in regard to practical applications.Wilks worked with the Educational Testing Service in developing the standardized tests like the SAT that have had a profound effect on American education. He also worked with Walter Shewhart on statistical applications in quality control in manufacturing.
J Tukey – The Picasso of Statistics
Fast Fourier Transforms, Exploratory data analysis, box plots, stem-and-leaf plots, rootgram instead of histogram are some of the stellar contributions of this versatile scientist.
George Box Gertrude Cox
Famously remembered for the Box-Cox transformation, a technique used to reduce data variation, make the data more normal distribution-like, improve the correlation between variables and for other data stabilization procedures.
Deming is credited to have brought quality movement. His contribution to the quality was from a statistical standpoint where he looked at variation in output as a combination of common cause variation and special cause variation. He championed for the reduction of common cause variation and that changed the face of Japan
LÉVY, Paul Pierre
Levy, was dissatisfied with counting methods in probability. He developed and applied Martingales to clinical trial studies and today Martingales are used in so many domains, finance, modeling, you name it…Martingale has entered the common vocabulary, thanks to Levy
David Dickey ( Dickey-Fuller unit root test – Stationarity test)
Stationarity of residuals , one of the methods to check is the dickey fuller test.
BAHADUR, Raghu Raj
India born mathematical statistician considered by peers to be “one of the architects of the modern theory of mathematical statistics”. He is popularly known in the context of Anderson-Bahadur algorithm
Inverse Gaussian distribution, also called Wald Distribution , came from Abraham Wald.Also popularly known for Wald Chi square Test which is a statistical test, typically used to test whether an effect exists or not. In other words, it tests whether an independent variable has a statistically significant relationship with a dependent variable.
Credited for discovering “Resampling method”, one of the greatest breakthroughs in field of statistics
Finally, the author who has has put a fascinating account of the above personalities and many more, all in one book – Dr. David Salsburg
I came across so many references to this book that I finally decided to take a peek in to it. At the heart of it, the book is about Kelly’s criterion.A simple example made me motivate to read this book.
If you know that you gain 50% with prob = 0.6 and lose 50% with prob = 0.4…AND…if you have X dollars, how much do you bet ?A naive strategy of using expectation over one step is dangerous. Expected gain over 1 step = 0.6*0.5-0.4*0.5=0.1 Expected profit >0 . Should you bet your entire $X ? Common sense tells us that it cannot be the answer as there is a high probability that you would be out of the game before law of large numbers kicks in. Kelly was interested in knowing what fraction of the wealth should one bet, given the odds of gains. Well, the answer for the riddle is f=2p-1 where f is the fraction and p is probability of a gain. This is all the matters if you want to have a 10,000 ft view. However if you want to know the STORY behind the criterion, this book is a wonderful way to spend time and understand the story which William Poundstone has thoroughly documented. This book is pretty readable..It just tells the story behind the criterion and it tells it in a fantastic way.
PART 1 – ENTROPY :
The book starts like a crime thriller. It tells a story of John Payne who starts a wire service to communicate horse track results to the bookies. In a strange twist of events, AT&T rise and Payne’s original network converge to the establishment of Bell Labs, a place which is credited to have been the place of innumerable famous personalities. The first part of the book introduces some famous personalities .
(Shannon) (Thorpe) (John Kelly)
Firstly, about Claude Shannon
, one of the few brilliant scientists who was single handedly responsible for bringing “Information theory” to the world. His contribution was immediately applied to a wide range of fields. Edward Thorpe
, a brilliant empiricist who sets Shannon in to a direction where they team up to build a tool to estimate probabilities at a roulette wheel. In their adventure, they realize that it is very important to have a betting strategy in place. The simple martingale strategy of betting twice on loss until you win is a risky proposition as the gambler might be bankrupt before he manages to win.
Shannon’s key idea was the essence of a message is its unpredictability. John Kelly’s , another brilliant scientist’s key insight in to this idea was:
Greedy-though prudent better is faced a similar situation as a receiver of a noisy message in shannon’s case. Kelly extrapolated the same shannon’s ideas to the horse betting scheme. The key equation spelled out by Kelly is : Gmax = R where Gmax is the growth rate of gambler’s money and R is the information transmission rate (Shannon’s theory). It is also popularly known as edge/odds kelly criterion which is often quoted in the media.Ed Thorpe used Kellys ideas in BlackJack and made a killing in the vegas casinos.
Henry Latene idea : If in each period, the investor chooses the alternative with the largest geometric mean across possible states, the geometric mean strategy is going to dominate all the strategies. I am facing a paucity of time but I would love to read more on this. STREETWISE, a book having a collection of papers relating to portfolio management has one of the papers on Henry Latene. May be I will read it some day.
The first part of book titled entropy covers the people behind kelly criterion and gives a non-mathematical introduction to the same. After reading this part ,it is difficult not to go beyond this introduction and explore the ideas of risk management. Let’s say I have a statarb strategy to pick stocks. How do you manage the risk of the portfolio becomes extremely important ? I should somehow find time to read about these issues and work on its implementation.
PART 2 – BLACK JACK :
This part is a journalistic account of Thorpe using Kelly Criterion in Las Vegas to make money. The story highlights one key idea that is recurrent in various forms through out the book.
Law of Large numbers is largely misunderstood by gamblers. The probabilities are realizations over the long run and you need to a criterion to bet , to avoid Gambler’s ruin. Kelly’s criterion is one such criterion, edge/odds tells you the way you should bet on successive outcomes. This part of the book compares 4 strategies , Bet it all, Martingale, fixed wager system, Kellys system. At a first glance , Martingale system looks good but there is a gambler’s ruin in the pursuit of the strategy, meaning, there is a chance of ruin before the law of large numbers strikes. This is where kelly criterion , which is a geometric criterion does exceedingly well. Even though the trajectory path is jittery, the kelly criterion beats all the strategies.
The above reliance on the way to make law of large numbers is exactly what Shannon used in Information theory. Sometimes reading all these beautiful ways to look at things, makes me feel that I should never try to lose this habit of reading regularly. However I have seen that once I start working in a company / startup, i find it difficult to devote time to read in silence. I hope to change that pattern this time around!
One thing that will make any math inclined reader to ponder is , “How to use Kelly criterion in stock trading strategies ?” I am certain that there is a ton of literature out there. I should find some time to go over it sometime!. Also a few of the books mentioned in this part of the book are
“Beat the dealer”, “Beat the market”. I guess the ideas of delta hedging was put to use by Thorpe in Newport Ventures, a successful company, much before Black scholes used it in the famous Black scholes formula
Part 3 – Arbitrage
This is the part of the book I love the most because it is a topic close to my heart. It talks about Thorpe’s attention to convertible bond arbitrage and the way he devised a risk neutral strategy for warrants. The delta hedging technique which Black Scholes popularized years later was being practically implemented years earlier by Thorpe. This book talks about the rise of random walk in finance literature and the popularization of efficient market hypothesis .Samuelson, Fama, Sharpe, all touted EMH. Shannon on the other hand believed that the only way to make money was through arbitrage. However he himself did not use the technique in the earlier years to make money. His success came from picking few stocks.
However Thorpe who had already used Kelly criterion, used warrants to make money, wanted to expand and get more money to manage. One way he did was to popularize his concepts in “Beat the market”. The book’s popularity brought him investors who were eager to put in money. Thorpe’s Princeton-Newport ventures did a tremendous business which is still a sort of benchmark for all the money managers. A dollar invented in 1968 would have grown to $14.78 in 1988. Over 19 years this was a return of 15.1 %(S&P averaged 8.8%). The most interesting thing was about the standard deviation which was 4%, meaning a Sharpe ratio of 3.7, a bloody good Sharpe ratio by any standards. If you ever think of managing money, this ratio should always be in your mind to remind you that there was guy who beat the market hands down with a Sharpe ratio of 3.7. WOW!, till this day, money managers talk about this performance. If you read this part of the boo,k, you cannot help but start thinking about kelly criterion, delta hedging, information ratios. The meat of the book lies in this and the next part
On a side note, there is a new term that I came across in this part of the book:
Paul Samuelson coined the term , Performance Quotient. Like IQ, this measures a portfolio manager’s ability to generate alpha.A PQ of 100 is average.Sameulson theorized that if such people existed, they would all be invisible. :You would not find them working for Ibanks. They have too high an IQ for that.They would operate by stealth, investing their own money or their friends money.They would keep the system to themselves.”
Part 4 – St. Petersburg Wager
This is the famous wager problem where expected value of the wager is infinite. So, the question is that no one would pay infinite amount to play the game. whats the fair value of the game. This paradox created a lot of interest and lead to popularization of utility function ,Bernoulli’s masterpiece where he mentions that risky ventures should be evaluated based on the geometric mean of the outcomes. This ultimately lead to the involvement of Henry Latane, Markowitz, Kelly and finally it was being accepted that mean-variance analysis does not talk about compounding of investments. But Kelly’s world is about compounding of returns, the winnings are being reinvested continuously.Even though Samuelson and Merton were strict opponents of geometric criterion, other people like Henry Latane, Thorpe believed in Kelly criterion.
The world to this day is divided I guess between Kelly and Mean-Variance approach. However Thorpe, the man with one of the highest PQs believed in kelly(half-kelly to be conservative). Kelly has no log normal assumption or distribution assumption. It has no utility function behind it. It is a plain simple formula to avoid ruin. Will it work in specific markets ? Can some on use Kelly in India for stock selection? I do not know whether people are using it. But it will be fun to play with the money and see whether it works.
Part 5 – RICO
This part gives a historical recount of RICO, a tax evasion law that became attached to Princeton-Newport ventures. Most of the partners got indicted. Thorpe was one person who was found innocent. The whole issue with princeton-Newport ventures was that they were parking stocks at firms and buying back at a price so as to offset short term gains. Tax evasion!! was good in the short run until RICO hit them. However this did not happen during the stellar performance of the fund where it had a Sharpe ratio of 3.7. So , for all the aspiring fund managers, that metric can still be considered as a benchmark for your performance !
Part 5 – Blowing up
This details the blow up that took place at LTCM. Ironically , the fund failed to be long term!. The point in illustrating this case by the author is to bring the relevance of Kelly criterion and how the criterion would have saved LTCM from a blow up. Thorpe brilliant point about convergence trades made me think about the relevance of some trading strategies that I know about.
Part 6 – Signal Vs Noise
Well , the last part of the book is a short collection of diverse views on Kelly . Shannon, it turns out was more than a stock picker than arbitrage exploiter. Thorpe believed in relative value and made a killing in various markets. The book ends with the author not taking any side to the debate on Kelly. Left to the imagination of the reader and interpretation, I feel that somebody soon , the heavy weights(PhDs) from ivy/other schools with get behind the Kelly bandwagon and make it popular. Ideas about mean variance will die soon and may be then , people will start giving importance to Kelly criterion.
This book has made me inquisitive about the actual implementation of Kelly formula, how to use it in a long-short portfolio.
Lets say I manage to identify some long-short trades, is my current implementation of risk management sound ? May be I should simulate P&L with Kelly criterion and see how it behaves!. The following are some of the wonderful references which I hope to read some day:
- Beat the market – Thorpe
- Streetwise – Peter Bernstein
- Portfolio Choice and Kelly Criterion
- Henry Latane’s paper on geometric mean
- Shannon’s basic funda relating to Information theory
- Implementation of Kelly Function
- How to implement Gmax = R
Wow!! its a long way to go for me, before before understanding all the stuff clearly..