Books such as these, give visual images that are necessary to make learning stick. It is fair to say that I do not remember anything much about cell biology nor anything related to DNA. It was way back in my high school that I had crammed something, held it in my working memory for a few years in order to write exams. Some bits would have percolated to my long term memory, but since I have never retrieved them, they lie somewhere in some inaccessible part of my brain.

In the past two months, I have been exposed to a lot of terminology that is specific to cell biology and genetics. My dad was diagnosed with advanced stage colon cancer and I had consulted three of the best oncologists in the city. Each meeting with the doctor lasted about 30-45 minutes. Some of the meetings were overwhelming. One of the doctors, who is known to be the best in the city, threw a lot of jargon at me, explained various types of scenarios for cancer treatment. Needless to say I was clueless. Here I was, lucky enough, to get a time slot with a leading oncologist and I was completely lost. The only thing I could think of doing is to jot down rapidly the list of words and phrases he was uttering in the conversation. Subsequently I came back home and read up on each term and understood various treatment options. Despite spending time understanding the terms, my knowledge about the treatment options was cursory at best. In any case, there were people around me who were far more intelligent and knowledgeable than me, that choosing the right doctor and the treatment schedule became an easy decision.

Amidst the hectic schedule in making my dad go through various chemo cycles, I have read through a few books on cancer. However as a primer to understanding those books on cancer, I read a few genetics/biology 101 books. This book is amongst the preliminary set of books that I have read in the past month. To begin with, this book has given me a basic collection of visuals that I can use as anchors, while reading general literature. Why do we need visuals ? Can’t one just read the stuff and understand. Well, may be yes. But most likely at least for me, it is a NO. My mind needs visuals to understand stuff better. For example, if one were to read the steps involved in creating protein(a chain of amino acids) from a DNA. it goes something like this :

  1. Enzymes in the nucleus create short sequences of mRNA based on DNA
  2. rRNA attaches itself to mRNA
  3. An appropriate tRNA attaches to rRNA based on mRNA
  4. Each tRNA gives rise to an amino acid
  5. Each amino acid so formed, attaches to the previously formed amino acid.
  6. At the end of every DNA encoding protein, there is a specific stop code that makes rRNA detach from the amino acid production line.
  7. A sequence of amino acids thus attached from the previous steps is nothing but one of the many proteins in the cell.

If one has to follow the above sequence of steps, merely reading them might not be sufficient to understand what’s going on. Some sort of pictures would be helpful and the book exactly fills in that void. The authors do a fantastic job of illustrating the above steps so that the visuals form a very sticky cue for further learning.

Here is a list of terms/concepts/principles covered in the book :

  • Selective breeding
  • Bible story on Jacob’s flock illustrates accurate Genetic observation coupled with total lack of understanding. Science and magic went together
  • Most coherent Greek theory of Heredity(by Hippocrates) : There were fluids inside the bodies of men and women. These fluids battled against each other and the outcome decided whether a particular part of body resembled the mother’s or the father’s
  • Greek Civilization and the Middle Ages had all sorts of crazy ideas about theories of heredity
    • All inheritance came from father
    • Spontaneous generation – Living organisms could arise from non living matter. This was challenged by Francesco Redi
  • Anton Van Leeuwenhoek used microscope and made two important discoveries. First one was to see bacteria and second one was the discovery of sperm cells
  • William Harvey believed that all animals come from the egg
  • Mammals lay very few eggs. Human female produces only a few a month
  • Oscar Hertwig’s observation – Fertilization as the union of sperm and egg
  • Plants – male parts are called anthers (contains pollen) and female part is called the stigma
  • No general laws of inheritance were discovered for a very long time
  • Gregor Mendel – Austrian Monk was to discover the laws of inheritance
  • Mendel’s results
    • Hereditary traits are governed by genes which retain their identity in hybrids
    • One form of gene is dominant over another form of gene. But recessive genes will pop up later
    • Each adult organism has two copies of each gene – one from each parent. When pollen or sperm and eggs are produced, they each get one copy
    • Different alleles are sorted out to sperm mand eff randomly and independently. All combinations of alleles are equally likely
  • All living beings are made of cells – This fact wasn’t appreciated until late 19th century
  • Mitosis and Meiosis – Types of cell replication
  • Mitosis – Extremely accurate process of creating two cells. Number of chromosomes will be same in both the cell
  • Sperm cell and egg cell contain only half a set of chromosomes.
  • In a typical cell, there are 46 chromosomes – 23 pairs
  • Chromosome contains the genetic material
  • Nucleotides – the building blocks for nucleic acids. An individual nucleotide has three components, sugar, phosphate and a base
  • RNA – Nucleotides with ribose
  • DNA – Nucleotides with deoxyribose
  • Proteins – These are chain of amino acids
  • Hemoglobin – One of the most complicated macromolecules. Max Perutz spent 25 years in understanding this protein.
  • Enzymes – These are proteins that take apart or put together other molecules
  • Connection between gene and enzyme – The metabolic role of the genes is to make enzymes, and each gene is responsible for one specific enzyme.
  • RNA – RNA’s are single stranded, much shorter in length (50 to 1000 nucleotides )
  • RNA polymerase – teasing apart a region of DNA and creating a copy. This is also called transcription
  • mRNA – messenger RNA
  • tRNA – transfer RNA
  • rRNA – ribosomal RNA
  • Codon – triplets of bases
  • Amino acid – Each 3 base codon stands for an amino acid
  • 64 codons represent 20 amino acids
  • Each DNA encoding protein has a same codon at the beginning – AUG.
  • The stop codon does not encode any amino acid and they signal rRNA to detach the protein formed
  • anticodon – Loop of tRNA has three unpaired bases
  • amino acid site – At the tail end of tRNA is a site or attaching single amino acid
  • DNA contains sequences encoding for every tRNA, mRNA, rRNA
  • Eucaryotes – Cell with nucleus
  • Procaryotes – Cell with no nucleus
  • Spliceosome – proteins and RNA grabs the mRNA and shears off the loop, discards it, splices the remaining pieces together. This complex is called spliceosome
  • Eucaryotic genes contain Junk DNA
  • Introns – In the middle of perfectly good genes, there may be several meaningless sequences, each hundreds of nucleotides long
  • Protein spools – To help organize all the storage, eucaryotes wrap their DNA around protein spools. Each spool consists of several proteins that are bound together
  • Principle of complementarity – Each base can pair with only one other complementary pair
  • Knowledge about DNA replication in a cell division, is still sketchy
  • Repetitive DNA – Eucaryotic cells harbor lots of so-called repetitive DNA
  • A virus contains only two parts, i.e. a bit of nucleic acid wrapped up in a protein coat. A virus can’t reproduce on its own because it lacks ribosomes and the rest of the living cell’s protein main equipment
  • Retro Virus – RNA virus encoding an enzyme that makes a DNA copy of its RNA and splicies it in to host chromosome
  • Why are some viral infections incurable ? the virus genes can’t be gotten rid of, in your own chromosomes
  • Hypothesis for Junk DNA – Its possible that some of the repetitive and junk DNA in our chromosomes may have come from this ancient virus
  • Repressive Tolerance – Shut the junk DNA down and ignore them
  • Mutation – A mutation in a gene is just a change in the DNA’s sequence of nucleotides. Even a mistake at just one position can have a profound effects
  • Defense against mutation – One amino acid can be encoded by several codons
  • Blood cells illustrate another common fact of life – One kind of a cell can turn in to another kind of cell
  • Alleles – Genes in a plant can be one of two distinct types or Alleles
  • Principle of Independent Assortment – The Alleles of one gene sort out independently of the alleles of another
  • Homologous – Two copies of each cell that resemble each other, having the same shape
  • Phenotype – How an organism looks like ?
  • Genotype – Based on what alleles it has
  • Homozygous – An organism is homozygous with respect to a given gene if its alleles are the same
  • Heterozygous– An organism is heterozygous with respect to a given gene if its alleles are different
  • Haploid – A cell with a single set of chromosomes
  • Diploid – A cell with two sets of chromosomes
  • Operon – Cluster of genes, encoding related enzymes and regulated together is called an operon
  • Promoter region – At the start of Operon, there is a site where RNA polymerase binds to the DNA to begin transcribing the message
  • Attenuation – Shortage of certain types of molecules turns on the gene
  • Jumping Genes – A method of gene regulation
  • Transposons – Movable section of genes
  • Crossover – During Meiosis, chromosomes can exchange genes
  • Gene splicing – Splice two pieces of DNA together
  • Recombinant DNA – The result of splicing two DNA’s together
  • Restriction Enzyme – Gene splicing depends on this enzyme. It creates two pieces of DNA with identical tails
  • Proteins can be produced via Recombinant DNA
  • Gene therapy – Fixing specific defects
  • Genetic engineering

There is a visual for each of the above concept/mechanism. If you are curious to know about the basic ideas of genetics, this book can be a useful starting point. If not anything, it will give visual cues to read and understand the general literature on genetics.



In today’s world where access to information is being democratized like never before, “learning how to learn” is a skill that will play a key role in one’s academic and professional accomplishments. This book collates ideas from some of the recent books on learning such as, “Make it Stick”, “A Mind for Numbers”, “The Five Elements of Effective Thinking”, “Mindset”, etc. The author has added his own personal take on the various research findings mentioned in the book and has come up with a 250 page book. If one has really absorbed the concepts mentioned in the previous books, then you really DO want to read this book. Any exercise that puts you in retrieval mode of certain concepts alters your memory associated with those specific concepts. Hence even though this is book serves as a content aggregator of all the previous books, reading it from the eyes of a new person, changes the way we store and retrieve memories of the main principles behind effective learning.

Broaden the Margins

The book starts with the author narrating his own college experience, one in which standard learning techniques like “find a quiet place to study”, “practice something repeatedly to attain mastery”, “take up a project and do not rest until it is finished” were extremely ineffective. He chucks this advice and adopts an alternative mode of learning. Only later in his career as a science journalist, does he realize that some of the techniques he had adopted during his college days were actually rooted in solid empirical research. Researchers over the past few decades have uncovered techniques that remain largely unknown outside scientific circles. The interesting aspect of these techniques is that they run counter to the learning advice that we have all taken at some point in our lives. Many authors have written books/blog posts to popularize these techniques. The author carefully puts all the main learning techniques in a format that is easy to read, i.e. he strips away the academic jargon associated with the techniques. The introductory chapter gives a roadmap to the four parts of the book and preps the reader’s mind to look out for the various signposts in the “learning to learn” journey.

The Story maker

A brief look at the main players in our brain:


Labeled in the figure are three areas. The entorhinal cortex acts as filter for the incoming information, the hippocampus is the area where memory formation begins and neocortex is the area where conscious memories are stored. It was H.M, the legendary case study that helped medical research community and doctors give a first glance in to the workings of the brain. Doctors removed hippocampus from H.M’s brain essentially removing the ability to form long term memories. Many amazing aspects of brain were revealed by conducting experiments on H.M. One of them being motor skills like playing music, driving a car are not dependent on hippocampus. This meant that memories were not uniformly distributed and brain had specific areas that handled different types of memory. H.M had some memories of his past after removal of hippocampus. This means that there were long term memories residing in some part of the brain. The researchers then figured out that the only candidate left in the brain where memories could be stored was the neocortex. The neocortex is the seat of human consciousness, an intricate quilt of tissue in which each patch has a specialized purpose.


To the extent that it’s possible to locate a memory in the brain, that’s where it resides: in neighborhoods along the neocortex primarily, not at any single address. This is as far as storage is concerned. How is retrieval done? Again a set of studies on epilepsy patients revealed that the left brain weaves the story based on the sensory information. The left hemisphere takes whatever information it gets and tells a tale to the conscious mind. Which part of the left brain tells this story? There is no conclusive evidence on this. The only thing known is that this interpreter module is present somewhere in the left hemisphere and it is vital to forming a memory in the first place. The science clearly establishes one thing: The brain does not store facts, ideas and experiences like a computer does, as a file that is clicked open, always displaying the identical image. It embeds them in a network of perceptions, facts and thoughts, slightly different combinations of which bubble up each time. No memory is completely lost but any retrieval of memory fundamentally alters it.

The Power of Forgetting

This chapter talks about Herbin Ebbinghaus and Philip Boswood Ballard who were the first to conduct experiments relating to memory storage and retrieval. Ebbinghaus tried to cram 2300 nonsense words and figured out how long it would take to forget them.


The above is probably what we think of memory. Our retention rate of anything falls as time goes. Philip Boswood Ballard on the other hand was curious to see what can be done to improve learning. He tested his students in the class at frequent intervals and found that testing increased their memory and made them better learners. These two experiments were followed by several other experiments and finally Bjorks of UCLA shepherded the theory to give it a concrete direction. They coined their theory as “Forget to Learn”. Any memory has two kinds of strengths, storage strength and retrieval strength. Storage strength builds up steadily and grows with usage of time. Retrieval strength on the other hand is a measure of how quickly a nugget of information comes to mind. It increases with studying and use. Without reinforcement, retrieval strength drops off quickly and its capacity is relatively small. The surprising thing about retrieval strength is this: the harder we work at retrieving something, the greater is the subsequent spike in retrieval and storage strength. Bjorks call this “desirable difficulty”. This leads to the key message of the chapter, “Forgetting is essential for learning”

Breaking Good Habits

This chapter says that mass practice does not work as well as randomized practice. Finding a particular place to do your work and working on just one thing till you master, and then proceeding on to the next , is what we often hear an advice for effective learning. This chapter says that by changing the study environment randomly, randomly picking various topics to study gives a better retrieval memory than the old school of thought.

Spacing out

This chapter says that spacing out any learning technique is better than massed practice. If you are learning anything new, it is always better to space it out than cram everything at one go. This is the standard advice – Do not study all at once. Study a bit daily. But how do we space out the studies? What is the optimal time to revisit something that you have read already? Wait for too long a time, the rereading will sound as a completely new material. Wait for too less a time, your brain gets bored because of familiarity. This chapter narrates the story of Piotr Wozniak, who tackled this problem of “how to space your studies?” and eventually created SuperMemo, a digital flashcard software which is used by many people to learn foreign languages. Anki, an open source version of SuperMemo is another very popular way to inculcate spaced repetition in your learning schedule. The essence of this chapter is to distribute your time over a longer interval in order to retrieve efficiently and ultimately learn better.

The Hidden Value of Ignorance

The chapter talks about “Fluency illusion”, the number one reason why many students flunk exams. You study formulae, concepts, theories etc. and you are under the illusion that you know everything until the day you see the examination paper. One way to come out of this illusion is to test oneself often. The word “test”, connotes different things to different people. For some teachers, it is a way to measure a student’s learning. For some students, it is something they need to crack to get through a course. The literature on “testing” has a completely different perspective. “Testing” is way of learning. When you take a test, you retrieve concepts from your memory and the very act of retrieving fundamentally alters the way you store those concepts. Testing oneself / taking a test IS learning. The chapter cites a study done on students shows the following results


The above results show that testing does not = studying. In fact, testing > studying and by a country mile on delayed tests. Researchers have come up with a new term to ward off some of the negative connotation associated with the word “test”; they call it “retrieval practice”. Actually this is a more appropriate term as testing oneself (answering a quiz / reciting from memory/ writing from memory) essentially is a form of retrieval that shapes learning. When we successfully retrieve something from the memory, we then re-store it in the memory in a different way than we did before. Not only has storage level spiked; the memory itself has new and different connections. It’s now linked to other related aspects that we have also retrieved. Using our memory changes our memory in ways we don’t anticipate. One of the ideas that the chapter delves in to is to administer a sample pre-final exam right at the beginning of the semester. The student will anyway flunk the exam. But the very fact that he gets to see a set of questions and looks at the pattern of questions before anything is taught, makes him a better learner by the end of semester.

Quitting before you are ahead

The author talks about “percolation”, the process of quitting an activity after we have begun and then revisiting at frequent intervals. Many writers explicitly describe this process and you can read their autobiographies to get in to the details. Most of the writers say something to this effect: “I start on a novel, then take a break and wander around a familiar/ unfamiliar environment, for when I do so, the characters tend to appear in the real/imaginary worlds who give clues to continue the story”. This seems to be to domain specific. May be it applies only to the “writing” field where after all writing about something is discovering what you think about it and it takes conscious quitting and revisiting your work.

The author cites enough stories to show that this kind of “percolation” effect can be beneficial to many other tasks. There are three elements of percolation. The first element of percolation is interruption. Whenever you begin a project, there will be times when your mind might say, “Moron quit it now, I can’t take it anymore”. Plodding through that phase is what we have been told leads to success. However this chapter suggests another strategy, “quit with the intention of coming back to it”. There is always a fear that we will never get back to working on it. But if it is something you truly care, you will get back to it at some point in time. An interesting thing happens when you quit and you want to get back to the activity after a break, the second element of percolation kicks in, i.e. your mind is tuned to see/observe things related to your work, everywhere. Eventually the third element of percolation comes in to play; listening to all the incoming bits and pieces of information from the environment and revisiting the unfinished project. In essence, having this mindset while working on a project means quitting frequently with the intention of returning to it, which tunes your mind to see things you have never paid attention to. I have seen this kind of “percolation” effect in my own learning process so many times that I don’t need to read a raft of research to believe that it works.

Being Mixed up

The author starts off by mentioning the famous beanbag tossing experiment of 1978 that showed the benefits of interleaved practice. This study was buried by academicians as it was against the conventional wisdom of “practice till you master it”. Most of the psychologists who study learning fall in two categories, first category focus on motor/movement and the second category focus on language/abstract skills. Studies have also proven that we have separate ways to memorize motor skills and language skills. Motor memories can be formed without hippocampus unlike declarative memories. Only in 1990s did researchers start to conduct experiments that tested both motor and declarative memories. After several experimental studies, researchers found that interleaving has a great effect on any kind of learning. The most surprising thing about interleaving is that the people who participated in the experiments felt that massed practice was somehow better, despite test scores showing that interleaving as a better alternative. One can easily relate to this kind of feeling. If you spent let’s say a day on something and you are able to understand a chapter in a book, you might be tempted to read the next chapter and the next until the difficulty level reaches a point where you need to put in far more effort to get through the concepts. Many of us might not be willing to take a break and revisit it, let’s say a week later or a month later. Why? These are following reasons based on my experience:

  • I have put so much effort in understanding the material ( let’s say the first 100 pages of a book). This new principle/theorem on the 101st page is tough. If I take a break and come back after a week or so, I might have to review all the 100 pages again which could be waste of time. Why not somehow keep going and put in a lot of effort in understanding the stuff on page 101 when all the previous 100 pages are in my working memory.
  • I might never get the time to revisit this paper/book again and my understanding will be shallow
  • Why give up when I seem to cruising along the material given in the book? This might be a temporary show stopper that I will slog it out.
  • By taking a break from the book, am I giving in to my lazy brain which does not want to work through the difficult part?
  • What is the point in reading something for a couple of hours, then reading something else for a couple of hours? I don’t have a good feeling that I have learnt something
  • I have put in so many hours in reading this paper/book. Why not put in some extra hours and read through the entire book?

The above thoughts, research says are precisely the ones that hamper effective learning. Interleaving is unsettling but it is very effective

Importance of Sleep

We intuitively know that a good sleep/quick nap brings our energy levels back. But why do humans sleep? One might think that since this is an activity that we have been doing since millennia, neuroscientists / psychologists / researchers would have figured out the answer by now. No. There is a no single agreed upon scientific explanation for it. There are two main theories that have been put forth. First is that sleep is essentially a time-management adaptation. Humans could not hunt or track in the dark. There was nothing much to do and automatically the internal body clock evolved to sleep during those times. Brown bat sleeps 20 hours and is awake for 4 hours in the dusk when it can hunt mosquitoes and moths. Many such examples give credence to this theory that we are awake when we there’s hay to be made and we sleep when there is none. The other theory it that sleep’s primary purpose is memory consolidation. Ok, if we take for granted that for some reason, evolution has made us crave for sleep, what happens to stuff that we learn? Does it get consolidated in sleep? The author gives a crash course on the five stages of sleep.


The five stages of sleep are illustrated in the above figure. There are bursts of REM(Rapid eye moment) in a typical 8 hr. sleep period. Typically one experiences a total of four to five REM bursts during the night–of 20 min of average duration. With its bursts of REM and intricate, alternating layers of wave patterns, the brain must be up to something during sleep. But what? For the last two decades there has been massive evidence that sleep improves retention and comprehension. Evidence has also shown mapping between Stage II of the sleep and motor skill consolidation, mapping between REM phase and learning skill consolidation. If you are a musician/artist preparing for tomorrow’s performance, it is better to practice late in to the night and get up little late so that Type II phase of sleep is completed. If you are trying to learn something academic, it makes sense to sleep early as REM phase comes up in the early stages of 8 hr. sleep period that helps you consolidate. Similar research has been done on “napping” and it has been found to be useful for learning consolidation. The brain is basically doing the function of separating signal from noise.

The Foraging brain

If effective learning is such a basic prerequisite to our survival in today’s world, why haven’t people figured out a way out to do it efficiently? There is no simple answer to this. The author’s response to this question is that our ideas of learning are at odds with the way our brain has been shaped over the millennia. Humans were foragers; hunting and tracking activities dominated human’s life for over a million years. The brain adapted to absorb – at maximum efficiency –the most valuable cues and survival lessons. Human brain too became a forager—for information, strategies, for clever ways to foil other species’ defenses. However its language, customs and schedules have come to define as how we think the brain should work—Be organized, develop consistent routines, concentrate on work, focus on one skill. All this sounds fine until we start applying in our daily lives. Do these strategies make us effective learners?

We know intuitively that concentrating on something beyond a certain time is counterproductive, mass practice does not lead to longer retention; it is difficult to be organized when there are so many distractions. Instead of adapting our learning schedules to the foraging brain, we have been trying to adapt our foraging brains( something that has evolved over a millennia) to our customs/schedules/notions about learning things( something that has happened over the few thousand years). The author says that this is the crux of the problem. This has kept us at bay in becoming effective learners. The foraging brain of the past that brought us back to our campsite is the same one that we use to make sense of the academic and motor domains. Most often when we do not understand something, the first instinct is to give up. However this feeling of “lost” is essential for the foraging brain to look for patterns, aid your brain in to creating new pathways to make sense of the material. This reinforces many of the aspects touched upon in this book:

  • If you do not forget and you are not lost, you do not learn.
  • If you do not space out learning, you do not get lost from time to time and hence you do not learn.
  • If you do not use different contexts/physical environments to learn, your brain has fewer cues to help you make sense of learning.
  • If you do not repeatedly test yourself, the brain doesn’t get feedback and the internal GPS becomes rusty

It is high time to adapt our notions of learning to that of our foraging brain; else we will be forever trying to do something that our brains will resist.


There are some counterintuitive strategies for learning that are mentioned in this book—changing the physical environment of your study, spaced repetition, testing as a learning strategy, interleaving, quitting and revisiting project frequently, welcoming distractions in your study sessions etc. Most of these are different from the standard suggestions on “how to learn”. However the book collates all the evidence from the research literature and argues that these strategies are far more effective for learning than what we have known before.


This book is mainly targeted at high school / college kids who feel their learning efforts are not paying off, teachers who are on the look out for effective instruction techniques, parents who are concerned with their child’s academic results and want to do something about it.

The author of the book, Dr. Barbara Oakley, has an interesting background. She served in the US army as a language translator before transitioning to academia. She is now a professor of engineering at Oakland University in Rochester, Michigan. In her book, she admits that she had to completely retool her mind. A person who was basically in to artsy kind of work had to read hard sciences to get a PhD and do research. Needless to say the transition was a frustrating experience.  One of her research areas is neuroscience where she explores effective human learning techniques. The author claims that her book is essentially meant to demystify some of the common notions that we all have about learning.

This book is  written in “personal journal” format,i.e. with images, anecdotes, stories etc. It is basically a collection of findings that are scattered in various places such as academic papers, blogs, pop science books. So, this book does the job of an “aggregator” , ,much like a Google search, except that the results are supplemented with comments and visuals.

Some of the collated findings mentioned in the book are  :

1) Focused vs.. Diffused mode of thinking : Tons of books have already been written on this subject. The book provides a  visual to remind the reader the basic idea behind it.


In the game “pinball,” a ball, which represents a thought, shoots up from the spring-loaded plunger to bounce randomly against rows of rubber bumpers. These two pinball machines represent focused (left) and diffuse (right) ways of thinking. The focused approach relates to intense concentration on a specific problem or concept. But while in  focused mode , sometimes you inadvertently find yourself focusing intently and trying to solve a problem using erroneous thoughts that are in a different place in the brain from the “solution” thoughts you need to actually need to solve the problem. As an example of this, note the upper “thought” that your pinball first bounces around in on the left-hand image. It is very far away and completely unconnected from the lower pattern of thought in the same brain. You can see how part of the upper thought seems to have an underlying broad path. This is because you’ve thought something similar to that thought before. The lower thought is a new thought— it doesn’t have that underlying broad pattern. The diffuse approach on the right often involves a big-picture perspective. This thinking mode is useful when you are learning something new. As you can see , the diffuse mode doesn’t allow you to focus tightly and intently to solve a specific problem— but it can allow you to get closer to where that solution lies because you’re able to travel much farther before running into another bumper.

2)  Spaced repetition : This idea has lead a massive research area in the field of cognitive psychology. The book nails it with the following visual :


Learning well means allowing time to pass between focused learning sessions , so the neural patterns have time to solidify properly. It’s like allowing time for the mortar to dry when you are building a brick wall, as shown on the left. Trying to learn everything in a few cram sessions doesn’t allow time for neural structures to become consolidated in your long-term memory— the result is a jumbled pile of bricks like those on the right.

3) Limited short term memory :
Experiments have shown that you can at max hold 4 items in your working memory. This means the key to making sense of stuff lies in effective storage and retrieval of concepts/ideas from your long term memory than trying to cram everything in to working memory(which will anyway vanish quickly)

4) Chunking : From KA Ericsson (academician behind the notion of “deliberate practice{ ) to Daniel Coyle (pop science book author)  -  all have emphasized this aspect. Again a visual to summarizes the key idea :


When you are first chunking a concept, its pre-chunked parts take up all your working memory, as shown on the left. As you begin to chunk the concept, you will feel it connecting more easily and smoothly in your mind, as shown in the center. Once the concept is chunked, as shown at the right, it takes up only one slot in working memory. It simultaneously becomes one smooth strand that is easy to follow and use to make new connections. The rest of your working memory is left clear. That dangling strand of chunked material has, in some sense, increased the amount of information available to your working memory, as if the slot in working memory is a hyperlink that has been connected to a big webpage.

5) Pomodoro to prevent procrastination : Knowledge scattered around various blogs and talks are put in one place. The idea is that that you do work in slots of (25min work + 5 min break).image

6) { (Recall + Test > Reread) , ( Interleave + Spaced repetition > massed practice )  }
– These ideas resonate through out the book “Make it Stick”. This book though summarized the ideas and supplements them with this visuals such as :


Solving problems in math and science is like playing a piece on the piano. The more you practice, the firmer, darker, and stronger your mental patterns become.


If you don’t make a point of repeating what you want to remember, your “metabolic vampires” can suck away the neural pattern related to that memory before it can strengthen and solidify.

7) Memory enhancement hacks :
Most of the ideas from “Moonwalking with Einstein” and other such memory hack books are summarized for easy reading

8) Reading / engaging in diverse material pays off : This has been a common trait amongst many people who do brilliant stuff. Pick up any person who has accomplished something significant, you will find they have varied interests.


Here you can see that the chunk— the rippling neural ribbon— on the left is very similar to the chunk on the right. This symbolizes the idea that once you grasp a chunk in one subject, it is much easier for you to grasp or create a similar chunk in another subject. The same underlying mathematics, for example, echo throughout physics, chemistry, and engineering— and can sometimes also be seen in economics, business, and models of human behavior. This is why it can be easier for a physics or engineering major to earn a master’s in business administration than someone with a background in English or history. Metaphors and physical analogies also form chunks that can allow ideas even from very different areas to influence one another. This is why people who love math, science , and technology often also find surprising help from their activities or knowledge of sports, music, language, art, or literature.

9) Adequate sleep is essential for better learning : This is like turning the lights off on the theatre stage so that artists can take a break, relax and come back for their next act. Not turning off the mind and overworking can only lead us to an illusion of learning, when in fact all we are doing is showcasing listless actors on the stage(working memory).


Toxins in your brain get washed away by having an adequate amount of sleep everyday.

The book can easily be read in an hour or two as it is filled with lot of images/ metaphors/ anecdotes and recurrent themes. The content of this book is also being offered in the form of  4 week course at Coursera

Lady Luck favors the one who tries

– Barbara Oakley


I had been intending to read this book for many months but somehow never had a chance to go over it. Unfortunately I fell sick this week and lacked strength to do my regular work. Fortunately I stumbled on to this book again. So, I picked it up and read it cover to cover while still getting over my illness.

One phrase summary of the book is “Develop Bayesian thinking”. The book is a call to arms for acknowledging our failures in prediction and doing something about it. To paraphrase author,

We have a prediction problem. We love to predict things and we aren’t good at it

This is the age of “Big Data” and there seems to be a line of thought that you don’t need models anymore since you have the entire population with you. Data will tell you everything. Well, if one looks at classical theory of statistics where the only form of error that one deals with is the “sampling error”, then the argument might make sense. But the author warns against this kind of thinking saying that, “the more the data, the more the false positives”. Indeed most of the statistical procedures that one come across at the undergrad level are heavily frequentist in nature. It was relevant to an era where sparse data needed heavy assumption laden models. But with huge data sets, who needs models/ estimates? The flip side to this is that many models fit the data that you have. So, the noise level explodes and it is difficult to cull out the signal from the noise. The evolutionary software installed in a human’s brain in such that we all love prediction and there are a ton of fields where it has failed completely. The author analyzes some domains where predictions have failed, some domains where predictions have worked and thus gives a nice compare and contrast kind of insight in to the reasons for predictive efficiency. If you are a reader who is never exposed to Bayesian thinking, my guess is, by the end of the book, you will walk away being convinced that Bayes is the way to go or at least having Bayes thinking is a valuable addition to your thinking toolkit.

The book is organized in to 13 chapters. The first seven chapters diagnose the prediction problem and the last six chapters explore and apply Bayes’s solution. The author urges the reader to think about the following issues while reading through various chapters:

  • How can we apply our judgment to the data without succumbing to our biases?
  • When does market competition make forecasts better- and how can it make them worse?
  • How do we reconcile the need to use the past as a guide with our recognition that the future may be different?

A Catastrophic failure of prediction(Recession Prediction)

Financial Crisis has lead to a boom in one field – “books on financial crisis”. Since the magnitude of impact was so large, everybody had something to say. In fact during the first few months post 2008, I had read at least half a dozen books and then gave up when every author came up with almost similar reasons why such a thing happened? There was nothing to read but books on crisis. Some of the authors even started writing books like they were some crime thrillers. In this chapter, the author comes up with almost the same reasons for the crisis that one has been bombarded earlier

  • Homeowners thought their house prices will go up year after year.
  • Rating agencies had faulty models with faulty risk assumptions.
  • Wall Street took massive leverage bits on housing sector and the housing crisis turned in to a financial crisis.
  • Post crisis, there was a failure to predict the nature and extend of various economic problems.

However the author makes a crucial point that in all of the cases, the prediction were made “Out of sample”. This is where he starts making sense.

  • IF the homeowners had a prior that house prices may fall, they would have behaved differently
  • IF the models had some prior on correlated default behavior, then models would have brought some sanity in to valuations.
  • IF the Wall Street had Bayesian risk pricing, the crisis would have been less harsher
  • IF the post crisis scenarios had sensible priors for forecasting employment rates etc., then policy makers would have been more prudent.

As you can see, there is a big “IF”, which is usually a casualty when emotions run wild, when personal and professional incentives are misaligned and when there is a gap between what we know and what we think we know. All these conditions can be moderated by an attitudinal shift towards Bayesian thinking. Probably the author starts the chapter with this recent incident to show that our prediction problems can have disastrous consequences.

Are you smarter than a Television Pundit ?( Election Result Prediction)

How does Nate Silver crack the forecasting problem? This chapter gives a brief intro to Philip Tetlock’s study where he found hedgehogs fared worse than foxes. There is an interesting book that gives a detailed look at Philip Tetlock’s study titled Future Babble, that makes for quite an interesting read. Nate Silver gives three reasons why he has succeeded with his predictions:

  • Think Probabilistically
  • Update your Probabilities
  • Look for Consensus

If you read it from a stats perspective, then the above three reasons are nothing but, form a prior, update the prior and create a forecast based on the prior and other qualitative factors. The author makes a very important distinction between “objective” and “quantitative”. Often one wants to be former but sometimes end up being latter. Quantitative gives us many options based on how the numbers are made to look like. A statement on one time scale would be completely different on a different time scale. “Objective” means seeing beyond our personal biases and prejudices and seeing the truth or at least attempting to see the truth. Hedgehogs by their very nature stick to one grand theory of universe and selectively pick things to confirm to their theory. In the long run they lose out to foxes that are adaptive in nature and update the probabilities and do not fear making a statement that they don’t know something or they can only make a statement with a wide variability.

I have seen this Hedgehog Vs. Fox analogy in many contexts. Ricardo Rebanato has written an entire book about it saying volatility forecasting should be made like a fox rather than a hedgehog. In fact one of the professors at NYU said the same thing to me years ago,” You don’t need a PhD to do well in Quant finance, You need to be like a fox and comfortable with alternating hypothesis for a problem. Nobody cares whether you have a grand theory for success in trading or not. Only thing that matter is whether you are able to adapt quickly or not.”

One thing this chapter made me think was about the horde of equity research analysts that are on the Wall Street, Dalal Street and everywhere. How many of them have a Bayesian model of whatever securities they are investing? How many of them truly update the probabilities based on the new information that flows in to the market? Do they simulate for various scenarios? Do they active discuss priors and the various assigned probabilities? I don’t know. However my guess is only a few do as most of the research reports that come out contain stories, spinning yarns around various news items, terrific after the fact analysis but terrible before the act statements.

All I care about is W’s and L’s( Baseball Player Performance Prediction)

If you are not a baseball fan but have managed to read “Money ball” or watched the same titled movie starring Brad Pitt, one knows that baseball as a sport has been revolutionized by stat geeks. In the Money ball era, insiders might have hypothesized that stats would completely displace scouts. But that never happened. In fact Billy Beane expanded the scouts team of Oakland A’s. It is easy to get sucked in to some tool that promises to be the perfect oracle. The author narrates his experience of building one such tool PECOTA. PECOTA crunched out similarity scores between baseball players using nearest neighbor algorithm, the first kind of algo that you learn in any machine learning course. Despite its success, he is quick to caution that it is not prudent to limit oneself to gather only quantitative information. It is always better to figure out processes to weigh the new information. In a way this chapter says that one cannot be blinded by a tool or a statistical technique. One must always weight every piece of information that comes in to the context and update the relevant probabilities.

The key is to develop tools and habits so that you are more often looking for ideas and information in the right places – and in honing the skills required to harness them in to wins and losses once you have found them. It’s hard work.(Who said forecasting isn’t?)

For Years You have been telling us that Rain is Green( Weather Prediction)

This chapter talks about one of the success stories in prediction business, “weather forecasting”. National Hurricane Center predicted Katrina five days before the levees were breached and this kind of prediction was unthinkable 20-30 years back. The chapter says that weather predictions have become 350% more accurate in the past 25 years alone.

The first attempt to weather forecasting was done by Lewis Fry Richardson in 1916. He divided the land in to a set of square matrices and then used the local temperature, pressure and wind speeds to forecast the weather in the 2D matrix. Note that this method was not probabilistic in nature. Instead it was based on first principles that took advantage of theoretical understanding of how the system works. Despite the seemingly commonsensical approach, Richardson method failed. There are couple of reasons, one Richardson’s methods required awful lot of work. By 1950, John Von Neumann made the first computer forecast using the matrix approach. Despite using a computer, the forecasts were not good because weather conditions are multidimensional in nature and analyzing in a 2D world was bound to fail. Once you increase the dimensions of analysis, the calculations explode. So, one might think with exponential rise in computing power, weather forecasting problem might have been a solved problem in the current era. However there is one thorn in the flesh, the initial conditions. Courtesy chaos theory, a mild change in the initial conditions gives rise to a completely different forecast at a given region. This is where probability comes in. Meteorologists run simulations and report the findings probabilistically. When someone says there is 30% chance of rain, it basically means that 30% of their simulations showed a possibility of rain. Despite this problem of initial conditions, weather forecasting and hurricane forecasting have vastly improved in the last two decades or so. Why? The author gives a tour of World Weather office in Maryland and explains the role of human eyes in detecting patterns in weather.

In any basic course on stats, a healthy sense of skepticism towards human eyes is drilled in to students. Typically one comes across the statement that human eyes are not all that good at figuring out statistically important patterns, i.e. pick signal from noise. However in the case of weather forecasting, there seems to be tremendous value for human eyes. The best forecasters need to think visually and abstractly while at the same time being able to sort through the abundance of information that the computer provides with.

Desperately Seeking Signal ( Earthquake Prediction)

The author takes the reader in to the world of earthquake prediction. An earthquake occurs when there is a stress in one of the multitude of fault lines. The only recognized relationship is the Gutenburg- Ritcher law where the frequency of earthquakes and the intensity of earthquakes form an inverse linear relationship on a log-log scale. Despite this well known empirical relationship holding good for various datasets, the problem is with temporal nature of the relationship. It is one thing to say that there is a possibility of earthquake in the coming 100 years and completely different thing to say that it is going to hit in between Xth and Yth years. Many scientists have tried working on this temporal problem. However a lot of them have called quits. Why? It is governed by the same chaos theory type dependency of initial conditions. However unlike the case of weather prediction where science is well developed, the science of earthquakes is surprisingly missing. In the absence of science, one turns to probability and statistics to give some indication for forecast. The author takes the reader through a series of earthquake predictions that went wrong. Given the paucity of data and the problem of over fitting, many predictions have gone wrong. Scientists who predicted that gigantic earthquakes would occur at a place were wrong. Similarly predictions where everything would be normal fell flat on the face when earthquakes wreathed massive destruction. Basically there has been a long history of false alarms.

How to Drown in Three Feet of Water(Economic variable Prediction)

The chapter gives a brief history of US GDP prediction and it makes abundantly clear that it has been a big failure. Why do economic variable forecasts go bad ?

  1. Hard to determine cause and effect
  2. Economy is forever changing
  3. Data is noisy

Besides the above reasons, the policy decision effect the economic variable at any point in time. Thus an economist has a twin job of forecasting the economic variable as well as policy. Also, the sheer number of economic indicators that come out every year is huge. There is every chance that some of the indicators might be correlated to the variable that is being predicted. Also it might turn out that an economic variable is a lagging indicator in some period and leading indicator in some other period. All this makes it difficult to cull out the signal. Most often than not the economist picks on some noise and reports it.

In one way, an economist is dealing with a system that has similar characteristics of a system dealt by meteorologist. Both weather and economy are highly dynamic systems. Both are extremely sensitive to initial conditions. However meteorologist has had some success mainly because there is some rock solid theory that helps in making predictions. Economics on the other hand is a soft science. So, given this situation, it seems like predictions for any economic variable are not going to improve at all .The author suggests two alternatives:

  1. Create a market for accurate forecasts – Prediction Markets
  2. Reduce demand for inaccurate and overconfident forecasts – Make margin of error reporting compulsory for any forecast and see to it that there is a system that records the forecast performance. Till date, I have never seen a headline till date saying ,” This year’s GDP forecast will be between X% and Y %”. Most of the headlines are point estimates and they all have an aura of absolutism. May be there is a tremendous demand for experts but we don’t have actually that much demand for accurate forecasts.

Role Models (Epidemiological predictions)

This chapter gives a list of examples where flu predictions turned out to be false alarms. Complicated models are usually targeted by people who are trying to criticize a forecast failure. In the case of flu prediction though, it is the simple models that take a beating. The author explains that most of the models used in flu prediction are very simple models and they fail miserably. Some examples of scientists trying to get a grip on flu prediction are given. These models are basically agent simulation models. However by the end of the chapter the reader gets a feeling the flu prediction is not going to easy at all. In fact I had read about Google using search terms to predict flu trends. I think the period was 2008. Lately I came across an article that said Google’s flu trend prediction was not doing that good!. Out of all the areas mentioned in the book, I guess flu prediction is the toughest as it contains multitude of factors, extremely sparse data and no clear understanding about how it spreads.

Less and Less and Less Wrong

The main character of the story in this chapter is Bob Voulgaris, a basketball bettor. His story is a case in point of a Bayesian who is making money by placing bets in a calculated manner. There is no one BIG secret behind his success. Instead there are a thousand little secrets that Bob has. This repertoire of secrets keeps growing day after day, year after year. There are ton of patterns everywhere in this information rich world. But whether the pattern is a signal or noise is becoming increasing difficult to say. In the era of Big Data, we are deluged with false positives. There is a nice visual that I came across that excellently summarizes the false positives of a statistical test. In one glance, it cautions us to be wary of false positives.


The chapter gives a basic introduction to Bayes thinking using some extreme examples like, what’s the probability that your partner is cheating on you ? If a mammogram shows gives a positive result, what’s the probability that one has a cancer ?, What’s the probability of a terrorist attack on the twin towers after the first attack? These examples merely reflect the wide range of areas where Bayes can be used. Even though Bayes theory was bought to attention in 1763, major developments in the field did not take place for a very long time. One of the reasons was Fisher, who developed frequentist way of statistics and that caught on. Fischer’s focus was on sampling error. In his framework , there can be no other error except sampling error and that reduces as sample size approaches the population size. I have read in some book that the main reason for popularity of Fisher’s framework was that it contained the exact steps that an scientist needs to follow to get a statistically valid result. In one sense, he democratized statistical testing framework. Fisher created various hypothesis testing frameworks that could be used directly by many scientists. Well, in the realm of limited samples, limited computing power, these methods thrived and probably did their job. But soon, frequentist framework started becoming a substitute for solid thinking about the context in which hypothesis ought to be framed. That’s when people noticed that frequentist stats was becoming irrelevant. In fact in the last decade or so, with massive computing power, everyone seems to be advocating Bayesian stats for analysis. There is also a strong opinion of replacing the frequentist methodologies completely by Bayesian Paradigm in the schooling curriculum.

Rage against the Machines

This chapter deals with chess, a game where initial conditions are known, the rules are known and chess pieces move based on certain deterministic constraints. Why is such a deterministic game appearing in a book about forecasting ? Well, the reason being that, despite chess being a deterministic game, any chess game can proceed in one of the 1010^50, i.e. the number of possible branches to analyze are more than the number of atoms in the world. Chess comprises of three phases, the opening game, the middle game and the end game. Computers are extremely good in the end game as there are few pieces on the board and all the search path of the game can be analyzed quickly. In fact all the end games with six or fewer pieces have been solved. Computers also have advantage in the middle game where the game complexity increases and the computer can search an enormously long sequence of possible steps. It is in the opening game that computers are considered relatively weak. The opening of a game is a little abstract. There might be multiple motives behind a move, a sacrifice to capture the center, a weak move to make the attack stronger etc. Can a computer beat a human ? This chapter gives a brief account of the way Deep Blue was programmed to beat Kasparov. It is fascinating to learn that Deep Blue was programmed in ways much like how a human plays a game. The banal process of trial and error. The thought process behind coding Deep Blue was based on questions like :

  • Does allotting the program more time in the endgame and less in the midgame improve performance on balance?
  • Is there a better way to evaluate the value of a knight vis-à-vis a bishop in the early going?
  • How quickly should the program prune dead-looking branches on its search tree even if it knows there is some residual chance that a checkmate or a trap might be lurking there?

By tweaking these parameters and seeing how it played with the changes, the team behind Deep Blue improved upon slowly and eventually beat Kasparov. I guess the author is basically trying to say that even in such deterministic scenarios, trial and error,fox like thinking is what made the machine powerful.

The Poker Bubble

This chapter is an interesting chapter where the author recounts his experiences with playing poker, not merely as a small time bystander but as a person who was making serious money in six figures in 2004 and 2005. So, here is a person who is not giving some journalistic account of the game. He has actually played the game, made money and he is talking about why he succeeded. The author introduces what he calls prediction learning curve where if you do 20% of things right, you get 80% of the times forecasts right. Doing this and making money in a game means there must be people who don’t do these 20% of the things right. In a game like poker, you can make money if there are enough suckers. Once the game becomes competitive and suckers are out of the game, the difference between an average player and an above average player in terms of their winning stakes is not much. In the initial years of Poker bubble, every person wanted to play poker and become rich quickly. This obviously meant that there were enough suckers in the market. The author says he was able to make money precisely because of the bubble. Once the fish were out of the game, it became difficult for him to make money and ultimately the author had to give up and move on. The author’s message is

It is much harder to be very good in fields where everyone else is getting the basics right—and you may be fooling yourself if you think you have much of an edge.

Think about stock market. As the market matures, the same lots of mutual fund managers try to win the long only game, the same options traders try to make money off the market. Will they succeed? Yes if there are enough fish in the market. No, if the game is played between almost equals. With equally qualified grads on the trading desks, with the same colocated server infra, can HFTs thrive ? May be for a few years but not beyond that, is the message from this chapter.

The author credits his success to picking his battles well. He went in to creating software for measuring and forecasting baseball player’s performance in the pre-money ball era. He played poker when there was a boom and where getting 20% of things right could reap good money for him. He went in to election outcome forecasting when most of the election experts were not doing any quantitative analysis. In a way, this chapter is very instructive for people trying to decide on the fields where their prediction skills can be put to use. Having skills alone is not enough. It is important to pick the right fields where one can apply those skills.

If you can’t beat ‘em(Stock Market Forecasting)

The author gives an account of a prediction markets site, Intrade run by a Wharton professor Justin Wolfers. These markets are the closest thing to Bayes land where if you have believe in certain odds and see that there is someone else having a different odds for the same event, you enter in to a bet and resolve the discrepancy. One might think that stock markets also perform something similar, where investors with different odds for the same event settle their scores by entering in to a financial transaction. However the price is not always right in the market. The chapter gives a whirlwind tour of Fama’s efficient market theory, Robert Shiller’s work, Henry Blodget’s fraud case etc. to suggest that market might be efficient in the long run but the short run is characterized by noise. Only a few players benefit in the short run and the composition of the pool changes from year to year. Can we apply Bayes thinking to markets ? Prediction markets are something that is close to Bayes land. But markets are very different. They have capital constraints, horizon constraints, etc. Thus even though your view is correct, the market can stay irrational for a longer time. So, applying Bayesian thinking to markets is a little tricky. The author argues that market is a two way track, one that is driven by fundamentals and pans out in the long run correctly, the second is a fast street that is populated by HFT traders, algo traders, noise traders, bluffers etc. According to the author, Life in the fast lane is high risk game that not many can play and sustain over a period of time.

A climate of healthy Skepticism(Climate Prediction)

This chapter talks about the climate models and the various uncertainties/issues pertinent to building such long range forecasting models.

What you don’t know can hurt you (Terrorism Forecasting)

This chapter talks about terrorist attacks, military attacks etc. and the contribution of having a Bayes approach. Post Sept 11, the commission report identified “failure of imagination” as one of the biggest failures. The Nationality security just did not imagine such a thing would happen. Basically they were completely blinded to a devastation of such scale. Yes, there were a lot of signals but all of them seem to make sense after the fact. The chapter mentions Aaron Clauset, a professor at the University of Colorado who compares a terrorist attack prediction to that of an earthquake prediction. One known tool in the earthquake prediction domain is the loglog scale plot of frequency to the intensity. In the case of terrorist attacks, one can draw such a plot to at least acknowledge that an attack that might kill a million Americans is a possibility. Once that is acknowledged the terrorist attacks falls under known unknown category and at least a few steps can be taken by national security and other agencies to ward off the threat. There is also a mention of Israeli approach to terrorism where the Israeli govt. makes sure that people get back to their normal lives soon after a bomb attack and thus reducing the “fear” element that is one of the motives of a terrorist attack.


The book is awesome in terms of its sheer breadth of coverage. It gives more than a bird’s eye view of forecast / prediction performance in the following areas:

  • Weather forecasts
  • Earthquake forecasts
  • Chess strategy forecasts
  • Baseball player performance forecasts
  • Stock market forecasts
  • Economic variable forecasts
  • Political outcome forecasts
  • Financial crisis forecasts
  • Epidemiological predictions
  • Baseball outcome predictions
  • Poker strategy prediction
  • Climate prediction
  • Terrorist attack prediction

The message from the author is abundantly clear to any reader at the end of the 500 pages. There is a difference between what we know and what we think we know. The strategy to closing the gap is via Bayesian thinking. We live in an incomprehensibly large universe. The virtue in thinking probabilistically is that you will force yourself to stop and smell the data—slow down, and consider the imperfections in your thinking. Over time, you should find that this makes your decision making better. “Have a prior, collect data, observe the world, update your prior and become a better fox as your work progresses” is the takeaway from the book.


This book takes a rather difficult topic, “algorithmic complexity”, and explains it in a way that any reader with a bit of curiosity towards algorithmic world can understand most of its contents. This is actually not a book in the traditional sense of it. ETH Zurich offered a public lecture series called, “ The Open Class – Seven Wonders of Informatics” in the fall of 2005 and this book has been written based on those lecture series. 

To begin with, the title needs a bit of explanation. The phrase “From Knowledge to Magic” is meant to represent the transition from a deterministic algorithmic world to a randomized algorithmic world, a transition from a world of bits and bytes to a world of molecules and DNA, a transition from something that we know that is currently possible in to something that appears magical. What are randomized algorithms and deterministic algorithms ? In fact what are algorithms ? How does one compute using molecules ? What is an NP-hard problem ? These and many more such questions answered in the book. Do you know that a particular version of Traveling Salesman problem that is considered as NP-hard has been solved by a DNA computer ? I didn’t know this fact until I read this book. The book starts off with the basic definition of “algorithm” and takes a reader all the way to visualizing a few scenarios that could represent the world of computing 20-30 years from now, may be even earlier.

A Short Story about the development of Computer Science


Why Computer Science Is Not a Computer Driving Licence

The chapter starts off with a question, “Why does public opinion equate the facility to use specific software packages to computer science , while most of us clearly distinguish basic research in physics vs. the technical application in electrical engineering?”.  As an answer to “What is Compute Science?”, the author writes

The main problem with answering this question is that computer science itself does not provide a clear picture to people outside the discipline. One cannot classify computer science unambiguously as a metascience such as mathematics, or as a natural science or engineering discipline. The situation would be similar to merging physics and electrical and mechanical engineering into one scientific discipline. From the point of view of software development, computer science is a form of engineering, with all features of the technical sciences, covering the design and development of computer systems and products. In contrast, the fundamentals of computer science are closely related to mathematics and the natural sciences. Computer science fundamentals play a similar role in software engineering as theoretical physics plays in electrical and mechanical engineering. It is exactly this misunderstanding of computer science fundamentals among the general public that is responsible for the bad image of computer science.

Any scientific discipline is built on certain notions. These notions are like self-evident truths and are called axioms. If you take probability theory, it took more than 2 centuries of dabbling until it was described in a consistent mathematical language. Is Probability theory water tight ? Yes, in a way because everything flows from the basic axioms, No because we take axioms for granted. For example, the quantum physics world is vastly different and maybe there is a need to relook at the entire theory. It might seem that giant scientific disciplines stand on these axioms and can fall off any time. The scientific researchers in fact are always looking to take a crack at axioms and other notions of a scientific discipline because it might result in a better understanding of the entire subject.

It was David Hilbert who pushed the notion of cause and effect in Mathematics and strove for the ideal world of water tight and perfect mathematical framework. In 1931, Kurt Godel definitively destroyed all dreams of building such a perfect mathematics. Basically Godel’s work says that building mathematics as a formal language of science is an infinite process. The author writes,

The result of Godel were responsible for the founding of computer science. Before Godel nobody saw any reason to try and give an exact definition of the notion of a method. Such a definition was not needed, because people only presented methods for solving particular problems. The intuitive understanding of a method as an easily comprehensible description of a way of solving a problem was sufficient for this purpose. But when one wanted to prove the nonexistence of an algorithm (of a method) for solving a given problem, then one needed to know exactly (in the sense of a rigorous mathematical definition) what an algorithm is and what it is not. Proving the nonexistence of an object is impossible if the object has not been exactly specified.

Hence there was a need for formal definitions in Computer Science. The first formal definition of an algorithm was given by Alan Turing in 1936 and later further definitions followed. This definition made a demarcation between the problems that can and cannot be solved using an algorithms/ computers.  Thanks to the exact definition of what an algorithm is, one was able to investigate the borderline between the automatically solvable and unsolvable.



What Have Programming and Baking in Common?

This chapter explains the meaning of algorithm via a simple example – “baking a cake”. Since the book is based on public lectures , it is no surprise that the content goes in great detail to explain in layman terms as to what happens exactly with in the internals of the computer when a program is executed.

Infinity is not equal to Infinity Infinity Is Not Equal to Infinity,


Why Infinity Is Infinitely Important in Computer Science

The chapter gives a background to the wonderful concept of infinity and gives a good explanation for the following questions :

  • What does it mean for a set to be infinite ?
  • How does one compare two infinite sets ?
  • Are there different sizes of infinities ?
  • How does on prove that set of real numbers is a greater set (higher cardinality) than the set of rational numbers ?

The point of this chapter is to equip the reader with the “not so intuitive” nature of infinity so that he/she can grasp the concepts around algorithmically solvable problems and unsolvable problems.

Limits of Computability,


Why Do There Exist Tasks That Cannot Be Solved Automatically by Computers

How many programs can ever be written? The author shows that this question can be answered by matching every program that can ever be written to a natural number. Hence the first basic principle to be grasped is that the cardinality of the set containing all the programs is same as the cardinality of N. Algorithms are nothing but programs that run in finite time. So, they are a proper subset of programs set.

The chapter introduces the first problem that no algorithm can solve. The task is called Problem(c) that takes a natural number n and outputs a number c up to n decimal digits after the decimal points. Since the cardinality of R is greater than N and there are |N| programs, and there exists a c for which there is no algorithm that can be written. So, the same Cantor’s diagonalization is used to show that there is a problem for which no algo exists.

Obviously who is interested in a task such as Problem (c). So, the author takes on a practical problem, a decision problem, that cannot be solved by any algo. Simply stated, a decision problem is one that takes a natural number n as input and it outputs YES if n belong to a certain set M and NO if n does not belong to a set M. On the face of it, it seems a straightforward task and that a simple algo can do the job. For example if the set M is the set of even numbers, given a number n, it is easy to decide whether it falls in to the set of even numbers. Similarly if M is a set of primes, a few lines of code is enough to check whether n falls in the set of primes. However the fact is that this simple problem is devious and no algo can solve this problem. The proof of this statement lies in doing a nifty trick on the Diagonalization argument. You have got to read through the proof to understand its magic. Thus the author shows the first practical problem, a decision problem that cannot be algorithmically solved. This is denoted by (N, M(DIAG)). Reduction method is used to propagate this hard problem in to a number of problems that cannot be solved algorithmically. By the end of the chapter, the reader gets a fair idea that all non trivial semantic questions like the following are not algorithmically solvable

  • What does a given program compute? Which problem does the program solve
  • Does the program developed solve the given problem?
  • Does a given problem halt on a given input or does a program always halt ?
  • Does a given program accept a given input ?

Complexity Theory,


What to Do When the Energy of the Universe Doesn’t Suffice for Performing a Computation?

As the research on the problems that were algorithmically solvable or not, reached its maturity, the next issue that was dealt by research scientists was, “How does one classify the problem as practically solvable or unsolvable?” First one must get some clarity on what does “practically solvable” mean. The chapter defines time complexity and space complexity of an algorithm and uses these concepts to explain the type of algorithms that are not practical. An extreme case would be to think in terms of age of universe that is less than 1018 seconds. Let’s say you have been running an algo on your laptop, 1 Gig machine since the start of universe. Basically you have let the computer do 1018 times 109 instructions , i.e. 1027 instructions . If the algo is of time complexity n! or 2n, then the algo can at most work on a few hundred numbers only. So, any algo whose time complexity is exponential or factorial can be safely assumed to be practically unsolvable. So, what are the kinds of problems that can be deemed practically solvable ? There is no formal proof for this, but scientist have agreed that if the time complexity of an algorithm to solve a problem is of polynomial order, then the problem can be practically solved.

There are two reasons for acceptance of this definition. First is a practical one. It has always been the case that if a polynomial running time algo was devised, people have always figured out a better algo with lesser degree polynomial. The second reason is a theoretical reason : you cannot fix a specific degree for the polynomial and declare that as the boundary line. What if, the same algo runs a bit faster on a different environment, different programming language etc?

The main task of complexity theory is to classify concrete computing problems with respect to their degree of hardness measured by computational complexity. So, how does one go about classifying a problem as a hard problem or not ? Since one is unable to design lower bounds on complexity of concrete problems, a indirect  method is used to classify whether a problem is NP-hard or not. There are about 4000 problems that are considered as hard problems and cannot be solved by an algo efficiently.

Let’s say you have thought about an algo A and you want to know whether it is NP-hard. The way to do it is , check whether the assumption of polynomial running time for your algo would mean a polynomial running time for any of the 4000 NP-hard problems. If that is the case your assumption is wrong and hence your algo is NP-hard. It’s a convoluted argument but a plausible one and that has stood the test of time.

Now it is perfectly possible that the now classified 4000 problems themselves are not NP-hard. For example Traveling Salesman problem is NP-hard but with DNA computing it is possible in the future that all variants of TSP can be solved in polynomial time. So, may be the subset of 4000 problems might be reduced and the ones you thought were NP-hard are no longer NP-hard. But till then the method to check whether a problem is NP-hard or not is via the indirect method. This concept is not easier to grasp for a first timer and hence the author provides enough examples and visuals to drive home this point.

At the end of this chapter, a reader will get a fair idea of the kind of tasks that are deemed to be practically unsolvable. For example, I can now understand and make sense of the statement that I came across a few days ago in graph theory – “ Finding a Hamiltonian Cycle in Graph is NP-hard problem”. This means that there is no algo till date that has been developed that has polynomial time complexity.


Randomness in Nature and as a Source of Efficiency in Algorithmics

The reader will feel that the word “magic” in the title is apt, after reading this chapter. Randomized algorithms are introduced in this chapter. What are they ? There are many tasks for which deterministic algos take more than the age of universe and are deemed practically impossible. In all such cases, uncertainty comes to the rescue. By giving up on absolute reliability of the output and allowing for a small error probability, algorithms can be devised to have have randomization as one of the steps. This randomization could range from merely selecting a number at random from a set to simulating something at random and using it for algo processing. This randomization allows for the task to be done in a far less time than any deterministic algo. The chapter uses an example of communication protocol for explaining the power of randomized algorithm. This example makes the reader abundantly clear that any kind of insane reliability can be obtained by repeating several randomized computations on the same problem instance. For many applications on the internet, randomized algorithms are the solutions to many problems that take insane time or space by a deterministic algo.



How to transform drawbacks in to advantages?

Cryptography is also a topic that lives up to the subtitle of the book, “from knowledge to magic”. Thinking of secure communication between multiple parties is not possible without randomized algorithms. The chapter gives the logic behind symmetric cryptosystems and then explains its limitations. It then goes on to describing the math behind public key-private key cryptography system that is the dominant mode of secure communication on the internet. At the heart of its success lies the fact there are one way functions that cannot be inverted easily. No wonder the best number theorists in the world contributed a ton of stuff to developments in Cryptography.

The last few chapters are like reading stuff from a science fiction book, but with a difference. The stories mentioned here are happening in various labs around the world. DNA computing for instance is the use of molecules for computation. The following visual is a projection of how computing will be done years from now.



Massive parallel computing can be done via DNA computing. This might make all the 4000 or so NP-hard problems known till date crumble down in to doable tasks. In fact Adelman, a scientist has solved an instance of Traveling Salesman problem in his lab. Though there are many aspects of DNA computing that are yet to worked on, it will not be surprising ( given the way PC and internet revolution happened so quickly relatively speaking) to see DNA computing device in every scientist’s lab in the near future.


imageTakeaway :

The book shows the beauty, depth and usefulness of the key ideas in computer science and makes a reader understand and appreciate the “science” part of “computer science”.


Was recovering from a brief illness. Tried reading this book just to recover from my drowsy and sullen mood.

I found the first part of this book interesting. Given the amount of information overload it often helps us to understand how our brain functions. “How do we use our brains for understanding, deciding, recalling, memorizing and inhibiting information ?” is an important question that we all need to answer, to function efficiently in our lives. I loved the initial part of the book because it uses the metaphor of stage and audience to explain the scientific aspects of our brain, more specifically the prefrontal cortex. Also the book is organized in such a way that it is presented as a play with various scenes(like a theater play). Each scene has two takes, first take is one where the actor has no clue on how the brain works and messes it up, and the second take is where the actor performs in full cognizance of the workings of the brain.

Firstly , about the stage-audience metaphor for prefrontal cortex


The prefrontal cortex that comprises 5 to 10 % of the brain is responsible for some of the most important tasks that we do in our daily lives. Best things do indeed come in small packages!. The author uses the metaphor of the stage to explain the various functionalities of the prefrontal cortex

  • Understanding : Bringing new actors on the stage and see how they connect to the audience
  • Deciding: To Put actors(external or from the audience) on the stage and compare them to one another
  • Recall: Bring members from the audience on to the stage , the audience from the front seats representing short term memory and back seats representing long term memory.
  • Memorize: Moving the actors from the stage and make them sit as audience, be it in the front row or the back row
  • Inhibition: Not allowing some actors to get on to the stage

Also, the lights on the stage keep dimming as time passes. The only way to bring back the brightness is to energize yourself with constant breaks, exercise , pursuing various activities , i.e mixing it up. Any prioritizing activity takes up a lot of energy and hence you need to do such a task at the very beginning of the day when you are lights are bright on the stage, i.e your energy levels are high.

Since the stage is very small, one must be careful to organize the stage in such a way that the act is pleasant to watch by the audience. Bringing too many actors on the stage is NO-NO. Simplifying the stage, the number of actors, chunking the scene in the specific sequences are some of the actions one can take, to reconcile with the limited stage space. Also it so happens that the audience in the front row always want to come on to the stage and they need not be the most useful actors for the specific act (example, for critical decisions, the things in the immediate memory are not always important. Sometimes actors who are sitting way back in the audience might be extremely important)

Also just as a theater act, only one actor is allowed to speak at a time. You can put how many actors you want on the stage,obviously the lesser the better. When the scene starts, only one actor can act. This is the basic limitation of prefrontal cortex. Not only is the stage limited (the number of items that you can have on it limited) but also, what you the actors can do is also limited. This essentially means that single tasking is usually the best way to go. Whenever more than one task is done, accuracy drops. If you reply to emails + talk to someone + be on a conference call+ decide the venue for the dinner, all at once, then all the tasks will suffer to some extent or the other.

The book describes an interesting experiment that shows that developing a language for an activity enables us to catch yourself before doing that specific activity. This means if we have the language to describe the feeling of having too much on stage at once, we will be more likely to notice it. So, by giving explicit language, metaphors, analogies, terms for various functions of brain that are known implicitly to many, this book aims to help us stage the entire play (our lives) in a better way.

Talking of distractions and how they kill our ability to job efficiently, the book says that recognizing the distraction and vetoing it is very important. This requires cutting down distractions at the very source. Meaning it is better to work with mobile phone switched off than working with a visible missed call, working with email program closed is better than email box open with unread mail populating,etc. Simple actions can cause quite a bit of improvement in the way we can manage distractions. More importantly, the fact that we have a language to talk about and take cognizance of these distractions, help us to veto them.

Part I of the book ends with the author talking about ways to get over impasse. He quotes experiments forms scientific literature that says that breakthroughs, insights often come from shutting off the stage completely,i.e., instead of living in limited stage space of prefrontal cortex amongst the audience and external actors, it is better to shut off the stage completely and explore completely different paths.

The book then introduces the “Director” of the play, i.e. ‘Mindfulness’. If the stage is a metaphor for narrative circuit in our brains, director is a metaphor for experiencing circuit. The director can observe the play, the actors, scenes etc. and has the power to control them. Sometimes most of us only operate only with our default network, i.e the stage where actors seem to be dropping by with out any control. We are never directly experiencing anything completely. Even if we are reading a good book/ watching a movie / seeing a play/ sitting on a beach, our thoughts are far away from the stuff that we are experiencing. This is mainly because our director is missing and the stage is out of control.

Part II of the book is about emotional functions in the brain. Called the Limbic System, this is the seat of emotions that help us take millions of little decisions in our daily lives. In fact that is what makes us human. The downside is that when there is an over-arousal, we tend to under perform.This causes scenes on the stage to go haywire. Unnecessary actors get on to the stage, Director goes missing , wrong dialogs are uttered by the actors etc. The content in this part of the book says that you can get out of this over-arousal tendency by either labeling an emotion or reappraisal of the emotion. Both are easier said than done but by constant practice , you can see to it that director and stage is intact whenever there is amygdala hijack. Another way to save from emotional hijack is altering your perception about the event.

The last 2 parts of the book talk about things that crop up in social interactions and change management areas..More MBA style content in the last 2 parts and hence it was, needless to say, damn boring.

imageTakeaway :

By presenting “stage” as metaphor for brain’s “prefrontal cortex” and “director” as a metaphor for “mindfulness”, the book is pretty engaging for the first 100 pages. The rest is crap!


Any book that promises a journey spanning 300 years is bound to focus on events that / people who made the maximum impact for the development of option pricing formula. If one were to pause and think about the option pricing formula, one would come up with questions like

  • Pretty naïve question to start with, Why is there a need for options, in the first place? How were the traded historically? Were they precursor to some other instruments?
  • What factors drive option prices? Can one mathematically pin down a few factors, assuming others are constant?
  • Option by default depends on the movement of the underlying security. So , how to mathematically describe the underlying security?
    • Should one form an equation for the price?
    • Should one form an equation for price increments?
    • Should one assume a discrete process or a continuous process, after all traded price series is a discrete time series?
  • Given a specific process for the underlying security, How does one go about figuring out the price of option?

This book in a way traces all the developments leading to Black Scholes equation like the Brownian motion, Ito’s calculus, Kolmogorov forward and backward equations,etc. and leading up to the most important idea of option prices, “replication”. Each of these ideas are described chapter wise. Let me summarize briefly, the chapters in this book.

Flowers and Spices


The book starts off describing Tulip mania of 1630’s and the reason it talks about Tulip mania is this : It was the first instance when government, in order to come out of a crisis, converted forward contracts to options contracts. The chapter then talks about Dutch East India company that dealt in Spices. The history of this company is closely linked to the emergence of the first formal market for options. Dutch East India company (VOC) became a powerful company in Netherlands just a few years from its inception. The shares of the company became valuable and a futures market emerged. Subsequently to cater to various kinds of demands, options market emerged for VOC shares. VOC finally got bogged down in corporate politics, corruption and became bankrupt. The period of 1680s was also the time when there was a need for communicating to general public about the ways to trade and understand options. To clarify the various terms and mechanics of options, Joseph de la Vega wrote extensively in his book,“Confusion of Confusions”. De la Vega was the first to describe in writing the workings of the stock market, in particular the trading of options. His work is held at such a high esteem that, since the year 2000, the federation of European Securities exchanges sponsor an annual prize for “outstanding research paper related to the securities markets in Europe”.

In the Beginning


This chapter contains some important developments that happened in the financial markets between 1694 and 1885.The chapter starts off in 1694 with John Law advising French king on restoring the financial stability of the country. John Law started a firm in France that printed paper money. It was the first attempt in Europe to replace metal coins with paper money as legal tender. The bank, forerunner of France central bank, guaranteed that the notes would always be worth as much as the metal coins on which they were based. Law also convinced the king to grant him powers for a natural resource trading company so that he can bring in more revenues in to the country. Law’s company became very popular and there was a mad scramble amongst people to buy shares of the company. Like any bubble, the printing machine idea flopped by 1720 and France was facing an imminent financial disaster again. However the taste of trading and speculation activity that Law gave to French citizens was still in full force. Unofficial trading of various other instruments increased. So, finally in 1724, the government decided to bring some order in to this situation and an official Bourse was created. The whole system ran well until the French revolution in 1789 after which chaos ensued. There were few more developments that lead to the reopening of Paris Bourse and this time everybody was allowed to trade. Again forwards were outside the regulation but it did not stop the increasing volumes in the instruments and soon became the hub of speculators. Also with the collapse of a big investment bank, France went in to a recession. Out of all these developments, there was one positive development,legalization of forward market in 1885.

From Rags to Riches


This chapter talks about the life of Jules Regnault. It is a classic rags to riches story. Why is the story relevant to the book or options pricing ? Well, Jules Regnault was the first person at least as per the book who deduced square root of time law. He not only tried proving it using math, but also used the law in the stock markets to retire rich. He started his work life at the age of 23 as a broker assistant. His living conditions were miserable but he managed to improve his living conditions by working hard in the broker’s office. Regnault was the first person to try to understand the workings of the stock exchange in mathematical terms, and his explanations had all the trappings of a scientific theory. Regnault managed this with hardly any formal education system. After a full day’s work at the Bourse, he would sit in a small room under the attic and do quant. Truly inspiring life. Remember this was in 1860s and he was investigating concepts such as random walks, role of information in stock markets, useful/harmful sides of speculation, insider information, factors that drive the valuation of an instrument, ways to calculate fair value of an instrument etc. An admirable quality about Jules Regnault’s life is that he never shied away from applying things in the stock market. He used all his principles at the Bourse and retired at the age of 47 after making a ton of money. In one sense, Jules Regnault can be called the first quant who made a fortune in the market.

The Banker’s Secretary


In any introductory book on options theory, you are likely to see payoff diagrams for various options and option strategies. This chapter talks about the first person to use these diagrams for communicating option positions, Henri Lefevre. Lefevre was a personal secretary to business tycoon James de Rothschild. Lefevre did not participate in speculation or trading activities but was a keen observer and educator of markets. He published various books and articles, thanks to the fact that he was a secretary to Rothschild and could influence the publishers. He made two main contributions to options theory. First contribution was his mental model of comparing economy and the flows of capital and goods with the human body and its organs. In Lefevre model of economy, stock exchange is like the heart that keeps blood moving through the veins, government is like a brain that thinks, combines and regulates the flow, Speculation is like the nervous system that provides the momentum that keeps commodities and capital in continuous motion. His second contribution was in 1873 and 1874 through his books. In those books he introduced a method of analysis that we are all familiar with, the pay off diagrams. The pay off diagrams for individual options might be very simple and a use of such a diagram to explain things could be a stretch. The real power of payoff diagrams comes in to play when you are analyzing a set of option positions. The final payoff diagram for a set of option positions, at once provides the price ranges where the entire position makes or loses money. Lefevre’s method and his extensive publications in the form of books , papers, articles, etc. helped the common masses understand options and options based investing in a better way.

The Spurned Professor


Louis Bachelier

The study of financial markets began in earnest at the turn of twentieth century. The first underpinnings of a mathematical theory were developed in the doctoral thesis of a thirty-year-old French student , Louis Bachelier. In 1900 Bachelier defended his PhD thesis and the committee composed on Poincare, Appell and Boussinesq awarded the thesis as “somewhat better than okay”. This evaluation haunted Bachelier through out his life as he could not secure a faculty position without a “real distinction” on his PhD. One of the reasons for Bachelier’s theory not getting attention was that it was incorrect. He analyzed price movements as a specific random walk ( an Arithmetic Brownian motion ) which allowed for stock prices to take negative values. So, the thesis that was probably the first step towards a mathematical treatment of financial markets lay dormant for a long time.

Botany, Physics and Chemistry


The first observation about the jiggle movement of particles was by a Dutch Scientist Jan Ingenhousz. However the credit goes to Robert Brown for the name. I find Robert Brown’s life very inspiring. He made use of his free time and did all the observations and work after work hours. He never socialized or dined out. Basically he was a guy who kept to himself and did outstanding work. He observed that Brownian motion never ceases though he never knew the reason for Brownian motion. Remember this was 1827 and molecular/atomic theory was not established. Then came along Einstein , aptly named as the father of atomic theory. He hypothesized and predicted the Brownian motion behavior. He also formulated the square root law by clarifying that one must not analyze a single drunkard’s walk but an ensemble of drunkard walks. In 1906 Marian Smoluchowski , another scientist made important contributions to understanding Brownian motion when he postulated that , the observed jittery motions are the displacements due to the unobservable zigzag movements that are a result of huge number of impacts. He concluded this after analyzing the speed of the particles at various resolutions. He saw that the velocity of the particles increased at higher resolutions. This made him conclude that whatever action that is seen in the microscope is basically the displacement. This book provides a fantastic analogy to Brownian motion which one will never forget after reading once. The books says it in a beautiful way:

Imagine the dance floor of a studio. Flashing strobe lights highlight the seemingly jerky movements of the dancers, Of course, even the coolest dancers do not disappear in to thin air between the flashes. They move contiguously through time and space, but their movements are hidden during the dark periods between flashes. In fact, there could be different dance steps that give rise to identical positions during the strobe flashes. This is the key point. Particles pushed around by the molecules in a liquid describe a continuous path, and the jerky motion we see under the microscope is just a sample of the entire movement.

In the next chapter, the book gives an account of Norbert Weiner’s life who proved that even for the jerkiest observations, it is practically certain that continuous paths exists which give rise to them. Typically these historical developments give so much meaning to what one gets to read on a pure mathematical text on Brownian motion. Brownian motion is continuous. “Oh! ok! ” was my first reaction when I studied stochastic calculus. However books such as these give so much context to mathematical properties that learning math becomes that much more interesting.

Disco Dancers and Strobe Lights


It was in 1880 that physicist John William Strutt discovered random walk in a completely different context, super imposition of waves of equal amplitudes, equal periods but random phases ? He came to the conclusion that amplitude was proportional to the square root of number of vibrations( variant of square root rule). Thorvald Nicolai Thiele, another scientist also worked on the random walk , while developing concepts in computational techniques. He used Gauss method of least square to conclude that an ensemble of particles following random walk would have an average displacement proportional to the square root of number of steps. Some 80 years later, this was picked up by another scientist Kalman and today we know a ton of applications of Kalman filter. Karl Pearson, the famous statistician added in a bit with his article in “Nature” magazine. Despite all these efforts, Brownian motion was not on sound mathematical foundation. It was Norbert Weiner who formally defined the process, proved its existence in a 40 page paper in 1923. In 1948, Paul Levy, considered as the founding father of probability theory , defined and constructed another class of random walks called Levy processes. In the last 60 years, more and more scientists studied various classes of random walks such as reflecting random walks, loop-erased random walks, biased random walks, random walks with drifts etc. In 2006, the prestigious Fields Medal, the Nobel prize equivalent in mathematics, was awarded to a French mathematician, Wendelin Werner, for his work on random walks.

The Overlooked Thesis


Regnault had brought Statistics , Lefevre Geometry and Bachelier Calculus to understanding options. This chapter highlights some of the important elements of Bachelier’s PhD thesis that shows how the thesis was so far removed from the traditional way of analyzing finance. The section mentions the following from the thesis

  • The thesis referred to the efficient market hypothesis and formulates the mathematical equivalent of the same.
  • It departs from the tradition of describing economic phenomenon verbally and uses mathematical equations to describe things.
  • It showed that fluctuations of the stock price around true price follows Gaussian distribution.
  • It showed two justifications for the square root of time law.
  • It used Fourier methodology to derive heat equation in the context of price probabilities
  • It showed that Options value depends on the security’s volatility and time
  • It used reflecting principle to reduce the complexity involving Brownian motion calculations.

One of the offshoots of Bachelier thesis was Chapman-Kolmogorov equation that ties in the probability distribution of a variable at the end of a discrete Markov chain to its intermediate probabilities. Books such as these should be made required reading for a student before he/she gets exposed to math relating to Brownian motion. If you learn square root of time law using math and with no prior exposure to the rich history behind the development, you might learnt it but not appreciate the beauty that lies behind it. Similarly you might prove that Brownian path is nowhere differentiable but you will feel the result in a completely different way after reading Theoder Svedberg experiments about calculation of velocity of particles. All said and done, I strongly believe that historical developments about concepts/formulas are very important, sometimes more important than merely knowing a few ways to derivation of Black-Scholes equation.

Another Pioneer

Developments in science tend to be non-linear and controversy laden. Examples of plagiarism accusations are rampant. In such a context, it is but obvious that option pricing formula also has , in its history , some people whom we can only speculate that, they knew about a way to value options much before anyone else. One such individual mentioned in this chapter is Vincenz Bronzin. His work was accidentally discovered by Swiss historian Wolfgang Hafner, who later sent it to Prof Zimmermann. Both concluded that Vincenz Bronzin work in early 1920s had all the necessary math and concepts relating to pricing an option and in fact Bronzin’s work ends up deriving the same formula as Bachelier’s. The fact that his work was never popular/ recognized shows that historical development of any concept is tough to attribute to individuals. You never know that some person who never published stuff might have known the concept right through. Vincenz Bronzin work on option pricing was so advanced that some of the concepts looked similar to what the final Black-Scholes formula looked like, decades later.

Measuring the Immeasurable


I loved this section where various historical figures are mentioned in the development of probability and stochastic processes. Firstly, what has a stochastic process got to do with pricing of options. What’s wrong with using / trying to use a deterministic process ? Well, the basic difference between a stochastic process and deterministic process is, in the latter case you can pin point the result and in the former case, you can only get a probability distribution for the result. So, all diffusion processes, arithmetic random walks and geometric random walks are all nothing but a way to summarize the particle movement and any computations on it will likely result in a probability distribution. Ok, let me get back to summarizing this chapter. This chapter talks about Kolmogorov, the father of modern probability. Picking on one of the Hilbert’s 23 problems that were announced in 1900, Kolmogorov developed a full-fledged axiomatic theory of probability in 1933. In doing so, he heavily relied on the works of Henri Lebesgue, George Cantor and Bachelier. Henri Lebesgue is credited for his revolutionary method of integration called the Lebesgue Integration that is applicable to a wide range of functions and sets of points. Lebesgue benefited from Cantor’s work on real line and introduced the concept of measure. Well , the development of Lebesgue integration is  in itself is a fantastic story and “The Calculus Gallery” covers it in a splendid way. Kolmogorov also derived Fokker Planck equation in his monograph, unaware that the PDE was developed in 1913 and 1914 by two physicists Adriaan Fokker and Max Planck to describe the time evolution of a variable’s probability distribution. The variable could be a particle’s position in a fluid or distribution of stock price. I liked this section because of a nice analogy that gives the difference between Chapman Kolmogorov and Fokker Planck equation. I love analogies as they are the first things that come to your mind than the raw equations. I paraphrase the author’s words here

The Chapman-Kolmogorov equation gives the probability of jumping from A to Z by investigating the probabilities for two consecutive jumps, say from A to P and then from P to Z, or from A to Q followed by a jump from Q to Z, and so on. It is like driving from New York in any direction and ending up in San Francisco. What are the chances of that happening ? Well, you could drive via Dallas, Texas, Via Witchita, Kansas, via Rapid City, South Dakota or via any other route. The probability of ending up in San Francisco is then computed by adding the probabilities of all possible routes. This is what the Chapman-Kolmogorov equation does. The Fokker-Planck equation goes one step further. It develops the whistle stop tours in to a distribution of probabilities of ending up anywhere on the West Coast, be it San Francisco, Santa Barabara, Los Angeles, or Tijuana, Mexico

After formulating the PDE, Kolmogorov found that the solution to the PDE was Chapman-Kolmogorov equation. He then turned the situation around: starting with a parabolic PDE, he pondered on the question whether it had a solution. If the answer was yes, then the solution could be none other than the Chapman-Kolmogorov equation. Thus he proved that Chapman-Kolmogorov equation indeed existed and was not merely a figment of imagination. Further development was needed in this area as there were strict conditions that one had to impose on Kolmogorov’s PDE describing Brownian motion so that it had a solution.

Accounting for Randomness


Kiyoshi Ito

This section talks about the contribution of Kiyoshi Ito. Since the Brownian motion is jagged and jittery at any resolution, one can’t calculate the velocity of the particles, as Marian Smoluchowski concluded in 1905. So how does one go about analyzing any Brownian path if the usual Riemann and Lebesgue Calculus cannot be applied ? Karl Weierstrass was the first mathematician who came up with such a nowhere differentiable function. How does one work with such functions? There is no way to slice the paths and handle it in the usual way. Whatever slice you look at , there will be jaggedness. What’s the way out? Here is where Ito comes in. In 1942 Ito published a paper that contained the mathematical machinery to handle such functions or paths. He used Taylor series expansion and found that the first three terms of the expansion were all that mattered for a stochastic process and thus ignored the rest to come up a forecast of probability distributions. Today Ito’s calculus is synonymous with “framework for handling Stochastic Differential equations”. Ito lived for 93 years and contributed immensely to field of Stochastics. In his Kyoto-Prize address, he says that he devised stochastic differential equations, after painstaking , solitary endeavors. For those who think that Solitude drives people crazy and must be avoided, Ito is a classic example of the tremendous possibilities of Solitude in life.

The Sealed Envelope


Wolfgang Doblin

This section talks about Wolfgang Doblin, a young mathematician who commits suicide while serving in the army to avoid getting killed by Germans. Before his death, he sends a sealed envelope to Academy of Paris. There is a big history behind the sealed envelopes that one can read from this section. The sealed envelope was supposed to be opened in 2040 but thankfully it gets opened in 2000. The math that Doblin sent to the academy had a framework to deal with Stochastic PDEs in such a way that one could lessen the restrictions imposed on it. Mathematicians wonder that if Doblin had not served in the army and had not met the fatal outcome, option pricing would have developed much earlier. In any case, a reader gets a glimpse in to this young genius,thanks to this book. Doblin’s life revolved around math and it served as a way to get him out of his gloomy and depressing environment. Though he only had the rare hour to focus entirely on math, usually during the night shifts hidden away in the telephone station that he was guarding and that provided some heat, his preoccupation with mathematical problems alleviated the dreariness and kept him from falling in to depression In one of his letter to his professor he writes,

Fortunately, my mathematical work helps me fight despair. As I am not interested in alcohol, I do not have the luxury of getting drunk like others.

The Utility of Logarithms


Paul Samuelson

This chapter talks about Paul Samuelson who relooked at Bachelier’s thesis and improvised on it. Instead of considering plain vanilla Brownian motion, Samuelson insisted on geometric brownian motion because of couple of reasons

  • Share values are always positive in GBM scenario
  • Price jumps, in percent, are equally distributed
  • Model is in accordance with human nature, as displayed by Weber and Fechner’s psychology
  • Data observed on stock exchanges fit GBM than Arithmetic Brownian motion

The chapter also mentions M.F.M.Osborne, a Physicist, who claimed in 1958, that log values and not the changes in stock prices are normally distributed. He was motivated to analyze the behavior after reading the works of psychologists Weber and Fechner. This observation is the same utility argument made by Daniel Bernoulli in 1713 while solving St.Petersburg Paradox.

The Nobelists -  The Three Musketeers


Fischer Black – Myron Scholes – Robert Merton

The next two chapters talk about Fischer Black, Myron Scholes and Robert Merton. MIT was one of the common denominators for all the three people. Fischer Black and Myron Scholes worked together on consulting projects for financial firms and thus were aware of markets and the places where quant techniques could be applied. They developed a good working rapport as Black’s forte was theory and Scholes was good at implementation. Robert Merton had worked on pricing warrants along with his MIT professor Samuelson. Warrants were similar to Options but for some technical differences. Before the trio cracked the formula, there were quite a number of scholars who came close to cracking it. In 1961, a graduate student in economics, Case Sprenkle made some headway. He assumed GBM for prices and allowed drift. But his approach was faulty as he posited that investor’s utility function would be revealed in prices. Also he ignored time value of money. In 1962, James Boness another student from Chicago also attempted and improvised on Sprenkle’s model by incorporating time value of money. However there were some problems with his theory as he assumed, all stocks on which the options are traded are defined to be of the same risk class and all investors are indifferent to risk. So, all attempts of Thorp, Case Sprenkle, James Boness, Robert Merton and Samuelson had shortcomings. But they all carried seeds of the eventual solution.

Somewhere in 1968 Fischer Black tried formulating the option price as PDE but faced a tough time solving the PDE. So, he filed it in his cabinet and went on with his work. However in 1969 , while conversing with his sole associate of his company, Myron Scholes, Black got a chance to start the work again. The key idea that was used in cracking the formula was “Portfolio hedging and replication” . If the option is deep in the money, then delta or the hedge ratio is 1. This means you can perfectly hedge a long call option by shorting one share of the underlying contract. If the option is not deep in the money but near ATM, then the hedge ratio is definitely not 1. It must be less than 1. If you represent the hedge ratio as x, then the combination of call and hedge position is basically a risk free portfolio growing at risk free rate. With the unknown hedge ratio as a variable, Black and Scholes wrote an equation describing option pricing.

With the PDE in hand, Black-Scholes made an educated guess that turned out to be the solution for option pricing formula. The solution to Black-Scholes PDE surprisingly did not have stock return component at all. This was totally unexpected from the prevailing notion that option price should somehow contain the return of the underlying instrument as one of its components. But Black-Scholes proved it otherwise. Robert Merton arrived at the same solution using Ito’s calculus. Initially Black had a lot of difficult publishing the paper in academic journals. It was only in 1973 that Black and Scholes managed to publish it under the title,”The Pricing of Options and Corporate Liabilities”, in Journal of Finance and Journal of Political Economy. According to a study undertaken in 2006, the paper garnered 2589 citations, making it the sixth most cited paper in all of economics. In the paper they acknowledged the fact that Merton had worked out a different approach and arrived at the same solution, thus kind of validating their final solution to option pricing.

The Higher They Climb …

This section highlights the lives of the three musketeers before the Nobel Prize award in recognition to the work. There are some people who believe that Texas Instruments was solely responsible for the tremendous marketing of Black Scholes formula by incorporating it in the module in its calculator. Traders did not need to know anything about the formula except the inputs. So, they happily traded away the options at CBOE that is today one of the biggest derivative exchanges in the world. Black joins Goldman at the age of 46 , becomes a partner with in two years and dies at the age of 56 (1995) because of cancer. Scholes and Merton land up at LTCM a fund using quant stuff to manage money. The principals get their recognition in the form of a Nobel Prize in 1997.

…The Harder They Fall

This section talks about the fall of LTCM. The LTCM bust is basically a `leverage’ lesson for all money managers.If one needs to read in detail about the demise of LTCM, the book by Roger Lowenstein is spot on. This section does not fit the flow of the book.

The Long Tail

The last chapter talks about the problems behind Black-Scholes formula such as assuming volatility constant, assuming GBM that has normal distribution at it’s core. The books ends by saying

So, there are pitfalls and they can and do lead to spectacular failures like the LTCM crash, crisis of 2007, demise of Lehman Brothers. But to fault Black, Scholes and Merton and their equations for such aberrations would be utterly wrong. This would be like accusing Issac Newton and the laws of motion for fatal traffic accidents.



”What risk premium should be used for option pricing” was a stumbling block for developing the option pricing formula. The answers given by Black-Scholes-Merton surprised everyone : no risk premium at all. This book traces the historical developments leading to option pricing formula and in doing so, weaves an entertaining and a gripping narration of all the mathematical developments that were a precursor to Black-Scholes formula.


“Moonwalking with Einstein” is a book recounting the experiences of US Memory Championship winner Joshua Foer, whose day job is NY Times journalist. The book is easy on eyes and can be read in a few hours time. One might shy away from books that deal with memory assuming they are of “self-help” type books . But that’s not the case with this book. This book is a result of curious journalist who happens to cover National and International Memory championships for a couple of times and starts wondering about the participants. He starts pondering over a few questions:

  • Passive questions like (When I say passive, any memory championship event spectator would have these questions)
    • How do people memorize a long list of random numbers ?
    • How do people memorize a deck of cards ?
    • How does one memorize a poem ?
  • Curiosity driven questions like
    • Are the participants’ brains wired differently?
    • Are they gifted people and average Joe can never perform such feats ?
    • How does brain remember things? Memory competitions definitely check on working memory. So, Is long term memory involved at all in improving short term memory ?
    • Does memorizing some arbitrary collection of random things be useful at all ?
    • How do the mathletes practice for these memory events ? Is there something that one can learn from these people so that we can apply to anything that we want to become good at.

In fact all the people whom the author interviews say that their feats are achievable by anybody. All it takes is to learn the right technique and master it. This makes the author very curious and so begins his quest for learning about memory, memory techniques, deliberate practice, exploring the common myths surrounding memory etc. Well, the first question that any reader gets by reading a few pages in the book is : There are so many devices/gadgets/services that help us remember things. Basically our culture is an edifice built on external memories. So, Why develop internal memories ? Is there any point in developing them?

I have listed down a few points against the various sections of this book that I found interesting.

  • There is something about mastering a specific field that breeds a better memory for the details of the field
  • Experts process enormous information flowing through their senses in more sophisticated ways.
  • Chunking is key to remembering and learning better. Chunking is key step in memorization techniques.
  • At the root of chess master’s skill is that he or she has a richer vocabulary of chunks to recognize
  • Expertise is vast amounts of knowledge, pattern-based retrieval, and planning mechanisms acquired over many years of experience in the associated domain.
  • You need a stockpile of images / places to form visual images of words. You sequence these images to remember better.
  • We are exceptional at remembering visuals and pathetic at remembering words, numbers,etc. and this is key aspect driving all the memory techniques. The author describes his learning process with his mentor Ed. He learns about a technique called memory place, a method where you represent the content you want to remember using images and then populate the images in a familiar location.Humans are amazingly good at gobbling spatial information.So, instead of our memory that is non linear in nature, you are making an effort to linearize, put a pattern, put a sequence to these images. By laying down elaborate, engaging, vivid images in your mind, it more or less guarantees that your brain is going to end up storing a robust, dependable memory.
  • We read, read and read, then we forget, forget and forget. Frankly most of us read a lot of stuff and forget almost instantly. So, why read at all ? is a valid question as so much of info is readily externalized. You want any fact about any thing, you get it on net. You want to keep track of things, there are to-do list reminders The rate of externalizing memories has become faster. Anything you would want to commit to memory ,there are devices/gadgets/tools to make the memorization unnecessary. So, is there going to me a need for memorization at all ? This question I guess has to be answered by each one of us individually. As for me, I think there is a value in memorization as it instantly helps you connect the new material you are reading with the stuff that is already in your brain.

In the entire book, I found two sections particularly interesting.

First one is where deliberate practice is tied to memorization techniques. The author hits a plateau in his preparation for the US memory championships and this leads him to explore  `deliberate practice’, a concept popularized by Dr. K. Anders Ericsson.

What’s deliberate practice ? Well, we all go thorough various stages of learning, right from Cognitive stage where we are intellectualizing the task, to Associative stage where we are concentrating less, making fewer errors , to Autonomous stage where we operate in autopilot mode. It’s fine if we are in autopilot mode for some mundane stuff but it is not ok when we are want to gain expertise in something.

The trick is to get out of autopilot mode as much as possible by practicing hard. Practicing hard need not be practicing for long hours. It means to design the work in such a way that it stretches you and there is a possibility of learning from failures that ensue. So, according to Ericsson,the way to achieve expertise in any field is to practice failing so that there is a constant feedback for your work. In one word, we need to be attentive and choose hard tasks to become damn good at something. The author finds that mathletes apply these same scientific principles for memorization too. They develop hypotheses about their limitations, conduct experiments and track them. This reminds me of Atul Gawande’s book, “Better” , where he makes a strong case for tracking something, whatever be your interest.Tracking gives you an instant internal feedback. 

The other section of the book I liked was about a particular school in NY.The author visits a school in South Bronx, more specifically a class taught by a teacher who trains 43 kids in the class in developing memory. Whenever we hear memory development, there are only negative images that come in to our minds, like rote learning, creativity crushing etc. However this teacher at this school firmly believes that memory needs to taught at school level like any other skill. These memory techniques involve methods where kids are inculcated ways to visualize things. Every fact is turned in to an image. Well, do these techniques help ?The class is a case in point where all the 43 kids do extremely well in the school and in the entire region of South Bronx.

The author also manages to meet Tony Buzan. Though the author is extremely sarcastic about Buzan’s goals, he does seem to agree one of Buzan’s core principle that, memory and intelligence go hand in hand.

There is a nice little fact mentioned in this book : “Inventio” is the root word for the words “Inventory” and “Invention”. So in one sense , until one does not have some inventory of thoughts, invention seems unlikely, as more often than not, invention is nothing but clever combination of seemingly unrelated ideas/facts.

imageTakeaway :

Memory is like a spider web that catches new info. The more it catches the bigger it grows. If you are curious about `memory’ and its role in every day learning / awareness, you will find this book very interesting.


Books such as ”Brain Rules” are basically meant to be a bridge to understanding some very basic things. A layperson like me does not know much about the brain. I have completely forgotten whatever I have learnt in high school about the anatomy of brain . So, why read this book ? In fact why should anyone read this book? Most of our survival in modern day world depends on our brains. So, a little knowledge about it might help us in performing better and probably living better too. The author states in his preface that the rules that he talks about in the book are a collection of ideas that he wants others to explore in the field of education, business, entertainment, marketing and in general , every aspect of our lives.

My intention to read this book was mainly to understand some very basic aspects like

  • After an early morning workout, why am I more alert while reading something ? Why am I able to grasp better?
  • How do I remember stuff that I read ? I mean there is so much stuff that’s happening around? How do I read / understand / recall them in a better way ?
  • I am pathetic in remembering stuff. Is there something called short term memory and long term memory ? If so, how does one go about improving long term memory ?
  • Why are pictures based learning emphasized heavily instead of word based learning everywhere? Yes, We remember pictures better, but Why ?
  • Once we hear an song, even a 5 sec beat of the some part of a song, we instantly recollect the lyrics/the song and probably different memories associated with the song ? Why doesn’t the same happen with passwords ? I can’t get to remember passwords on websites. If by mistake, I delete the browser form history + cache+ cookies, then in all probability I have to use `forgot my password’ link to retrieve all the passwords ? Is there some different kind of memory that is associated with remembering passwords, as compared to sounds / visuals etc. .?
  • Why do I code better after playing an instrument ?

Obviously I don’t have the time nor the inclination to become an expert in understanding above aspects. However I felt I needed a little bit of understanding. Had read a reference to this book somewhere a few years ago, i think it was related to some behavioral economics article where the article was talking about "Blind Spots".Recently I came upon quite a few references to this book. So, I guessed the book might have something to say and hence picked up this book with a firm resolution that I won`t be spending more than an hour or so.However I ended up spending more than that, thanks to the easy flow of words and ideas in the book. The book is cleverly written, cleverly because, books such as these can be terribly boring. But Dr. John Medina does a wonderful job of using analogies, simple language to show the workings of the brain. I got some kind of answers to my questions. In fact the book explores lot more aspects.

I guess this book will be useful to anyone who wants to understand a little bit about how brain functions, and what can one do in one’s daily life , so that one learns and remembers stuff better.