I stumbled on this book on my way to Yangon and devoured the book with in a few hours. It took me more time to write the summary than to read the book. The book is 300 pages long and full credit goes to the author for making the book so interesting.In this post, I will attempt to summarize the contents of the book.

This book is a fascinating adventure into all the aspects of the brain that are messy and chaotic. These aspects ultimately result in an illogical display of human tendencies – in the way we think, the way we feel and the way we act. The preface of the book is quite interesting too. The author profoundly apologizes to the reader on a possible scenario, i.e. one in which the reader spends a few hours on the book and realizes that it as a waste of time. Also, the author is pretty certain that some of the claims made in the book would become obsolete very quickly, given the rapid developments happening in the field of brain sciences. The main purpose of the book is to show that the human brain is fallible and imperfect despite the innumerable accomplishments that it has bestowed upon us. The book talks about some of the every day common experiences we have and ties it back to the structure of our brain and the way it functions.

Chapter 1: Mind Controls

How the brain regulates the body, and usually makes a mess of things

The human brain has evolved over several thousands of years. The brain’s function was to ensure survival of the body and hence most of the functions of the brain were akin to that of a reptile, and hence the term reptilian brain. The reptilian brain comprises Cerebellum and and the Brain Stem. However the nature and size of the human brain has undergone rapid changes in the recent past. As the nature of food consumption changed, i.e. from raw food to cooked food, the size of the brain increased,  which in turn made humans seek out more sophisticated types of food. With survival related problems having been taken care of, the human brain developed several new areas as compared to its ancestors. One of the most important developmental area is the area named neo-cortex. If one were to dissect the brain on an evolutionary time scale; the reptilian brain came first and later, the neo-cortex.

Brain, skull and meninges

Humans have a sophisticated array of senses and neurological mechanisms. These give rise to proprioception, the ability to sense how our body is currently arranged, and which parts are going where. There is also, in our inner ear, the vestibular system that helps detect balance and position. The vestibular system and proprioception, along with the input coming from our eyes are used by the brain to distinguish between an image we see while running and a moving image seen while stationary. This interaction explains the reason we throw up during motion sickness. While sitting in a plane or a ship, there is movement, even though our body is stationary. The eyes transmit these moving signals to the brain and so does the vestibular system but the proprioreceptors do not send any signal as our body parts are stationary. This creates confusion in the brain and leads to the classic conclusion that there is poison and it has to be removed from the body. 

The brain’s complex and confusing control of diet and eating

Imagine we have eaten a heavy meal and our stomach is more than full, when we spot the dessert. Why do most of us ahead and gorge on the dessert? It is mainly because our brains have a disproportionate say on our appetite, and interfere in almost every food related decision. Whenever the brain sees a dessert, its pleasure pathways are activated and the brain makes an executive decision to eat the dessert and override any signals from the stomach. This is also the way protein milkshakes work. As soon as they are consumed, the dense stuff fills up the stomach, expanding it in the process and sending an artificial signal to the brain that the body is fed. Stomach expansion signals however are just one small part of diet and appetite control. They are the bottom rung of a long ladder that goes all the way up to the more complex elements of the brain. Appetite is also determined by a collection of hormones secreted by the body, some of which pull in opposite directions; the ladder therefore occasionally zigzags or even goes through loops on the way up. This explains why many people report feeling hungry less than 20 minutes after drinking a milk shake!

Why does almost every one eat between 12 and 2 pm, irrespective of the kind of work being done? One of the reasons is that the brain gets used to the pattern of eating food between those time slots and it expects food once the routine is established. This not only works for pleasant things but also for unpleasant things. Subject yourself to the pain of sitting down every day and doing some hard work like working through some technical stuff, programming, playing difficult musical notes, etc. Once you do it regularly, the brain ignores the initial pain associated with the activity making it easy to pursue such activities.

The takeaway of this section is that the brain interferes in eating and this can create problems in the way food is consumed.

The brain and complicated properties of sleep?

Why does a person need sleep? There are umpteen number of theories that have been put across. The author provides an interesting narrative of each, but subscribes to none. Instead he focuses on what exactly happens in our body that makes us sleep. The pineal gland in the brain secretes the hormone melatonin, which is involved in the regulation of circadian rhythms. The amount of secretions is inversely proportional to light signals passing through our eyes. Hence the secretions rise as day light fades and the increased secretion levels lead to feelings of relaxation and sleepiness.pineal-gland

This is the mechanism behind jet-lag. Travelling to another time zone means you are experiencing a completely different schedule of daylight, so you may be experiencing 11 a.m. levels of daylight when your brain thinks it’s 8 p.m. Our sleep cycles are very precisely attuned, and this throwing off of our melatonin levels disrupts them. And it’s harder to ‘catch up’ on sleep than you’d think; your brain and body are tied to the circadian rhythm, so it’s difficult to force sleep at a time when it’s not expected (although not impossible). A few days of the new light schedule and the rhythms are effectively reset.

Why do we have REM sleep? For adults, usually 20% of sleep is REM sleep and for children, 80% of sleep is REM sleep. One of the main functions of REM sleep is to reinforce, organize and maintain memories.

The brain and the fight or flight response

The thalamus can be considered the central hub of the brain. Information via the senses enters the thalamus from where it is sent to the cortex and further on to the reptilian brain. Sensory information is also sent to the amygdala, an area that is said to be the seat of EQ. If there is something wrong in the environment, even before the cortex can analyze and respond to it, the amygdala invokes the hypothalamus which in turn invokes certain types of nerve systems that generate the fight or flight response such as dilating our pupils, increasing our heart rate, shunting blood supply away from peripheral areas and directing it towards muscles etc.


  The author provides a good analogy for the thalamus system:

If the brain were a city, the thalamus would be like the main station where everything arrives before being sent to where it needs to be. If the thalamus is the station, the hypothalamus is the taxi rank outside it, taking important things into the city where they get stuff done


Chapter 2: Gift of memory

One hears the word "memory" thrown in a lot of places, especially among devices such as computers, laptops, mobile phones etc. Sometimes our human memory is wrongly assumed to be similar to a computer memory/ hard drive where information is neatly stored and retrieved. Far from it, our memory is extremely convoluted. This chapter talks about various intriguing and baffling properties of our brain’s memory system.

The divide between long-term and short-term memory

The way memories are formed, stored and accessed in the human brain, like everything else in the brain, is quite fascinating. Memory can be categorized into short term memory/ working memory and long term memory. Short term memory is anything that we hold in our minds for not more than a few seconds. There are tons of stuff that we think through out the day and it stays in our short term memory and then disappears. There is no physical basis for short term memories. These are associated with areas in the prefrontal cortex where they are captured as neuronal activity. Since short term memory is always in constant use, anything in the working memory is captured as neuronal activity changes very quickly and nothing persists. But then how do we have memories? This is where long term memory comes in. In fact there are a variety of long term memories.


Each of the above memories are processed and stored in various parts of the brain. The basal ganglia, structures lying deep within the brain, are involved in a wide range of processes such as emotion, reward processing, habit formation, movement and learning. They are particularly involved in co-ordinating sequences of motor activity, as would be needed when playing a musical instrument, dancing or playing basketball. The cerebellum is another area that is important for procedural memories.

An area of the brain that plays a key role in encoding the stuff that we see/hear/say/feel to memories is the hippocampus. This encodes memories and slowly moves them to the cortex where they are stored for retrieval. Episodic memories are indexed in the hippocampus and stored in the cortex for later retrieval. If everything is in some form or the other is stored in the memory, why are certain aspects more easily to recall and other difficult? This depends on the richness, repetition and intensity of the encoding. These factors will make something harder or easier to retrieve.   

The mechanisms of why we remember faces before names

I have had this problem right from undergrad days. I can remember faces very well but remembering names is a challenge. I have tough time recollecting names of colleagues, names of places that I visit, names of various clients I meet. I do find this rather painful when my mind just blanks out recalling someone’s name. The chapter gives a possible explanation – the brain’s two-tier memory system that is at work retrieving memories.

And this gives rise to a common yet infuriating sensation: recognising someone, but not being able to remember how or why, or what their name is. This happens because the brain differentiates between familiarity and recall. To clarify, familiarity (or recognition) is when you encounter someone or something and you know you’ve done so before. But beyond that, you’ve got nothing; all you can say is this person/thing is already in your memories. Recall is when you can access the original memory of how and why you know this person; recognition is just flagging up the fact that the memory exists. The brain has several ways and means to trigger a memory, but you don’t need to ‘activate’ a memory to know it’s there. You know when you try to save a file onto your computer and it says, ‘This file already exists’? It’s a bit like that. All you know is that the information is there; you can’t get at it yet.

Also visual information conveyed by the facial features of a person is far richer and sends strong signals to the brain as compared to auditory signals such as the name of a person. Short term memory is largely aural whereas long term memory relies on vision and semantic qualities. 

How alcohol can actually help you remember things

Alcohol increases release of dopamine and hence causes the brain’s pleasure pathways to get activated. This in turn creates an euphoric buzz and that’s one of the states that alcoholics seek. Alcohol also leads to memory loss. So, when does it exactly help you to remember things? Alcohol dims out the activity level of the brain and hence lessens its control on various impulses. It is like dimming the red lights on all the roads in a city. With the reduced intensity, humans tend to do many more things that they would not do while sober. Alcohol disrupts the hippocampus, the main region for memory formation and encoding. However regular alcoholics get used to a certain level of consumption. Hence any interesting gossip that an alcoholic gets to know an inebriated state, he/she is far likely to remember if his body is in an inebriated state. So, in all such situations, alcohol might actually help you remember things. The same explanation also applies to caffeine-fuelled all-nighters – take a caffeine just before the exam so that the internal state of the body is the same as the one in which the information was crammed. Once you are taking an examination in a caffeinated state, you are likely to remember stuff that you crammed in a caffeinated state. 

The ego-bias of our memory systems

This section talks about the way the brain alters memories to suit a person’s ego. It tweaks memories in such a way that it flatters the human ego and over a period of time these tweaks can become self-sustaining. This is actually very dangerous. What you retrieve from your mind changes, to flatter your ego. This means that it is definitely not a wise thing to trust your memories completely.

When and how the memory system can go wrong 

The author gives a basic list of possible medical conditions that might arise, if the memory mechanism fails in the human brain. These are

  • False memories – that can be impanted in our heads by just talking
  • Alzheimers – associated with significant degeneration of the brain
  • Strokes can have effect on hippocampus thus leading to memory deficit
  • Temporal lobe removal can cause permanent loss of long term memories
  • Amnesia formed by virus attacks on hippocampus – anterograde

Chapter 3: Fear

Fear is something that comes up in a variety of shades as one ages. During childhood, it is usually fear of not meeting expectations from parents and teachers, of not fitting in a school environment etc. As one moves on to youth, it is usually the fear of not getting the right job, and fear arising out of emotional and financial insecurities. As one trudges into middle age, fears revolve around meeting the financial needs of the family, avoiding getting downsized, fear of conformity, fear of not keeping healthy etc. At each stage, we somehow manage to conquer fear by experience or live with it. However there are other kinds of fears that we carry on, through out our lives. Even though our living conditions have massively improved, our brains have not had chance to evolve. Hence there are many situations where out brains are primed to think of potential threats even when there aren’t any. You look at people who have strange phobias, people believing in conspiracy theories, having bizarre notions; all these are creations of our idiot brain.

The connection between superstition, conspiracy theories and other bizarre beliefs

Apophenia involves seeing connections where there aren’t. This to me seems to be manifestation of Type I errors in real life. The author states that it is our bias towards rejecting randomness that gives rise to superstitions, conspiracy theories etc. Since humans fundamentally find it difficult to embrace randomness, their brains cook up associations where there are none and make them hold on to the assumptions to give a sense of quasi-control in this random and chaotic world.  

Chapter 4: The baffling science of intelligence

I found this chapter very interesting as it talks about an aspect that humans are proud about, i.e. human intelligence.There are many ways in which human intelligence is manifested. But as far as tools are concerned, we have limited set of tools to measure intelligence. Whatever tools we do have do not do a good job of measurement. Take IQ for example. Did you know that average IQ for any country is 100. Yes, it is a relative scale. It means that if a deadly virus kills all the people in a country who have IQ > 100, the IQ of the country still remains 100 because it is a relative score. The Guinness Book of Records has retired the category of “Highest IQ” because of the uncertainty and ambiguity of the tests.

Charles Spearman did a great service for intelligent research in 1920’s by developing factor analysis. Spearman used a process to assess IQ tests and discovered that there was seemingly one underlying factor that underpinned test performance. This was labelled the single general factor, g, and if there’s anything in science that represents what a layman would think of as intelligence, it’s g. But even g does not equate to all intelligence. There are two types of intelligence that are generally acknowledged by the research and scientific community – fluid intelligence and crystallized intelligence

Fluid intelligence is the ability to use information, work with it, apply it, and so on. Solving a Rubik’s cube requires fluid intelligence, as does working out why your partner isn’t talking to you when you have no memory of doing anything wrong. In each case, the information you have is new and you have to work out what to do with it in order to arrive at an outcome that benefits you. Crystallised intelligence is the information you have stored in memory and can utilise to help you get the better of situations. It has been hypothesized that Crystallised intelligence stays stable over time but fluid intelligence atrophies as we age.

Knowledge is knowing that a tomato is a fruit; wisdom is not putting it in a fruit salad. It requires crystallised intelligence to know how a tomato is classed, and fluid intelligence to apply this information when making a fruit salad.

Some scientists believe in "Multiple Intelligence" theories where intelligence could be of varying types. Sportsmen have different kind of intelligence as compared to chess masters who have different sort of intelligence from instrument artists etc. Depending on what one chooses to spend time on, the brain develops a different kind of intelligence – just two categories (fluid and crystallized) is too restrictive to capture all types of intelligence. Even though the "Multiple Intelligence" theory appears plausible, the research community has not found solid evidence to back this hypothesis.  

Why Intelligent people often lose arguments ?

The author elaborates on Impostor syndrome, a behavior trait in intelligent people. One does not need a raft of research to believe that the more you understand something, the more you realize that you know very little about it. Intelligent people by the very way they have built up their intelligence are forever skeptical and uncertain about a lot of things. In a way their arguments seem to be balanced rather than highly opinionated. On the other hand, when you see less intelligent people, the author points out that the one can see Dunning-Kruger effect in play.

Dunning and Kruger argued that those with poor intelligence not only lack the intellectual abilities, they also lack the ability to recognise that they are bad at something. The brain’s egocentric tendencies kick in again, suppressing things that might lead to a negative opinion of oneself. But also, recognising your own limitations and the superior abilities of others is something that itself requires intelligence. Hence you get people passionately arguing with others about subjects they have no direct experience of, even if the other person has studied the subject all their life. Our brain has only our own experiences to go from, and our baseline assumptions are that everyone is like us. So if we’re an idiot…

Crosswords don’t actually keep your brain sharp

The author cites fMRI based research to say that intelligent brains use less brain power to think or solve through problems. While doing complex tasks, the brain activity in a set of people who are posed equi-challenging tasks showed that intelligent people could solve all the challenging tasks with out any increase in brain power. On the other hand, brain activity increased only when the task complexity increased. All these activity scans arising out prefrontal cortex scans show that it is performance that matters rather than power.  There’s a growing consensus that it’s the extent and efficiency of the connections between the regions involved (prefrontal cortex, parietal lobe and so on) that has a big influence on someone’s intelligence; the better he or she can communicate and interact, the quicker the processing and lower the effort required to make decisions and calculations. This is backed up by studies showing that the integrity and density of white matter in a person’s brain is a reliable indicator of intelligence. Having given this context, the author goes on to talk about plasticity of the brain and how musicians develop a certain aspect of motor cortex after spending years of practice.

While the brain remains relatively plastic throughout life, much of its arrangement and structure is effectively set. The long white-matter tracts and pathways will have been laid down earlier in life, when development was still under way. By the time we hit our mid-twenties, our brains are essentially fully developed, and it’s fine-tuning from thereon in. This is the current consensus anyway. As such, the general view is that fluid intelligence is ‘fixed’ in adults, and depends largely on genetic and developmental factors during our upbringing (including our parents attitudes, our social background and education). This is a pessimistic conclusion for most people, especially those who want a quick fix, an easy answer, a short-cut to enhanced mental abilities. The science of the brain doesn’t allow for such things.

To circle back to the title of the section, solving crosswords will help you become good at that task alone. Working through brain games would help you become good at that specific game alone. The brain is complex enough that just by involving in a specific activity, it does not increase all the connections across the brain and hence the conclusion that solving crosswords might make you good in that specific area but it doesn’t go anything good to the overall intelligence. Think back to those days when you knew some of your friends who could crack crossword puzzles quickly. Do a quick check on where they are now and what’s their creative output so far and judge for yourself.

The author ends the chapter by talking about one phenomenon usually commented upon – tall people perceived as being smarter than shorter people, on an average. He cites many theories and acknowledges that none are conclusive enough to validate the phenomenon.

There are many possible explanations as to why height and intelligence are linked. They all may be right, or none of them may be right. The truth, as ever, probably lies somewhere between these extremes. It’s essentially another example of the classic nature vs nurture argument.    

Chapter 5: Did you see this chapter coming ?

The information that reaches our brain via the senses is often more like a muddy trickle rather than perfect representation of the outside world. The brain then does an incredible job of creating a detailed representation of the world based on this limited information. This process itself depends on many peculiarities of an individual brain and hence errors tend to creep in. This chapter talks about the way information is reconstructed by our brains and the errors that can creep in during this process. 

Why smell is powerful than taste ?

Smell is often underrated. It is estimated that humans have a capacity to smell up to 1 trillion odours. Smell is in fact the first sense that evolves in a foetus.

There are 12 facial nerves that link the functions of the face to the brain. One of them is the Olfactory nerve. Olfactory neurons that make up the olfactory nerve are unique in many ways – these are one of the few types of neurons that can regenerate. They need to regenerate because they are directly in contact with the external world and hence atrophy. The olfactory nerve sends electrical signals to the olfactory bulb, which relays information to olfactory nucleus and piriform cortex.smell

In the brain, the olfactory system lies in close proximity to the limbic system and hence certain smells are strongly associated with vivid and emotional memories. This is one of the reasons why marketers carefully choose odour in display stores in order to elicit purchases from the prospects. One misconception about smell is that it can’t be fooled but research has proven that there are in fact olfactory illusions. Smell does not operate alone. Smell and taste are classed as "chemical" senses, i.e. the receptors respond to specific chemicals. There are have been experiments where subjects were unable to distinguish between two completely different food items when their olfactory senses were disabled. Think about all the days where you had a bad cold and you seem to have lost the sense of taste. The author takes a dig at wine tasters and says that all their so called abilities are a bit overrated.

How hearing and touch are related ?

Hearing and touch are linked at a fundamental level. They are both classed as mechanical senses, meaning they are activated by pressure or physical force. Hearing is based on sound, and sound is actually vibrations in the air that travel to the eardrum and cause it to vibrate.



The sound vibrations are transmitted to the cochlea, a spiral-shaped fluid-filled structure, and thus sound travels into our heads. The cochlea is quite ingenious, because it’s basically a long, curled-up, fluid-filled tube. Sound travels along it, but the exact layout of the cochlea and the physics of sound waves mean the frequency of the sound (measured in hertz, Hz) dictates how far along the tube the vibrations travel. Lining this tube is the organ of Corti. It’s more of a layer than a separate self-contained structure, and the organ itself is covered with hair cells, which aren’t actually hairs, but receptors, because sometimes scientists don’t think things are confusing enough on their own. These hair cells detect the vibrations in the cochlea, and fire off signals in response. But the hair cells only in certain parts of the cochlea are activated due to the specific frequencies travelling only certain distances. This means that there is essentially a frequency ‘map’ of the cochlea, with the regions at the very start of the cochlea being stimulated by higher-frequency sound waves (meaning high-pitched noises, like an excited toddler inhaling helium) whereas the very ‘end’ of the cochlea is activated by the lowest-frequency sound waves. The cochlea is innervated by the eighth cranial nerve, named the vestibulocochlear nerve. This relays specific information via signals from the hair cells in the cochlea to the auditory cortex in the brain, which is responsible for processing sound perception, in the upper region of the temporal lobe.

What about touch ?

Touch has several elements that contribute to the overall sensation. As well as physical pressure, there’s vibration and temperature, skin stretch and even pain in some circumstances, all of which have their own dedicated receptors in the skin, muscle, organ or bone. All of this is known as the somatosensory system (hence somatosensory cortex) and our whole body is innervated by the nerves that serve it.

Also touch sensitivity isn’t uniform through out the body. Like hearing, the sense of touch can also be fooled.The close connection between touch and hearing means that often if there is a problem in one, there tends to be the problem with the other.

What you didn’t know about the visual system ?

The visual system is the most dominating of all the senses and also the most complicated. If you think about the retina, only 1% of the area (fovea) can digest the finer details of the visual and the rest of the 99% of the area takes in hazy peripheral details of the visual. It is just amazing that our brain can construct the image by utilizing vast amount of peripheral detail data and make us feel that we are watching a crystal clear image. There are many aspects of visual processing mentioned in this chapter that makes you wonder at this complex mechanism that we use every day. When we move our eyes from left to right, even though we see one smooth image, the brain actually receives a series of jerky scans and it then recreates a smooth image.



 Visual information is mostly relayed to the visual cortex in the occipital lobe, at the back of the brain. The visual cortex itself is divided into several different layers, which are themselves often subdivided into further layers. The primary visual cortex, the first place the information from the eyes arrives in, is arranged in neat ‘columns’, like sliced bread. These columns are very sensitive to orientation, meaning they respond only to the sight of lines of a certain direction. In practical terms, this means we recognise edges. The secondary visual cortex is responsible for recognising colours, and is extra impressive because it can work out colour constancy. It goes on like this, the visual-processing areas spreading out further into the brain, and the further they spread from the primary visual cortex the more specific they get regarding what it is they process. It even crosses over into other lobes, such as the parietal lobe containing areas that process spatial awareness, to the inferior temporal lobe processing recognition of specific objects and (going back to the start) faces. We have parts of the brain that are dedicated to recognising faces, so we see them everywhere.

The author ends this section by explaining the simple mechanism by which our brain creates a 3D image from a 2D information on the retina. This mechanism is exploited by the 3D film makers to create movies for which we end up paying more money than the usual movie ticket.

Strength and Weaknesses of Human Attention

There are two questions that are relevant to the study of attention.

  • What is the brain’s capacity for attention ?
  •   hat is it that determines where the attention is being directed ?

Two models have been put forth for answering the first question and have been studied in various research domains. First is the Bottleneck model that says that all the information that gets in to our brains is channelled through a narrow space offered by attention. It is more like a telescope where you see a specific region but cut out all the information from other parts of the sky. Obviously this is not the complete picture in as far as our attention works. Imagine you are talking to a person in a party and somebody else mentions your name; your ears perk up and your attention darts to this somebody else and you want to know about what they are speaking about you.

To address the limitations of Bottleneck model, researchers have put forth a Capacity model that says that humans attention is finite and is available to be put to use across multiple streams of information so long as the finite resource is not exhausted. The limited capacity is said to be associated with the fact that we have limited working memory. So, can you indulge in multi-tasking without compromising on the efficiency of tasks? Not necessarily. If you have trained certain tasks to have procedural memory, then probably you can do such tasks AND do some other tasks that require conscious attention.Think about preparing a dish that you have done umpteen number of times and suddenly your brain is thinking of something completely different. So, it is possible to increase the kind of attention on a task only after committing some parts of the task to procedural memory. All this might sound very theoretical but I see this work in my music practice. Unless I build muscle memory of a raag, let us say the main piece of a raag and some of the standard phrases in a raag, there is no way I can progress to improvisations.

About the second question, most of the attention is directed to what we see. It is obvious in a way. Our eyes carry most of the signals to our brains. This is a "top-down" approach to why we pay attention. We see something and we pay attention. There is also another kind of approach – a "bottom-up", where something detected as biological significance can make us pay attention without even the conscious parts of the brain having any say on it. This makes sense as our reptilian brain needed to have paid to various stimuli before even consciously processing it.

Now, where does the idiocy of the brain come in ? There are many examples cited in the chapter, one being the "door man" experiment. The experiment is a classic reminder that when we are so tunnel focused that we sometimes miss something very apparent that’s going on in the external environment. The way I relate to this experiment is – one needs to be focused and free from distraction when one is doing a really challenging task. But at the same time, it is important to scan the environment a bit and see whether it makes sense to approach the task in the same manner that you are approaching. In other words, distancing your self from the task from time to time is the key to doing the task well. I came across a similar idea in Anne Lamott’s book – Bird by Bird. Anne Lamott mentions an incident when she almost gives up on a book, takes a break, comes back to it after a month and finishes off the book in style. Attention to a task is absolutely important but beyond a point can prove counter-productive.

Chapter 6: Personality

Historically people believed that brain had nothing to do with a person’s personality. This was until Phineas Gage case surfaced in 1850’s – Phineas Gage underwent a brain injury and subsequently his mannerisms changed completely. Since then many experiements have shown that brains do have a say in the personality. The author points out to many problems measuring the direct impact of the same on personality.

The questionable use of personality tests

Personality patterns across diverse set of individuals are difficult to infer. However there are certain aspects of personalities where we see a surprising set of common patterns. According to the Big 5 Traits theory, everyone seems to fall between two extremes of the Big 5 Traits

  • Openness
  • Conscientiousness
  • Extrovert
  • Aggreableness
  • Neurotic

Now one can easily dismiss the above traits saying that they are too reductionist in nature. There are many limitations of Big 5 Theory. This theory is based on Factor Analysis, a tool that tells us the sources of variation but does not say about anything about the causation. However whether the brain evolves to suit the personality types or personalities evolve based on the brain structure is a difficult question to answer. There are many such personality tests such Type A/ Type B, Myers-Briggs Type Inventory etc. Most of these tests have been put together by amateur enthusiasts and somehow they have become popular. The author puts together a balanced account of several theories and seems to conclude that most personality tests are crap.

How anger works for you and Why it can be good thing ?

The author cites various research studies to show that anger evokes signals in the left and right brain. In the right hemisphere it produces negative, avoidance or withdrawal reactions to unpleasant things, and in the left hemisphere, positive and active behaviour. Research shows that venting anger reduces cortisol levels and relaxes the brain. Suppressing anger for a long time might cause a person to overreact to harmless situations. So, does the author advocate venting anger in every instance? Not really. All that this section does is to point out research evidence that sometimes venting out anger is ok. 

How different people find and use motivation ?

How does one get motivated to do something? This is a very broad question that will elicit a variety of answers. There are many theories and the author gives a whirlwind tour of all. The most basic of all, is that humans tend to do activities that involve pleasure and avoid those that involve pain. It is so simplistic in nature that it definitely was the first theory to be ripped apart. Then came Maslow’s theory of hierarchy which looks good on paper. The fancy pyramid of needs that every MBA student reads is something that might seem to explain motivational needs. But you can think of examples from your own life when motivation to do certain things did not neatly fit the pyramid structure. Then there is this theory of extrinsic and intrinsic motivation. Extrinsic motivations are derived from the outside world. Intrinsic motivations drive us to do things because of decisions or desires that we come up within ourselves. Some studies have shown that money can be a demotivating factor when it comes to performance. Experiments have shown that subjects without a carrot perform well and seemed to have enjoyed tasks well, as compared to subjects with a carrot. There are some other theories that talk about ego gratification as the motivating factor. Out of all the theories and quirks that have been mentioned in the chapter, the only thing that has kind of worked in my life is Zeigarnik effect, where the brain really doesn’t like things being incomplete. This explains why TV shows use cliff-hangers so often; the unresolved storyline compels people to tune into the conclusion, just to end the uncertainty. To give another example, many times I have stopped doing something when I badly wanted to work more on it. This has always been the best option in the hindsight. Work on something to an extent that you leave some of it incomplete. This gives the motivation to work on it the next day.

Chapter 7: Group Hug

Do we really need to listen to other people to understand or gauge their motives? Do facial expressions give away intentions? This is one of the questions tackled in this chapter. It was believed for a long time that the speech processing areas in the brain are Broca’s area, named for Pierre Paul Broca, at the rear of the frontal lobe,and Wernicke’s area, identified by Carl Wernicke, in the temporal lobe region. 


Damage to these areas produced profound disruptions in speech and understanding. For many years these were considered the only areas responsible for speech processing. However, brain-scanning technology has changed and since then many new developments have occurred. Broca’s area, a frontal lobe region, is still important for processing syntax and other crucial structural details, which makes sense; manipulating complex information in real-time describes much of the frontal lobe activity. Wernicke’s area, however, has been effectively demoted due to data that shows the involvement of much wider areas of the temporal lobe around it. 


Although the field as a whole has made tremendous progress in the past few decades, due in part to significant advances in neuroimaging and neurostimulation methods, we believe abandoning the Classic Model and the terminology of Broca’s and Wernicke’s areas would provide a catalyst for additional theoretical advancement.

Damage to Broca’s and Wernicke’s area disrupts the many connections between language-processing regions, hence aphasias. But that language-processing centres are so widely spread throughout shows language to be a fundamental function of the brain, rather than something we pick up from our surroundings. Communication, though involves non-verbal cues. Many experiments conducted on aphasia patients prove that that non-verbal cues can easily be inferred by facial expressions and thus it is difficult to fake by just talking. Your faces give away your true intentions. The basic theory behind facial expressions is that there are voluntary facial expressions and involuntary facial expressions. Poker players are excellent in controlling voluntary facial expressions and train their brains to control a certain kind of involuntary facial expressions. However we do not have full control on the involuntary facial expressions and hence an acute observer can easily spot our true intentions by paying attention to facial expressions.

The author explores situations such as romantic breakups, fan club gatherings and situations where we are predisposed to cause harm to others. The message from all these narratives is that our brain gets influenced by people around us in ways we cannot fathom completely. People around you influence the way you think and act. The aphorism – you tell me who your friends, I will tell you who you are – resonates clearly in various examples mentioned in the chapter.

Chapter 8: When the Brain breaks down

All the previous chapters in the book talk about how our brain is an idiot, when functioning in a normal way. The last chapter of the book talks about situations when the brain stops functioning in a normal way. The author explores various mental health issues such as depression, drug addiction, hallucination and nervous breakdown. The chapter does a great job of giving a gist of the competing theories out there to explain some of the mental health issues. 


For those whose work is mostly cerebral in nature, this book is a good reminder that we should not identify with our brains or trust our brains beyond a certain point. A healthy dose of skepticism towards whatever our brain makes us feel, think and do, is a better way to lead our lives. The book is an interesting read, with just enough good humor spread across, and with just enough scientific details. Read the book if you want to know to what extent our brains are idiosyncratic and downright stupid in their workings.



Most of the decisions that we take or activities that we do, on a daily basis are not a result of deliberate thought. These are the result of habits that we have built over time. We realize some of these decisions/activities as habits but often carry out many activities in auto-pilot mode. This is good as it frees our mind to do other things. The flip side is that we do not seem to be in control of the actions and hence feel powerless.

This book by Charles Duhigg goes in to various details about habit, i.e. how do habits arise ? what are the triggers to our habits ? why is it difficult to change some of our habits ? what should be done to change our habits. In this post, I will try to briefly summarize the contents of the book.

The Habit Loop

The author explains the basic framework of any habit formation via what he calls,"The Habit Loop". In order to explain this framework, the reader is taken through a few specific cases that triggered an active research in this area. There have been many recent pop-science books that have mentioned H.M, a unique patient in the medical case history. What’s unique about H.M is that his hippocampus was removed surgically to avoid frequent convulsions. This created a testbed for several experiments relating to memory as H.M forgot anything he learnt in a few seconds.

The author mentions one such patient,Eugene Pauly, whose medical condition has led to an explosion of habit formation research. Eugene Pauly like H.M could not remember any new learnings beyond a few seconds. His ability to store anything new was severely damaged. However there was one thing peculiar about him. He was able to do certain activities effortlessly. The activities that were outcomes of past habits were all intact. In fact the most surprising aspect of this patient that eventually lead to a ton of research is that, Eugene Pauly was able to learn new habits, despite not being able to commit anything to memory. Dr. Larry Squire was the first person to study Eugene Pauly and report his findings in a medical journal. Larry Squire found that there are similar neurological processes for habit formulation across all individuals and the part of the brain that plays a major role in it is Basal Ganglia.

A series of experiments that were done on rats showed that habit formation has a specific pattern. When one is learning a new activity, there is a spike in neurological activity in the brain. However once the brain learns something, the basal ganglia takes over and there is a decrease in the brain activity. Also, it is not the case that the entire activity is in auto-pilot. The author explains the three-step habit loop as follows


  • First, there is a cue, a trigger that tells your brain to go into automatic mode and which habit to use. The cues fall in to one of the five categories; time, location, emotional state, other people, immediately preceding action.
  • Second, there is a routine, which can be physical or mental or emotional pattern
  • Third, there is a reward which helps your brain figure out if this particular loop is worth remembering for the future.

Over time, this loop – cue, routine reward – becomes more and more automatic. The cue and reward become intertwined until a powerful sense of anticipation and craving emerges. Eventually a habit is born.Another finding that emerged from experiments on Eugene Pauly is that a minor tweak in the components of the habit loop can completely wreck the habit. This means a few aspects can wreck our good habits that have been painfully cultivated. It also means that bad habits can be changed by tweaking the habit loop.

The takeaway from this chapter is that one needs to be aware of the habit loop in many of the activities that we perform in our daily life. It is estimated that 40% of the decisions that we take and activities that we do are based on habits. It is very likely that some of the decisions on a daily basis might be sub-optimal. It is also likely that some of the activities/habits that we carry need to change. By realizing the cue, routine and reward components of our activities, we can develop self-awareness that will go a long way in empowering us to create a change.

The Craving Brain

The author speaks about "craving", an important ancillary component in the habit loop, that makes the loop in to a habit. If one reflects about cue-routine-reward loop for a few minutes, it becomes clear that there must be someother element that makes us go over the cue-routine-reward path. Something that powers the habit loop should be present in order to inculcate a new habit/ change an existing habit. That something is a craving for the reward.


Wolfram Schultz, a Cambridge scientist, in a series of experiments with monkeys, observed that there was a spike in monkey’s brain activity much before the reward was given. This led to an hypothesis that there is some sort of craving feeling that gets activated as soon as one sees a cue. This craving is the one that powers the habit loop. The author uses a few examples from the marketing world such as Febreze and Pepsodent to strengthen the "craving" hypothesis. In the hindsight, if you think about it for sometime, you can easily get convinced by looking a few situations in your own life where a cue created a craving that made you go through a specific routine to obtain the reward.

The Golden Rule of Habit Change

The author shares a motley collection of stories to illustrate two other aspects of habit loop. Stories about the success of Alcoholics Anonymous, Bucs(NFL) team performance, Mandy(a normal looking girl who has an obsessive tendency to bite her nails till they bled) illustrate a method to break bad habits. In all these stories, a common theme that runs is: the awareness of the cue and reward made it possible for the routine to be tweaked to obtain the same cue and reward with a healthy routine. All the above stories seem to end in a happy state by identifying the cue, identifying the actual reward and then tweaking the routine. However the habits break down under stress. Alumni of AA seemed to fall in to bad habits once tested at their limits and so is the case with other stories. The author then talks about a critical component of sustaining the habit loop – Belief, i.e one’s belief in the possibility of change.

One might read the first three chapters and cast aside the whole content as common sense. Possible. However if you reflect about the various activities that you do in your daily life and analyze whether every activity is a carefully well thought out deliberate action or not?. My guess is most of us will realize that the so called actions that we perform, be it at work or at home, are habits that we have acquired over time. Some bad, Some good. We might feel powerless to change bad habits. However the framework suggested in the first three chapters of the book offers a way to experiment with our lives and see if we could make a meaningful change in our habits.

Key stone Habits

Is there a pecking order among the various habit loops in our life? Is there any specific type of habit that has a disproportionate effect on our lives? The author calls such habits as key stone habits that can transform our lives. Keystone habits say that success doesn’t depend on getting every single thing right, but instead relies on identifying a few key priorities and fashioning them into powerful levers.In order to support his thesis, the author gives the following examples:

  1. Paul O’Neill focused on one key stone habit "safety" and changed the entire culture of Alcoa.
  2. Bob Bowman targeted a few specific habits that had nothing to do with swimming and everything to do with creating right mindset. He coached Phelps and we all know the rest of the story. We need to get a few small wins and they can create massive transformation
  3. American Library Association’s Task Force on Gay Liberation decided to focus on one modest goal: convincing the Library of Congress to reclassify books about the gay liberation. It succeeded. It was a small win and eventually created a cascade of big wins

Key stone habits encourage change and subsequently can create a unique culture in an organization. The author gives various examples that show that a conscious effort in sustaining the culture can result in massive positive outcomes.

One of the takeaways of this chapter is the importance of journaling. If you want to pay attention to what you eat and control diet, food journaling has been found very useful. In a similar, journaling about anything that you want to change/improve can help you identify the components of habit loops. Self awareness is half the battle won.

Starbucks and the Habit of Success

Starbucks is an unique place for many reasons. One of the reasons is that it employs many youngsters just out of college. These young recruits, in all likelihood, would not have faced angry customers before and hence they find it difficult to fit in to the professional setting. Credit goes to Starbucks for teaching life skills to thousands of recruits. The way Starbucks accomplishes this is worth knowing and this chapter goes in to some of the details. At the core of all the education that Starbucks imparts is all-important habit: Will power

The company spent millions of dollars developing curriculums to train employees on self-discipline. Executives wrote workbooks that, in effect, serve as guides to how to make willpower a habit in workers’ lives. Those curriculums are, in part, why Starbucks has grown from a sleepy Seattle company into a behemoth with more than seventeen thousand stores and revenues of more than $10 billion a year.

It is not an easy task to bring about self-discipline in a massive organization. What Starbucks did is a great lesson for anyone who wants to master self-discipline. Starbucks realized that their employees tendency to self-discipline themselves depended on the way they handled themselves at few inflection points. Hence it was a organization wide practice where employees would write/talk/share about their intended behavior, much before the inflection points.

In a way this is like an entrepreneur writing in his journal about potential inflection points in his/her startup’s future journey and then gearing up for an appropriate response. Institutionalizing this kind of response among its vast workforce is what has made Starbucks sustain its profitability year after year.

The takeaway from this chapter is that one should always consider that willpower is a finite resource and spend it appropriately over a day so that we get to maintain a healthy dose of self-discipline in our lives.

What Target knows much before you do

The author uses examples from the marketing world to drive home the point that if you dress a new something in old habits, it’s easier for the public to accept it. The takeaway from this chapter is a well known message is that it pays to monitor your customers habits closely.

The last few chapters of the book talks about habit formation in societies.

Here’s a visual from the web that captures the essence of the book



Habit is a choice that we deliberately make at some point, and then stop thinking about it, but continue doing, often every day. The book delves in to the basic components of habit formation loop. The book might take a few hours to read but is well worth the time as it makes the reader conscious of his/her habit loops. Once you recognize a habit loop, you seem to be much more control of your habits and hence your choices.


Neo4j’s founder Emil Eifrem shares a bit of history about the way Neo4j was started. Way back in 1999, his team realized that the database that was being internally used had a lot of connections between discrete data elements. Like many successful companies which grow out of a founder’s frustration with status quo, Neo4j began its life from the founding team’s frustration with the fundamental problem with the design of relational databases. The team started experimenting on various data models centered around graphs. Much to their dismay, they found no readily available graph database to store their connected data. Thus began the team’s journey in to building a graph database from scratch. Project Neo was born. What’s behind the name Neo4j ? The 4 in Neo4j does not stand for version number. All the versions numbers are appended after the word Neo4j. I found one folksy explanation on stackoverflow that goes like this,   

The Neo series of databases was developed in Sweden and attracted the ‘j’ suffix with the release of version 4 of the graph database. The ‘j’ is from the word ‘jätteträd’, literally "giant tree", and was used to indicate the huge data structures that could now be stored.

Incidentally the ‘neo’ portion of the name is a nod to a Swedish pop artist Linus Ingelsbo, who goes by the name NEO. In Neo1 the example graph was singers and bands and featured the favourite artists of the developers, including NEO. This was changed to the movie data set in the version 4 release.

There are other people speculating that Neo refers to "The Matrix" character Neo, fighting the "relational tables". It was recently announced that Neo4j would be called Neo5j as a part of latest branding exercise. In one of the recent blog post from the company, it was said that j in Neo4j stood for Java as the original graph database was writtein as a java library.


This talks about the purpose of the book, i.e to introduce graphs and graph databases to technology practitioners, including developers, database professionals, and technology decision makers. It also explains the main changes in the content of this book as compared to the first edition. The changes are mainly in the areas of Cypher syntax and modeling guidelines.


A graph is nothing but a collection of edges and vertices. Having said that, there are many different ways in which the graph can be stored. One of the most popular form of graph model is the labeled property graph, which is characterized as

  • It contains nodes and relationships
  • Node contain properties(key-value pairs)
  • Nodes can be labeled with one or more labels
  • Relationships are names and directed, and always have a start and end node
  • Relationships can also contain properties

Rough Classification

Given the plethora of products in this space, it is better to have some sort of binning. The chapter bins the products in to two groups.

  1. Technologies used primarily for transactional online graph persistence, typically accessed directly in real time from an application
  2. Technologies used primarily for offline graph analytics, typically performed as a series of batch steps

Broader Classification

The chapter also says that one of the other ways to slice the graph space is via graph data models, i.e. property graph, RDF triples and hypergraphs. In the appendix for the book, there is a much broader classification given for the NOSQL family of products. I will summarize the contents of appendix in this section as I found the appendix very well written.

Rise of NOSQL:

The NOSQL movement has mainly arisen because of 3 V’s

  • Volume of data : There has been a deluge of semi-structured data in the recent decade and storing it all in an structured relational data format has been fraught with difficulties. Storing connections gives rise to complicated join queries for CRUD operations.
  • Velocity of data read and writes and schema changes
  • Variety of data

Relational databases are known to provide ACID transactions; Atomic, Consistent, Isolated, Durable. NOSQL databases instead have BASE properties; Basic availability, Soft-state, Eventual consistency. Basic availability means that the data store appears to work most of the time. Soft-state stores don’t have to be write-consistent, nor do replicas have to be mutually consistent all the time. Eventual consistency stores exhibit consistency at some later point.

Types of NOSQL Databases:

NOSQL databases can be divided in to 4 types, i.e.

  • Document Stores :Document databases store and retrieve documents, just like an electronic filing cabinet. These stores act like a key-value pair with some sort of indexing in place for quicker retrieval. Since the data model is one of disconnected entities, stores tend to scale horizontally. Also writes are atomic at a document level. Technology has not matured for writes across multiple documents. MongoDB and CouchDB are two examples of Document stores.
  • Key-Value Stores :Key-value stores are cousins of the document store family, but their lineage comes from Amazon’s Dynamo database. They act like large, distributed hashmap data structures that store and retrieve opaque values by key. A client stores a data element by hashing a domain-specific key. The hash function is crafted such that it provides a uniform distribution across the available buckets, thereby ensuring that no single machine becomes a hotspot. Given the hashed key, the client can use that address to store the value in a corresponding bucket. These stores are similar to document stores but offer a higher level of insight in to the data stores. Riak-for instance- also offers visibility into certain types of structured data. In any case, these stores are optimized for high availability and scale.Riak and Redis are two example of Key-Value stores.
  • Column Family : These stores are modeled on Google’s BigTable. The data model is based on sparsely populated table whose rows can contain arbitrary columns, the keys which provide natural indexing. The four building blocks of Column family databasestore are column, super column, column family and super column family. HBase is an example of Column-oriented database.
  • Graph Databases : All the three previous types of databases are still aggregate stores. Querying them for insight into data at scale requires processing by some external application infrastructure. Aggregate stores are not built to deal with highly connected data This is where Graph databases step in. A graph database is an online, operational database management with CRUD methods that expose a graph model. Graph databases are generally built for use with transactional systems.There are two properties that are useful to understand while investigating graph technologies  
    • The underlying storage: Some graph databases use "native graph storage", which is optimized and designed for storing and managing graphs.Not all graph database technologies use native graph storage. Some serialize the graph data in to relational database,OOPS database etc
    • Processing engine : Some graph databases are capable of index-free adjacency, i.e. nodes point to each other in the database. For graphs that use index-free processing, the authors use the term "native graph processing"

    Besides adopting a specific approach to storage and processing, Graph databases also adopt a specific model. There are various models such as property graphs, hypergraphs and triples. Hypergraphs are mainly useful for representing many-to-many relationships. Triple stores typically provide SPARQL capabilities to reason about and store RDF data. Triple stores generally do not support index-free adjacency and are not optimized for storing property graphs. To perform graph queries, triple stores must create connected structures from independent facts, which adds latency to each query. Hence the sweet sport for a triple store is analytics, where latency is a secondary consideration, rather than OLTP.  

Power of Graph Databases:

The power of graph databases lies in performance, agility and flexibility. Performance comes from the fact that search can be localized to a portion of graph. Flexibility comes from the fact that there is little impedance between the way business communicates the requirements and the way the graph is modeled. Agility comes from the fact that graph databases are schema less and hence all the new connections can easily be accommodated.

Options for Storing Connected Data

This chapter takes a simple example of modeling friends and friend of friends relations via a RDBMS. SQL code is given to extract a few basic questions an user might have, while looking at a social graph. The queries, even for simple questions, become very complex. It quickly becomes obvious to anyone reading this chapter that modeling connections via RDMBS is challenging, for the following reasons:

  • Join tables add accidental complexity
  • Foreign key constraints add additional development and maintenance overhead
  • Sparse tables with nullable columns require special checking in code
  • Several expensive joins are needed to discover nested relationships
  • Reverse queries are difficult to execute as the SQL code becomes extremely complicated to write and maintain

NOSQL databases are also not scalable for storing connected data. By the very nature of document stores/key-value stores/ columnar family, the lack of connection or relation as a first class object makes it difficult to achieve scale. Even though there is some sort of indexing available in most of the NOSQL databases, they do not have index free adjacency feature. This implies that there is a latency in querying connected data.

The chapter ends with showcasing a graph database to store connections and it is abundantly clear by this point in the chapter that, by giving relationship the status of first class object, graph databases make it extremely easy to represent and maintain connected data. Unlike RDBMS and NOSQL stores in which connected data necessitates developers to write data processing logic in the application layer, graph databases offer a convenient and a powerful alternative.

Data Modeling with Graphs

The path from a real life representation of data to a data model in the graph database world is short. In fact one can use almost the same lingo to talk about various aspects while referring to graph data. This is unlike the RDBMS world where one needs to translate the real life representation to a logical model via "normalization-denormalization" route. In the latter world, there is a semantic dissonance between conceptualization of a model and database’s instantiation of that model

The chapter begins by introducing Cypher, an expressive graph database query language. Cypher is designed to be easily read and understood by developers, database professionals, and business stakeholders. Its ease of use derives from the fact that it is in accord with the way we intuitively describe graphs using diagrams. Cypher is composed of clauses like most other query languages. The chapter introduces some of the main clauses such as MATCH, RETURN, WHERE, CREATE, MERGE, DELETE, SET, FOREACH, UNION, WITH, START.

There are two examples used in the chapter that serve to show the contrast in the steps involved in modeling data in the relational world and graph world. In the relational world, one takes the white-board model, creates an ER diagram, and then a normalized representation of the model and finally a denormalized form. Creating a normalized representation involves incorporating foreign key relationships to avoid redundancy. Once the database model has been adapted to a specific domain, there is an additional denormalization work that needs to be done so as to suit the database and not the user. This entire sequence of steps creates an impedance between real life representation and the data base representation. One of the justifications given to this effort expended is that it is one time investment and it will payoff as the data grows. Sadly, given that data has 3V’s in the current age, i.e. Volume, Velocity and Variety, this argument no longer holds good. It is often seen that RDBMS data models undergo migrations with changing data structures. What once seemed to be a solid top-down robust approach falls apart quickly. In the graph world, the effort put in translating a white-board model in to a database model is minimal, i.e. what you sketch on a keyboard is what you store in the database. These two examples mentioned illustrate the fact that conceptual to implementation dissonance is is far less for graph database than an RDBMS.

The chapter also gives some guidelines in creating graphs so that they match up the BASE properties of NOSQL databases. By the end of the chapter, a reader ends up getting a good understanding of Cypher query language. If one is a complete newbie to Neo4j, it might be better to take a pause here and experiment with Cypher before going ahead with the rest of the book.

Building a Graph Database application

The basic guidelines for data modeling are

  • Use nodes to represent entities – that is, the things in our domain that are interesting to us
  • Use relations to represent relations between entities and to establish a semantic context for each entity
  • Use relationship direction to further clarify relationship semantics. Many relationships are asymmetric and hence it makes sense to always represent a relation with a certain direction
  • Use node attributes to represent entity attributes, plus any entity meta-data
  • Use relationship attributes to represent the strength, weight, quality of the relationship etc.
  • While modeling a node, make sure that it does not connect two nodes
  • It is always better to represent both a fine grained and a coarse grained relationship between two nodes. This helps in quickly querying at a coarse grained level.
  • The key idea of using graph as a way to model the data is that one can do an iterative model for creating nodes and relationships

Application Architecture

There are two ways in which Neo4j can be used in an application. One is the embedded-server mode and the rest is REST API mode.


All writes to Neo4j can be written to the master or a slave. This capability at an application level is an extremely powerful feature. These kind of powerful features are not present in other graph databases.

Load Balancing

There is no native load balancing available in Neo4j. However it is up to the network infra to create load balancing. One needs to introduce a network level rebalancer to separate out the read and write queries. Infact Neo4j exposes an API call to indicate whether the server is a master or slave.

Since cache partitioning is an NP Hard problem, one can use a technique called "cache-sharding" to route the query to that specific server that has a highest probability of finding a cache to serve the query. The way to route and cache queries is dependent on the domain in which the graph database is being developed.


If you need to debug any function written in any language, one needs sample input. In the case of testing graph, one needs to create a small graph of a few dozen nodes and relationships, so that this localized graph can be used to check any anomalies in the graph model. It is always better to write a lot of simple tests checking various parts of the graph than relying on one single universal test for the entire graph. As the graph evolves, an entire set of regression framework of tests can be formulated.

Importing and Bulk Loading Data

Most of the deployments of any kind of database requires some sort of content to be imported in to the database. This section of the chapter talks about importing data from a csv format. The headers in the csv file should be specified in such a way that they reflect whether the pertinent columns is an ID/LABEL/TYPE/property. Once the csv files are in the relevant format, they can be imported via neo4j-import command. One can also do a batch import via Cypher queries using LOAD CSV. In my line of work, I deal with a lot of RDF input files and there is an open source stored procedure that one can use to import any large RDF in to Neo4j.

Usually doing a bulk import of any file should be preceded by index creation process. Indexes will make lookups faster during and after the load process. Creating indexes on any type of is quite straightforward via Cypher commands. If the indexing is only helpful during data loading process, one can delete the index after the bulk data load is completed. For large datasets, it is suggested that one uses PERIODIC COMMIT command so that the transactions are committed only after a certain set of rows have been processed.

Graphs in the Real World

Why Organizations choose Graph Databases ?

Some of the reasons that organizations choose graph databases are :

  • "Minutes to Millisecond" performance requirement – Using an index free adjacency, a graph database turns complex joins into fast graph traversals, thereby maintaining millisecond performance irrespective of the overall size of the dataset
  • Drastically accelerated development cycles : Graph model reduces the impedance mismatch between the technical and the business domains
  • Extreme business responsiveness- Schema-free nature of the graph database coupled with the ability to simultaneously relate data elements in lots of different ways allows a graph database solution evolve as the business needs evolve.
  • Enterprise ready -The technology must be robust, scalable and transactional. There are graph databases in the market that provide all the -ilities, i.e transactionability, high-availability, horizontal read scalability and storage of billions of triples

 Common Use cases

  The chapter mentions the following as some of the most common usecases in the graph database world:

  1. Social Networks Modeling
  2. Recommendations (Inductive and Suggestive)
  3. Geospatial data
  4. Master Data Management
  5. Network and Data Center Management
  6. Authorization and Access Control

Graphs in the real world

There are three usecases that are covered in detail. Reading through these usecases gives one a good introduction of various types of data modeling aspects including writing good cypher queries. Reading through the Cypher queries pre-supposes that the reader is slightly comfortable with Cypher. To get the best learning out of these examples, it is better to write out the Cypher queries before looking at the constructed queries. One gets an immediate feedback about the way to construct efficient Cypher queries.

Graph Database Internals

This chapter gives a detailed description of Neo4j graph database internals.

neo4jarch - Copy

Native Graph Processing

A Graph database has native processing capabilities if it exhibits a property called index-free adjacency. A database engine that utilizes index-free-adjacency is one that maintains direct reference to its adjacent nodes. This enables the query times to be independent of the graph size.

A nonnative graph database uses global indexes to link nodes. These indexes add a layer of indirection to each traversal, thereby incurring computational costs. To traverse a network of m steps, the cost of indexed approach is O(m log n) whereas the cost is O(m) for an implementation that uses index-adjacency. To create a index-free adjacency in graphs, it is important to create an architecture that supports this property. Neo4j has painfully built this over a decade and one can see their efforts by querying a large graph. In my work, I have worked with pure triple stores and some sort of global indexing built on the triple stores. The performance of Neo4j is zillion times better than the databases that I have worked on. Hence a big fan of Neo4j.

Native Graph Storage

Neo4j stores data in a number of different file formats. There are separate data files for nodes, relationships and properties. Node store is a fixed size record store.Each node record is 9 bytes in length. The Node ID is created in such a way that node lookup is O(1) instead of O(log n). The constituents of the node record are 1) ID of the first relationship, 2) ID of the first property 3) label store of the node. Relationships are also stored in fixed length records whose constituents are 1) start Node ID 2) end Node ID 3) pointer to the previous relationship and pointer to the next relationship.

With fixed-sized records, traversals are implemented simply by chasing the pointers around a datastructure. The basic idea of searching in a Neo4j boils down to locating the first record in the relationship chain and then chasing various pointers. Apart from this structure, Neo4j also has a caching feature that makes increases its performance.

Programmatic APIs

Neo4j exposes four types of API for a developer and they are

  1. Kernel API
  2. Core API
  3. Traverser API

The Core API allows developers to fine-tune their queries so that they exhibit high affinity with the underlying graph. A well-written Core API query is often faster than any other approach. The downside is that such queries can be verbose, requiring considerable developer effort. Moreover, their high affinity with the underlying graph makes them tightly coupled to its structure. When the graph structure changes, they can often break. Cypher can be more tolerant of structural changes things such as variable-length paths help mitigate variation and change. The Traversal Framework is both more loosely coupled than the Core API (because it allows the developer to declare informational goals), and less verbose, and as a result a query written using the Traversal Framework typically requires less developer effort than the equivalent written using the Core API. Because it is a general-purpose framework, however, the Traversal Framework tends to perform marginally less well than a well-written Core API query

Nonfunctional characteristics


One of the ways to evaluate the performance of database is via 1) the number of transactions that can be handled in an ACID way and 2) the number of read and write queries that can be processed on a database.

Transactions are semantically identical to traditional database transactions. Writes occur with in the same transaction context, with write locks being taken for consistency purposes on any nodes and relationships involved in the transaction. The following talks about the way transaction is implemented

The transaction implementation in Neo4j is conceptually straightforward. Each transaction is represented as an in-memory object whose state represents writes to the database. This object is supported by a lock manager, which applies write locks to nodes and relationships as they are created, updated, and deleted. On transaction rollback, the transaction object is discarded and the write locks released, whereas on successful completion the transaction is committed to disk.


For a platform like Neo4j, one cannot talk of scale based on # of transactions per second. One needs to think about scale along atleast three different axis.

  • Capacity: The current release of Neo4j can support single graphs having tens of billions of nodes, relationships, and properties.The Neo4j team has publicly expressed the intention to support 100B+ nodes/relationships/properties in a single graph as part of its road map
  • Latency: The architecture of Neo4j makes the performance almost constant irrespective of the size of the graph. Most queries follow a pattern whereby an index is used to find a starting node and the remainder is pointer chasing. Performance does not depend on the size of the graph but depends on the data being queried.
  • Throughput: Neo4j has a constant time performance irrespective of the graph size. We might think that extensive read writes will make the performance of the entire database to go down. However one sees that the typical read, write queries happen on a localized graph and hence there is scope to optimize at an application level.

Holy grail of scalability

The future goal of most graph databases is to be able to partition a graph across multiple machines without application-level intervention, so that read and write access to the graph can be scaled horizontally. In the general case this is known to be an NP Hard problem, and thus impractical to solve.

Predictive Analysis with Graph Theory

The chapter starts off with a basic description of depth-first and breadth-first graph traversal mechanisms. Most useful algorithms aren’t pure breadth-first but are informed to some extent. Breadth-first search underpins numerous classical graph algorithms, including Dijkstra’s algorithm. The last chapter talks about several graph theoretical concepts that can be used to analyze Networks.


The book gives a good introduction to various aspects of any generic graph database, even though most of the contents are Neo4j specific. This book is 200 pages long but it packs so much content in it that one needs to read slowly to understand thorughly various aspects of a graph databases. If you have been exposed to native triple stores,then this book will be massively useful to get an idea about Neo4j’s implementation of property-graph model – an architecture that makes all the CRUD operations on a graph database very efficient.


Cypher is a query language for Neo4j graph database. The basic model in Neo4j can be described as

  • Each node can have a number of relationships with other nodes
  • Each relationship goes from one node either to another node or to the same node
  • Both nodes and relationships can have properties, and each property has a name and a value

Cypher was first introduced in Nov 2013 and since then the popularity of graph databases as a category has taken off. The following visual shows the pivotal moment:


Looking at the popularity of Cypher, Neo4j was made open source in October 2015. Neo4j founders claim that the rationale behind the decision was that a common query syntax could be followed across all the graph databases. Cypher provides a declarative syntax, which is readable and powerful and a rich set of graph patterns can be recognized in a graph.

Via Neo4j’s blog:

Cypher is the closest thing to drawing on a white board with a keyboard. Graph databases are whiteboard friendly; Cypher makes them keyboard friendly.

Given that Cypher has become open source and has the potential to become the de facto standard in graph database segment, it becomes important for anyone working with graph data to have a familiarity with the syntax. Since the syntax looks like SQL syntax, has some pythonic element to the query formulation, it can be easily picked up by reading a few articles on it. Do you really need a book for it ? Not necessarily. Having said that, this book reads like a long tutorial and is not dense. It might be worth one’s time to read this book to get a nice tour of various aspects of Cypher.

Chapter 1 : Querying Neo4j effectively with Pattern Matching

Querying a graph database using API is usually very tedious. I have had this experience first hand while working on a graph database that had ONLY API interface to obtain graph data. SPARQL is a relief in such situations but SPARQL has a learning curve. I would not call it steep, but the syntax is a little different and one needs to get used to thinking in triples, for writing effective SPARQL queries. Writing effective SPARQL queries entails thinking in subject-predicate-object terms. Cypher on the other hand is a declarative query language, i.e. it focuses on the aspects of the result rather than on methods or ways to get the result. Also it is human-readable and expressive

The first part of the chapter starts with instructions to set up a new Neo4j instance. Neo4j server can be run as a standalone machine with the client making API calls OR can be run as an embedded component in an application. For learning purpose, working with standalone server is the most convenient option as you have a ready console to test out sample queries. The second part of the chapter introduces a few key elements of Cypher such as

  • () for nodes
  • [] for relations
  • -> for directions
  • – for choosing bidirectional relations
  • Filtering matches via specifying node labels and properties
  • Filtering relationships via specifying relationship labels and properties
  • OPTIONAL to match optional paths
  • Assigning the entire paths to a variable
  • Passing parameters to Cypher queries
  • Using built in functions such as allShortestPaths
  • Matching paths that connect nodes via a variable number of hops

Chapter 2 : Filter, Aggregate and Combine Results

This chapter introduces several Cypher statements that can be used to extract summary statistics of various nodes and relationships in a graph. The following are the Cypher keywords explained in this chapter

  • WHERE for text and value comparisons
  • IN to filter based on certain values
  • "item identifier IN collection WHERE rule" pattern that can be used to work with collections. This pattern is similar to list comprehension in python
  • LIMIT and SKIP for pagination purposes. The examples do not use ORDER BY which is crucial for obtaining paginated results
  • SORT
  • COALESCE function to work around null values
  • COUNT(*) and COUNT(property value) – Subtle difference between the two is highlighted
  • math functions like MIN, MAX, AVG
  • COLLECT to gather all the values of properties in a certain path pattern
  • CASE WHEN ELSE pattern for conditional expressions
  • WITH to separate query parts

Chapter 3 : Manipulating the Database

This chapter talks about Create, Update and Delete operations on various nodes and relations. The Cypher keywords explained in the chapter are

  • CREATE used to create nodes, relationships and paths
  • SET for changing properties and labels
  • MERGE to check for an existing pattern and create the pattern if it does not exist in the database
  • MERGE SET and MERGE CREATE for setting properties during merge operations
  • REMOVE for removing properties and labels
  • FOREACH pattern to loop through nodes in a path

By the end of this chapter, any reader should be fairly comfortable in executing CRUD queries. The queries comprise three phases

  1. READ : This is the phase where you read data from the graph using MATCH, OPTIONAL, and MATCH clauses
  2. WRITE : This is the phase where you modify the graph using CREATE, MERGE, SET and all other clauses
  3. RETURN : This is the phase where you choose what to return to the caller

Improving Performance

This chapter mentions the following guidelines for creating queries in Neo4j :

  • Use Parametrized queries: Wherever possible, write queries with parameters that allows engine to reuse the execution of the query. This takes advantage of the fact the Neo4j engine can cache the query
  • Avoid unnecessary clauses such as DISTINCT based on the background information of the graph data
  • Use direction wherever possible in match clauses
  • Use a specific depth value while searching for varying length paths
  • Profile queries so that the server does not get inundated by inefficient query construction
  • Whenever there is large number of nodes belonging to a certain label, it is better to create index. In fact while importing a large RDF it is always better to create indices on certain types of nodes.
  • Use constraints if you are worried about property redundancy

Chapter 4 :  Migrating from SQL

The chapter talks about various tasks involved in migrating data from a RDBMS to a graph database. There are three main tasks in migrating from SQL to a graph data base :

  1. Migrating the schema from RDBMS to Neo4j
  2. Migrating the data from tables to Neo4j
  3. Migrating queries to let your application continue working

It is better to start with an ER diagram that is close to the white-board representation of the data. Since graph databases can closely represent a white-board than the Table structure mess(primary key, foreign key, cardinality), one can quickly figure out the nodes and relationships needed for the graph data. For migrating the actual data, one needs to import the data in to relevant CSV and load the CSV in to Neo4j. The structure of various CSV files to be generated depends on the labels, nodes, relationships of the graph database schema. Migrating queries from RDMBS world in to graph database world is far more easier as Cypher is a declarative syntax. It is far quicker to code the various business requirement queries using Cypher syntax.

Chapter 5 : Operators and Functions

The last section of the book contains a laundry list of operators and functions that one can use in creating a Cypher query. It is more like a cheat sheet but with elaborate explanation of various Cypher keywords


This book gives a quick introduction to all the relevant keywords needed to construct a Cypher query. In fact it is fair to describe the contents of the book as a long tutorial with sections and subsections that can quickly bring a Cypher novice up to speed.


Git and Github have revolutionized the way one creates, maintains and shares software code. It is said to be the Linus Torvald’s second gift to the world, first obviously being the Linux operating system. Nowadays it is common for job seekers to showcase their work in the form of several github repositories so that various employers can evaluate the job seeker in a much better way. Open source projects are thriving because of easy to use git based social coding platforms. The popularity of these platforms has grown to such an extent that many non-programmers are using git and github for maintaining version control of their work. Personally I know two nonfiction writers/journalists who use git to maintain their various documents.

My guess is that the transition from a newbie to a moderately skilled user of github might take a week or two. It is easy to mug up commands and just use it. But if you want to understand the way git does certain things and why it does certain things, it would be a good idea to spend some time understanding each of the main ideas of git. There are many tutorials, videos, screencasts, podcasts available on the internet. They may be a good starting point for a newbie. There are also book length treatment given to Git, Github etc. Out of the massive content that is available to understand Git, I think this book is one of the best introductions, the reasons being,

  • Each main idea is condensed in to a one hour lesson, meaning that someone else has thought through the process of "what’s the ideal chunking necessary to understand git"?
  • Each idea is illustrated via visuals, i.e the ideas that you glean from the book stick with you longer
    Easy to understand examples through out the book
  • There is a lab at the end of every chapter. One cannot learn unless one practices, more so when you are completely new to the subject
  • Examples are reused all over the book so that there is some sort of reinforcement of the previous ideas

In this post, I will summarize the main points of each chapter.

Chapter 1 – Before you begin

Firstly, the rationale of title. Each chapter is designed to be read during your lunch hour, not literally though. It basically means that all you need to read any chapter in the book is an hour of your time. In that hour, you should be able to read the text and go through each of the "Try it now" exercises for a particular chapter. The exercises at the end of each chapter will help reinforce the points of the chapter

The author suggests a learning path that a reader could follow, i.e. one chapter each day for 20 days. There are no pre-requisites to reading this book. Any novice can pick up the material covered in the book, if he chooses to allocate 20 hrs. My guess is that even if you manage to spend 14 hours on this book, it should make you conversant with the git workings and should turn you in to a moderately skilled git user.

Chapter 2 – An Overview of Git and Version Control

If you try coding stuff, then you typically have files that you will modify and save. You might save the file multiple times but a set of saves might warrant a comment. For a reader who isn’t well versed with version control, she might try to incorporate comments in the file name itself by suffixing or prefixing the file name. This type of managing files doesn’t scale well. Hence the need for a versioning system.

Every version control system has three concepts

  • Versioning
  • Auditing
  • Branching

The power of Git comes from the following features:

  • Distributed repositories: Each developer has his own repository that she can commit to. There is no problem of taking a backup as every user of the centralized repository has a full working repository. The basic difference between previous version control systems and git is that the latter is a DVCS (Distributed version control system). This means that you don’t need to run a Git server to get all its benefits. You don’t even need a network to run Git’s commands. Every developer is given a version control of the repository.
  • Fast branching: One of the ways to keep the distinction between development and production code is to separate it out in to two folders. One has to switch between folders to know the appropriate folder to work on. Git makes branching extremely fast. Internally it manages branching by a set of pointers. There is no need to copy files and other things. The speed with which you can create a new branch and begin working creates a new model for doing work : if you want to try a new idea in your code, create your own branch. Because you’re working in a local repository, no other developer is disturbed by this new code stream. Your work is safe and isolated.
  • Staging area: There are situations where you want a specific code to be used while developing but a sanitized code to be used in the production. For example, the username and password could be hard-coded in development stage but not in production. Git has a concept of staging where you stage the file but commit a sanitized version of the file to the repository

The author gives a tour of Git via GUI interface as well as CLI. Towards the end of the chapter, the author lists down the various terms that one comes across while using git

  • Branch
  • Check out
  • Clone
  • Commit
  • Distributed
  • Repository
  • Staging area
  • Timeline
  • Version Control


Chapter 3 – Getting Oriented with Git

The syntax for using any git command is

git [switches] <command> [<args>]

switches are optional arguments, command is the git command and args are the arguments to the git command.

For example in the following command,

git -p config –global "RK"

In the above command, -p is the switch to paginate output if needed, config is the git command, –global,, "RK" are three arguments

This chapter introduces basic command line functions that are used to create, remove, rename files and directories. The most important function of this chapter is to make the user set the global and global email setting. These two values plus a bunch of other stuff will be used by git to create SHA1 for the commit objects.

The following are the commands mentioned in the chapter :

Command   Description
git config –global "Your Name" Add your name to the global configuration
git config –global "Your email"    Add your email to the global configuration
git config –list    Display all the git configurations    
git config Displays the configuration value
git config Displays the configuration value
git help help    Ask git to help about its help system
git help -a    Print all the git available commands
git –paginate help -a    Paginate the display of all the git available commands
git help -g     Print all the git available guides
git help glossary Display the git glossary

Chapter 4 – Making and Using a git repository

This chapter introduces the basics of creating and using a git repository. git init creates a repository on your local machine.

There are two things to keep in mind

  • No server was started
  • The repository was entirely local

The init commands creates a special folder called .init and it contains a host of folders for managing the commit objects, trees, references etc. There is a difference between working directory and repository. The working directory is the place where you do your work. The repository is a specialized storage area in which you can save versioned files. The repository lives inside the working directory.

For any file you create in the working directory, you need to make it git aware. This can be done via git add command. Once git add is run on a file, Git knows about your file and tracks changes to it. But since the file is not committed, there is no time information that is recorded in git. A good way to to imagine adding a file to git is, putting a file in a queue called staging area. It appears in the repository only after one commits the file with the relevant comment.

The following are the commands mentioned in the chapter :

Command   Description
git init Initialize a git repository in the current repository
git status Display status of current directory, as it relates to Git
git add FILE start tracking FILE in Git; adds FILE to the staging area
git commit -m MSG Commit changes to the git repository, with a message in quotes
git commit -a -m MSG Adds the unstaged files and creates a new commit object
git log     Display the log history
git log –status Displays the log with the files that were modified
git ls-files List the files in the repository


Chapter 5 – Using Git with a GUI

This chapter uses GUI to create/add/commit to the repository. I have used Cola Git Gui to explore the various lessons in this chapter. Towards the end of the chapter, the author touches upon Tcl/Tk.Tcl is a dynamic interpreted language invented in 1988 by John Ousterhout. Tk, a toolkit of GUI controls, was added to the language not long after. Both Git Gui and gitk are written in Tcl/Tk.

Chapter 6 – Tracking and Updating files in Git

The author introduces "staging area" in Git via the following analogy:

Pretend that your code is an actor in a theater production. The dressing room is the Git working directory. The actor gets a costume and makeup all prepared for the upcoming scene. The stage manager calls the actor and says that the scene is just about to start. The actor (the code) doesn’t jump in front of the audience right then. Instead, it waits in an area behind the curtain. This is our Git staging area. Here, the actor might have one last look at the costume or makeup. If the actor (the code) looks good, the stage manager opens the curtains, and the code commits itself to its performance.

Whenever you change anything in the working directory, that change has to be reflected in the staging area. This staging area can be committed to git. The author shows step by step procedure of adding a file to the staging area, committing the file, checking the log messages, figuring out the difference between staged file and the file in the working directory etc.

The following are the commands mentioned in the chapter :

Command   Description
git commit -m "Message" commit changes with the log message entered on the command line via -m switch
git diff Show any difference between tracked files in the current directory and the staging area
git diff –staged Show any difference between the files in the staging area and repository
git commit -a -m "Message" Perform git add, and Perform git commit with the given message
git add –dry-run Show what git add would do
git add . Add all new files to the git repository
git log –shortstat –oneline Show history using one line per commit, and listing each file changed per commit

Chapter 7 – Committing parts of changes

The way to delete a file from the repository is to remove the file from the staging area first and then commit to the repository. If you use bash command rm it will only remove the file from the working directory. To remove the the file from staging area, use git rm. This removes the file from the staging area as well as the current directory. The same logic applies to renaming files too. Use git mv command to rename files in the staging directory as well as the working directory. In a way it might seem like staging area is pretty redundant. However it is extremely useful in committing partial files. You can choose the portion of files that you want to stage by using git add -p filename. This will throw a list of hunks that you can choose to stage or ignore. It took me sometime to get used to understand this functionality. The other aspect that is covered in the chapter is about commit. When to commit ? It makes sense to commit to the repository under any of these conditions:

  • Adding or deleting a file
  • Renaming a file
  • Updating a file to a known good working state
  • When you anticipate being away from the work
  • When you introduce some questionable code

However the author feels that since all the commits are local to user machine, it is better to commit as frequently as possible

The following are the commands mentioned in the chapter :

Command   Description
git rm file Remove file from the staging area
git mv file1 file2 Rename file1 to file2 in the staging area
git add -p Pick parts of your changes to add to staging area
git reset file Reset your staging area, removing any changes you have done with git add
git checkout file Check out the latest committed version of the file in to your working directory

Chapter 8 – The time machine that is Git

Each commit has a unique SHA1 ID associated with it. This code is generated based on author’s email, time of the commit, the files in the staging area and previous commit SHA1. The fact that it is based on previous commit SHA1 means you can traverse the entire version tree via the latest commit SHA1. No two commit objects will ever share a common SHA1 ID. At the beginning of the project, HEAD and master point to the same version. As you keep doing commits, master points to the latest commit and so does HEAD. However once you checkout a particular version, then the HEAD moves back in time to that particular version. One of the easy ways to refer to SHA1s are by using tags. You can set a particular SHA1 a specific tag that you can use it later for quick checkout.

The following are the commands mentioned in the chapter :

Command   Description
git log –parents Show the history, displaying the parent commit’s SHA1 ID for each commit
git log –parents –abbrev-commit Same as the preceding command, but shorten the SHA1 ID
git log –oneline Display history concisely using one line per each commit
git log –patch Display the history, showing the file differences between each commit
git log –stat Display the history, showing a summary of the file changes between each commit
git log –patch-with-stat Display the history combining patch and stat output
git log –oneline file_one Display the history for file_one
git rev-parse    Translate a branch name or tag in to a specific SHA1
git checkout your_sha1id change your working directory to match a specified sha1id       
git tag tag_name -m "message" sha1id create a tag named tag_name, pointing to your sha1id
git tag List all tags
git show tag_name Show information about the tag named tag_name

Chapter 9 – Taking a fork in the road

Branching is one of the most important concepts in git. Typically you start with a master code and as time goes, you keep creating divergent code bases. Each of the divergent code base could represent a bug fix, an enhancement, a new feature, etc. Each branch has a reference called master that refers to the latest commit in that specific branch. There is also a reference by name,"HEAD", that refers to the commit of the checked out commit. If the checked out code and latest commit represent the same set of files, then master and HEAD point to the same commit object. One often forgets that SHA1 for every commit object includes the information of its parent object. Once you create branches, one obviously needs to know commands to

  • switch to another branch
  • list down all the branches
  • create a DAG showing all the branches
  • Difference between the codebase between two branches
  • Creating and checking out a branch in single line of code

The author gradually introduces commands to do all the above. He concludes the chapter after introducing git stash and git pop commands.

The commands mentioned in this chapter are :

Command   Description
git branch List all branches
git branch dev Create a new branch named dev
git checkout dev Change your working directory to the branch named dev
git branch -d master Delete the branch named master
git log –graph –decorate –pretty=oneline –abbrev-commit View history of the repository across all branches
git branch -v    List all branches with SHA1 information
git branch fixing_readme YOUR_SHA1ID Making a branch using YOUR_SHA1ID as the starting point
git checkout -b another_fix_branch fixing_readme Make a branch name another fix_branch using branch fixing_readme as the starting point
git reflog Show a record of all times you changed branches     
git stash Set the current work in progress to stash, so you can perform a git checkout
git stash list List works in progress that you have stashed away
git stash pop Apply the most recently saved stash to the current working directory, remove it from stash

Chapter 10 – Merging Branches

"Branch often" is the mantra of a git user. In that sense, merging the created branch with the master or any other branch becomes very important. Branching diverges code base and Merging converges code base. Using the pneumonic "traffic merges in to us", the author reinforces the point that git merge command is used to merge other branches in to the branch we are on. A merge results in creating a commit object that has two parent commits. One of the most useful commands to explore the master branch commit structure is

git log –graph –oneline –decorate –all –parents –abbrev-commit

In any merge, there is a possibility of conflicts between the code bases. The conflicts can be resolved by opening the conflict files, choosing the appropriate hunk, and creating a new commit by merge. The author shows the steps to do a git merge via UI tools. The chapter ends with the discussion of fast-forward merge. This type of merge arises when you the target branch is a direct descendant of the branch that it will merge with. Git also has the ability to merge multiple branches, the jargon for such a task is, "octopus merge".

The following are the commands mentioned in the chapter :

Command   Description
git diff BRANCH1…BRANCH2 Indicate the difference between BRANCH1 and BRANCH2 relative to when they first became different
git diff –name-status BRANCH1…BRANCH2 Summarize the difference between BRANCH1 and BRANCH2
git merge BRANCH2 Merge BRANCH2 in to the current branch that you’re on
git log -l    A shorthand for git log -n 1
git mergetool open a tool to help perform a merge between two conflicted branches
git merge –abort Abandon a merge between two conflicted branches
git merge-base BRANCH1 BRANCH2    Show the base commit between BRANCH1 and BRANCH2

Chapter 11 – Cloning

When you typically want to share your code, you can either copy your working directory code and send it across OR in the git’s world, host your repository for others to clone it. In the first approach, all your version control is lost. The receiver has no way to track changes that you make in your code after the code has been shared. In the second approach, all your version history is intact and anyone can clone your directory to get access to the entire history of commits. The crucial advantage of cloning is that the copy is linked to the original repository and you can send and receive changes back to the original.

When you clone a directory, the only branch that appears in the clone is the active branch from the original repository, i.e the branch that is pointed by HEAD. When you look at the tracking branches in a repository cloned from another one, you see a strange naming convention such as remotes/origin/branch_name. For each branch on the remote repository, git creates a reference branch.The remote-tracking branches, like regular branches, point to the last commit of that line of development. Because every commit points to its parent, you can see how you have the entire history. If you want to develop code by working on any reference branch, you checkout the branch in the usual way using git branch and it creates a branch off the remote tracking branch

The author introduces bare directory, i.e. a standalone directory that contains only a git repository and nothing else. An important aspect of a bare directory is that it has no reference to the original repository. Unlike a clone, which has a reference to its originating repository, the bare directory is a completely standalone repository. Because of this, and the fact that it has no working directory, bare directories are often the official copy of a repository. The only way to update it is to push to it, and the only way to retrieve its contents is to clone, or pull, from it.

The following are the commands mentioned in the chapter :

Command   Description
git clone source destination_dir Clone the Git repository at source to the destination_dir
git log –oneline -all Display all commit log entries from all branches
git log –simplify-by-decoration –decorate –all –oneline Display the history in a simplified form
git branch -all Show remote-tracking branches in addition to local branches
git clone –bare source destination_dir Clone the bare directory of the source directory into the destination_dir
git ls-tree HEAD Display all the files for HEAD

Chapter 12 – Collaborating with Remotes

This chapter talks about creating references to one or many remote repositories. The remote could be a single or multiple repositories. These remotes could reside anywhere on the network. Once you set up a remote and clone the repository, you are all set to send and receive changes from the remotes. The usual word attributed to remote repository is "origin". However you can change it to refer to any word that sticks with your mental model.

The following are the commands mentioned in the chapter :

Command   Description
git checkout -f master checkout the master branch, throw away any changes in the current branch
git remote Displays the name of the remote directory
git remote -v show Displays the names of the remotes along with the corresponding URL
git remote add bob ../math.bob Add a remote names bob that points to the local repository ../math.bob
git ls-remote bob Display the references of a remote repository
GIT_TRACE_PACKET git ls-remote REMOTE Display the underlying network interaction

Chapter 13 – Pushing your changes

git push is a command that affects another repository besides your own. Once you are done with the changes in your local repository, you might want to share your code with a remote repository. In the case where the remote repository has not changed, the code can be easily merged via a fast-forward merge. If you get a conflict in pushing code, you need to fix your local repository by pulling changes from the remote and then pushing your code. If you create a new branch in your local repository and then try to push your code, git will crib. You have to use –set-upstream switch so that git creates a branch on the remote and then pushes the code to it. The author also explains the way to delete branches on the remote. It is a two step process where you first delete the branch from the local repository and then use a specific syntax to push to the remote, post which the branch on the remote is also deleted. The last section of the chapter talks about pushing and deleting tags on the remote.

The following are the commands mentioned in the chapter :

Command   Description
git push origin master Push the master branch to the remote name origin
git push Push the current branch to the default remote-tracking branch set up by git checkout or git push –set-upstream
git push –set-upstream origin new_branch create a remote tracking branch to new_branch on the remote named origin
git config –get-regexp branch List all the git configuration settings that have the word branch in the variable name
git branch -d local branch Remove the local branch named local branch
git push origin :remotebranch Remove the branch named remotebranch from the remote named origin
git tag -a TAG_NAME -m TAG_MESSAGE SHA1 create a tag to the sha1 with the name tag_name and the message tag_message
git push origin TAGNAME Push the tag named TAGNAME to the remote named origin
git push –tags Push all the tags to the default remote
git push origin :TAGNAME Delete the tag named TAGNAME on the remote named origin
git tag -d TAGNAME Remove the tag named TAGNAME from the local repository

Chapter 14 – Keeping in sync

The rationale for syncing is simple – git will not allow you to push your code to the remote until your local repository is in sync with the remote. git pull is a two part operation. git pull comprises git fetch and git merge. The first step comprises fetching the remote repository and seeing to it that your repository look like remote repository. This overlays all the commits from the remote repository on to the working repository. The crucial thing to note is the pointer by name FETCH_HEAD that points to the most recent remote tracking branch that was fetched. When git merge is done on your working branch, you use the FETCH_HEAD pointer to merge in all the changes of the same branch on the remote.

The following are the commands mentioned in the chapter :

Command   Description
git pull Sync your repository with the repository that you cloned from. This comprises git fetch and git merge
git fetch The first part of git pull . This brings in new commits from the remote repository and updates the remote-tracking branch
git merge FETCH_HEAD Merge the new commits from FETCH_HEAD in to the current branch
git pull –ff-only The -ff-only will allow a merge if FETCH_HEAD is a descendant of the current branch

Chapter 15 – Software archaeology

This chapter gives elaborate explanation of various switches that go with the git log command. Detailed explanations are given for understanding gitk view configurations.

The following are the commands mentioned in the chapter :

Command   Description
git log –merges List commits that are a result of merges
git log –oneline FILE List commits that affect FILE
git log –grep=STRING List commits that have STRING in the commit message
git log –since MM/DD/YYYY –until MM//DD/YYYY List commits between two dates
git shortlog Summarizes commits by various authors
git shorlog -e Summarizes commits by various authors including email
git log –author=AUTHOR List commits by AUTHOR
git log -stat HEAD^..HEAD List the difference between the current checked out branch and its immediate parent
git branch –column List the branches by column name
git name-rev SHA1 Given a SHA1, it gives the name of the branch
git grep STRING Find all the files with the given STRING
git blame FILE Display blame output for a FILE

Chapter 16 – Understanding git rebase

It is often the case that the checked out branch that you are working in the local directory goes out of sync with the remote master because of a collaborator committing it to the remote master. If you want to push your branch on to remote, git will crib. One of the ways to deal with this situation is to use git rebase. This command alters the history of your local directory by downloading the remote repository commit and then adding your changes as the descendant of the HEAD branch of the downloaded commit.The most important reason for using git rebase is to change the starting point of your local branches. In case there is an accidental rebase, one can always use git reflog and reset the head to the point at the relevant SHA1 ID. The chapter concludes by introducing git cherry-pick that can copy a specific commit to the current branch.

The following are the commands mentioned in the chapter :

Command   Description
git log –oneline master..new_feature Show the commits between the master branch and the feature branch
git rebase master Rebase your current branch with the latest commit from master
git reflog Display the reflog
git reset –hard HEAD@{4} Reset HEAD to point to the SHA1 ID represented by HEAD@{4}.
git cherry-pick SHA1 ID Copy the commit to the current branch you are on

Chapter 17 – Workflows and branching conventions

This chapter discusses the unwritten rules, policy and convention relating to git.

  • Try to keep the Git commit subject under 50 characters
  • It might make sense to limit users who are given rights to push the code
  • Standardize the name of branches
  • Depending on whether there is a need to maintain the history of every commit or not, one might want to use git rebase or not
  • Standardize the name of the tags that can be used

The author explains two workflows that are popular amongst git users

  • git-flow : There are two main branches in a git-flow repository. Other branches such as feature and release are created temporarily and then deleted when finished. The master branch contains released production-level code. This is what the public can see, perhaps on a deployed website or in some released software that they’ve feature downloaded from you.The develop branch release contains code that is about to be released
  • GitHub flow: There is one master branch that is forever alive. There are feature branches that are brought in to existence whenever required. Once the feature is developed, it is merged in to the master branch. Unlike git-flow workflow, the branches are not deleted in this type of work-flow.

The following are the commands mentioned in the chapter :

Command   Description
git commit –allow-empty -m "Initial commit" Create a commit without adding any files
git merge –no-ff BRANCH Merge BRANCH in to the current branch, creating a merge commit even if its a fast-forward commit
git flow A git command that becomes available after installing gitflow

Chapter 18 – Working with Github
Github is a service that hosts git repositories. These repositories are typically bare directories and they contain all the version control related files and folders. The way to go about creating a github repository is via an UI on the github website. Once the bare directory is created, it is ready to be used. One can add the URL of the bare repository using git add remote command and then the rest is same as communicating with any remote repository. All the commands such as push, fetch, merge, pull remain the same. The power of github lies in widespread collaboration on a single project. If you want to contribute to a github repository XYZ, the first thing one needs to do is to fork it. A fork creates a replica of the XYZ and this can serve as your own private space to play with the entire repository. You can clone it on to your local machine, hack it, develop on the code etc. There is one key element that needs to kept in mind. All the changes that you push on to the github will only be present in your fork. They will not be reflected in the original XYZ repository unless you send a request to the XYZ owner and the owner accepts your pull request. Github has a cool UI that enables any developer to send pull requests to owner. It also has many features that enable an owner to keep track of various pull requests, maintain wiki and much more.

The following are the commands mentioned in the chapter :

Command   Description
git remote add github https:/// Add a rename named github that points to your math repo on github
git push -u github master push your master branch to remote identified by github
git clone http:/// Clone your github repository named math

Chapter 19 – Third Party Tools and Git

I did not go through this chapter as I do not foresee, at least in the near future, using IDE plugins mentioned in the chapter, i.e. Atlassian’s SourceTree, EclipseIDE integration

|Chapter 20 – Sharpening your Git

This chapter urges the reader to explore the configuration files. There are three levels at which config options can be set. First at the local or repository level. Second at the global level and third at the System level. The switches used to access each of the three levels are –local, –global, –system. Each configuration is specified as name=value pair. The author explain ways to configure various IDEs with git like notepad++, nano etc. The author concludes the chapter by giving some general directions for continually learning git.

The following are the commands mentioned in the chapter :

Command   Description
git config –local –list List the local git configuration
git config –global –list List the global git configuration
git config –system –list List the system git configuration
git -c log -n 2 Show the last two commits using the relative date format
git config –local relative Save the relative date format in the local Git configuration
git config –local –edit Edit the local Git configuration
git config –global –edit Edit the global Git configuration
git config –system –edit Edit the system Git configuration
git -c core.editor=echo config –local –edit Print the name of git configuration file     
git -c core.editor=nano config –local –edit Edit the local git configuration file using nano
git config core.excludesfile Print the value of the core.excludesfile git configuration settings



This book is an excellent book to learn git for someone who is short on time. Each chapter takes an hour and depending on one’s requirement, one could select the relevant chapters of the book, read it, practice the lab exercises and become a moderately skilled git user. Highly recommended book for a git newbie.


Books such as these, give visual images that are necessary to make learning stick. It is fair to say that I do not remember anything much about cell biology nor anything related to DNA. It was way back in my high school that I had crammed something, held it in my working memory for a few years in order to write exams. Some bits would have percolated to my long term memory, but since I have never retrieved them, they lie somewhere in some inaccessible part of my brain.

In the past two months, I have been exposed to a lot of terminology that is specific to cell biology and genetics. My dad was diagnosed with advanced stage colon cancer and I had consulted three of the best oncologists in the city. Each meeting with the doctor lasted about 30-45 minutes. Some of the meetings were overwhelming. One of the doctors, who is known to be the best in the city, threw a lot of jargon at me, explained various types of scenarios for cancer treatment. Needless to say I was clueless. Here I was, lucky enough, to get a time slot with a leading oncologist and I was completely lost. The only thing I could think of doing is to jot down rapidly the list of words and phrases he was uttering in the conversation. Subsequently I came back home and read up on each term and understood various treatment options. Despite spending time understanding the terms, my knowledge about the treatment options was cursory at best. In any case, there were people around me who were far more intelligent and knowledgeable than me, that choosing the right doctor and the treatment schedule became an easy decision.

Amidst the hectic schedule in making my dad go through various chemo cycles, I have read through a few books on cancer. However as a primer to understanding those books on cancer, I read a few genetics/biology 101 books. This book is amongst the preliminary set of books that I have read in the past month. To begin with, this book has given me a basic collection of visuals that I can use as anchors, while reading general literature. Why do we need visuals ? Can’t one just read the stuff and understand. Well, may be yes. But most likely at least for me, it is a NO. My mind needs visuals to understand stuff better. For example, if one were to read the steps involved in creating protein(a chain of amino acids) from a DNA. it goes something like this :

  1. Enzymes in the nucleus create short sequences of mRNA based on DNA
  2. rRNA attaches itself to mRNA
  3. An appropriate tRNA attaches to rRNA based on mRNA
  4. Each tRNA gives rise to an amino acid
  5. Each amino acid so formed, attaches to the previously formed amino acid.
  6. At the end of every DNA encoding protein, there is a specific stop code that makes rRNA detach from the amino acid production line.
  7. A sequence of amino acids thus attached from the previous steps is nothing but one of the many proteins in the cell.

If one has to follow the above sequence of steps, merely reading them might not be sufficient to understand what’s going on. Some sort of pictures would be helpful and the book exactly fills in that void. The authors do a fantastic job of illustrating the above steps so that the visuals form a very sticky cue for further learning.

Here is a list of terms/concepts/principles covered in the book :

  • Selective breeding
  • Bible story on Jacob’s flock illustrates accurate Genetic observation coupled with total lack of understanding. Science and magic went together
  • Most coherent Greek theory of Heredity(by Hippocrates) : There were fluids inside the bodies of men and women. These fluids battled against each other and the outcome decided whether a particular part of body resembled the mother’s or the father’s
  • Greek Civilization and the Middle Ages had all sorts of crazy ideas about theories of heredity
    • All inheritance came from father
    • Spontaneous generation – Living organisms could arise from non living matter. This was challenged by Francesco Redi
  • Anton Van Leeuwenhoek used microscope and made two important discoveries. First one was to see bacteria and second one was the discovery of sperm cells
  • William Harvey believed that all animals come from the egg
  • Mammals lay very few eggs. Human female produces only a few a month
  • Oscar Hertwig’s observation – Fertilization as the union of sperm and egg
  • Plants – male parts are called anthers (contains pollen) and female part is called the stigma
  • No general laws of inheritance were discovered for a very long time
  • Gregor Mendel – Austrian Monk was to discover the laws of inheritance
  • Mendel’s results
    • Hereditary traits are governed by genes which retain their identity in hybrids
    • One form of gene is dominant over another form of gene. But recessive genes will pop up later
    • Each adult organism has two copies of each gene – one from each parent. When pollen or sperm and eggs are produced, they each get one copy
    • Different alleles are sorted out to sperm mand eff randomly and independently. All combinations of alleles are equally likely
  • All living beings are made of cells – This fact wasn’t appreciated until late 19th century
  • Mitosis and Meiosis – Types of cell replication
  • Mitosis – Extremely accurate process of creating two cells. Number of chromosomes will be same in both the cell
  • Sperm cell and egg cell contain only half a set of chromosomes.
  • In a typical cell, there are 46 chromosomes – 23 pairs
  • Chromosome contains the genetic material
  • Nucleotides – the building blocks for nucleic acids. An individual nucleotide has three components, sugar, phosphate and a base
  • RNA – Nucleotides with ribose
  • DNA – Nucleotides with deoxyribose
  • Proteins – These are chain of amino acids
  • Hemoglobin – One of the most complicated macromolecules. Max Perutz spent 25 years in understanding this protein.
  • Enzymes – These are proteins that take apart or put together other molecules
  • Connection between gene and enzyme – The metabolic role of the genes is to make enzymes, and each gene is responsible for one specific enzyme.
  • RNA – RNA’s are single stranded, much shorter in length (50 to 1000 nucleotides )
  • RNA polymerase – teasing apart a region of DNA and creating a copy. This is also called transcription
  • mRNA – messenger RNA
  • tRNA – transfer RNA
  • rRNA – ribosomal RNA
  • Codon – triplets of bases
  • Amino acid – Each 3 base codon stands for an amino acid
  • 64 codons represent 20 amino acids
  • Each DNA encoding protein has a same codon at the beginning – AUG.
  • The stop codon does not encode any amino acid and they signal rRNA to detach the protein formed
  • anticodon – Loop of tRNA has three unpaired bases
  • amino acid site – At the tail end of tRNA is a site or attaching single amino acid
  • DNA contains sequences encoding for every tRNA, mRNA, rRNA
  • Eucaryotes – Cell with nucleus
  • Procaryotes – Cell with no nucleus
  • Spliceosome – proteins and RNA grabs the mRNA and shears off the loop, discards it, splices the remaining pieces together. This complex is called spliceosome
  • Eucaryotic genes contain Junk DNA
  • Introns – In the middle of perfectly good genes, there may be several meaningless sequences, each hundreds of nucleotides long
  • Protein spools – To help organize all the storage, eucaryotes wrap their DNA around protein spools. Each spool consists of several proteins that are bound together
  • Principle of complementarity – Each base can pair with only one other complementary pair
  • Knowledge about DNA replication in a cell division, is still sketchy
  • Repetitive DNA – Eucaryotic cells harbor lots of so-called repetitive DNA
  • A virus contains only two parts, i.e. a bit of nucleic acid wrapped up in a protein coat. A virus can’t reproduce on its own because it lacks ribosomes and the rest of the living cell’s protein main equipment
  • Retro Virus – RNA virus encoding an enzyme that makes a DNA copy of its RNA and splicies it in to host chromosome
  • Why are some viral infections incurable ? the virus genes can’t be gotten rid of, in your own chromosomes
  • Hypothesis for Junk DNA – Its possible that some of the repetitive and junk DNA in our chromosomes may have come from this ancient virus
  • Repressive Tolerance – Shut the junk DNA down and ignore them
  • Mutation – A mutation in a gene is just a change in the DNA’s sequence of nucleotides. Even a mistake at just one position can have a profound effects
  • Defense against mutation – One amino acid can be encoded by several codons
  • Blood cells illustrate another common fact of life – One kind of a cell can turn in to another kind of cell
  • Alleles – Genes in a plant can be one of two distinct types or Alleles
  • Principle of Independent Assortment – The Alleles of one gene sort out independently of the alleles of another
  • Homologous – Two copies of each cell that resemble each other, having the same shape
  • Phenotype – How an organism looks like ?
  • Genotype – Based on what alleles it has
  • Homozygous – An organism is homozygous with respect to a given gene if its alleles are the same
  • Heterozygous– An organism is heterozygous with respect to a given gene if its alleles are different
  • Haploid – A cell with a single set of chromosomes
  • Diploid – A cell with two sets of chromosomes
  • Operon – Cluster of genes, encoding related enzymes and regulated together is called an operon
  • Promoter region – At the start of Operon, there is a site where RNA polymerase binds to the DNA to begin transcribing the message
  • Attenuation – Shortage of certain types of molecules turns on the gene
  • Jumping Genes – A method of gene regulation
  • Transposons – Movable section of genes
  • Crossover – During Meiosis, chromosomes can exchange genes
  • Gene splicing – Splice two pieces of DNA together
  • Recombinant DNA – The result of splicing two DNA’s together
  • Restriction Enzyme – Gene splicing depends on this enzyme. It creates two pieces of DNA with identical tails
  • Proteins can be produced via Recombinant DNA
  • Gene therapy – Fixing specific defects
  • Genetic engineering

There is a visual for each of the above concept/mechanism. If you are curious to know about the basic ideas of genetics, this book can be a useful starting point. If not anything, it will give visual cues to read and understand the general literature on genetics.


Ani Pema Chodron, the author of the book, gave a commencement address to the 2014 graduating class of Naropa, University of Boulder, Colorado. She did so, to keep her promise with her grand daughter, who was amongst the graduating class. The speech went viral on the net and this book is an offshoot of it. It contains the full text of the speech and a Q&A session. The title of the speech and hence the book, is inspired from a quote by Samuel Beckett. 

The author says that she had received one of the best pieces of advice, while learning how to teach.

I was being taught how to teach, as many of us were. And the instructions I received were to prepare well, know your subject, and then go in there with no note cards. Honestly, that is the best advice for life: no note cards. Just prepare well and know what you want to do. Give it your best, but you really don’t have a clue what’s going to happen. And note cards have limited usefulness.

The crux of the speech is author’s take on how to fail ? Most of the strategies that we adopt when faced with failure fall in to two categories. We either blame it on something external or become excessively critical about our self. How does one handle a failure ?

The author says,

We move away from the rawness, of holding the rawness of vulnerability in our heart, by blaming it on the other. Getting curious about outer circumstances and how they are impacting you, noticing what words come out and what your internal discussion is, this is the key.

So sometimes you can take rawness and vulnerability and turn it into creative poetry, writing, dance, music, song. Artists have done this from the beginning of time. Turn it into something that communicates to other people, and out of this raw and vulnerable space, communication really happens.

It’s in that space-when we aren’t masking ourselves or trying to make circumstances go away-that our best qualities begin to shine.


This book gives a a macro picture of  machine learning. In this post, I will briefly summarize the main points of the book. One can think of this post as a meta summary as the book itself is a summary of all the main areas of machine learning.



Machine learning is all around us, embedded in technologies and devices that we use in our daily lives. They are so integrated with our lives that we often do not even pause to appreciate its power. Well, whether we appreciate or not, there are companies harnessing the power of ML and profiting from it. So, the question arises, whether we need to care about ML at all ? When a technology becomes so pervasive, you need to understand a bit. You can’t control what you don’t understand. Hence at least from that perspective, having a general overview of the technologies involved, matters.

Machine learning is something new under the sun: a technology that builds itself. The artifact in ML is referred to as a learning algorithm. Humans have always been designing artifacts, whether they are hand built or mass produced. But learning algorithms are artifacts that design other artifacts. A learning algorithm is like a master craftsman: every one of its productions is different and exquisitely tailored to the customer’s needs. But instead of turning stone in to masonry or gold in to jewelry, learners turn data into algorithms. And the more data they have, the more intricate the algorithms can be.

At its core, Machine learning is about prediction: predicting what we want, the results of our actions, how to achieve our goals, how the world will change.

The author says that he has two goals in writing this book:

  • Provide a conceptual model of the field,i.e. rough knowledge so that you can use it effectively. There are many learning algorithms out there and many are being invented every year. The book provides an overview of learning algos by categorizing the people who use them. The author calls each category a tribe. Each tribe has its own master algorithm for prediction
    • Symbolists: They view learning as the inverse of deduction and take ideas from philosophy, psychology, and logic.
      • Master algorithm for this tribe is inverse deduction
    • Connectionists: They reverse engineer the brain and are inspired by neuroscience and physics.
      • Master algorithm for this tribe is backpropagation
    • Evolutionaries: They simulate evolution on the computer and draw on genetics and evolutionary biology.
      • Master algorithm for this tribe is genetic programming
    • Bayesians: They believe that learning is a form of probabilistic inference and have their roots in statistics.
      • Master algorithm for this tribe is Bayesian inference
    • Analogizers: They learn by extrapolating from similarity judgments and are influenced by psychology and mathematical optimization.
      • Master algorithm for this tribe is Support vector machines.

In practice each of the master algorithms that a particular tribe uses is good for some type of problems but not for others. What we really want is a single algorithm combining the key features of all of them, The Master Algorithm

  • Enable the reader to invent the master algorithm. A layman, approaching the forest from a distance, is in some ways better placed than the specialist, already deeply immersed in the study of particular trees. The author suggests the reader to pay attention to each tribe and get a general overview of what each tribe does and what tools that each tribe uses. By viewing each tribe as a piece of a puzzle, it would be easy to get a conceptual clarity of the entire field as a whole.

The Machine Learning Revolution

An algorithm is a sequence of instructions telling a computer what to do. The simplest algorithm is : flip a switch. The second simplest algorithm is : combine two bits. The idea connecting transistors and reasoning was understood by Shannon and his masters thesis lead to the most important scientific discipline – information theory. Computers are all about logic. Flipping a set of transistors is all what any algorithm does. However behind this benign activity, some of the most powerful algorithms go about doing their work by using some preexisting algos as building blocks. So, if computing power increases, do all the algos automatically become efficient and all powerful? Not necessarily. The serpent in the garden goes by the name "complexity monster". There are many heads to this complexity monster.

  • Space complexity : the number of bits of info that an algo needs to store in the computer’s memory
  • Time complexity : how long the algo takes to run
  • Human complexity : when algos become too complex, humans cannot control them and any errors in the algo execution causes panic.

Every algorithm has an input and an output – the data goes into the computer, the algo does the job and gives out the result. Machine learning turns this around : in goes the data and the desired result and out comes the algorithm that turns one in to the other. Learning algorithms – learners- are algorithms that make other algorithms. With machine learning, computers write their own programs. The author uses a nice low tech analogy to explain the power of machine learning :

Humans have known a way to get some of the things they need by letting nature make them. In farming, we plant the seeds, make sure they have enough water and nutrients, and reap the grown crops. The promise of machine learning is that the technology can mirror farming. Learning algorithms are the seeds, data is the soil, and the learned programs are the grown plants. The machine learning expert is like a farmer, sowing the seeds, irrigating and fertilizing the soil, and keeping an eye on the health of the crop but otherwise staying out of the way.

This analogy makes two things immediate. First the more data we have, the more we can learn. Second, ML is a sword that can slay the complexity monster. If one looks at any learning algorithm, it does broadly two things. It either learns knowledge or learns some skills, i.e. it learns knowledge in terms of statistical models or learns procedures that underlie a skill. The author talks about another analogy in the information eco-system. He equates databases, crawlers, indexers and so on as herbivores, patiently munging on endless fields of data. Statistical algos, online analytical processing are the predators and learning algorithms are the super predators. Predators turn data in to information and Super predators turn information in to knowledge.

Should one go and spend years in computer science discipline to become ML expert ? Not necessarily. CS makes you think deterministically and ML needs probabilistic thinking. The difference in thinking is a large part of why Microsoft has had a lot more trouble catching up with Google than it did with Netscape. A browser is just a standard piece of software, but a search engine requires a different mind-set.

If you look at all the fantastic stuff behind ecommerce applications, it is all about match making; producers of information are being connected to consumers of information. In this context ,learning algorithms are the match makers. The arsenal of learning algos will serve as a key differentiator between the companies. Data is indeed the new oil.

The chapter ends with a discussion of how political campaigning is being revolutionized by ML. With truck loads of data at each voter level, politicians are turning to ML experts to help them with campaign analysis and directed targeting. The author predicts that in the future, machine learning will cause more elections to be close. What he means by that is that learning algorithms will be ultimate retail politicians.


The Master Algorithm

If you look at any of the algorithms such as nearest neighbor, decision tree, naive Bayes etc., they are all domain agnostic. This warrants a question, "Can there be one learner that does everything, across all domains?". If you have used let’s say a frequentist way of estimating a parameter and subsequently check it with a Bayesian inference, then both parameters give almost the same distribution if there is a LOT of data. Under a deluge of data, you will most certainly get a convergence of parameters in a model, irrespective of the model used. However the reality is that there is a scarcity of data, that necessitates making assumptions. Depending on the type of problem and depending on the type of assumption, certain class of learning models are better than the rest. If one speculates the presence of a master algorithm, then the assumptions also need to go in as input. This chapter explores the presence of a master algorithm that can be learnt from input data, output data and the assumptions. The author grandly states the central hypothesis of the book as

All knowledge-past, present, and future-can be derived from data by a single, universal learning algorithm.

The author speculates the presence of a master algorithm and says that there are arguments from many fields that speculate the presence of one.

The argument from Neuroscience : The evidence from the brain suggests that it uses the same learning algorithm, throughout, with the areas dedicated to the different senses distinguished only by the different inputs they are connected to. In turn, the associative areas acquire their function by being connected to multiple sensory regions, and the "executive" areas acquire theirs by connecting the associative areas and motor output.

If you look at any cortex in a human brain, you find that the wiring pattern is similar. The cortex is organized in to six different layers, feedback loops, short range inhibitory connections and long-term excitatory connections. This pattern is very much common across the brain. There is some variation in the patterns but the structure pretty much is the same. The analogy is that the it is the same algo with different parameters and settings. Low-level motor skills are controlled by cerebellum that has a clearly different and regular architecture. However experiments have shown that cerebellum can be perfectly replaced by cortex. So, this again goes to suggest that there is some "master algorithm" controlling the entire brain.

If you look at a set of learning algos in the ML field, you can infer that at some level they are trying to reverse engineer the brain’s function. One of the five ML tribes, Connectionists, believe in this way of modeling the world.

The argument from Evolution : If one looks at the evolution of species since the beginning of earth, one can think of natural selection or whatever the nature does as an algorithm. This algorithm has all the species as inputs and the species at any given point in time as the output. The master algo does the work of eliminating certain species, allowing certain species to mutate, etc. It is a dynamic process where the outputs are again fed as inputs. This line of thought makes one speculate the presence of a master algorithm. In fact one of the five tribes in the ML world, Evolutionaries, strongly believe in this way of modeling.

The argument from Physics : Most of the physics is driven by simple equations that prune away all the noise in the data and focus on the underlying beauty. Physics laws discovered in one domain are seamlessly applied to other domains. If everything we see in the nature could be explained by few simple laws, then it makes sense that a single algorithm can induce all the can be induced. All the Master Algorithm has to do is provide a shortcut to the laws’ consequences, replacing impossibly long mathematical derivations with much shorter ones based on actual observations. Another way to look at scientific disciplines is to think of various laws, states as outcomes of an dynamic optimization problem. However physics is unique in its simplicity. It’s only reasonably effective.

Machine learning is what you get when the unreasonable effectiveness of mathematics meets the unreasonable effectiveness of data.

The argument from Statistics : Bayesians look at the world from a probabilistic and learning mindset. Bayes rule is a recipe to turn data in to knowledge. In the yesteryears, Bayes applications were confined to simple applications. However with the rise of computing power, Bayes applications are now being used to a wide range of complex modeling situations. Is Bayes the master algorithm ? Well, there seems to many critics to the Bayesian approach of modeling. All said and done, it definitely appears that Bayesian inference will be a part of "Master Algorithm" in some way.

The argument from computer science : The author mentions the famous unsolved problem in computer science, P vs. NP. NP-complete are a set of problems that are equivalent in their computational hardness. If you solve one of the problems, you solve the rest. For decades, many mathematicians, researchers and scientists have been finding clever tricks to almost solve NP-complete problems. But the fundamental problem still eludes us -  Is the class of problems for which we can efficiently compute same as the class of problems for which we can efficiently check whether a solution exists ? If you read up on this problem, you will realize that all the NP-complete problems can be reduced to a satisfiability problem. If we invent a learner that can learn to solve satisfiability, it has a good claim to being the Master Algorithm.

NP-completeness aside, the sheer fact that a computer can do a gazillion tasks should make one confident about speculating the presence of a master algorithm that does the job across several problems. The author uses the example of Turing machine and says that back then, it was unthinkable to actually see a Turing machine in action. Turing machine can solve every conceivable problem that can be solved by logical deduction. The fact that we see these machines everywhere means that, despite the odds, we might see a Master Algorithm sometime in the future.

The Master Algorithm is for induction, the process of learning, what the Turing machine is for deduction. It can learn to simulate any other algorithm by reading examples of its input-output behavior. Just as there are many models of computation equivalent to a Turing machine, there are probably many different equivalent formulations of a universal learner. The point, however, is to find the first such formulation, just as Turing found the first formulation of the general-purpose computer.

Interesting analogy : The author is of the opinion that "human intuition" can’t replace data. There have been many instances where human intuition has gone terribly wrong and a guy with lots of data has done better. The author uses Brahe, Kepler and Newton’s work to draw a parallel to the machine learning.

Science goes through three phases, which we can call the Brahe, Kepler, and Newton phases. In the Brahe phase, we gather lots of data, like Tycho Brahe patiently recording the positions of the planets night after night, year after year. In the Kepler phase, we fit empirical laws to the data, like Kepler did to the planets’ motions. In the Newton phase, we discover the deeper truths. Most science consists of Brahe-and-Kepler-like work; Newton moments are rare. Today, big data does the work of billions of Brahes, and machine learning the work of millions of Keplers. If-let’s hope so-there are more Newton moments to be had, they are as likely to come from tomorrow’s learning algorithms as from tomorrow’s even more overwhelmed scientists, or at least from a combination of the two.

Critics of Master Algo : Well, for a concept as ambitious as Master Algo, there are bound to be critics and there are many. The author mentions a few of them as examples,

  • Knowledge engineers
  • Marvin Minsky
  • Naom Chomsky
  • Jerry Fodor

Hedgehog or Fox : One of the other questions that comes up when we think of "Master Algorithm" is whether it is a fox or a hedgehog. There are many studies that have shown that being a fox is far better than being a hedgehog. The hedgehog is synonymous with that of an "expert". In the context of this book though, a learning algorithm can be considered as "hedgehog" if variations of it can solve all the learning problems. The author hopes that the "Master Algorithm" turns out to a hedgehog.

Five tribes of ML : In the quest for master algorithm, we do not have to start from scratch. There are already many decades of ML research underway and each type of research community is akin to a tribe. The author describes five tribes, i.e. symbolists, connectivists, Bayesians, evolutionaries and analogizers. Each of the tribes uses its own master algo. Here is an illustration from the author’s presentation



But the real master algo that the author is hinting at is an algo that combines all the features of the tribes.

The most important feature of each of the tribe is that they firmly believe that theirs is the only way to model and predict. Unfortunately, this thinking hinders their ability to model a broad set of problems. For example, a Bayesian would find it extremely difficult to leave the probabilistic inference method and look at the problem from a evolutionary point of view. His thinking is forged based on priors, posteriors and likelihood functions. If a Bayesian were to look at an evolutionary algorithm like a genetic algo, he might not critically analyze it and adapt it to the problem at hand. This limitation is prevalent across all tribes. Analogizers love support vector machines but there are limited because they look for similarities of inputs across various dimensions; i.e. they are bound to be hit by curse of dimensionality. The same serpent,"the curse of dimensionality" that the author talks about in the previous chapters comes and bites each tribe, depending on the type of problem being solved.

The obvious question that arises in a reader’s mind is, can there be combination of tribes that come together to solve a specific set of problems ? Indeed the tribe categorization is not a hard categorization of the algorithms. It is just meant as a starting point so that you can place the gamut of algos in separate buckets.


Hume’s problem of Induction

The chapter starts with a discussion of "Rationalism vs. Empiricism". The rationalist likes to plan everything in advance before making the first move. The empiricist prefers to try things and see how they turn out. There are philosophers who strongly believe in one and not in the other. From a practical standpoint, there have been productive contributions to our world from both the camps. David Hume is considered to be one of the greatest empiricist of all time. In the context of Machine Learning, one of his questions has hung like a sword of Damocles over all the knowledge, which is,

How can we ever be justified in generalizing from what we’ve seen to what we haven’t?

The author uses a simple example where you have to decide to ask someone out for a date or not. The dataset used in the example illustrates Hume’s problem of induction, i.e. there is no reason to pick one generalization over another. So, a safe way out of the problem, at least to begin with, is to assume that future will be like the past. Is this enough ? Not really. In the ML context, the real problem is : How to generalize to cases that we haven’t seen before. One might think that by amassing huge datasets, you can solve this problem. However once you do the math, you realize that you will run out of data that covers all the cases needed to carry the inductive argument safely. Each new data point is most likely unique and you have no choice but to generalize. According to Hume, there is no way to do it

If this all sounds a bit abstract, suppose you’re a major e-mail provider, and you need to label each incoming e-mail as spam or not spam. You may have a database of a trillion past e-mails, each already labeled as spam or not, but that won’t save you, since the chances that every new e-mail will be an exact copy of a previous one are just about zero. You have no choice but to try to figure out at a more general level what distinguishes spam from non-spam. And, according to Hume, there’s no way to do that.

The "no free lunch" theorem : If you have been reading some general articles in the media on ML and big data, it is likely that you would have come across a view on the following lines:

With enough data, ML can churn out the best learning algo. You don’t have to have strong priors, the fact that you have large data is going to give you all the power to understand and model the world.

The author introduces David Wolpert’s "no free lunch" theorem that a limit on how good a learning algorithm can be. The theorem says that no learner can be better than random guessing. Are you surprised by this theorem ? Here is how one can reconcile to it,

Pick your favorite learner. For every world where it does better than random guessing, I, the devil’s advocate, will deviously construct one where it does worse by the same amount. All I have to do is flip the labels of all unseen instances. Since the labels of the observed ones agree, there’s no way your learner can distinguish between the world and the antiworld. On average over the two, it’s as good as random guessing. And therefore, on average over all possible worlds, pairing each world with its antiworld, your learner is equivalent to flipping coins.

How to escape the above the random guessing limit? Just care about the world we live in and don’t care about alternate worlds. If we know something about the world and incorporate it into our learner, it now has an advantage over random guessing. What are the implications of "free lunch theorem" in our modeling world ?

There’s no such thing as learning without knowledge. Data alone is not enough. Starting from scratch will only get you to scratch. Machine learning is a kind of knowledge pump: we can use it to extract a lot of knowledge from data, but first we have to prime the pump.

Unwritten rule of Machine learning : The author states that the principle laid out by Newton in his work, "Principia", that serves as the first unwritten rule of ML

Whatever is true of everything we’ve seen is true of everything in the universe.

Newton’s principle is only the first step, however. We still need to figure out what is true of everything we’ve seen-how to extract the regularities from the raw data. The standard solution is to assume we know the form of the truth, and the learner’s job is to flesh it out. One of the ways to think about creating a form is via "conjunctive concepts", i.e a series of statements with AND as the bridge. The problem with "conjunctive concepts" is that they are practically useless. Real world is driven by "disjunctive concepts", i.e a concept defined by a set of rules. One of the pioneers in this approach of discovering rules was Ryszard Michalski, a Polish computer scientist. After immigrating to the United States in 1970, he went on to found the symbolist school of machine learning, along with Tom Mitchell and Jaime Carbonell.

Overfitting and Underfitting : The author uses the words "blindness" and "hallucination" to describe underfitting and overfitting models. By using ton of hypothesis, you can almost certainly overfit the data. On the other hand, being sparse in your hypothesis set, you can fail to see the true patterns in the data. This classic problem is obviated by doing out-of-sample testing. Is it good enough ? Well, that’s the best that is available without going in to the muddy philosophical debates or alternative pessimistic approaches like that of Leslie Valiant(author of Probably Approximately Correct).

Induction as inverse of deduction : Symbolists work via the induction route and formulate an elaborate set of rules. Since this route is computationally intensive for large dataset, the symbolists prefer something like decision trees. Decision trees can be viewed as an answer to the question of what to do if rules of more than one concept match an instance. How do we then decide which concept the instance belongs to?

Decision trees are used in many different fields. In machine learning, they grew out of work in psychology. Earl Hunt and colleagues used them in the 1960s to model how humans acquire new concepts, and one of Hunt’s graduate students, J. Ross Quinlan, later tried using them for
chess. His original goal was to predict the outcome of king-rook versus king-knight endgames from the board positions. From those humble beginnings, decision trees have grown to be, according to surveys, the most widely used machine-learning algorithm. It’s not hard to see why: they’re easy to understand, fast to learn, and usually quite accurate without too much tweaking. Quinlan is the most prominent researcher in the symbolist school. An unflappable, down-to-earth Australian, he made decision trees the gold standard in classification by dint of relentlessly improving them year after year, and writing beautifully clear papers about them. Whatever you want to predict, there’s a good chance someone has used a decision tree for it.

The Symbolists : The symbolists’ core belief is that all intelligence can be reduced to manipulating symbols. A mathematician solves equations by moving symbols around and replacing symbols by other symbols according to predefined rules. The same is true of a logician carrying out deductions. According to this hypothesis, intelligence is independent of the substrate.

Symbolist machine learning is an offshoot of the knowledge engineering school of AI. The use of computers to automatically learn the rules made the work of pioneers like Ryszard Michalski, Tom Mitchell, and Ross Quinlan extremely popular and since then the field has exploded

What are the shortcomings of inverse deduction?

  • The number of possible inductions is vast, and unless we stay close to our initial knowledge, it’s easy to get lost in space
  • Inverse deduction is easily confused by noise
  • Real concepts can seldom be concisely defined by a set of rules. They’re not black and white: there’s a large gray area between, say, spam and nonspam. They require weighing and accumulating weak evidence until a clear picture emerges. Diagnosing an illness involves giving more weight to some symptoms than others, and being OK with incomplete evidence. No one has ever succeeded in learning a set of rules that will recognize a cat by looking at the pixels in an image, and probably no one ever will.

An interesting example of a success from Symbolists is Eve, the computer that discovered malaria drug. There was a flurry of excitement a year ago, when an article, titled, Robot Scientist Discovers Potential Malaria Drug was published in Scientific American. This is the kind of learning that Symbolists are gung-ho about.


How does your brain learn ?

This chapter covers the second tribe of the five tribes mentioned in the book. This tribe is called "Connectionists". Connectionists are highly critical about the way Symbolists work as they think that describing something via a set of rules is just the tip of iceberg. There is lot more going under the surface that formal reasoning can’t see. Let’s say you come across the word "love", Symbolists would associate a rule with such a concept whereas Connectionists would associate various parts of the brain to such a concept. In a sense, there is no one to one correspondence between a concept and a symbol. Instead the correspondence is many to many. Each concept is represented by many neurons, and each neuron participates in representing many different concepts. Hebb’s rule is the corner stone of connectionists. In a non-math way, it says that "Neurons that fire together stay together". The other big difference between Symbolists and Connectionists is that the former tribe believes in sequential processing whereas the latter tribe believes in parallel processing.

To get some basic understanding of the key algos used by connectionists, it is better to have a bit of understanding of the way neuron is structured in our brain. Here is a visual that I picked up from the author’s presentation :


The branches of the neuron connect to others via synapses and basic learning takes place via synaptic connections. The first formal model of a neuron was proposed by Warren McCulloch and Walter Pitts in 1943. It looked a lot like the logic gates computers are made of. The problem with this model was that the model did not learn. It was Frank Rosenblatt who came up with the first model of learning by giving variable weights to the connections between neurons. The following is a good schematic diagram of the perceptron:


This model generated a lot of excitement and ML received a lot of funding for various research projects. However this excitement was short lived. Marvin Minsky and few others published many examples where perceptron failed to learn. One of the most simple and dangerous example that perceptron could not learn was XOR operator. Perceptron was mathematically unimpeachable, searing in its clarity, and disastrous in its effects. Machine learning at the time was associated mainly with neural networks, and most researchers (not to mention funders) concluded that the only way to build an intelligent system was to explicitly program it. For the next fifteen years, knowledge engineering would hold center stage, and machine learning seemed to have been consigned to the ash heap of history.

Fast forward to John Hopfield work on spin glasses, there was a reincarnation of perceptron

Hopfield noticed an interesting similarity between spin glasses and neural networks: an electron’s spin responds to the behavior of its neighbors much like a neuron does. In the electron’s case, it flips up if the weighted sum of the neighbors exceeds a threshold and flips (or
stays) down otherwise. Inspired by this, he defined a type of neural network that evolves over time in the same way that a spin glass does and postulated that the network’s minimum energy states are its memories. Each such state has a "basin of attraction" of initial states that converge to it, and in this way the network can do pattern recognition: for example, if one of the memories is the pattern of black-and-white pixels formed by the digit nine and the network sees a distorted nine, it will converge to the "ideal" one and thereby recognize it. Suddenly, a vast body of physical theory was applicable to machine learning, and a flood of statistical physicists poured into the field, helping it break out of the local minimum it had been stuck in.

The author goes on to describe "Sigmoid" function and its ubiquitous nature. If you think about the curve for sometime, you will find it everywhere. I think the first time I came across this function was in Charles Handy’s book, "The Age of Paradox". Sigmoid functions in that book are used to describe various types of phenomenon that show an exponential slow rate of increase in the beginning, then a sudden explosive rate of increase and subsequently with an exponential rate of decrease. Basically if you take the first derivative of the Sigmoid function, you get the classic bell curve. I think the book,"The Age of Paradox" had a chapter with some heavy management gyan that went something like – "you need to create another Sigmoid curve in your life before the older Sigmoid curve starts a downfall" or something to that effect. I don’t quite recollect the exact idea from Charles Handy’s book, but there is a blog post by Bret Simmons, titled The Road to Davy’s Bar that goes in to related details.

Well, in the context of ML, the application of Sigmoid curve is more practical. It can be used to replace the step function and suddenly things become more tractable. A single neuron can learn a straight line but a set of neurons, i.e multi-layer perceptron can learn more convoluted curves. Agreed there is a curse of dimensionality here, but if you think about it, the hyperspace explosion is a double edged sword. On the one hand, there objective function is far more wiggly but on the other hand, there is a less scope that you will stuck at a local minimum via gradient search methods. With this Sigmoid input and multi layer tweak, Perceptron came back with vengeance. There was a ton of excitement just like the time when perceptron was introduced. The algorithm by which the learning takes place is called "back propagation", a term that is analogous to how human brains work. This algo was invented by David Rumelhart in 1986. It is a variant of gradient descent method. There is no mathematical proof that Back propagation will find the global minimum/maximum, though. The backprop solves what the author calls "credit assignment" problem. In a multi-layered perceptron the error between the target value and the current value needs to be propagated across all layers backward. The basic idea of error propagation, i.e error assignment for each of the layers is done via backprop.

Whenever the learner’s "retina" sees a new image, that signal propagates forward through the network until it produces an output. Comparing this output with the desired one yields an error signal, which then propagates back through the layers until it reaches the retina. Based on this returning signal and on the inputs it had received during the forward pass, each neuron adjusts its weights. As the network sees more and more images of your grandmother and other people, the weights gradually converge to values that let it discriminate between the two.

Sadly the excitement phase petered down as learning with dozens of hundreds of hidden layers was computationally difficult. In the recent years though, backpropagation has made a comeback thanks to huge computing power and big data. It now goes by the technique "Deep Learning". The key idea of deep learning is based on auto encoders that is explained very well by the author. However there are many things that need to be worked out for deep learning to be anywhere close to the Master algorithm. All said and done there are a few limitations to exclusively following a connectionist tribe. Firstly, the learning algo is difficult to comprehend. It comprises convoluted connections between various neurons. The other limitation is that the approach is not compositional, meaning it is divorced from the way a big part of human cognition works.


Evolution : Nature’s Learning Algorithm

The chapter starts with the story of John Holland, the first person to have earned a PhD in computer science in 1959. Holland is known for his immense contribution to Genetic algorithms. His key insight lay in coming up with a fitness function that would assign a score to every program considered. What’s the role of fitness function ? Starting with a population of not-very-fit individuals-possibly completely random ones-the genetic algorithm has to come up with variations that can then be selected according to fitness. How does nature do that? This is where the genetic part of the algorithm comes in. In the same way that DNA encodes an organism as a sequence of base pairs, we can encode a program as a string of bits. Variations are produced by crossovers and mutations. The next breakthrough in the field of genetic programming came from Holland’s student John Koza who came up with the idea of evolving full blown computer programs.

Genetic programming’s first success, in 1995, was in designing electronic circuits. Starting with a pile of electronic components such as transistors, resistors, and capacitors, Koza’s system reinvented a previously patented design for a low-pass filter, a circuit that can be used for things like enhancing the bass on a dance-music track. Since then he’s made a sport of reinventing patented devices, turning them out by the dozen. The next milestone came in 2005, when the US Patent and Trademark Office awarded a patent to a genetically designed factory optimization system. If the Turing test had been to fool a patent examiner instead of a conversationalist, then January 25, 2005, would have been a date for the history books. Koza’s confidence stands out even in a field not known for its shrinking violets. He sees genetic programming as an invention machine, a silicon Edison for the twenty-first century.

A great mystery in genetic programming that is yet to be solved conclusively is the role of crossover. None of Holland’s theoretical results show that crossover actually helps; mutation suffices to exponentially increase the frequency of the fittest schemas in the population over time. There were other problems with genetic programming that finally made ML community at large divorce itself from this tribe

Evolutionaries and connectionists have something important in common: they both design learning algorithms inspired by nature. But then they part ways. Evolutionaries focus on learning structure; to them, fine-tuning an evolved structure by optimizing parameters is of secondary importance. In contrast, connectionists prefer to take a simple, hand-coded structure with lots of connections and let weight learning do all the work. This is machine learning’s version of the nature versus nurture controversy. As in the nature versus nurture debate, neither side has the whole answer; the key is figuring out how to combine the two. The Master Algorithm is neither genetic programming nor backprop, but it has to include the key elements of both: structure learning and weight learning. So, is this it ? Have we stumbled on to the right path for "Master Algorithm" ? Not quite. There are tons of problems with evolutionary algos. Symbolists and Bayesians do not believe in emulating nature. Rather, they want to figure out from first principles what learners should do. If we want to learn to diagnose cancer, for example, it’s not enough to say "this is how nature learns; let’s do the same." There’s too much at stake. Errors cost lives. Symbolists dominated the first few decades of cognitive psychology. In the 1980s and 1990s, connectionists held sway, but now Bayesians are on the rise.


In the Church of the Reverend Bayes

Perci Diaconis in his paper titled, MCMC Revolution, says that MCMC technique that came from Bayesian tribe has revolutionized applied mathematics. Indeed, thanks to high performance computing ability, Bayes is now a standard tool in any number cruncher’s tool kit. This chapter talks about various types of Bayesian techniques. The basic idea behind Bayes is that it is a systematic and quantified way of updating degrees of belief, in the light of new data. You can pretty much cast any problem, irrespective of the size of the data available, in to a Bayesian inference problem. Bayes theorem usually goes by the name "inverse probability" because in real life we know Pr(effect|cause) and we are looking to compute Pr(cause|effect). Bayes’ theorem as a foundation for statistics and machine learning is bedeviled not just by computational difficulty but also by extreme controversy. The main point of conflict between Bayesians and Non-Bayesians is the reliance of subjective priors to "turn on the Bayesian crank". Using subjective estimates as probabilities is considered sin by Frequentists, for whom, everything should be learned from the data.

One of the most common variant of Bayesian models is the "Naive Bayes" model where each cause is independent of other causes in creating an effect. Even though this assumption sounds extremely crazy, there are a ton of areas where Naive Bayes beats sophisticated models. No one is sure who invented the Naïve Bayes algorithm. It was mentioned without attribution in a 1973 pattern recognition textbook, but it only took off in the 1990s, when researchers noticed that, surprisingly, it was often more accurate than much more sophisticated learners. Also if you reflect a bit, you will realize that Naive Bayes is closely related to Perceptron algorithm.

The author mentions Markov Models as the next step in the evolution of Bayes models. Markov models are applicable to a family of random variables where each variable is conditionally independent of its history except the current state. Markov chains turn up everywhere and are one of the most intensively studied topics in mathematics, but they’re still a very limited kind of probabilistic model. A more complicated model is Hidden Markov Model where we don’t get to see the actual states but we have to infer them from the observations. A continuous version of HMM goes under the name "Kalman Filter" that has been used in many applications across domains.

Naive Bayes, Markov Models, Hidden Markov Models are all good but they are all a far cry from Symbolists. The next breakthrough came from Judea Pearl who invented Bayesian Networks. This allowed one to specify complex dependencies among random variables. By defining the conditional independence of a variable given a set of neighboring nodes, Bayesian networks tame the combinatorial explosion and make inferences tractable. Basically Bayesian Network can be thought of as a "generative model", a recipe for probabilistically generating a state of the world. Despite the complex nature of a Bayesian net, the author mentions that there have been techniques developed to successfully infer various aspects of the network. In this context, the author mentions MCMC and gives an intuitive explanation of the technique. A misconception amongst many is that MCMC is a simulation technique. Far from it, the procedure does not simulate any real process; rather it is an efficient way to generate samples from a Bayesian network. Inference in Bayesian networks is not limited to computing probabilities. It also includes finding the most probable explanation for the evidence. The author uses the "poster child" example of inferring the probability of heads from coin tosses to illustrate the Bayesian technique and compare it with the Frequentist world of inference.

The next set of models that came to dominate the Bayesian tribe is Markov Networks. A Markov network is a set of features and corresponding weights, which together define a probability distribution. Like Bayesian networks, Markov networks can be represented by graphs, but they have undirected arcs instead of arrows. Markov networks are a staple in many areas, such as computer vision. There are many who feel that Markov networks are far better than Naive Bayes, HMMs etc., as they can capture the influence from surroundings.

Bayesians and symbolists agree that prior assumptions are inevitable, but they differ in the kinds of prior knowledge they allow. For Bayesians, knowledge goes in the prior distribution over the structure and parameters of the model. In principle, the parameter prior could be anything we please, but ironically, Bayesians tend to choose uninformative priors (like assigning the same probability to all hypotheses) because they’re easier to compute with. For structure, Bayesian networks provide an intuitive way to incorporate knowledge: draw an arrow from A to B if you think that A directly causes B. But symbolists are much more flexible: you can provide as prior knowledge to your learner anything you can encode in logic, and practically anything can be encoded in logic-provided it’s black and white.

Clearly, we need both logic and probability. Curing cancer is a good example. A Bayesian network can model a single aspect of how cells function, like gene regulation or protein folding, but only logic can put all the pieces together into a coherent picture. On the other hand, logic can’t deal with incomplete or noisy information, which is pervasive in experimental biology, but Bayesian networks can handle it with aplomb.Combining connectionism and evolutionism was fairly easy: just evolve the network structure and learn the parameters by backpropagation. But unifying logic and probability is a much harder problem.


You are what you resemble

The author introduces techniques of the "Analogizers" tribe. This tribe uses similarities among various data points to categorize them in to distinct classes. In some sense, we all learn by analogy. Every example that illustrates an abstract concept is like an analogy. We learn by relating the similarity between two concepts and then figure what else one can infer based on the fact that two concepts are similar.

The chapter begins with explaining the most popular algorithm of the tribe, "the nearest neighbor algorithm". This was invented way back in 1951 by Evelyn Fix and Joe Hodges. The inventors faced a massive difficulty in publishing their algorithm. However the fact that the algo remained unpublished did not faze many researchers who went about developing variants of the algorithm like "K nearest neighbor" method, "Weighted K-nearest neighbor" etc. It was in 1967 that Tom Cover and Peter Hart proved that, given enough data, nearest-neighbor is at worst only twice as error-prone as the best imaginable classifier. This was a momentous revelation. Up until then, all known classifiers assumed that the frontier had a very specific form, typically a straight line. This was a double-edged sword: on the one hand, it made proofs of correctness possible, as in the case of the perceptron, but it also meant that the classifier was strictly limited in what it could learn. Nearest-neighbor was the first algorithm in history that could take advantage of unlimited amounts of data to learn arbitrarily complex concepts. No human being could hope to trace the frontiers it forms in hyperspace from millions of examples, but because of Cover and Hart’s proof, we know that they’re probably not far off the mark.

Is nearest neighbor algo, the master algorithm ? It isn’t because of curse of dimensionality. As the dimension of covariates goes up, the NN algo efficiency goes down. In fact the curse of dimensionality is the second most important stumbling block in the Machine learning, over-fitting being the first one. There are certain techniques to handle the dimension explosion but most of them are hacks and there is no guarantee that they are going to work.

Subsequently, the author introduces Support Vector Machines(SVM) that has become the most popular technique used by Analogizers. I loved the way author describes this technique using plain simple English. He asks the reader to visualize a fat serpent that moves between two countries that are at war. The story of finding the serpent incorporates pretty much all the math that is needed to compute support vectors, i.e.

  • kernel for SVM
  • support vectors
  • weight of the support vectors
  • constrained optimization
  • maximizing the margin of the classifier

My guess is, one would understand the math far easier, after reading through this section on SVMs. SVMs have many advantages and the author highlights most of them. Books such as these also help us in verbalizing math stuff in simple words. For example, if you were to explain the difference between constrained optimization and unconstrained optimization to a taxi driver, how would you do it? Read this book to check whether your explanation is better than what the author provides.

Towards the end of the chapter, the author talks about case-based reasoning and says that in the years to come, analogical reasoning will become so powerful that it will sweep through all the fields where case-based reasoning is still employed.


Learning without a teacher

Unlike the previous chapters that focused on labeled data, this chapter is learning via unsupervised learning. Cognitive scientists describe theories of child learning using algos and machine learning researchers have developed techniques based on them. The author explains k-means algorithm, a popular clustering technique. It is actually a special case of Expectation Maximization(EM) algorithm that was invented by three Harvard statisticians. EM is used in a ton of places. To learn hidden Markov models, we alternate between inferring the hidden states and estimating the transition and observation probabilities based on them. Whenever we want to learn a statistical model but are missing some crucial information (e.g., the classes of the examples), we can use EM. Once you have a cluster at the macro level, nothing stops you from using the same algo for each cluster and come up with sub-clusters etc.

Subsequently, the author introduces another popular technique for unsupervised learning, PCA, that is used for dimensional reduction. PCA tries to come up linear combination of various dimensions in the hyperspace so that total variance of the data across all dimensions is maximized. A step up to this algo is called "Isomap", a nonlinear dimensionality reduction technique. It connects each data point in a high-dimensional space (a face, say) to all nearby points (very similar faces), computes the shortest distances between all pairs of points along the resulting network and finds the reduced coordinates that best approximate these distances.

After introducing clustering and dimensional reduction techniques, the author talks about "Reinforcement learning", a technique that relies on immediate response of the environment for various actions of the learner. Research on reinforcement learning started in earnest in the early 1980s, with the work of Rich Sutton and Andy Barto at the University of Massachusetts. They felt that learning depends crucially on interacting with the environment, but supervised algorithms didn’t capture this, and they found inspiration instead in the psychology of animal learning. Sutton went on to become the leading proponent of reinforcement learning. Another key step happened in 1989, when Chris Watkins at Cambridge, initially motivated by his experimental observations of children’s learning, arrived at the modern formulation of reinforcement learning as optimal control in an unknown environment. A recent example of a successful startup that combines neural networks and reinforcement learning is "DeepMind", a company that was acquired by Google for half a billion dollars.

Another algorithm that has a potential to be a part of "Master Algorithm" is chunking. Chunking remains a preeminent example of a learning algorithm inspired by psychology. The author gives a basic outline of this concept. Chunking and reinforcement learning are not as widely used in business as supervised learning, clustering, or dimensionality reduction, but a simpler type of learning by interacting with the environment is: A/B testing. The chapter ends with the author explaining another potentially killer algo, "relational learning".


The Pieces of the Puzzle Fall into Place

Progress in science comes from unifying theories; two or more seemingly disparate observations are driven by the same logic or law. If one looks at ML, it appears that Master Algorithm is akin to a unifying theory in science. It will unify all the master algorithms of each tribes, all the techniques of each tribes and give one cohesive way to learn from data.

In fact there is already a technique called "meta learning" that some of the tribes use with in their techniques. For example, bagging, random forests and boosting are some of the famous meta learning techniques used by Symbolists. Bayesians have something called "model averaging" that learns from each model by considering it as an hypothesis and then computes a score based on the vote given by each of the models. Meta learning in its current avatar is remarkably successful, but it’s not a very deep way to combine models. It’s also expensive, requiring as it does many runs of learning, and the combined models can be quite opaque.

The author used the following schematic diagram for each of the tribes, while explaining the rationale of a possible “Master Algorithm”


He then takes the reader through a tour of each of the tribes philosophy and their master algorithms and comes up a unifier, called "Alchemy", that he calls it as the "Master Algorithm". In the process of creating this master algorithm, he introduces Markov Logic Networks and says that they serve for representing the problem. Alchemy uses posterior probability as the evaluation function, and genetic search coupled with gradient descent as the optimizer. The author is wary about Alchemy’s immediate application and says that there is a ton of research that is yet to be done, so that it can become a true Master Algorithm, i.e., one that has the capability to solve hard problems.

This chapter is a little more involved as it tries to connect all the ideas from the previous eight chapters and introduces a way to combine the various pieces of puzzle to create a "Master Algorithm". The chapter will also be very interesting for an aspiring ML researcher  who is trying to pick his focus area.


This is the World on Machine Learning

The last chapter of the book discusses the world in which "Master Algorithm" is all pervasive. The author tries to speculate answers to the following questions :

  • Will humans be replaced by machines ?
  • What do you want the learners of the world to know about you ?
  • How good a model of you, a learner, can have ?
  • How will ecommerce shopping experience be ?
  • Will there be a rise of "humanities" disciple after the automation of most of non-human related tasks ?
  • What will the current online dating sites morph in to ?
  • How will the Amazons, Netflixes, Googles of the world change ?
  • What will be the privacy issues in a society where most of the transactions and activities involve, one algo talking to another algo?
  • Will future wars be fought by robots ?
  • Will robot-warfare be viable ?
  • Will AI and Master Algorithm take over the world ?

The author ends the book by saying,

Natural learning itself has gone through three phases: evolution, the brain, and culture. Each is  product of the previous one, and each learns faster. Machine learning is the logical next stage of this progression. Computer programs are the fastest replicators on Earth: copying them takes  only a fraction of a second. But creating them is slow, if it has to be done by humans. Machine learning removes that bottleneck, leaving a final one: the speed at which humans can absorb change.

takeawayTakeaway :

This book is a pop-science book for Machine Learning. ML has reached a point where it is not just for geeks anymore. Every one needs to know about it, Every one needs to at least have a conceptual model about the field as it has become all pervasive. Having said that, it would take time to plod though the book if you are a complete newbie to ML. This book is massively appealing to someone who a cursory knowledge of a few ML techniques and wants to have a 10,000 ft. view of the entire fields’ past, present and future. The future, as the title of book states is, would be the invention of a master algorithm that unifies methods across all the five tribes.



I stumbled on to this book a few weeks ago and immediately picked it up after a quick browse through the sections of the book. I had promptly placed it in my books-to-read list. I love anything related to information theory mainly because of its inter-disciplinary applications. The principles of information theory are applicable in a wide range of fields. In fact it will hard to pinpoint a specific area where concepts from information theory have not been applied. In this post, I will summarize the main points of the book.


Prologue : The Eternal War

The chapter is titled so, because there is a conflict between entropy and information. The entropy is the incessant march towards disorder. One of the ways that I can relate to is my music practice. If I don’t practice my music for long, I find it difficult to retrain my fingers and get back my muscle memory. "That which you don’t use atrophies". Entropy is also something similar. In the absence of any mechanisms to create information, the disorder of the system increases. This obviously raises a question about the mechanisms that allow the information to battle randomness and grow. The book is mainly about describing the mechanisms by which the information grows, the physical order of our world increases – that makes our planet unique, rich and uneven, from atoms to economies. The author focuses on planet earth as this is a special place where information lives, grows and hides in an otherwise mostly barren universe.

In the prologue, the author says that the book would answer the following questions:

  • What is Information ?
  • Where does it come from ?
  • Why is information concentrated on our planet?
  • Why does it grow on our planet ?
  • What are the natural, social and economic mechanisms that allow it to grow ?
  • How do the various mechanisms contribute to social and economic unevenness of the global economy ?
  • How does the social accumulation of information improve our capacity to accumulate even more information?

Introduction : From Atoms to People to Economies

The chapter starts with the story of Ludwig Boltzmann, the famous scientist who committed suicide. Though the exact reason is not known, the author speculates that it could be the apparent conflict between his theory and the order prevalent in the world. His theory was that there is always a march towards disorder, which stumped him because there were so many fascinating things in the nature that were orderly, systematic, almost giving an impression that there was a creator up there who was designing our world. The biggest sin that Ludwig committed, given the context of scientific temper at his time, was that he had worked across spatial scales. His theory made connections between atoms and gases, both belonging to different spatial scales. At that point in time, any connection between various spatial scales was considered as a sin.

At the turn of twentieth century, Ludwig was vindicated. There was immense cross-fertilization of ideas amongst many fields. Yet not all of the cross-fertilization took place near known scientific boundaries. Amid these multidisciplinary tangos, there was one concept that was promiscuous enough to play the field. This was the idea of information. In the twentieth century, the study of information was inspired by war as there was a urgent need to encode and decode messages effectively. The field took off after the revolutionary paper by Claude Shannon and Warren Weaver. Information as a concept found its followers in almost every field for the simple reason that it could be applied to microscopic as well as macroscopic worlds. It was the first truly scale independent concept. Even though the idea of information grew in prominence, many began to forget one crucial aspect of information

We forget about the physicality of information that had troubled Boltzmann. The word information became a synonym for the ethereal, the unphysical, the digital, the weightless, the immaterial. But information is physical. It is as physical as Boltzmann’s atoms or the energy they carry in their motion. Information is not tangible; it is not a solid or a fluid. It does not have its own particle either, but it is as physical as movement and temperature, which also do not have particles of their own. Information is incorporeal, but it is always physically embodied. Information is not a thing; rather, it is the arrangement of physical things. It is physical order, like what distinguishes different shuffles of a deck of cards.

One of the highlights of the work of Shannon and Weaver is that they divorced the idea of information and message. Colloquially we can use both the terms interchangeably. However the need to divorce the two was needed so that further developments in the field could happen. Whatever gets transmitted between two devices, two people, is information. It is humans who automatically interpret the information as a meaning, given the various contextual factors. This clear demarcation was given because technically , one could now focus on sending any kind of messages whether the message meant anything or not. Shannon also came up with a formula for encoding an arbitrary message with maximum efficiency. This formula looked identical to the Boltzmann’s formula.

The beauty of information being scale independent means that one can use principles of information theory to describe everything from atoms to economies. In all the previous attempts, natural sciences described the atom to human connection, the social sciences described the connection between humans and economies. Using the concept of information, one can analyze across all scales. The content of book is laid out in such a way that it describes the history of the universe, centered not on the arrow of time but on the arrow of complexity.

It is the accumulation of information and of our ability to process information that defines the arrow of growth encompassing the physical, the biological, the social, and the economic, and which extends from the origin of the universe to our modern economy. It is the growth of information that unifies the emergence of life with the growth of economies, and the emergence of complexity with the origins of wealth.

The Secret to Time Travel

This book made me look at child birth from a completely different perspective. The author compares child birth as an example of time travel; the baby is transferred from an environment(mother’s womb) that has essentially remained same since the last 1000 years in to 21st century world that is largely alien for the species. There are a ton of machines, gadgets, technologies, objects that are realizations of human knowledge and human knowhow. All the objects that we seen around embody information and imagination. The author uses two central actors, amongst many, to describe the way information grows, i.e.

  1. Physical objects: physical embodiment of information
  2. People: fundamental embodiment of knowledge and knowhow

The fundamental perspective of the author is,

Economy is the system by which people accumulate knowledge and knowhow to create packets of physical order, or products, that augment our capacity to accumulate more knowledge and knowhow and, in turn, accumulate more information.

How are humans different from other species on the planet ?

The fact that objects embody information and imagination may seem obvious. Information is a fundamental aspect of nature, one that is older than life itself. It is also an aspect of nature that accelerated with life. Consider the replication of information-rich molecules, such as DNA and RNA. The replication of DNA and RNA is not the replication of matter but the replication of the information that is embodied in matter. Living organisms are highly organized structures that process and produce information. Yet, our focus here will not be on the information-generating capacity that is embodied in the intimacy of our cells but that which emerged with humans and society. Humans are special animals when it comes to information, because unlike other species, we have developed an enormous ability to encode large volumes of information outside our bodies.

Humans are able to create physical instantiations of the objects we imagine, while other species are stuck with nature’s inventory.

The Body of the Meaningless

This chapter clarifies the differences amongst various terms used in information theory. Terms such as entropy and information are used interchangeably. Indeed they can be used in some situations but not always. Shannon’s definition of information relates to the number of bits required to encode a message with maximum efficiency. In a sense, a highly regular correlation rich structure has less information and a randomized set of instructions in a message has more information. He termed this as "entropy"(von Neumann told Shannon that calling his measure entropy would guarantee Shannon’s victory in every argument, since nobody really knew what entropy was). If I consider my laptop, it contains many documents, pictures, videos etc. In Shannon’s language, if I randomly switch the bits in my computer, the information increases. But this doesn’t go with our intuitive definition of information. Ideally the more regular, the more ordered the data is, there is more information in to it. So, there is a need to expand the definition of entropy as defined by Shannon so that one can use those concepts to talk about information that we can relate to.

The author gives a nice analogy of a half-filled stadium to show the difference between entropy as defined in statistical physics and entropy as defined by Shannon. In statistical physics, entropy is dependent on "multiplicity of states". A highly disordered system tends to have higher multiplicity of states and hence has higher entropy. However it is not necessary that a higher entropy system is necessarily more disordered. In other words, disorder can be equated to higher entropy but not always. In the physical sciences, information has always been referred to something that has order. So, in physical states, information is the opposite of entropy. The ordered states, commonly referred to as information rich states are highly correlated structures. These information rich structures are also uncommon and peculiar structures in the nature.

The author uses the example of Rubik’s cube to illustrate the rarity of ordered states in the nature. Rubik’s cube has 4.3 × 10^9 possible states and the perfect state can be obtained in less than 20 moves. However getting to this ordered state requires a specific movement of the cube that one is called a genius if he can reach to an ordered state in less than 30 moves. This example can be extrapolated to the nature. The growth of entropy is like a Rubik’s cube in the hands of a child. In nature information is rare not only because information-rich states are uncommon but also because they are inaccessible given the way in which nature explores the possible states. The author provides a few nice examples that show the connection between multiplicity of states and the ability to process information,i.e. compute

The main idea of this chapter is to look at the word "information" as defined by Shannon, and then reconcile the concept with the colloquial meaning of the word information and the work of Boltzmann.

The Eternal Anomaly

If the natural tendency of a system is to move towards disorder, move towards higher entropy, how does one explain the information explosion on our planet ? If we look around the planet, it is amazing to see so many beautiful creations of the nature. Why didn’t our planet disintegrate in to chaos ? Why does information grow on our planet ? To explain this phenomenon, the author introduces the theory put forth by Ilya Prigogine. The main idea of the theory is

Information emerges naturally in the steady states of physical systems that are out-of-equilibrium.

The author unpacks the above statement using many examples such as marble in a bowl, box filled with gas, whirlpool in a sink etc. Prigogine realized that although Boltzmann’s theory was correct, it did not apply to what we observe on Earth because our planet is an out-of-equilibrium pocket inside a larger system-the universe-that is moving toward equilibrium. In fact, our planet has never been close to any form of equilibrium. Prigogine did the math and showed that out-of-equilibrium systems give rise to information-rich steady states. So, that explains "Where information comes from ?". In an out-of-equilibrium system, such as Earth, the emergence of information is expected. It is no longer an anomaly. The bad news, however, is that entropy is always lurking on the borders of information-rich anomalies, waiting to devour these anomalies as soon as it gets the chance. Yet information has found ways to fight back. As a result, we live on a planet where information is "sticky" enough to be recombined and created. This stickiness, which is essential for the emergence of life and economies, also hinges on additional fundamental physical properties.

The author explains three mechanisms that make the information sticky. The first mechanism flows from Prigogine’s math that states that out-of-equilibrium systems self-organize into steady states in which order emerges spontaneously, minimizing the destruction of information. The second mechanism comes from Schrodinger’s theory that says Solids are essential to explain the information-rich nature of the life. The third mechanism by which information grows is matter’s ability to process information, or the ability of the matter to compute. The author explains wonderfully all the three aspects that make information "sticky"

The main idea of this chapter is to view our planet as out-of-equilibrium system. The other idea communicated by the author is that of "entropy barrier". I love this concept as it is philosophically aligned with what I believe, "Life is a Martingale".

Time is irreversible in a statistical system because the chaotic nature of systems of many particles implies that an infinite amount of information would be needed to reverse the evolution of the system. This also means that statistical systems cannot go backward because there are an infinite number of paths that are compatible with any present. As statistical systems move forward, they quickly forget how to go back. This infiniteness is what Prigogine calls the entropy barrier, and it is what provides a perspective of time that is not spatialized like the theories of time advanced by Newton and Einstein. For Prigogine, the past is not just unreachable; it simply does not exist. There is no past, although there was a past. In our universe, there is no past, and no future, but only a present that is being calculated at every instant. This instantaneous nature of reality is deep because it helps us connect statistical physics with computation. The instantaneous universe of Prigogine implies that the past is unreachable because it is incomputable at the micro level. Prigogine’s entropy barrier forbids the present to evolve into the past, except in idealized systems

Crystallized Imagination

The author starts off by giving his perspective on life

Life is all about : moving around and processing information, helping information grow while interacting in a social context.

If you reflect on the above statement a bit, I guess you will at least concur with some part of it, if not the entire statement. Our society’s ability to accumulate information requires flows of energy, the physical storage of information in solid objects, and of course our collective ability to compute. The flow of energy that keeps our planet’s information growing is clearly that coming from the sun. Plants capture that energy and transform it into sugar, and over long periods of time they degrade into the mineral fuel we know as oil. But as a species, we have also developed an amazing capacity to make information last. We have learned to accumulate information in objects, starting from the time we built our first stone axes to the invention of the latest computer.

The easiest way to get a grasp on the "accumulating information in an object" is via comparing "apple" that is product of a tree, and "Apple" product from Silicon valley. The former is a product available in the nature and we internalize in our minds while the latter is an instantiation of the knowledge in our head. Both products are packets of information, but only the latter is a crystal of imagination. The author cites two examples of MIT lab scientists who are working on robotic arms and optogenetics. They are trying to create objects that crystallize imagination, and by doing so, they are endowing our species with new capacities. The author gives several contexts where thinking about products in a different way changes several preexisting metrics and notions that we carry on in our head. For example, Chile is a potential exporter of copper and one might argue that other countries are exploiting Chile. However by looking at the value generated in the finished products that use copper, the value of copper itself goes up. So, who is exploiting whom? Isn’t Chile free-riding on the crystallized imagination of other people?

Thinking about products as crystals of imagination helps us understand the importance of the source of the information that is embodied in a product. Complex products are not just arrangements of atoms that perform functions; rather, they are ordered arrangements of atoms that originated as imagination.


The chapter is titled so, to emphasize the amplifying nature of the objects. Each object can be thought of as a crystallization of knowledge and knowhow and these objects become important to all of us because they enhance our capacities to do other things with it. Take laptop for instance. It is a product of someone else’s imagination and we get to use it to produce some other objects. There is no need to know what’s behind the hood for every object that we use. In the words of the author,

Products are magical because they augment our capacities

Objects are much more than merely a form of communication.

Our ability to crystallize imagination into products, although expressive, is different from our ability to verbally articulate ideas. An important difference is that products can augment our capacities in ways that narrative descriptions cannot. Talking about toothpaste does not help you clean your teeth, just as talking about the chemistry of gasoline will not fill up your car with gas. It is the toothpaste’s embodiment of the practical uses of knowledge, knowhow, and imagination, not a narrative description of them, that endows other people with those practical uses. Without this physical embodiment the practical uses of knowledge and knowhow cannot be transmitted. Crystallizing imagination is therefore essential for sharing the practical uses of the knowledge that we accumulate in our mind. Without our ability to crystallize imagination, the practical uses of knowledge would not exist, because that practicality does not reside solely in the idea but hinges on the tangibility of the implementation. Once again, the physicality of products-whether tangible or digital-augments us.

The main idea of this chapter is to describe products as physical embodiments of information, carrying the practical uses of knowledge, knowhow, and imagination. Our capacity to create products that augment us also helps define the overall complexity of our society.

This time, It’s personal

If we look at various products, the knowledge and knowhow for creating these products are geographically biased, though it is coming down a bit at least on the software front. The reason for this geographical bias is that crystallization of any product requires a great amount of knowledge and knowhow. The learning in almost all the cases is experimental and social. Bookish knowledge alone is not enough. You need a certain set of environment where you can interact, share ideas, experiment, learn from trial and errors. Each geographical region has its own idiosyncrasies and hence gives rise to different codifications of knowledge and knowhow. So, this means that there is certainly going to be geographical bias in the products we see. So, this naturally limits the growth of information. The author introduces a term, person-byte, meaning maximum knowledge and knowhow carrying capacity of a human. Is there a limit for human knowledge? Well, let’s talk about knowledge that one can accumulate over a period of ones working life. If I take my own example, there is a limit to how much math you can do, what kind of stats I can work on, what kind of models that I can build, the amount of code I can write. All these ultimately limit the growth of information. In that sense, a person-byte is a nifty idea that says that for information to grow, there needs to be a network of people where the collective person-bytes of the group is more than the individual person-byte.

The person-byte limit implies that the accumulation of large amounts of knowledge and knowhow are limited by both the individual constraints of social and experiential learning and the collective constraints brought by our need to chop up large volumes of knowledge and knowhow and distribute them in networks of individuals.

Links are not free

If one harks back to the time Henry Ford’s Model-T factory, it was considered as a poster child of industrial economy. It stood for the efficiency gained through scale. The output of the factory, the car, was a complex product and the rationale was, it was better to chunk out this complex task in to 7,882 tasks. It is another matter of debate whether there was a need for 7,882 individual tasks or not. One takeaway could be that complex products needs giant factories. Based on that takeaway, we should be having innumerable giant factories, given the complexity of products that we see in today’s world. This is where the author introduces a second level of quantization of knowledge and knowhow; firm-byte. This is a conceptual term that gives a upper limit on the amount of knowledge and knowhow a firm can possess. So, if a product requires more number of firm-bytes, there is a need for a network of firms. The factors that limit the size of the firm has been studied under "transaction cost theory" extensively. The author gives an overview of the theory that says

There are fundamental forces that limit the size of the networks we know as firms, and hence that there is a limit to the knowledge and knowhow these networks can accumulate. Moreover, it also tells us that there is a fundamental relationship between the cost of the links and the size of these networks: the cheaper the link, the larger the network.

It all comes down to links. If you take a typical Barbie doll, the various steps in the start to scratch process happen in twenty different countries. What has made possible this splintering up of the manufacturing process? It is not because the product is complicated. It is because the cost of creating a links between a set of firms has become easy. This could be attributed to reducing transportation costs, revolution in communication technologies, standardization of parts etc. In all the cases where market links have become cheaper, we have seen vast networks of firms participating together. There are innumerable examples that fall in to this category(iPad, iPhone,laptops,cell phones,…)

Does it mean that making the cost of market links cheaper will automatically give rise to increase in information via crystallization of many other products? Not necessarily. We observe links that are inherently expensive depending on the frequency and specificity of the transaction.

In Links We Trust

This chapter explores the role of "trust" in formation of networks. Social networks and social institutions help determine the size, adaptability, and composition of the networks humans need to accumulate knowledge and knowhow. When it comes to size, the ability of societies to grow large networks is connected to the level of trust of the underlying society. When it comes to the composition of networks, social institutions and preexisting social networks affect the composition of the professional networks we form in two important ways. On the one hand, a society’s level of trust determines whether networks are more likely to piggyback on family relations. On the other hand, people find work through personal contacts, and firms tend to hire individuals who trace the social networks of their employees.

The takeaway from this chapter is that social networks and institutions are also known to affect the adaptability of firms and networks of firms.

The Evolution of Economic Complexity

If one’s intention were to study the geographical distribution of knowledge and knowhow, one inevitably comes up with an issue- knowledge and knowhow are intangibles. How does one cull out of these things for various geographies ? The author’s first attempt is to look at the location of various industries that produce complex objects to simple objects. In this context, he uses the concept of "nestedness" from ecology and does number crunching to show that

There is a clear trend showing that the most complex products tend to be produced in a few diverse countries, while simpler products tend to be produced in most countries, including those that produce only a handful of products. This is consistent with the idea that industries present everywhere are those that require less knowledge and knowhow.

The author ties back his person-byte theory to the observations from the data. In a sense, the inference is commonsensical. The knowledge and knowhow of specialized products is sticky and biased towards specific geographical areas where or ubiquitous products, the knowledge and knowhow is spread across a wide range of geographies.

The Sixth Substance

If one looks at the models describing economic growth, the five common factors used in the literature are

  1. Land
  2. Labor
  3. Physical Capital
  4. Human Capital
  5. Social Capital

The author connects these five factors to the principles explained in the previous chapters. For example, the physical capital is the physical embodiment of information that carries the practical uses of the knowledge and knowhow used in their creation. Physical capital is made of embodied information and it is equivalent to the crystals of imagination described in the previous chapters. The author introduces a metric "economic complexity" that takes in to consideration diversity of exporting country, diversity of the country to which export is being made, the ubiquity of the product exported. The author tests his model for predictive accuracy and shows that it performs well.


The last section of the book highlight the main points from the book. In a sense, it makes my summary redundant as the author provides a far more concise summary. So, if you are short on time, you might just want to go over the last 10 pages of the book.


There is no denying about the importance of Silence and Solitude in one’s life. For me, they have always provided an appropriate environment to learn and understand a few things deeply. Drawing from that experience, I strongly feel one should actively seek some amount of "silent time" in one’s life. Is it difficult for a person leading a married life, to carve out spaces of silence? Not necessarily. I remember reading a book by Anne D.LeClaire, in which the author writes about her experience of remaining completely silent on the first and third Mondays of every month. Anne explains that this simple practice brought tremendous amount of calmness in her family life. The family members unknowingly start giving importance to "pauses", the "pauses" that actually make the sentence meaningful, the "pauses" that make the music enjoyable, the "pauses" that make our lives meaningful. Indeed many have written about the transformative experience of silence. But how many of us consciously seek silence and more importantly incorporate in our daily lives ? In the hubbub of our lives and in our over-enthusiasm for acquiring/reaching/grabbing something that is primarily externally gratifying, we often turn our back on "silence" and consequently deny or at least partly deny those experiences that are internally gratifying.

I picked up this book almost 2 months ago. For various reasons, it remained in my inventory for quite some without me having a peaceful go at it. In mid-December 2015, after living in Mumbai for 6.5 years, I decided to leave Mumbai for personal reasons. I had spent the first few weeks of December shipping most of my stuff and vacating the rented flat. Those weeks were undeniably very exhausting as I had a ton of books that had to be sorted, categorized and shipped to different places. Once I had shipped everything, the house was literally empty. Except for a few clothes of mine, and my Sitar, the house was totally empty and silent. For some reason, I felt totally liberating in that empty and silent house. In that context, I set out to read this book. In this post, I will briefly summarize the main points from the book.


The author starts off by talking about the excessive importance we give to material comforts and affective concerns

In our daily lives many of us spend most of our time looking for comforts-material comforts and affective comforts-in order to merely survive. That takes all our time. These are what we might call the daily concerns. We are preoccupied with our daily concerns: how to have enough money, food, shelter, and other material things. We also have affective concerns: whether or not some particular person loves us, whether or not our job is secure. We worry all day because of those kinds of questions. We may be trying to find a relationship that is good enough to endure, one that is not too difficult. We’re looking for something to rely on.

We may be spending 99.9 percent of our time worrying about these daily concerns-material comforts and affective concerns-and that is understandable, because we need to have our basic needs met to feel safe. But many of us worry far, far beyond having our needs met. We are physically safe, our hunger is satisfied, we have a roof over our heads, and we have a loving family; and still we can worry constantly.

The deepest concern in you, as in many of us, is one you may not have perceived, one you may not have heard. Every one of us has an ultimate concern that has nothing to do with material or affective concerns. What do we want to do with our life? That is the question. We are here, but why are we here? Who are we, each of us individually? What do we want to do with our life? These are questions that we don’t typically have (or make) the time to answer.

These are not just philosophical questions. If we’re not able to answer them, then we don’t have peace-and we don’t have joy, because no joy is possible without some peace. Many of us feel we can never answer these questions. But with mindfulness, you can hear their response yourself, when you have some silence within.

What we all need is "silence" to tune in to ourselves.

A Steady diet of noise

This chapter is mainly about realizing the kind of noise that pervades our minds. Cows, goats and buffalo chew their food, swallow it, then regurgitate and rechew it multiple times. We may not be cows or buffalo, but we ruminate just the same on our thoughts – unfortunately, primarily negative thoughts. We eat them, and then we bring them up to chew again and again, like a cow chewing its cud. The author calls this incessant noise, NST (Non Stop Thinking) Radio Station. Unconsciously many of us are constantly listening to NST and do not take time out to truly listen to what our heart needs. To understand the kind of thoughts that we constantly consume via NST, the author classifies them as follows :

  1. Edible food: What we eat affects how we feel. Imagine for a second that you overeat something you like. It is similar to a seizure where you cannot control yourself and give in to it. The immediate feeling after this overdose is usually laziness, boredom and a dull brain. You try concentrating on a thing and you realize it becomes difficult. So, something as elementary as edible food that is necessary for our survival becomes nourishing or toxic, depending on what we consume, how much we consume, and how aware we are of our consumption.
  2. Sensory food: Sensory food is what we take in with our senses and our mind – everything we see, smell, touch, taste and hear. This type of food has a far more influence on how we feel. Some of us are forever open to this external world. All the windows and doors are eternally open to this external world which throws a barrage of sensory stimuli. Most of the sensory food we consume is useless at best, harmful at worst. How often we keep watching a pathetic TV program and still lack the power to shut it off ? We often become paralyzed to sensory foods and become slaves to it. Kids are often introduced video games by parents and then the sensory food that kids derive is so addictive that kids start behaving like the characters in the video game. Imagine a kid who plays a game involving violence; Do you think he will generate calmness and a sense of balance in his mind ? No way. Same is the case with conversations. Suppose you talk to a person who is full of bitterness, envy or craving. During the conversation, you take in the person’s energy of despair. Even though you had no ill-feelings in your mind to begin with, you mind will be infected with such feelings as you begin conversing with such people. One easy way to avoid such a situation is to leave such a company and go else where. If you are in a situation where you are forced to be in such a company for whatever reasons, the next best thing is to be aware of the kind of thoughts the other person is emanating. Awareness makes you immune to the toxic sensory food that you come across in your daily life.
  3. Volition: Our primary intention and motivation is another kind of food. It feeds us and gives us purpose. Like the previous kinds of food, it can be extremely nourishing or extremely toxic based on the kind of intent and motivation levels. So much of the noise around us, whether advertisements, movies, games, music, or conversation, gives us messages about what we should be doing, what we should look like, what success looks like, and who we should be. Because of all this noise, it’s rare that we pay attention to our true desire. We act, but we don’t have the space or quiet to act with intention. If what you are doing is what your heart truly desires, then the associated work becomes a bliss. You don’t have to bother how other people look at your behavior, action and work. As long as you are clear that it is what you truly enjoy and desire, there will be little chance for toxic thoughts to arise. What you truly want to do, is something that is not that easy to figure out, if you are continuously tuned out. It requires some time out from the rat race, a period of solitude that gives you space to understand yourself.
  4. Individual and Collective consciousness: Even if we go on a sensory fast, we still feed our thoughts from our consciousness. The best way to describe this is to think of it has two storeyed building. We are forever planting seeds in the lower storey of the house. The seeds could be pleasant ones or unpleasant ones. These seeds are being watered constantly by us. Which of these seeds we water, is dependent on our individual and collective consciousness. If we are in a toxic environment, automatically, without our notice, we water crappy seeds and they show their colors with vengeance, making us feel unpleasant about it. As they say, even if you take a person to Himalayas, you cannot the take the person away from him. By consciously choosing what and who you surround yourself with, is among the keys to finding more space for joy

The takeaway from this chapter is that, we need to be aware of the kind of foods we are taking in. By being aware, it is likely that toxic foods do not enter us. By being aware, it is likely that we entertain healthy foods in to us, thus turning us in to a peaceful and a wholesome person.

Radio Non Stop Thinking

The antidote to NST is mindfulness. By having mindfulness in all the activities you perform, you will be able to stop the NST radio in your head. Shifting our attention away from our thoughts to what’s really happening in the present moment is a basic practice of mindfulness. We can do it anytime, anywhere, and find more pleasure in life. Whether we’re cooking, working, brushing our teeth, washing our clothes, or eating, we can enjoy this refreshing silencing of our thoughts and our speech. Mindfulness entails finding the inner quietness.


Thundering Silence

Many times we consume different kinds of foods mentioned in the previous chapters as a response to the compulsive urge to avoid ourselves. Whenever we try to confront the unpleasant part of us, we know that we should be letting it go. But we hold on to it. Our "knowing that we should let it go" and we actually letting it go, are two vastly different things. The latter requires us to remain silent and go to the very source of it, acknowledge it, appreciate it for whatever it has taught in our life and then let it go. With out this phase of "examining it in silence", we will forever be trying to let it go but never actually letting it go.

What is the essence of stillness ?

When we release our ideas, thoughts, and concepts, we make space for our true mind. Our true mind is silent of all words and all notions, and is so much vaster than limited mental constructs. Only when the ocean is calm and quiet can we see the moon reflected in it. Silence is ultimately something that comes from the heart, not from any set of conditions outside us. Living from a place of silence doesn’t mean never talking, never engaging or doing things; it simply means that we are not disturbed inside; there isn’t constant internal chatter. If we’re truly silent, then no matter what situation we find ourselves in, we can enjoy the sweet spaciousness of silence.

The chapter’s title is "thunderous silence", a kind of silence that is  opposite to oppressive silence

Suppose you sit outside and pay attention to the sunshine, the beautiful trees, the grass, and the little flowers that are springing up everywhere. If you relax on the grass and breathe quietly, you can hear the sound of the birds, the music of the wind playing in the trees. Even if you are in a city, you can hear the songs of the birds and the wind. If you know how to quiet your churning thoughts, you don’t have to turn to mindless consumption in a futile attempt to escape from uncomfortable feelings. You can just hear a sound, and listen deeply, and enjoy that sound. There is peace and joy in your listening, and your silence is an empowered silence. That kind of
silence is dynamic and constructive. It’s not the kind of silence that represses you. In Buddhism we call this kind of silence thundering silence. It’s very eloquent, and full of energy.

The author ends the chapter with a few simple exercises that one can perform anywhere to renew yourself and energize yourself.

The Power of Stillness

The author says, we rarely notice our breathing patterns and rarely do we enjoy our breathing. Some people carry around a notion that one has to add an additional item, "Meditation" in to their agenda. However it is far easier than that. All one has to do for practicing mindfulness is to reorient yourself and remember your true intention. Quiet, mindful breathing is something you can do at any time. Wherever you are can be a sacred place, if you are there in a relaxed and serene way, following your breathing and keeping your concentration on whatever you’re doing. The simple process of sitting quietly on a regular basis can be profoundly healing. The author offers a simple exercise for beginners; dedicate five minutes every day to walking quietly and mindfully. I guess the proof of the pudding is in the eating. You can try it out and see if it calms down your mind and make you more aware of the thoughts, feelings and NST radio in your head.

Paying Attention

One of the often asked questions involves the relevance of following mindfulness during tasks that are inherently banal. The author says that by practicing mindfulness at all times, it becomes easier to access our "island of self" during the times we actually need it. A related idea is developing the capacity to be alone or being in solitude. There are two dimensions to solitude. The first is to be alone physically. The second is to be able to be yourself and stay centered even in the midst of a group. The former appears easy and latter appears difficult, though it might be appear vice-versa for a different person. Paying attention towards anything necessitates that one is comfortable with solitude, for great technologies, ideas, inventions are a result of paying deep attention on something and then actualizing the imagination in to the real world.

Cultivating Connection

One of the ways to cultivate connection with others is to listen deeply. What does it mean to listen deeply? It basically entails stopping the NST radio, being silent and truly listening to others without forming any judgment. Sometimes if you are lucky, you will befriend a person whose company you can enjoy even without talking. The mere presence of that person who is silent can make your joyful. The author says that two people being together in silence is a very beautiful way to live. Solitude is not found only by being alone in a hut deep in the forest; it is not about cutting ourselves off from civilization. We do not lose ourselves; we do not lose our mindfulness; Taking refuge in our mindful breathing, coming back to the present moment, is to take refuge in the beautiful, serene island that each of us has within. If we carve out little moments of spaciousness in the various activities of our lives for this kind of quiet, we open ourselves up to the ultimate freedom. Whoever be the person you are trying to make connection with,friend or sibling or parent or relative or colleague, spending time in silence together is one of the best ways to forge long lasting relationships.


There are many other little hacks that the author dishes out for practicing mindfulness. Some of them are

  1. Digital Nirvana for a day
  2. Try to remain silent during a specific time period of the day or the week. I remember reading a wonderful book, titled, "Listening below the noise", in which the author follows "Silent Monday" ritual in her family. The book goes on to show innumerable benefits of this simple ritual to all her family members
  3. Some tool/gadget that reminds you to concentrate back on the present. In this context, I find Pomodoro technique to be a very effective mechanism to create a sequence of focused and relaxed times.

takeawayTakeaway : 

Irrespective of whether you take "pauses" in your life or not, I think it might be a good thing to take a "pause" and read this book. If not anything, you could find some useful hacks to lead a peaceful life.