Information Theory


This book provides a non-mathy entry point in to the world of decision trees and random forests.

Chapter 1 introduces the three kinds of learning algorithms:

  • Supervised Learning Algorithm: The algo feeds on labeled data and then use the fitted algorithm on unlabeled data
  • Unsupervised Learning Algorithm: The algo feeds on unlabeled data to figure out structure and patterns on its own
  • Semi-Supervised Learning: The algo feeds on labeled and unlabeled data. The algo needs to figure out the structure and patterns of the data. However it gets some help from the labeled data.

The algorithms covered in the book fall under Supervised learning category. Before understanding the decision tree, one needs to be familiar with the basic terminology. Chapter 2 of the book goes in to the meaning of various terms such as:

  • Decision Node
  • Chance Node
  • End Node
  • Root Node
  • Child Node
  • Splitting
  • Pruning
  • Branch

Chapter 3 goes in to creating a decision tree for a trivial example using pen and paper. In the process of doing so, it touches upon various aspects of the process, i.e. splitting, determining the purity of the node, determining the class label of a node etc. All of these are subtly introduced using plain English.

Chapter 4 talks about the three main strengths and weaknesses of hand-drawn decision trees.

Three main weaknesses:

  • Decision trees can change very quickly
  • Decision trees can become very complex
  • Decision trees can cause "paralysis by analysis"

Three main strengths:

  • Decision trees force you to consider outcomes and options
  • Decision trees help you visualize a problem
  • Decision trees help you prioritize

Decision Trees

Chapter 5 of the book gives a list of popular Decision tree algos. Decision tree can be used to perform two tasks: Classification and Regression, the former involves classifying cases based on certain features whereas the latter involves predicting a continuous value of a target variable based on input features. The five most common decision tree algos are,

  1. ID3
  2. C4.5
  3. C5.0
  4. CHAID
  5. CART

Chapter 6 of the the book goes in to showcasing a simple dataset that contains movies watched by X based on certain attributes. The objective of the algo is to predict whether X will like a movie not present in the training sample, based on certain attributes. The first step in creating decision tree involves selecting the attribute based on which the root node needs to be split. The concept of "impurity" of a node is illustrated via a nice set of visuals.

Chapter 7 goes in to the math behind splitting the node, i.e using the principles of entropy and information gain. Once a node is split, one needs a metric to measure the purity of the node. This is done via entropy. For each split of an attribute, one can compute the entropy of the subset of the nodes. To aggregate the purity measures of subsets, one needs to understand the concept of Information gain. In the context of node splitting, the information gain is computed by the difference of entropies between the parent and the weighted average entropy of the children. Again a set of rich visuals are used to explain every component in the entropy formula and information gain (Kullback-Leibler divergence).

Chapter 8 addresses common questions around Decision trees

  • How/When does the algo stop splitting?
    • Allow the tree to split until every subset is pure
    • Stop the split until every leaf subset is pure
    • Adopt a stopping method
      • Stop when a tree reaches a max number of levels
      • Stop when a minimum information-gain level is reached
      • Stop when a subset contains less than a defined number of data points
  • Are there other methods to measure impurity ?
    • Gini’s coefficient
  • What is greedy algo ?
    • greedy algo selects nodes to build a tree by making a choice that seems best in the moment and never looks back
  • What if the dataset has two identical examples ?
    • These are usually the noisy observations. Either the dataset can be increased or observations could be labeled correctly
  • What if there are more than 2 classes ?
    • Logic remains the same. Only the entropy formula differs

Chapter 9 talks about the potential problems with Decision trees and the ways to address them

  • Overfitting
    • Statistical significance tests can be be used to check the information gain is significant or not
    • Pruning a tree so that there nodes with sparse data or incorrect fits can be removed
  • Information gain bias
    • Information Gain Ratio reduces the bias that information gain has towards attributes that have large number of subsets. It accomplishes this by taking in to consideration the size and number of branches for each attribute

Chapter 10 gives an overview of Decision Tree Algorithms. Algorithms differ in the way the following aspects are handled:

  • How does the algorithm determine what to split?
  • How does the algorithm measure purity?
  • How does the algorithm know when to stop splitting?
  • How does the algorithm prune?

Here is a list of popular Decision tree algos with their pros and cons:

  • ID3 Algorithm Iterative Dichotomiser 3, is the "grandfather" of decision tree algorithms and was developed in 1986 by Ross Quinlan, a machine learning researcher.
    • Pros:
      • Easy to implement.
      • Very straightforward.
      • There is plenty of documentation available to help with implementation and working through issues.
    • Cons:
      • Susceptible to overfitting
      • Does not naturally handle numerical data.
      • Is not able to work with missing values.
  • C4.5 algorithm is the successor to the ID3 algorithm and was invented in 1993 by Ross Quinlan. It makes use of many of the same elements as the ID3 but also has number of improvements and benefits.
    • Pros:
      • Can work with either a continuous or discrete dataset. This means it can be used for classification or regression and work with categorical or numerical data
      • Can work with incomplete data.
      • Solves "overfitting" by pruning and its use of the Gain Ratio.
    • Cons:
      • Constructs empty branches with zero values.
      • Tends to construct very large trees with many subsets.
      • Susceptible to overfitting.
  • CART was first developed in 1984 and a unique characteristic of it is that it can only construct binary trees. ID3, C4.5 and CHAID are all able to construct multiple splits.
    • Pros:
      • Can work with either a continuous or discrete dataset. This means it can be used for classification or regression.
      • Can work with incomplete data.
    • Cons:
      • It can only split on a single variable.
      • Susceptible to instability.
      • Splitting method is biased towards nodes with more distinct values.
      • Overall, the algorithm can be biased towards nodes with more missing values.

Chapter 11 gives a sample python code to build a decision tree via CART.

Random Forests

A random forest is a machine learning algorithm that makes use of multiple decision trees to predict a result, and this collection of trees is often called an ensemble. The basic idea of random forest is that a decision tree is built based on a random set of observations from the available dataset.

Chapter 12 gives pros and cons of random forest algo

  • Pros:
    • More accurate than a single decision tree
    • More stable than a single decision tree
    • Less susceptible to the negative impact of greedy search
  • Cons
    • More Difficult to comprehend and interpret
    • Computationally expensive

Pros and cons of Decision tree

  • Pros:
    • easier to understand and interpet
    • less computational power
  • Cons:
    • tend to overfit
    • affected by small changes
    • affected by greedy search

Chapter 13 describes the basic algo behind random forest, i.e three steps. The first step involves selecting a subset of data.This is followed up by selecting random set of attributes from the bootstrapped sample.Based on the selected attributes, a best split is made and is repeated until a stopping criteria is reached.

Chapter 14 describes the way in which random forest predicts the response for a test data. There are two methods described in this chapter,i.e predicting with majority vote and predicting with mean

Chapter 15 explains the way to testing random forest for its accuracy. The method entails computing O.O.B estimate(Out of Bag error estimate).The key idea is to create a map between a data point and all the trees in which that data point does not act as a training sample. Once the map is created, for every randomized decision tree, you can find a set of data points that have not been used to train it and hence can be used to test the relevant decision tree.

Chapter 16 goes in to the details of computing attribute importance. The output of such computation is a set of relative scores for all the attributes in the dataset. These scores can be used to pre-process the data – remove all the unwanted attributes and rerun the random forest.

Chapter 17 answers some of the common questions around random forests

  • How many trees go in to random forest ?
    • A safe number to start is between 64 and 128 trees
  • When do I use random forest vs decision tree ?
    • If you are concerned with accuracy, go for random forest
    • If you are concerned with interpretability, go for decision tree

Chapter 18 gives a sample python code to build a random forest vis Scikit-Learn library


Here are some of the visuals from the book:









I think the visuals are the key takeaways from the book. You can read about the concepts mentioned in the book in a ton of places. However you might not find adequate visuals in a book that explains the math. This book is a quick read and might be worth your time as visuals serve as a power aid for learning and remembering concepts.




I stumbled on to this book a few weeks ago and immediately picked it up after a quick browse through the sections of the book. I had promptly placed it in my books-to-read list. I love anything related to information theory mainly because of its inter-disciplinary applications. The principles of information theory are applicable in a wide range of fields. In fact it will hard to pinpoint a specific area where concepts from information theory have not been applied. In this post, I will summarize the main points of the book.


Prologue : The Eternal War

The chapter is titled so, because there is a conflict between entropy and information. The entropy is the incessant march towards disorder. One of the ways that I can relate to is my music practice. If I don’t practice my music for long, I find it difficult to retrain my fingers and get back my muscle memory. "That which you don’t use atrophies". Entropy is also something similar. In the absence of any mechanisms to create information, the disorder of the system increases. This obviously raises a question about the mechanisms that allow the information to battle randomness and grow. The book is mainly about describing the mechanisms by which the information grows, the physical order of our world increases – that makes our planet unique, rich and uneven, from atoms to economies. The author focuses on planet earth as this is a special place where information lives, grows and hides in an otherwise mostly barren universe.

In the prologue, the author says that the book would answer the following questions:

  • What is Information ?
  • Where does it come from ?
  • Why is information concentrated on our planet?
  • Why does it grow on our planet ?
  • What are the natural, social and economic mechanisms that allow it to grow ?
  • How do the various mechanisms contribute to social and economic unevenness of the global economy ?
  • How does the social accumulation of information improve our capacity to accumulate even more information?

Introduction : From Atoms to People to Economies

The chapter starts with the story of Ludwig Boltzmann, the famous scientist who committed suicide. Though the exact reason is not known, the author speculates that it could be the apparent conflict between his theory and the order prevalent in the world. His theory was that there is always a march towards disorder, which stumped him because there were so many fascinating things in the nature that were orderly, systematic, almost giving an impression that there was a creator up there who was designing our world. The biggest sin that Ludwig committed, given the context of scientific temper at his time, was that he had worked across spatial scales. His theory made connections between atoms and gases, both belonging to different spatial scales. At that point in time, any connection between various spatial scales was considered as a sin.

At the turn of twentieth century, Ludwig was vindicated. There was immense cross-fertilization of ideas amongst many fields. Yet not all of the cross-fertilization took place near known scientific boundaries. Amid these multidisciplinary tangos, there was one concept that was promiscuous enough to play the field. This was the idea of information. In the twentieth century, the study of information was inspired by war as there was a urgent need to encode and decode messages effectively. The field took off after the revolutionary paper by Claude Shannon and Warren Weaver. Information as a concept found its followers in almost every field for the simple reason that it could be applied to microscopic as well as macroscopic worlds. It was the first truly scale independent concept. Even though the idea of information grew in prominence, many began to forget one crucial aspect of information

We forget about the physicality of information that had troubled Boltzmann. The word information became a synonym for the ethereal, the unphysical, the digital, the weightless, the immaterial. But information is physical. It is as physical as Boltzmann’s atoms or the energy they carry in their motion. Information is not tangible; it is not a solid or a fluid. It does not have its own particle either, but it is as physical as movement and temperature, which also do not have particles of their own. Information is incorporeal, but it is always physically embodied. Information is not a thing; rather, it is the arrangement of physical things. It is physical order, like what distinguishes different shuffles of a deck of cards.

One of the highlights of the work of Shannon and Weaver is that they divorced the idea of information and message. Colloquially we can use both the terms interchangeably. However the need to divorce the two was needed so that further developments in the field could happen. Whatever gets transmitted between two devices, two people, is information. It is humans who automatically interpret the information as a meaning, given the various contextual factors. This clear demarcation was given because technically , one could now focus on sending any kind of messages whether the message meant anything or not. Shannon also came up with a formula for encoding an arbitrary message with maximum efficiency. This formula looked identical to the Boltzmann’s formula.

The beauty of information being scale independent means that one can use principles of information theory to describe everything from atoms to economies. In all the previous attempts, natural sciences described the atom to human connection, the social sciences described the connection between humans and economies. Using the concept of information, one can analyze across all scales. The content of book is laid out in such a way that it describes the history of the universe, centered not on the arrow of time but on the arrow of complexity.

It is the accumulation of information and of our ability to process information that defines the arrow of growth encompassing the physical, the biological, the social, and the economic, and which extends from the origin of the universe to our modern economy. It is the growth of information that unifies the emergence of life with the growth of economies, and the emergence of complexity with the origins of wealth.

The Secret to Time Travel

This book made me look at child birth from a completely different perspective. The author compares child birth as an example of time travel; the baby is transferred from an environment(mother’s womb) that has essentially remained same since the last 1000 years in to 21st century world that is largely alien for the species. There are a ton of machines, gadgets, technologies, objects that are realizations of human knowledge and human knowhow. All the objects that we seen around embody information and imagination. The author uses two central actors, amongst many, to describe the way information grows, i.e.

  1. Physical objects: physical embodiment of information
  2. People: fundamental embodiment of knowledge and knowhow

The fundamental perspective of the author is,

Economy is the system by which people accumulate knowledge and knowhow to create packets of physical order, or products, that augment our capacity to accumulate more knowledge and knowhow and, in turn, accumulate more information.

How are humans different from other species on the planet ?

The fact that objects embody information and imagination may seem obvious. Information is a fundamental aspect of nature, one that is older than life itself. It is also an aspect of nature that accelerated with life. Consider the replication of information-rich molecules, such as DNA and RNA. The replication of DNA and RNA is not the replication of matter but the replication of the information that is embodied in matter. Living organisms are highly organized structures that process and produce information. Yet, our focus here will not be on the information-generating capacity that is embodied in the intimacy of our cells but that which emerged with humans and society. Humans are special animals when it comes to information, because unlike other species, we have developed an enormous ability to encode large volumes of information outside our bodies.

Humans are able to create physical instantiations of the objects we imagine, while other species are stuck with nature’s inventory.

The Body of the Meaningless

This chapter clarifies the differences amongst various terms used in information theory. Terms such as entropy and information are used interchangeably. Indeed they can be used in some situations but not always. Shannon’s definition of information relates to the number of bits required to encode a message with maximum efficiency. In a sense, a highly regular correlation rich structure has less information and a randomized set of instructions in a message has more information. He termed this as "entropy"(von Neumann told Shannon that calling his measure entropy would guarantee Shannon’s victory in every argument, since nobody really knew what entropy was). If I consider my laptop, it contains many documents, pictures, videos etc. In Shannon’s language, if I randomly switch the bits in my computer, the information increases. But this doesn’t go with our intuitive definition of information. Ideally the more regular, the more ordered the data is, there is more information in to it. So, there is a need to expand the definition of entropy as defined by Shannon so that one can use those concepts to talk about information that we can relate to.

The author gives a nice analogy of a half-filled stadium to show the difference between entropy as defined in statistical physics and entropy as defined by Shannon. In statistical physics, entropy is dependent on "multiplicity of states". A highly disordered system tends to have higher multiplicity of states and hence has higher entropy. However it is not necessary that a higher entropy system is necessarily more disordered. In other words, disorder can be equated to higher entropy but not always. In the physical sciences, information has always been referred to something that has order. So, in physical states, information is the opposite of entropy. The ordered states, commonly referred to as information rich states are highly correlated structures. These information rich structures are also uncommon and peculiar structures in the nature.

The author uses the example of Rubik’s cube to illustrate the rarity of ordered states in the nature. Rubik’s cube has 4.3 × 10^9 possible states and the perfect state can be obtained in less than 20 moves. However getting to this ordered state requires a specific movement of the cube that one is called a genius if he can reach to an ordered state in less than 30 moves. This example can be extrapolated to the nature. The growth of entropy is like a Rubik’s cube in the hands of a child. In nature information is rare not only because information-rich states are uncommon but also because they are inaccessible given the way in which nature explores the possible states. The author provides a few nice examples that show the connection between multiplicity of states and the ability to process information,i.e. compute

The main idea of this chapter is to look at the word "information" as defined by Shannon, and then reconcile the concept with the colloquial meaning of the word information and the work of Boltzmann.

The Eternal Anomaly

If the natural tendency of a system is to move towards disorder, move towards higher entropy, how does one explain the information explosion on our planet ? If we look around the planet, it is amazing to see so many beautiful creations of the nature. Why didn’t our planet disintegrate in to chaos ? Why does information grow on our planet ? To explain this phenomenon, the author introduces the theory put forth by Ilya Prigogine. The main idea of the theory is

Information emerges naturally in the steady states of physical systems that are out-of-equilibrium.

The author unpacks the above statement using many examples such as marble in a bowl, box filled with gas, whirlpool in a sink etc. Prigogine realized that although Boltzmann’s theory was correct, it did not apply to what we observe on Earth because our planet is an out-of-equilibrium pocket inside a larger system-the universe-that is moving toward equilibrium. In fact, our planet has never been close to any form of equilibrium. Prigogine did the math and showed that out-of-equilibrium systems give rise to information-rich steady states. So, that explains "Where information comes from ?". In an out-of-equilibrium system, such as Earth, the emergence of information is expected. It is no longer an anomaly. The bad news, however, is that entropy is always lurking on the borders of information-rich anomalies, waiting to devour these anomalies as soon as it gets the chance. Yet information has found ways to fight back. As a result, we live on a planet where information is "sticky" enough to be recombined and created. This stickiness, which is essential for the emergence of life and economies, also hinges on additional fundamental physical properties.

The author explains three mechanisms that make the information sticky. The first mechanism flows from Prigogine’s math that states that out-of-equilibrium systems self-organize into steady states in which order emerges spontaneously, minimizing the destruction of information. The second mechanism comes from Schrodinger’s theory that says Solids are essential to explain the information-rich nature of the life. The third mechanism by which information grows is matter’s ability to process information, or the ability of the matter to compute. The author explains wonderfully all the three aspects that make information "sticky"

The main idea of this chapter is to view our planet as out-of-equilibrium system. The other idea communicated by the author is that of "entropy barrier". I love this concept as it is philosophically aligned with what I believe, "Life is a Martingale".

Time is irreversible in a statistical system because the chaotic nature of systems of many particles implies that an infinite amount of information would be needed to reverse the evolution of the system. This also means that statistical systems cannot go backward because there are an infinite number of paths that are compatible with any present. As statistical systems move forward, they quickly forget how to go back. This infiniteness is what Prigogine calls the entropy barrier, and it is what provides a perspective of time that is not spatialized like the theories of time advanced by Newton and Einstein. For Prigogine, the past is not just unreachable; it simply does not exist. There is no past, although there was a past. In our universe, there is no past, and no future, but only a present that is being calculated at every instant. This instantaneous nature of reality is deep because it helps us connect statistical physics with computation. The instantaneous universe of Prigogine implies that the past is unreachable because it is incomputable at the micro level. Prigogine’s entropy barrier forbids the present to evolve into the past, except in idealized systems

Crystallized Imagination

The author starts off by giving his perspective on life

Life is all about : moving around and processing information, helping information grow while interacting in a social context.

If you reflect on the above statement a bit, I guess you will at least concur with some part of it, if not the entire statement. Our society’s ability to accumulate information requires flows of energy, the physical storage of information in solid objects, and of course our collective ability to compute. The flow of energy that keeps our planet’s information growing is clearly that coming from the sun. Plants capture that energy and transform it into sugar, and over long periods of time they degrade into the mineral fuel we know as oil. But as a species, we have also developed an amazing capacity to make information last. We have learned to accumulate information in objects, starting from the time we built our first stone axes to the invention of the latest computer.

The easiest way to get a grasp on the "accumulating information in an object" is via comparing "apple" that is product of a tree, and "Apple" product from Silicon valley. The former is a product available in the nature and we internalize in our minds while the latter is an instantiation of the knowledge in our head. Both products are packets of information, but only the latter is a crystal of imagination. The author cites two examples of MIT lab scientists who are working on robotic arms and optogenetics. They are trying to create objects that crystallize imagination, and by doing so, they are endowing our species with new capacities. The author gives several contexts where thinking about products in a different way changes several preexisting metrics and notions that we carry on in our head. For example, Chile is a potential exporter of copper and one might argue that other countries are exploiting Chile. However by looking at the value generated in the finished products that use copper, the value of copper itself goes up. So, who is exploiting whom? Isn’t Chile free-riding on the crystallized imagination of other people?

Thinking about products as crystals of imagination helps us understand the importance of the source of the information that is embodied in a product. Complex products are not just arrangements of atoms that perform functions; rather, they are ordered arrangements of atoms that originated as imagination.


The chapter is titled so, to emphasize the amplifying nature of the objects. Each object can be thought of as a crystallization of knowledge and knowhow and these objects become important to all of us because they enhance our capacities to do other things with it. Take laptop for instance. It is a product of someone else’s imagination and we get to use it to produce some other objects. There is no need to know what’s behind the hood for every object that we use. In the words of the author,

Products are magical because they augment our capacities

Objects are much more than merely a form of communication.

Our ability to crystallize imagination into products, although expressive, is different from our ability to verbally articulate ideas. An important difference is that products can augment our capacities in ways that narrative descriptions cannot. Talking about toothpaste does not help you clean your teeth, just as talking about the chemistry of gasoline will not fill up your car with gas. It is the toothpaste’s embodiment of the practical uses of knowledge, knowhow, and imagination, not a narrative description of them, that endows other people with those practical uses. Without this physical embodiment the practical uses of knowledge and knowhow cannot be transmitted. Crystallizing imagination is therefore essential for sharing the practical uses of the knowledge that we accumulate in our mind. Without our ability to crystallize imagination, the practical uses of knowledge would not exist, because that practicality does not reside solely in the idea but hinges on the tangibility of the implementation. Once again, the physicality of products-whether tangible or digital-augments us.

The main idea of this chapter is to describe products as physical embodiments of information, carrying the practical uses of knowledge, knowhow, and imagination. Our capacity to create products that augment us also helps define the overall complexity of our society.

This time, It’s personal

If we look at various products, the knowledge and knowhow for creating these products are geographically biased, though it is coming down a bit at least on the software front. The reason for this geographical bias is that crystallization of any product requires a great amount of knowledge and knowhow. The learning in almost all the cases is experimental and social. Bookish knowledge alone is not enough. You need a certain set of environment where you can interact, share ideas, experiment, learn from trial and errors. Each geographical region has its own idiosyncrasies and hence gives rise to different codifications of knowledge and knowhow. So, this means that there is certainly going to be geographical bias in the products we see. So, this naturally limits the growth of information. The author introduces a term, person-byte, meaning maximum knowledge and knowhow carrying capacity of a human. Is there a limit for human knowledge? Well, let’s talk about knowledge that one can accumulate over a period of ones working life. If I take my own example, there is a limit to how much math you can do, what kind of stats I can work on, what kind of models that I can build, the amount of code I can write. All these ultimately limit the growth of information. In that sense, a person-byte is a nifty idea that says that for information to grow, there needs to be a network of people where the collective person-bytes of the group is more than the individual person-byte.

The person-byte limit implies that the accumulation of large amounts of knowledge and knowhow are limited by both the individual constraints of social and experiential learning and the collective constraints brought by our need to chop up large volumes of knowledge and knowhow and distribute them in networks of individuals.

Links are not free

If one harks back to the time Henry Ford’s Model-T factory, it was considered as a poster child of industrial economy. It stood for the efficiency gained through scale. The output of the factory, the car, was a complex product and the rationale was, it was better to chunk out this complex task in to 7,882 tasks. It is another matter of debate whether there was a need for 7,882 individual tasks or not. One takeaway could be that complex products needs giant factories. Based on that takeaway, we should be having innumerable giant factories, given the complexity of products that we see in today’s world. This is where the author introduces a second level of quantization of knowledge and knowhow; firm-byte. This is a conceptual term that gives a upper limit on the amount of knowledge and knowhow a firm can possess. So, if a product requires more number of firm-bytes, there is a need for a network of firms. The factors that limit the size of the firm has been studied under "transaction cost theory" extensively. The author gives an overview of the theory that says

There are fundamental forces that limit the size of the networks we know as firms, and hence that there is a limit to the knowledge and knowhow these networks can accumulate. Moreover, it also tells us that there is a fundamental relationship between the cost of the links and the size of these networks: the cheaper the link, the larger the network.

It all comes down to links. If you take a typical Barbie doll, the various steps in the start to scratch process happen in twenty different countries. What has made possible this splintering up of the manufacturing process? It is not because the product is complicated. It is because the cost of creating a links between a set of firms has become easy. This could be attributed to reducing transportation costs, revolution in communication technologies, standardization of parts etc. In all the cases where market links have become cheaper, we have seen vast networks of firms participating together. There are innumerable examples that fall in to this category(iPad, iPhone,laptops,cell phones,…)

Does it mean that making the cost of market links cheaper will automatically give rise to increase in information via crystallization of many other products? Not necessarily. We observe links that are inherently expensive depending on the frequency and specificity of the transaction.

In Links We Trust

This chapter explores the role of "trust" in formation of networks. Social networks and social institutions help determine the size, adaptability, and composition of the networks humans need to accumulate knowledge and knowhow. When it comes to size, the ability of societies to grow large networks is connected to the level of trust of the underlying society. When it comes to the composition of networks, social institutions and preexisting social networks affect the composition of the professional networks we form in two important ways. On the one hand, a society’s level of trust determines whether networks are more likely to piggyback on family relations. On the other hand, people find work through personal contacts, and firms tend to hire individuals who trace the social networks of their employees.

The takeaway from this chapter is that social networks and institutions are also known to affect the adaptability of firms and networks of firms.

The Evolution of Economic Complexity

If one’s intention were to study the geographical distribution of knowledge and knowhow, one inevitably comes up with an issue- knowledge and knowhow are intangibles. How does one cull out of these things for various geographies ? The author’s first attempt is to look at the location of various industries that produce complex objects to simple objects. In this context, he uses the concept of "nestedness" from ecology and does number crunching to show that

There is a clear trend showing that the most complex products tend to be produced in a few diverse countries, while simpler products tend to be produced in most countries, including those that produce only a handful of products. This is consistent with the idea that industries present everywhere are those that require less knowledge and knowhow.

The author ties back his person-byte theory to the observations from the data. In a sense, the inference is commonsensical. The knowledge and knowhow of specialized products is sticky and biased towards specific geographical areas where or ubiquitous products, the knowledge and knowhow is spread across a wide range of geographies.

The Sixth Substance

If one looks at the models describing economic growth, the five common factors used in the literature are

  1. Land
  2. Labor
  3. Physical Capital
  4. Human Capital
  5. Social Capital

The author connects these five factors to the principles explained in the previous chapters. For example, the physical capital is the physical embodiment of information that carries the practical uses of the knowledge and knowhow used in their creation. Physical capital is made of embodied information and it is equivalent to the crystals of imagination described in the previous chapters. The author introduces a metric "economic complexity" that takes in to consideration diversity of exporting country, diversity of the country to which export is being made, the ubiquity of the product exported. The author tests his model for predictive accuracy and shows that it performs well.


The last section of the book highlight the main points from the book. In a sense, it makes my summary redundant as the author provides a far more concise summary. So, if you are short on time, you might just want to go over the last 10 pages of the book.



We see/hear/talk about “Information”  in many contexts. In the last two decades or so, one can also go and make a career in the field of “Information” technology. But what is “Information” ? If someone talks about a certain subject for 10 minutes in English and 10 minutes in French, Is the “Information” same in both the instances?. Can we quantify the two instances in someway ? This book explains Claude Shannon’s remarkable achievement of measuring “Information” in terms of probabilities. Almost 50 years ago, Shannon laid out a mathematical framework and it was an open challenge for engineers to develop devices and technologies that Shannon proved as a “mathematical certainty”. This book distils the main ideas that go in to quantifying information with very little math and hence makes it accessible to a wider audience. A must read if you are curious about knowing a bit about “Information” which has become a part of every day’s vocabulary.