If one rips apart a computer and looks at its innards, one sees a coalescence of beautiful ideas. The modern computer is built based on electronics but the ideas behind its design has nothing to do with electronics. A basic design can be built from valves/water pipes etc. The principles are the essence of what makes computers compute.
This book introduces some of the main ideas of computer science such as Boolean logic, finite-state machines, programming languages, compilers and interpreters, Turing universality, information theory, algorithms and algorithmic complexity, heuristics, uncomutable functions, parallel computing, quantum computing, neural networks, machine learning, and self-organizing systems.
Who might be the target audience of the book ? I guess the book might appeal to someone who has a passing familiarity of main ideas of computer science and wants to know a bit more, without being overwhelmed. If you ever wanted to explain the key ideas of computing to a numerate high school kid in such a way that it is illuminating and at the same time makes him/her curious, and you are struggling to explain the ideas in simple words, then you might enjoy this book.
Nuts and Bolts
Recollect any algebraic equation you would have come across in your high school. Now replace the variables with logic statements that are either true or false, replace the arithmetic operations with relevant Boolean operators, you get a Boolean algebraic equation. This sort of algebra was invented by George Boole. It was Claude Shannon who showed that one could build electronic circuits that could mirror any Boolean expression. The implication of this construction is that any function capable of being described as a precise logical statement can be implemented by an analogous system of switches.
Two principles of any type of computer involves 1) reducing a task to a set of logical functions and 2) implementing logical functions as a circuit of connected switches. In order to illustrate these two principles, the author narrates his experience of building tic-tac-toe game with a set of switches and bulbs. If you were to manually enumerate all the possible combinations of a 2 player game, one of the best representations of all the moves is via a decision tree. This decision tree will help one choose the right response for whatever be your opponent’s move. Traversing through the decision tree means evaluating a set of Boolean expressions. Using switches in series or parallel, it is possible to create a automated response to a player’s moves. The author describes briefly the circuit he built using 150 odd switches. So, all one needs to construct an automated tic-tac-toe game is switches, wires, bulbs and a way to construct logical gates. There are two crucial elements missing from the design. First is that the circuit has no concept of events happening over time; therefore the entire sequence of game must be determined in advance. The second aspect that is missing is that the circuit can perform only one function. There is no software in it.
Early computers were made with mechanical components. One can represent AND, OR, NOR, etc. using a set of mechanical contraptions that can then be used to perform calculations. Even in 1960’s most of the arithmetic calculators were mechanical.
Building a computer out of any technology requires two components, i.e.
The author uses hydraulic valves, tinker toys, mechanical contraptions, electrical circuits to show the various ways in which the Boolean logic can be implemented. The basic takeaway from this chapter is the principle of functional abstraction. This process of functional abstraction is a fundamental in computer design-not the only way to design complicated systems but the most common way. Computers are built up on a hierarchy of such functional abstractions, each one embodied in a building block. Once someone implements a 0/1 logic, you built stuff over it. The blocks that perform functions are hooked together to implement more complex functions, and these collections of blocks in turn become the new building blocks for the next level.
Universal Building blocks
The author explains the basic mechanism behind translating any logical function in to a circuit. One needs to break down the logical function in to parts, figure out the type of gates that are required to translate various types of input to the appropriate output of a logical function. To make this somewhat abstract principle more concrete, the author comes up with a circuit design for "Majority wins" logical block. He also explains how one could design a circuit for “Rocks-Paper-Scissors” game. The learnings from these simple circuits are then extrapolated to the general computer. Consider an operation like addition or multiplication; you can always represent it as a Boolean logic block. Most computers have logical blocks called arithmetic units that perform this job.
Hence the core idea behind making a computer perform subtraction, addition, multiplication, or whatever computation is to write out the Boolean logic on a piece of paper and then implement the same using logical gates.
The most important class of functions are time varying functions, i.e. the output depends on the previous history of inputs. These are handled via a finite-state machine. The author describes the basic idea of finite-state machine via examples such as as ball point pen, combination lock, tally counter, odometer etc. To store this finite-state machine, one needs a device called register. An n-bit register has n inputs and n outputs, plus an additional timing input that tells the register when to change state. Storing new information is called "writing" the state of the register. When the timing signal tells the register to write a new state, the register changes its state to match the inputs. The outputs of the register always indicate its current state. Registers can be implemented in many ways, one of which is to use a Boolean logic block to steer the state information around in a circle. This type of register is often used in electronic computers, which is why they lose track of what they’re doing if their power is interrupted.
A finite-state machine consists of a Boolean logic block connected to a register. The finite-state machine advances its state by writing the output of the Boolean logic block into the register; the logic block then computes the next state, based on the input and the current state. This next state is then written into the register on the next cycle. The process repeats in every cycle. The machine on which I am writing this blog post is 1.7GHz machine, i.e. the machine can change its state at a rate of 1.7billion times per second. To explain the details of a finite state machine and its circuit implementation, the author uses familiar examples like "traffic lights", "combination lock". These examples are superbly explained and that’s the beauty of this book. Simple examples are chosen to illustrate profound ideas.
Finite state machines are powerful but limited. They cannot be used for stating many patterns that are common in our world. For instance, it is impossible to build a finite-state machine that will unlock a lock whenever you enter any palindrome. So, we need something else too besides logic gates and finite state machines.
Boolean logic and finite-state machine are the building blocks of computer hardware. Programming language is the building block of computer software. There are many programming languages and if you learn one or two, you can quickly pick up the others. Having said, writing code to perform something is one thing and writing effective code is completely a different thing. The latter takes many hours of deliberate effort. This is aptly put by the author,
Every computer language has its Shakespeares, and it is a joy to read their code. A well-written computer program possesses style, finesse, even humor-and a clarity that rivals the best prose.
There are many programming languages and each has its own specific syntax.The syntax in these languages are convenient to write as compared to machine level language instructions. Once you have a written a program, how does the machine know what to do ? There are three main steps :
a finite-state machine can be extended, by adding a storage device called a memory, which will allow the machine to store the definitions of what it’s asked to do
extended machine can follow instructions written in machine language, a simple language that specifies the machine’s operation
machine language can instruct the machine to interpret the programming language
A computer is just a special type of finite-state machine connected to a memory. The computer’s memory-in effect, an array of cubbyholes for storing data-is built of registers, like the registers that hold the states of finite-state machines. Each register holds a pattern of bits called a word, which can be read (or written) by the finite-state machine. The number of bits in a word varies from computer to computer. Each register in the memory has a different address, so registers are referred to as locations in memory. The memory contains Boolean logic blocks, which decode the address and select the location for reading or writing. If data is to be written at this memory location, these logic blocks store the new data into the addressed register. If the register is to be read, the logic blocks steer the data from the addressed register to the memory’s output, which is connected to the input of the finite-state machine.Memory can contain data, processing instructions, control instructions. These instructions are stored in machine language. Here is the basic hierarchy of functional dependence over various components:
Whatever we need the computer to perform, we write it in a programming language
This is converted in to machine language by a compiler, via a predetermined set of subroutines called operating system
The instructions are stored in a memory. These are categorized in to control and processing instructions.
Finite state machines fetch the instructions and execute the instructions
The instructions as well as data are represented by bits, are stored in the memory
Finite state machines and memory are built based on storage registers and Boolean blocks
Boolean blocks are implemented via switches in series or parallel
Switches control something physical that sends 0 or 1
If you look at these steps, each idea is built up on a a level of abstraction. In one sense, that’s why any one can write simple software programs without understanding a lot of details about the computer architecture. However the more one goes gets closer to the machine language, the more he needs to understand the details of various abstractions that have been implemented.
How universal are Turing machines ?
The author discusses the idea of universal computer, that was first described in 1937 by Alan Turing. What’s a Turing machine ?
Imagine a mathematician performing calculations on a scroll of paper. Imagine further that the scroll is infinitely long, so that we don’t need to worry about running out of places to write things down. The mathematician will be able to solve any solvable computational problem no matter how many operations are involved, although it may take him an inordinate amount of time.
Turing showed that any calculation that can be performed by a smart mathematician can also be performed by a stupid but meticulous clerk who follows a simple set of rules for reading and writing the information on the scroll. In fact, he showed that the human clerk can be replaced by a finite-state machine. The finite-state machine looks at only one symbol on the scroll at a time,
so the scroll is best thought of as a narrow paper tape, with a single symbol on each line. Today, we call the combination of a finite-state machine with an infinitely long tape a Turing machine
The author also gives a few example of noncomputable problems such as "Halting problem". He also touches briefly upon quantum computing and explains the core idea using water molecule
When two hydrogen atoms bind to an oxygen atom to form a water molecule, these atoms somehow "compute" that the angle between the two bonds should be 107 degrees. It is possible to approximately calculate this angle from quantum mechanical principles using a digital computer, but it takes a long time, and the more accurate the calculation the longer it takes. Yet every molecule in a glass of water is able to perform this calculation almost instantly. How can a single molecule be so much faster than a digital computer?
The reason it takes the computer so long to calculate this quantum mechanical problem is that the computer would have to take into account an infinite number of possible configurations of the water molecule to produce an exact answer. The calculation must allow for the fact that the atoms comprising the molecule can be in all configurations at once. This is why the computer can only approximate the answer in a finite amount of time.
One way of explaining how the water molecule can make the same calculation is to imagine it trying out every possible configuration simultaneously-in other words, using parallel processing. Could we harness this simultaneous computing capability of quantum mechanical objects to produce a more powerful computer? Nobody knows for sure.
Algorithms and Heuristics
The author uses simple examples to discuss various aspects of any algorithm like designing an algorithm, computing the running time of an algorithm etc. For many of the problems where precise algorithms are not available, the next best option is to use heuristics. Designing heuristics is akin to art. Most of the real life problems requires one to come up with a healthy mix of algorithmic solutions and heuristics based solutions. IBM Deep Blue is an amazing example that shows the intermixing of algorithms and heuristics can beat one of the best human minds. Indeed there will be a few cognitive tasks that will be out of computer’s reach and humans have to focus on skills that are inherently human. A book length treatment is given to this topic by Geoff Colvin in his book (Humans are underrated)
Memory : Information and Secret codes
Computers do not have infinite memory. There needs a way to measure the information stored in the memory. An n bit memory can store n bits of data. But we need to know how many bits are required to store a certain form of input. One can think of various ways for doing so. One could use the "representative" definition, i.e think about how each character in the input is represented on a computer and then assess the total number of bits required to represent the input in a computer. Let’s say this blog post has 25000 characters and each character takes about 8 bits on my computer, then the blog post takes up 0.2 million bits. 8 bits for each character could be a stretch. May be all the characters in this post might require only 6 bits per character. Hence there would 0.15 million bits required to represent this post. The problem with this kind of quantifying information is that it is "representation" dependent. Ideal measure would be the minimum number of bits needed to represent this information. Hence the key question here is, "How much can you compress a given text without losing information?".
Let’s say one wants to store the text of "War and Peace", the information size obtained by multiplying 8 bits times number of characters in the novel would give a upper bound. By considering various forms of compression such as 6 bit encoding, taking advantage in regularities of data, take advantage of grammar associated with the language, one can reduce the information size of the novel. In the end, the compression that uses the best available statistical methods would probably reach an average representation size of fewer than 2 bits per character-about 25 percent of the standard 8-bit character representation.
If the minimum number of bits required to represent the image is taken as a measure of the amount of information in the image, then an image that is easy to compress will have less information. A picture of a face, for example, will have less information than a picture of a pile of pebbles on the beach, because the adjacent pixels in the facial image are more likely to be similar. The pebbles require more information to be communicated and stored, even though a human observer might find the picture of the face much more informative. By this measure, the picture containing the most information would be a picture of completely random pixels, like the static on a damaged television set. If the dots in the image have no correlation to their neighbors, there is no regularity to compress. So, pictures that are totally random require lot many bits and hence contain lot of information. This go against our notion of information. In common parlance, a picture with random dots should have less information that a picture with some specific dot pattern. Hence it is important for computers to store meaningful information. Indeed that’s how many image and sound compression algos work. They discard meaningless information. Another level of generalization of this idea is to consider a program that can generate the data that is being stored. This leads us to another measure of information:
The amount of information in a pattern of bits is equal to the length of the smallest computer program capable of generating those bits.
This definition of information holds whether the pattern of bits ultimately represents a picture, a sound, a text, a number, or anything else.
The second part of this chapter talks about public-key private key encryption and error correction mechanisms. The author strips off all the math and explains it in plain simple English.
Speed: Parallel Computers
"Parallel computing" is a word that is tossed around at many places. What is the basic problem with a normal computer that parallel computing aims to solve? If you look at the basic design of a computer, it has always been that processing and memory were considered two separate components, and all the effort was directed towards increasing the processing speed. If you compare the modern silicon chip based computer to let’s say the vintage room filled computer, the basic two-part design has remained the same, i.e. processor connected to memory design. This has come to be known as sequential computer and has given the need for parallel computing. To work any faster, today’s computers need to do more than one operation at once. We can accomplish this by breaking up the computer memory into lots of little memories and giving each its own processor. Such a machine is called a parallel computer. Parallel computers are practical because of the low cost and small size of microprocessors. We can build a parallel computer by hooking together dozens, hundreds, or even thousands of these smaller processors. The fastest computers in the world are massively parallel computers, which use thousands or even tens of thousands of processors.
In the early days of parallel computing, many people were skeptical about it for a couple of reasons, the main being Amdahl’s law. It stated that no matter how many parallel processor you use, there will be at least 10% of the task that needs to be done sequentially and hence the task completion time time does not do down as rapidly as one uses more and more processors. Soon it was realized that most of the tasks had negligible amount of sequential component. By smart design, one could parallelize many tasks.
Highly parallel computers are now fairly common. They are used mostly in very large numerical calculations (like the weather simulation) or in large database calculations, such as extracting marketing data from credit card transactions. Since parallel computers are built of the same parts as personal computers, they are likely to become less expensive and more common with time. One of the most interesting parallel computers today is the one that is emerging almost by accident from the networking of sequential machines. The worldwide network of computers called the Internet is still used primarily as a communications system for people. The computers act mostly as a medium-storing and delivering information (like electronic mail) that is meaningful only to humans.Already standards are beginning to emerge that allow these computers to exchange programs as well as data. The computers on the Internet, working together, have a potential computational capability that far surpasses any individual computer that has ever been constructed.
Computers that learn and adapt
The author gives some basic ideas on which adaptive learning has been explored via algorithms. Using the example of neutral networks, perceptron, etc. the author manages to explain the key idea of machine learning.
Brain cannot be analyzed via the usual "divide and conquer" mechanism that is used to understand "sequential computer". As long as the function of each part is carefully specified and implemented, and as long as the interactions between the parts are controlled and predictable, this system of "divide and conquer" works very well, but an evolved object like the brain does not necessarily have this kind of hierarchical structure. The brain is much more complicated than a computer, yet it is much less prone to catastrophic failure. The contrast in reliability between the brain and the computer illustrates the difference between the products of evolution and those of engineering. A single error in a computer’s program can cause it to crash, but the brain is usually able to tolerate bad ideas and incorrect information and even malfunctioning components. Individual neurons in the brain are constantly dying, and are never replaced; unless the damage is severe, the brain manages to adapt and compensate for these failures. Humans rarely crash.
So, how does on go about designing something that is different from "engineering"? The author illustrates this via "sorting numbers" example in which one can go about as follows
Generate a "population" of random programs
Test the population to find which programs are the most successful.
Assign a fitness score to each program based on how successfully they sort the numbers
Create new populations descended from the high-scoring programs. One can think of many ways here; only the fittest survive / "breed" new programs by pairing survivors in the previous generation
When the new generation of programs is produced, it is again subjected to the same testing and selection procedure, so that once again the fittest programs survive and reproduce. A parallel computer will produce a new generation every few seconds, so the selection and variation processes can feasibly be repeated many thousands of times. With each generation, the average fitness of the population tends to increase-that is, the programs get better and better at sorting. After a few thousand generations, the programs will sort perfectly.
The author confesses that the output from the above experiments works, but it is difficult to "understand" why it works.
One of the interesting things about the sorting programs that evolved in my experiment is that I do not understand how they work. I have carefully examined their instruction sequences, but I do not understand them: I have no simpler explanation of how the programs work than the instruction sequences themselves. It may be that the programs are not understandable-that there is no way to break the operation of the program into a hierarchy of understandable parts. If
this is true-if evolution can produce something as simple as a sorting program which is fundamentally incomprehensible-it does not bode well for our prospects of ever understanding the human brain.
The chapter ends with a discussion about building a thinking machine.
The book explains the main ideas of "computer science" in simple words. Subsequently, it discusses many interesting areas that are currently being explored by researchers and practitioners. Anyone who is curious about things like "How do computers work ?", "What are the limitations of the today’s computer hardware and software ?", etc. and wants to get some idea about them, without getting overwhelmed by the answers, will find this book interesting.
The author tries to answer two questions via this book :
What are the core elements of a fulfilling work ?
How should we make a change to our current path so that our work is in line with our being.
Give meaning to work
The author talks about five aspects relevant to work :
making a difference
following our passions
using our talents
The author categorically states the earning money and achieving status might not give meaning to our work. Most often than not, one gets on to a "hedonistic" treadmill that is difficult to get off. Instead pursuing "making a difference" path is a better option. It might lead an individual to try out a variety of jobs, learn a variety of skills, and become a portfolio worker. It is one of the ways to explore "many selves" that lie within us. The chapter also explores the various challenges that one might have to face in following any of the above paths. Personally I find "using your talents" path to be quite appealing. I have taken a path of "developing talents and then doing your bit in making your talents useful to others". If you forget the monetary aspect and status aspect of any work and immerse yourself in developing talents, I guess one might find that learning a specific skill and putting it to use, could be a wonderful experience. There are obvious challenges that one must encounter on day to day basis. However I guess the "love" element towards your work will give all the strength to face the challenges. The author’s message from this chapter is that there are three core elements of a fulfilling work, i.e. meaning, flow and freedom. Each of these elements are concisely discussed via examples and anecdotes. Light reading but there is no fluff here.
If you are going to get good at something you need a tunnel vision.
– Wayne Davies
Act First, Reflect Later
One often hears stories of somebody abandoning a certain career and heading out to do something radical. These stories are inspiring but they need a lot of courage. Not many would be in a state of mind to take such extreme step. Instead the author suggests three alternate ways
branching projects : start doing things in the context of your project on weekends or mini-breaks
conversational research : talk to people who might be working in the field that you want to work, get to know some details of the work etc.
The above three suggestions involve doing first and then reflecting later. While working on alternatives, one must seek out "flow" experiences. These tend to be activities in which we lose the sense of time. However one must always remember that there might be some activities where the struggle itself could be something that we find rewarding. Imagine understanding a certain branch of math. To get to "flow" state will take a LOT of time. However the struggles that you take up on a day to day basis and the incremental understanding of the subject could provide the kick needed to be continue working. To get more clarity about "flow" experiences, it is better to keep a monthly log of such experiences and reflect on them from time to time.
The Longing for Freedom
Based on many interviews with independent consultants, the author says that "bespoke consultant job" is both wonderful and awful. However once experiencing the freedom of a freelancer, most of them say that they would never trade-in with a 9-to-5 job. On the contrary, one can explore even in a routine job. There is a story about Wallace Stevens which goes like this:
By day he worked in an insurance company, eventually becoming vice-president of an established firm in Connecticut. But he was no workaholic: he returned home each evening to write verse, and was considered one of the great modernist poets of the early twentieth century. Stevens kept these two lives separate: he always felt something of an imposter in his day job; it was "like playing a part", he wrote. He regarded poetry as his "real work"- even if he wasn’t paid for it- and never wanted to commercialize his art by becoming a "professional" poet. After winning the Pulitzer Prize in 1955 he was offered a faculty position at Harvard that would have allowed him to write poetry for a living, but he turned it down to stay in his insurance job.
How to grow a vocation
The book ends with a saying that there is no right vocation that one can "find". One has to "grow" in to it. Once you start cultivating your talents, you see that those talents will give you the freedom to do certain things, and by taking those actions, you grow in to a vocation. This opinion is similar to the one offered by Cal Newport in his book,"So good that they can’t ignore you". The story of Marie curie illustrate the point that there is no magical calling that will become apparent to you one fine day. Real vocation is one where you grow in to it, rather than finding one.
The book is very compact and can be read in a few hours. What I like about the book is the various stories that the author manages to put in, so that the main takeaways of the book are implicit and do not sound preachy.
In this post, I will attempt to briefly summarize the main points of the book
An optimistic skeptic
The chapter starts off by saying that there are indeed people in the world who become extremely popular, make tons of money, get all the press coverage, by providing a perfect explanation, after the fact. The author gives one such example of a public figure who rose to fame explaining events post-fact, Tom Friedman. At the same time, there are many people who are lesser known in the public space but have an extraordinary skill at forecasting. The purpose of the book is to explain how these ordinary people come up with reliable forecasts and beat the experts hands down.
Philip Tetlock, one of the authors of the book is responsible for a landmark study spanning 20 years(1984-2004) that compares experts predictions and random predictions. The conclusion was, the average expert had done little better than guessing on many of the political and economic questions. Even though the right comparison should have been a coin toss, the popular financial media used "dart-throwing chimpanzee pitted against experts" analogy. In a sense, the analogy was more "sticky" than the mundane word, "random". The point was well taken by all; the experts are not all that good in predicting outcomes. However the author feels disappointed that his study has been used to dish out extreme opinions about experts and forecasting abilities such as, "all expert are useless". Tetlock believes that it is possible to see into the future, at least in some situations and to some extent, and that any intelligent, open-minded, and hardworking person can cultivate the requisite skills. Hence one needs to have "optimistic" mindset about predictions. It is foolhardy to have a notion that all predictions are useless.
The word "Skeptic" in the chapter’s title reflects the mindset on must possess in this increasingly nonlinear world. The chapter mentions an example of Tunisian man committing suicide that leads to a massive revolution in the Arab world. Could anyone have predicted such catastrophic ripple effects of a seemingly common event ? It is easy to look backward and sketch a narrative arc, but difficult to actually peer in to the future and forecast? To make effective predictions, the mindset should be that of an "optimistic skeptic".
So is reality clock-like or cloud-like? Is the future predictable or not? These are false dichotomies, the first of many we will encounter. We live in a world of clocks and clouds and a vast jumble of other metaphors. Unpredictability and predictability coexist uneasily in the intricately interlocking systems that make up our bodies, our societies, and the cosmos. How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances.
In fields where forecasts have been reliable and good, one sees that the people who make these forecasts follow, Forecast, measure, revise. Repeat. procedure. It’s a never-ending process of incremental improvement that explains why weather forecasts are good and slowly getting better. Why is this process non-existent in many stock market predictions, macro economic predictions? The author says that it is a demand-side problem. The consumers of forecasting don’t demand evidence of accuracy and hence there is no measurement.
Most of the readers and general public might be aware of the research done by Tetlock that produced the dart-throwing chimpanzee article. However this chapter talks about another research study that Tetlock and his research partner&wife started in 2011, Good Judgment Project. The couple invited volunteers to sign up and answer well designed questions about the the future. In total, there were 20,000 people who volunteered to predict event outcomes. The author collated the predictions from this entire crew of 20,000 people and played a tournament conducted by IARA, an intelligence research agency. The game comprised predicting events spanning a month to a year in to the future. It was held between 5 teams, one of which was GJP. Each team would effectively be its own research project, free to improvise whatever methods it thought would work, but required to submit forecasts at 9 a.m. eastern standard time every day from September 2011 to June 2015. By requiring teams to forecast the same questions at the same time, the tournament created a level playing field-and a rich trove of data about what works, how well, and when. Over four years, IARPA posed nearly five hundred questions about world affairs. In all, one million individual judgments about the future. In all the years, the motley crowd of forecasters of The Good Judgment project beat the experts hand down. The author says that there are two major takeaways from the performance of GJP team
- Foresight is real : They aren’t gurus or oracles with the power to peer decades into the future, but they do have a real, measurable skill at judging how high-stakes events are likely to unfold three months, six months, a year, or a year and a half in advance.
- Forecasting is not some mysterious gift : It is the product of particular ways of thinking, of gathering information, of updating beliefs. These habits of thought can be learned and cultivated by any intelligent, thoughtful, determined person.
The final section of the first chapter contains author’s forecast on the entire field of forecasting. With machines doing most of the cognitive work, there is a threat the forecasting done by humans will be no match to that supercomputers. However the author feels that humans are underrated(a book length treatment has been given by Geoff Colvin). In the times to come, the best forecasts would result from a combination of human-machine teams rather than humans only or machines only forecasts.
The book is mainly about specific type of people, approximately 2% of the volunteered forecasters who did phenomenally well. The author calls them "superforecasters". Have they done well because of luck or skill? If it is skill, what can one learn from them ? are some of the teasers posed in the first chapter of the book.
Illusions of Knowledge
Given the number of books and articles that have been churned out, talking about our cognitive biases, there is nothing really new in the chapter. The author reiterates the System-1 and System-2 thinking from Daniel Kahneman’s book. He also talks about the perils of being over-confident of our own abilities. He talks about various medical practices that were prevalent before the advent of clinical trials. Many scientists advocated medicine based on their "tip-of-your-nose" perspective without vetting their intuitions.
The tip-of-your-nose perspective can work wonders but it can also go terribly awry, so if you have the time to think before making a big decision, do so-and be prepared to accept that what seems obviously true now may turn out to be false later.
The takeaway from this chapter is obvious from the title of the chapter. One needs to weigh in both System-1 and System-2 thinking in most of our decisions. An example of Magnus Carlsen is given that illustrates this kind of mixed thinking. In an interview, the grandmaster disclosed that his intuition tells him what are the possible steps immediately(10 seconds), and he spends most of the time double checking his intuition. Only then does he make the next move in a chess tournament. Its an excellent practice to mix System-1 thinking and System-2 thinking, but one requires conscious effort to do that.
The chapter starts with the infamous statement of Steve Ballmer who predicted that iPhone was not going to have a significant market share. To evaluate Ballmer’s forecast in a scientific manner, the author looks at the entire content of Ballmer’s speech and says that there are many vague terms in the statement that it is difficult to give a verdict on the forecast. Another example is the "open letter" to Bernanke that was sent by many economists to stop QE to restore stability. QE did not stop and US has not seen any of the dire consequences that economists had predicted. So, is the forecast wrong ? Again the forecast made by economists is not worded precisely in numerical terms so that one can evaluate it. The basic message that the author tries to put across is, "judging forecasts is difficult".
The author’s basic motivation to conduct a study on forecasting came while sitting on a panel of experts who were asked to predict the future of Russia. Many of the forecasts were a complete disaster. However that did not make them humble. No matter what had happened the experts would have been just as adept at downplaying their predictive failures and sketching an arc of history that made it appear they saw it coming all along. In such scenario, how does one go about testing forecasts ? Some of the forecasts have no time lines. Some of the forecasts are worded in vague terms. Some of them are not worded in numbers. Even if there are numbers, the event happened cannot be repeated and hence how does one decide whether it is luck or skill ? We cannot rerun history so we cannot judge one probabilistic forecast- but everything changes when we have many probabilistic forecasts. Having many forecasts helps one pin down two essential features of any forecast analysis, i.e. calibration and resolution. Calibration involves testing whether the forecast and the actual are in sync. Resolution involves whether the forecast involved are decisive probabilistic estimate and not somewhere around 40%-60%. The author takes all the above thoughts in to consideration and starts a 20 year project from 1984-2004 that goes like this :
- assemble experts in various fields
- ask a large number of questions with precise time frames and unambiguous language
- require that forecast be expressed using numerical probability scales
- measure the calibration of the forecasters
- measure the resolution of the forecasters
- use brier score to evaluate the distance between the forecast and the actual
The author patiently conducts the study for 20 years to see the results of all the forecasts. The following are the findings/insights from the project :
To make a good analogy, the author says big idea thinkers are akin to "Hedgehogs" and many idea thinkers are akin to "foxes"
Foxes were better forecasters than Hedgehogs
Foxes don’t fare well with the media. Media likes authoritative statements to probabilistic statements.
Aggregating among a diverse set of opinions beats hedgehogs. That’s why averaging from several polls gives a better result than single poll. This doesn’t mean "wisdom of any sort of crowd" works. It means "wisdom of certain type of crowd" works.
The best metaphor for developing various perspective is to have a dragonfly eye. Dragonflies have two eyes, but theirs are constructed very differently. Each eye is an enormous, bulging sphere, the surface of which is covered with tiny lenses. Depending on the species, there may be as many as thirty thousand of these lenses on a single eye, each one occupying a physical space slightly different from those of the adjacent lenses, giving it a unique perspective. Information from these thousands of unique perspectives flows into the dragonfly’s brain where it is synthesized into vision so superb that the dragonfly can see in almost every direction simultaneously, with the clarity and precision it needs to pick off flying insects at high speed. A fox with the bulging eyes of a dragonfly is an ugly mixed metaphor but it captures a key reason why the foresight of foxes was superior to that of hedgehogs with their green-tinted glasses. Foxes aggregate perspectives.
Simple AR(1), EWMA kind of models performed better than hedgehogs and foxes
The chapter starts off recounting the massive forecasting failure from the National Security Agency, the Defense Intelligence Agency, and thirteen other agencies that constitute the intelligence community of US government. These agencies had a consensus view that IRAQ had weapons of mass destruction. This view made everyone support Bush’s policy of waging the Iraq war. After the invasion in 2003, no WMDs were found. How come the agencies that employ close to twenty thousand intelligence analysts were so wrong? Robert Jervis who has critically analyzed the performance of these agencies over several decades says that the judgment was a reasonable one but wrong. This statement does require some explanation and the author provides the necessary details. The takeaway from the story is that the agencies did some errors that would have scaled back the probability levels that were associated with the consensus view. Who knows it would have changed the course of Iraq’s history?
After this failure, IARPA(Intelligence Advanced Research Projects Activity) was created in 2006. Its mission was to fund cutting-edge research with the potential to make the intelligence community smarter and more effective. They approach the author with a specific type of game in mind. IARPA’s plan was to create tournament-style incentives for top researchers to generate accurate probability estimates for Goldilocks-zone questions. The research teams would compete against one another and an independent control group. Teams had to beat the combined forecast-the "wisdom of the crowd"-of the control group. In the first year, IARPA wanted teams to beat that standard by 20%-and it wanted that margin of victory to grow to 50% by the fourth year. But that was only part of IARPA’s plan. Within each team, researchers could run experiments to assess what really works against internal control groups. Tetlock’s team beat the control group hands down. Was it luck ? Was it the team had a slower reversion to mean ? Read the chapter to judge it for yourself. Out of several volunteers that were involved GJP, the author finds that there were certain forecasters who were very extremely good. The next five chapters are all about the way superforecasters seem go about forecasting. The author argues that there are two things to note from GJP’s superior performance :
- We should not treat the superstars of any given year as infallible. Luck plays a role and it is only to be expected that the superstars will occasionally have a bad year and produce ordinary results
- Superforecasters were not just lucky. Mostly, their results reflected skill.
The set of people whom the author calls superforecasters do not represent a random sample of people. So, the team’s outcome is not the same thing as collating predictions from a large set of random people. These people are different, is what the author says. But IQ or education are not the boxes based on which they can be readily classified. The author reports that in general the volunteers had higher IQ than others but there was no marked distinction between forecasters and superforecasters. So it seems intelligence and knowledge help but they add little beyond a certain threshold-so superforecasting does not require a Harvard PhD and the ability to speak five languages.
The author finds that superforecasters follow a certain way of thinking that seems to be marking better forecasters
- Good back of the envelope calculations
- Starting with outside view that reduces anchoring bias
- Subsequent to outside view, get a grip on the inside view
- Look out for various perspectives about the problem
- Think thrice/four times, think deeply to root out confirmation bias
- It’s not the raw crunching power you have that matters most. It’s what you do with it.
Most of the above findings are not groundbreaking. But what it emphasizes is that good forecasting skills do not belong to some specific kind of people. It can be learnt and consciously cultivated.
For superforecasters, beliefs are hypotheses to be tested, not treasures to be guarded. It would be facile to reduce superforecasting to a bumper-sticker slogan, but if I had to, that would be it.
Almost all the superforecasters were numerate but that is not what makes their forecasts better. The author gives a few examples which illustrate the mindset that most of us carry. It is the mindset of Yes, No and Maybe, where Yes mean very almost certainty, No means almost impossible and Maybe means 50% chance. This kind of probabilistic thinking with only three dials does not help us become a good forecasters. Based on the GJP analysis, the author says that the superforecasters have a more fine grained sense of probability estimates than the rest of forecasters. This fine grained probability estimates are not a result of some complex math model, but are a result of careful thought and nuanced judgment.
The chapter starts with the author giving a broad description of the way a superforecaster works:
Unpack the question into components. Distinguish as sharply as you can between the known and unknown and leave no assumptions unscrutinized. Adopt the outside view and put the problem into a comparative perspective that downplays its uniqueness and treats it as a special case of a wider class of phenomena. Then adopt the inside view that plays up the uniqueness of the problem. Also explore the similarities and differences between your views and those of others-and pay special attention to prediction markets and other methods of extracting wisdom from crowds. Synthesize all these different views into a single vision as acute as that of a dragonfly. Finally, express your judgment as precisely as you can, using a finely grained scale of probability.
One of the things that the author notices about superforecasters is their tendency to make changes to the forecasts frequently. As things/facts change around them, they revise their forecasts. This begs the question, "Does the initial forecast matter ?". What if one starts with a vague prior and keep updating it based on the changing world. The GJP analysis shows that superforecasters initial estimates were 50% more accurate than the regular forecasters. The real takeaway is that "updating matters";frequent updating is as demanding as challenging and it is a huge mistake to belittle belief updating. Both under and overreaction to events happening can diminish accuracy. Both can also, in extreme cases, destroy a perfectly good forecast. Superforecasters have little ego invested in their initial judgments and the subsequent judgments. This makes them update their forecasts far quicker than other forecasters. Superforecasters update frequently and update in smaller increments. Thus they tread the middle path between over forecasting and underforecasting. The author mentions one superforecaster who uses Bayes theorem to revise his estimates. Does that mean Bayes is the answer to getting forecasts accurate? No, says the author. He found that even though all the superforecasters were numerate enough to apply Bayes, but nobody actually crunched numbers that explicitly. The message is that all the superforecasters appreciate the Bayesian spirit, though none had explicitly used a formula to update their forecasts. But not always "small updations" work. The key idea that the author wants to put across is that there is no "magic" way to go about forecasting. Instead there are many broad principles with lots of caveats.
The author starts off by talking about Carol Dwecks’ "growth mindset" principle and says that this is one of the important traits of a superforecaster.
We learn new skills by doing. We improve those skills by doing more. These fundamental facts are true of even the most demanding skills. Modern fighter jets are enormously complex flying computers but classroom instruction isn’t enough to produce a qualified pilot. Not even time in advanced flight simulators will do. Pilots need hours in the air, the more the better. The same is true of surgeons, bankers, and business executives.
It goes without saying that practice is key to becoming good. However it is actually "informed practice" that is the key to becoming good. Unless there is a clear and timely feedback about how you are doing, the quantity of practice might be an erroneous indicator of your progress. This idea has been repeated in many books in the past few years. An officer’s ability to spot a liar is generally poor because the feedback of his judgment takes long time to reach him. On the other hand, people like meteorologists, seasoned bridge players learn from failure very quickly and improve their estimates. I think this is the mindset of a day trader in financial markets. He makes a trade, he gets a quick feedback about it and learns from the mistakes. If you take a typical mutual fund manager and compare him/her with a day trader, the cumulative feedback that the day trader receives is far more than what an MF manager receives. Read any indexing book and you will always read arguments debating whether Mr.XYZ was a good fund manager or not. You can fill in any name for XYZ. Some say luck, Some say skill. It is hard to tease out which is which when the data points are coarse grained. However if you come across a day trader who consistently makes money for a decent number of years, it is hard to attribute luck to his performance, for the simple reason that he has made far more trades cumulatively than an MF manager. The basic takeaway at least for a forecaster is that he/she must know when the forecast fails. This is easier said/written than done. Forecasts could be worded in ambiguous language, the feedback might have a large time lag like years by which time our flawed memories can no longer remember the old forecast estimate. The author gives a nice analogy for forecasters who do not have timely feedback. He compares them with basketball players doing free throws in the dark.
They are like basketball players doing free throws in the dark. The only feedback they get are sounds-the clang of the ball hitting metal, the thunk of the ball hitting the backboard, the swish of the ball brushing against the net. A veteran who has taken thousands of free throws with the lights on can learn to connect sounds to baskets or misses. But not the novice. A "swish!" may mean a nothing-but-net basket or a badly underthrown ball. A loud "clunk!" means the throw hit the rim but did the ball roll in or out? They can’t be sure. Of course they may convince themselves they know how they are doing, but they don’t really, and if they throw balls for weeks they may become more confident-I’ve practiced so much I must be excellent!-but they won’t get better at taking free throws. Only if the lights are turned on can they get clear feedback. Only then can they learn and get better.
Towards the end of this chapter, the author manages to give a rough composite portrait of a superforecaster :
||Nothing is certain
||Reality is infinitely complex
||What happens is not meant to be and does not have to happen
|Abilities and thinking styles
||Beliefs are hypotheses to be tested, not treasures to be protected
||Intelligent, Knowledgeable with a need for cognition
||Intellectually curious, enjoy puzzles and mental challenges
||Introspective and self-critical
||Comfortable with numbers
|Methods of forecasting
||Not wedded to any idea or agenda
||Capable of stepping back from the tip-of-your-nose perspective and considering other views
||Value diverse views and synthesize them into their own
||Judge using many grades of maybe
||When facts change, they change their minds
||Aware of the value of checking thinking for cognitive and emotional biases
||Believe it’s possible to get better
||Determined to keep at it however long it takes
The author says that the single most predictor of rising to the ranks of superforecasters is to be in a state of "perpetual beta".
The author uses GJP as a fertile ground to ask many interesting questions such as :
- When does "wisdom of crowd" thinking help ?
- Given a set of individuals, does weighing team forecasts work better than weighting individual forecasts?
- Given that there are two groups, forecasters and superforecasters, does acknowledging superforecasters after the year 1 performance works in improving or degrading the subsequent year performance for superforecasters?
- How do forecasters perform against prediction markets ?
- How do superforecasters perform against prediction markets ?
- How do you counter "groupthink" amongst a team of superforecasters?
- Does face-to-face interaction amongst superforecasters help/worsen the forecast performance ?
- If aggregation of different perspectives gives better performance, should the aggregation be based on ability or diversity ?
These and many other related questions are taken up in this chapter. I found this chapter very interesting as the arguments made by the author are based on data rather than some vague statements and opinions.
I found it difficult to keep my attention while reading this chapter. It was trying to address some of the issues that typical management books talk about. I have read enough management BS books that my mind has become extremely repulsive to any sort of general management gyan. May be there is some valuable content in this chapter. May be there are certain type of readers who will find the content in the chapter appealing.
Are they really Super?
The chapter critically looks at the team of superforecasters and tries to analyze viewpoints of various people who don’t believe that superforecasters have done something significant. The first skeptic is Daniel Kahneman who seems to be of the opinion that there is a scope bias in forecasting. Like a true scientist, the author puts his superforecasting team in a controlled experiment that gives some empirical evidence that supeforecasters are less prone to scope bias. The second skeptic that the author tries to answer is Nassim Taleb. It is not so much as an answer to Taleb, but an acknowledgement that superforecasters are different. Taleb is dismissive of many forecasters as he believes that history jumps and these jumps are blackswans (highly improbable events with a lot of impact). The author defends his position by saying
If forecasters make hundreds of forecasts that look out only a few months, we will soon have enough data to judge how well calibrated they are. But by definition, "highly improbable" events almost ever happen. If we take "highly improbable" to mean a 1% or 0.1% or 0.0001% chance of an event, it may take decades or centuries or millennia to pile up enough data. And if these events have to be not only highly improbable but also impactful, the difficulty multiplies. So the first-generation IARPA tournament tells us nothing about how good superforecasters are at spotting gray or black swans. They may be as clueless as anyone else-or astonishingly adept. We don’t know, and shouldn’t fool ourselves that we do.
Now if you believe that only black swans matter in the long run, the Good Judgment Project should only interest short-term thinkers. But history is not just about black swans. Look at the inch-worm advance in life expectancy. Or consider that an average of 1% annual global economic growth in the nineteenth century and 2% in the twentieth turned the squalor of the eighteenth century and all the centuries that preceded it into the unprecedented wealth of the twenty-first. History does sometimes jump. But it also crawls, and slow, incremental change can be profoundly important.
So, there are people who trade or invest based on blackswan thinking. Vinod Khosla invests in many startups so that one of them can be the next google. Taleb himself played with OTM options till one day he cracked it big time. However this is the not the only kind of philosophy that one can adopt. A very different way is to beat competitors by forecasting more accurately-for example, correctly deciding that there is a 68% chance of something happening when others foresee only a 60% chance. This is the approach of the best poker players. It pays off more often, but the returns are more modest, and fortunes are amassed slowly. It is neither superior nor inferior to black swan investing. It is different.
What Next ?
The chapter starts off by giving a few results of the opinion polls that were conducted before the Scotland’s referendum of joining UK. The numbers show that there was no clear sign of which way the referendum would go. In any case, the final referendum was voted NO. It was hard to predict the outcome. There was one expert/pundit, Daniel Drezner, who came out in the open and admitted that it is extremely easy to give an explanation after the fact but doing so, before the fact forecast, is a different ball game. Drezner also noted that he himself had stuck to NO for sometime before switching to YES. He made an error while correcting his prior opinion. As a learning, he says, in the future he would give a confidence interval for the forecast, rather than a binary forecast. The author wishes that in the future many more experts/forecasters adopt the confidence interval mindset. This shift from point estimate to interval estimate might do a world of good, says the author. What will this 500 page book do to the general reader/society ? The author says that there could be two scenarios.
- Scenario 1: forecasting is mainly used to advance a tribes interests. In all such situations, the accuracy of the forecast would be brushed aside and whoever makes the forecast that suits the popular tribe will be advertised and sadly actions will be taken based on these possibly inaccurate forecasts. This book will be just another book on forecasting that is good to read, but nothing actionable will come out of it.
- Scenario 2 : Evidence based forecasting takes off. Many people will demand accuracy, calibration results of experts
Being an optimistic skeptic, the author feels that evidence based forecasting will be adopted in the times to come. Some quantification is always better than no quantification(which is want we see currently). The method or system used in the forecasting tournament to come out ahead is a work-in-progress, admits the author. However that doesn’t mean it is not going to improve our forecasting performance.
Towards the end of the book, the author does seem to acknowledge the importance of Tom Friedmans of the world, not because of their forecasting ability. It is their vague forecasts that are actually superquestions for the forecasters. Whenever pundits give their forecasts in a imprecise manner, that serves as the fodder for all the forecasters to actually get to work. The assumption the author makes is that superforecasters are not superquestioners. Superquestioners are typically hedgehogs who have one big idea, think deeply and see the world based on that one big idea. Superforecasters, i.e. foxes are not that good at churning out big questions, is what the author opines. In conclusion, he says an ideal person would be a combination of superforecaster and superquestioner.
This book is not "ONE BIG IDEA" book. Clearly the author is on the side of foxes and not hedgehogs. The book is mainly about analyzing the performance a specific set of people from a forecasting team that participated in IARPA sponsored tournament. The book looks at these superforecasters and spells out a number of small but powerful ideas/principles that can be cultivated by anyone, who aspires to become a better forecaster.