The book titled, "The Golden ticket", gives a non-technical introduction to the most important unsolved problem in the computer science,"Whether P = NP?". If you are curious about knowing this problem and do not want to slog through the dense math behind it, then this book might be a perfect introduction to the problem. The author makes sure that the concepts are introduced using plain English and nothing more. By the time you are done with the book, I bet you will start appreciating a few if not all the NP-complete problems that are yet to be cracked. In this post, I will try to summarize briefly the main points of the book.

Preface

The author starts off by stating the P vs. NP challenge as one of seven recognized Millennium prize problems by the Clay Mathematical Institute. P refers to the class of problems that can be solved efficiently using computers. Efficient here means using a polynomial time algorithm. NP refers to the class of problems to which we would like to find the best solution. Another way to state NP class of problems is the set of problems for which a we can efficiently check its solution. So, the million dollar question is, whether the set of all problems that can be solved efficiently and the set of problems that for which we can efficiently check its solution are one and the same. For someone who is coming across this for the first time, it might be a bit puzzling. Why is "having a quick algo for verifying a solution?" for a problem important ? What’s the connection between an algo used for verifying a solution to a problem and an algo used for solving the problem? This question is answered elaborately in the chapters to follow.

The Golden Ticket

The author clarifies the intent behind using the phrase, "The Golden Ticket", in the title of the chapter. It acts as a cue to the story of a candy manufacturer who places a handful of golden tickets inside chocolate bars, hidden among tens of millions of bars produced each year. The finders of the golden tickets will get a free factory tour. If you want to desperately find the golden ticket, How do you find one ? No matter how you find the ticket, you will need a considerable amount of time, money and luck. This is a nice example to have in mind as it reminds the user that there are problems that we encounter in our lives for which the solutions take considerable amount of time, no matter how much computing power you have.

The chapter introduces the traveling salesman problem, a problem that involves selecting the best route to minimize the total distance travelled. The problem looks deceptively simple. In fact for 48 city tour, if you intend to check all the tour combinations on a laptop, it will take 10 trillion trillion times the current age of the universe. There are many problems such as TSP that fall under NP category, i.e the class of problems for which we want to find the solution. P refers to the class of problems for which one can find an efficient solution. P=NP means we can quickly find a solution to all the NP problems. P ≠ NP means we can’t. Do we come across NP problems in real world or Are they just a figment of imagination for a pure mathematician ? Well, NP-complete problems are everywhere. For example, when you plug in your start and end points in a GPS system, the possible combinations of routes are too many for one to do a exhaustive search. Instead the GPS finds the best route heuristically.

The chapter ends with a list of problems that the author promises to answer via this book, i.e.

1. What are P and NP?
2. What kinds of problems do they capture?
3. What are NP-complete problems, the hardest of all search problems?
4. How do these problems capture the P versus NP question?
5. How did we get to P and NP?
6. How do we approach a proof that P ≠ NP ?
7. Kurt Gödel showed that mathematics cannot solve every problem. Can similar techniques show that there are search problems we can’t solve quickly?
8. What good can come out of P ≠ NP?
9. Can future computers based on quantum mechanics make the P vs.NP problem irrelevant?
10. What about the future? The great challenges of computing still lie ahead of us. a) How do we deal with computers that have to work together to solve problems? b) How do we analyze the massive amount of data we generate every day? c) What will the world look like when we network everything?

The Beautiful World

The author gives a futuristic tour of various domains, if P=NP were to be proved conclusively. There are examples on the brighter side as well as the darker side. Cancers can be cured, Sports as we know it will get completely revolutionized, Politics and the whole activity of campaigning will turn in to a game of algos, Police would be crack cases in days what used to take months and years. However the dark side of it is that machines will become so damn powerful that there will be privacy issues, public key-private key encryption will be falter and ecommerce will fall back to a more primitive encryption, more people will lose jobs as machines become efficient in doing cognitive jobs too. These scenarios help a reader to visualize the future if ever P=NP were to be proved.

P and NP

The author uses an example of a fictional land,"Frenemy" to illustrate various types of problems that fall under class P. What does Frenemy look like?

Frenemy has about 20,000 inhabitants. Every individual seems normal, but put two in close vicinity and a strange thing happens. Either the two take an instant liking to each other, immediately becoming the best of friends, or they take one look at each other and immediately become the worst of enemies. Despite the Frenemy name, two inhabitants are never observed taking a middle ground; they are always either close friends or distant enemies.

If one were to check whether six degrees of freedom holds good for any two individuals in Frenemy by brute force, i.e. counting all the possible links of six links, this problem will take more than 100 years to compute. There are simple algos that make such tasks computationally tractable. To find the lengths of separations between all pairs of inhabitants of Frenemy, one can use Floyd-Warshall algorithm, which will compute all these distances in about eight trillion computation steps. Given the modern day computing power of about billion operations per second, this can be done pretty quickly. The author uses various types of examples using Frenemy citizens to illustrate various problems and algos.

1. Claude Berge Algorithm
2. Jack Edmonds Algorithm – In a paper, he was the first to use the term,"algebraic time" to capture the intuitive idea of efficiency of an algorithm. Later the term was replaced by "polynomial time". The class of problems that can be solved in "polynomial time" is referred to as P.
3. No mathematically rigorous proof that shows a map can be colored using four colors.
4. Algorithm for finding Eulerian path
5. Min-cut Problem in graph theory
6. No polynomial time algo for finding Cliques
7. No polynomial time for finding Hamiltonian Cycles
8. No polynomial time for Max-cut problem
9. No polynomial time for coloring maps

The last four of the above problems, as they stand today, have no polynomial time algo to find the solution. However given a potential solution, you can easily check whether it is indeed a solution or not. The class of problems such as these is called NP.

There are a ton of problems for which we do not know how to find the solutions quickly. The author goes on to give a list of NP problems in various fields

1. Genomics – Current DNA sequencing technologies can only sequence about 1000 base pairs at a time and shortest chromosome has 47 million base pairs. How best to put these pairs together and get a full sequence is an NP problem
2. How does a protein fold based on mRNA sequence is also a NP hard problem
3. Optimization problems can be NP hard problems. Finding the minimum energy of a physical system can be an NP hard problem
4. There are many NP problems in game theory

The Hardest Problems in NP

A logic expression is satisfiable if there is someway to find a solution that satisfies the logic expression. If there is a quick algorithm for solving the satisfiable problem, then by "reduction", you can use the same algorithm for solving the original problem. The author shows the clique problem can be reduced to satisfiability problem. It was Steve Cook, an University of Toronto professor who showed that "Satisfiability" is a good candidate for an interesting problem not efficiently computable. Even though Cook was the first to notice the importance of the problem, it was Richard Karp, a Berkley professor who converted the satisfiability problem to clique problem. Cook and Karp’s work meant that satisfiability for a clique is easy to compute if and only if clique was easy to compute. Thanks to the connection between satisfiability and solvability, there were now many different problems that are equivalent in terms of computational hardness. Solve any of the problem efficiently, and you have the solution to all the problems, i.e P=NP. These problems did not emanate from some pure mathematician’s head but arose from the practical world. It was Donald Knuth who ran a poll to figure out a name for the set of problems in NP that are hardest and the poll suggested the name, "NP complete".

The author gives the following examples from the real world that are NP-complete:

1. Scheduling operations in a Coke plant
2. How to go for a Disney tour so that you spend minimum amount of time ?
3. Figuring out the dominating set in a graph
4. Large Sudoko games
5. Minesweeper game
6. Tetris
7. Graph Isomorphism
8. Factoring a prime number
9. Integer Linear Programming problems

The Prehistory of P versus NP

Another way to verbalize P=NP is : Do you need a brute force search to solve a problem or do you have a smart algorithm for solving the problem ? This chapter gives the historical background to the P versus NP problem. There are two strands of developments mentioned, one in the Russian camp and other the Western nations camp. The following personalities figure on the Western nations camp :

• Warren McCulloch
• Walter Pitts
• Juris Hartmanis and Richard Stearns
• Jack Edmonds
• Alan Cobham
• Steve Cook
• Richard Karp

And on the Russian side,the following spearheaded the effort:

• Sergey Yablonsky, who first focused on perebor for finding the smallest circuits for computing functions but whose power and arrogance worked against the development of computational complexity in Russia.
• Andrey Kolmogorov, the great Russian mathematician, who measured complexity via algorithmic information.
• Kolmogorov’s student Leonid Levin, who independently developed the P versus NP problem and the theory of NP-completeness.

Dealing with Hardness

NP-complete problems are nasty creatures. Should one give up whenever one stumbles on to a variant of NP-complete problem. Not necessarily. This chapter gives a list of alternative approaches that can be used to solve moderate sized problems.

• Brute force : Thanks to Moore’s law, the computing power of today’s computers help one use a brute force approach to find the solution. For example TSP on 1000 cities can be solved using today’s technology
• Heuristics
• Search for a smaller version of the problem
• Approximate solutions

There are many other problems though, that do not yield to any of the above hacks and they are many out there that are NP-hard.

Proving P ≠ NP

This chapter provides a list of approaches that various people have taken to prove that P is not equal to NP

• Using Paradoxes : The author lists "I am liar" paradox to show that in mathematics, there is a certain appeal to using paradoxes. Godel upended the foundations of mathematics by showing that there are true statements that one cannot prove as true. Halting problem is another type of problem that can be understood via paradox. There simply can never be any program that can tell whether another program finishes. However the author shows that using this approach to prove P is not equal to NP isn’t going to work.
• Circuits : Every IC in a computer has transistors and each of these transistors implement logic gates. Every function, no matter how complicated, can be computed by a circuit of AND, OR, and NOT gates. What’s the connection with circuits ? Any problem that has computationally efficient solution can have a reasonably small. There have been many efforts by various people to use circuits to prove or disprove N=NP, but none have stood the test of rigor.

Where are we in the efforts to proving P vs. NP?

The only known serious approach to the P versus NP problem today is due to Ketan Mulmuley from the University of Chicago. He has shown how solving some difficult problems in a mathematical field called algebraic geometry (considerably more complex than high school algebra and geometry) may lead to a proof that P ≠ NP. But resolving these algebraic geometry problems will require mathematical techniques far beyond what we have available today. A few years ago Mulmuley felt this program would take 100 years to play out. Now he says that is a highly optimistic point of view. How long until we see an actual resolution of the P versus NP problem? Maybe the problem will have been resolved by the time you read this. Far more likely the problem will remain unsolved for a very long time, perhaps longer than the 357 years needed for Fermat’s Last Theorem, or perhaps it will forever remain one of the true great mysteries of mathematics and science.

Secrets

This chapter gives a brief introduction to cryptography and in doing so, shows how much of the modern cryptography relies on P ≠ NP. It starts off by giving a 10000 ft. view of public key and private key cryptography. One cannot help but notice that the fact that since factoring is assumed to be NP hard, today we have a convenient secured communication. Every time you see a "https" in your browser URL, it is the public key-private key in action. In the future, if someone were to prove P= NP, the modern cryptography would fall apart. So, does it signify "End of the World" ? Not really. There are techniques like one-time pads that can be used to create secure communication channels, the downside though is that the method of providing these channels become painful.

The author gives an example of Sudoko to illustrate the concept of "zero knowledge proof" Sudoku is NP-complete, so every NP problem can be reduced to Sudoku. So we automatically get a zero-knowledge system for every NP problem. In a common attack in cryptography, a person can cheat by pretending to be someone he or she isn’t. Zero-knowledge proofs give a method to guard against such identity spoofing. There is also a brief mention about the fully homomorphic encryption and Pseudorandom number generators for creating secure communication.

Quantum

This chapter gives a bird’s eye view on Quantum computing and its relevance to the P vs. NP problem. Unlike the traditional bit that is used in the computers, quantum computing uses something called "qubit". The chapter gives a range of potential applications that can make many NP problems tractable. However the author is cautious about the time it might take for us to see real applications of quantum computing.

Full-scale quantum computers will remain special-purpose machines, useful for factoring  numbers or simulating quantum systems. They may help us break codes and help us better understand the fundamental nature of the universe, but they likely won’t solve NP-complete problems or make spreadsheets run faster.

The Future

By the time a reader hits the last chapter, it is clear that the author has a bleak outlook for the P vs. NP problem. However he says that the problem gives a common framework to solve great computational tasks that lay ahead of us.

• Parallel computation: Computers used to double in speed every eighteen to twenty-four months, but now we are hitting physical limits that prevent much faster processors in the future. Instead, computers are multiplying, and we can have several processors working together on a chip or in a cloud. How can we adjust our algorithms to work in this growing parallel world? The class of the problems that can be solved by parallel computing is called NC. So, the mystery deepens. Is P=NC ? Is NC = NP?
• Big data: From the Internet to scientific experiments and simulations, we create huge amounts of data every day. How do we reason about, make sense of, learn from, and develop predictions from this massive amount of information?
• Networking of everything: Much of the world lives on some computer network, whether they use some social network like Facebook or just communicate via email. Soon nearly everything man-made will also be part of this network, from the clothes we wear to the light bulbs we read by. How do we make the best use of this ultraconnected world?

The author concludes the book by saying,

Computation is about process, and not just on computers. The P versus NP problem has to do with the limits on nature itself, on how biological and physical systems evolve, and even on the power of our own minds. As long as P versus NP remains a mystery we do not know what we cannot do, and that’s liberating.

Takeaway:

The book is more than just describing a particular unsolved problem in computer science. It takes a whirlwind tour of many problems across several domains and gives the reader a beautiful perspective about what could happen if ever the P vs. NP problem were to solved. Even if the problem is not solved for another century, the framework that can be obtained by thinking through the problem would make one’s mind richer.

If one rips apart a computer and looks at its innards, one sees a coalescence of beautiful ideas. The modern computer is built based on electronics but the ideas behind its design has nothing to do with electronics. A basic design can be built from valves/water pipes etc. The principles are the essence of what makes computers compute.

This book introduces some of the main ideas of computer science such as Boolean logic, finite-state machines, programming languages, compilers and interpreters, Turing universality, information theory, algorithms and algorithmic complexity, heuristics, uncomutable functions, parallel computing, quantum computing, neural networks, machine learning, and self-organizing systems.

Who might be the target audience of the book ? I guess the book might appeal to someone who has a passing familiarity of main ideas of computer science and wants to know a bit more, without being overwhelmed. If you ever wanted to explain the key ideas of computing to a numerate high school kid in such a way that it is illuminating and at the same time makes him/her curious, and you are struggling to explain the ideas in simple words, then you might enjoy this book.

Nuts and Bolts

Recollect any algebraic equation you would have come across in your high school. Now replace the variables with logic statements that are either true or false, replace the arithmetic operations with relevant Boolean operators, you get a Boolean algebraic equation. This sort of algebra was invented by George Boole. It was Claude Shannon who showed that one could build electronic circuits that could mirror any Boolean expression. The implication of this construction is that any function capable of being described as a precise logical statement can be implemented by an analogous system of switches.

Two principles of any type of computer involves 1) reducing a task to a set of logical functions and 2) implementing logical functions as a circuit of connected switches. In order to illustrate these two principles, the author narrates his experience of building tic-tac-toe game with a set of switches and bulbs. If you were to manually enumerate all the possible combinations of a 2 player game, one of the best representations of all the moves is via a decision tree. This decision tree will help one choose the right response for whatever be your opponent’s move. Traversing through the decision tree means evaluating a set of Boolean expressions. Using switches in series or parallel, it is possible to create a automated response to a player’s moves. The author describes briefly the circuit he built using 150 odd switches. So, all one needs to construct an automated tic-tac-toe game is switches, wires, bulbs and a way to construct logical gates. There are two crucial elements missing from the design. First is that the circuit has no concept of events happening over time; therefore the entire sequence of game must be determined in advance. The second aspect that is missing is that the circuit can perform only one function. There is no software in it.

Early computers were made with mechanical components. One can represent AND, OR, NOR, etc. using a set of mechanical contraptions that can then be used to perform calculations. Even in 1960’s most of the arithmetic calculators were mechanical.

Building a computer out of any technology requires two components, i.e.

• switches : this is the steering element which can combine multiple signals in to one signal
• connectors : carries signal between switches

The author uses hydraulic valves, tinker toys, mechanical contraptions, electrical circuits to show the various ways in which the Boolean logic can be implemented. The basic takeaway from this chapter is the principle of functional abstraction. This process of functional abstraction is a fundamental in computer design-not the only way to design complicated systems but the most common way. Computers are built up on a hierarchy of such functional abstractions, each one embodied in a building block. Once someone implements a 0/1 logic, you built stuff over it. The blocks that perform functions are hooked together to implement more complex functions, and these collections of blocks in turn become the new building blocks for the next level.

Universal Building blocks

The author explains the basic mechanism behind translating any logical function in to a circuit. One needs to break down the logical function in to parts, figure out the type of gates that are required to translate various types of input to the appropriate output of a logical function. To make this somewhat abstract principle more concrete, the author comes up with a circuit design for "Majority wins" logical block. He also explains how one could design a circuit for “Rocks-Paper-Scissors” game. The learnings from these simple circuits are then extrapolated to the general computer. Consider an operation like addition or multiplication; you can always represent it as a Boolean logic block. Most computers have logical blocks called arithmetic units that perform this job.

Hence the core idea behind making a computer perform subtraction, addition, multiplication, or whatever computation is to write out the Boolean logic on a piece of paper and then implement the same using logical gates.

The most important class of functions are time varying functions, i.e. the output depends on the previous history of inputs. These are handled via a finite-state machine. The author describes the basic idea of finite-state machine via examples such as as ball point pen, combination lock, tally counter, odometer etc. To store this finite-state machine, one needs a device called register. An n-bit register has n inputs and n outputs, plus an additional timing input that tells the register when to change state. Storing new information is called "writing" the state of the register. When the timing signal tells the register to write a new state, the register changes its state to match the inputs. The outputs of the register always indicate its current state. Registers can be implemented in many ways, one of which is to use a Boolean logic block to steer the state information around in a circle. This type of register is often used in electronic computers, which is why they lose track of what they’re doing if their power is interrupted.

A finite-state machine consists of a Boolean logic block connected to a register. The finite-state machine advances its state by writing the output of the Boolean logic block into the register; the logic block then computes the next state, based on the input and the current state. This next state is then written into the register on the next cycle. The process repeats in every cycle. The machine on which I am writing this blog post is 1.7GHz machine, i.e. the machine can change its state at a rate of 1.7billion times per second. To explain the details of a finite state machine and its circuit implementation, the author uses familiar examples like "traffic lights", "combination lock". These examples are superbly explained and that’s the beauty of this book. Simple examples are chosen to illustrate profound ideas.

Finite state machines are powerful but limited. They cannot be used for stating many patterns that are common in our world. For instance, it is impossible to build a finite-state machine that will unlock a lock whenever you enter any palindrome. So, we need something else too besides logic gates and finite state machines.

Programming

Boolean logic and finite-state machine are the building blocks of computer hardware. Programming language is the building block of computer software. There are many programming languages and if you learn one or two, you can quickly pick up the others. Having said, writing code to perform something is one thing and writing effective code is completely a different thing. The latter takes many hours of deliberate effort. This is aptly put by the author,

Every computer language has its Shakespeares, and it is a joy to read their code. A well-written computer program possesses style, finesse, even humor-and a clarity that rivals the best prose.

There are many programming languages and each has its own specific syntax.The syntax in these languages are convenient to write as compared to machine level language instructions. Once you have a written a program, how does the machine know what to do ? There are three main steps :

1. a finite-state machine can be extended, by adding a storage device called a memory, which will allow the machine to store the definitions of what it’s asked to do
2. extended machine can follow instructions written in machine language, a simple language that specifies the machine’s operation
3. machine language can instruct the machine to interpret the programming language

A computer is just a special type of finite-state machine connected to a memory. The computer’s memory-in effect, an array of cubbyholes for storing data-is built of registers, like the registers that hold the states of finite-state machines. Each register holds a pattern of bits called a word, which can be read (or written) by the finite-state machine. The number of bits in a word varies from computer to computer. Each register in the memory has a different address, so registers are referred to as locations in memory. The memory contains Boolean logic blocks, which decode the address and select the location for reading or writing. If data is to be written at this memory location, these logic blocks store the new data into the addressed register. If the register is to be read, the logic blocks steer the data from the addressed register to the memory’s output, which is connected to the input of the finite-state machine.Memory can contain data, processing instructions, control instructions. These instructions are stored in machine language. Here is the basic hierarchy of functional dependence over various components:

1. Whatever we need the computer to perform, we write it in a programming language
2. This is converted in to machine language by a compiler, via a predetermined set of subroutines called operating system
3. The instructions are stored in a memory. These are categorized in to control and processing instructions.
4. Finite state machines fetch the instructions and execute the instructions
5. The instructions as well as data are represented by bits, are stored in the memory
6. Finite state machines and memory are built based on storage registers and Boolean blocks
7. Boolean blocks are implemented via switches in series or parallel
8. Switches control something physical that sends 0 or 1

If you look at these steps, each idea is built up on a a level of abstraction. In one sense, that’s why any one can write simple software programs without understanding a lot of details about the computer architecture. However the more one goes gets closer to the machine language, the more he needs to understand the details of various abstractions that have been implemented.

How universal are Turing machines ?

The author discusses the idea of universal computer, that was first described in 1937 by Alan Turing. What’s a Turing machine ?

Imagine a mathematician performing calculations on a scroll of paper. Imagine further that the scroll is infinitely long, so that we don’t need to worry about running out of places to write things down. The mathematician will be able to solve any solvable computational problem no matter how many operations are involved, although it may take him an inordinate amount of time.

Turing showed that any calculation that can be performed by a smart mathematician can also be performed by a stupid but meticulous clerk who follows a simple set of rules for reading and writing the information on the scroll. In fact, he showed that the human clerk can be replaced by a finite-state machine. The finite-state machine looks at only one symbol on the scroll at a time,
so the scroll is best thought of as a narrow paper tape, with a single symbol on each line. Today, we call the combination of a finite-state machine with an infinitely long tape a Turing machine

The author also gives a few example of noncomputable problems such as "Halting problem". He also touches briefly upon quantum computing and explains the core idea using water molecule

When two hydrogen atoms bind to an oxygen atom to form a water molecule, these atoms somehow "compute" that the angle between the two bonds should be 107 degrees. It is possible to approximately calculate this angle from quantum mechanical principles using a digital computer, but it takes a long time, and the more accurate the calculation the longer it takes. Yet every molecule in a glass of water is able to perform this calculation almost instantly. How can a single molecule be so much faster than a digital computer?

The reason it takes the computer so long to calculate this quantum mechanical problem is that the computer would have to take into account an infinite number of possible configurations of the water molecule to produce an exact answer. The calculation must allow for the fact that the atoms comprising the molecule can be in all configurations at once. This is why the computer can only approximate the answer in a finite amount of time.

One way of explaining how the water molecule can make the same calculation is to imagine it trying out every possible configuration simultaneously-in other words, using parallel processing. Could we harness this simultaneous computing capability of quantum mechanical objects to produce a more powerful computer? Nobody knows for sure.

Algorithms and Heuristics

The author uses simple examples to discuss various aspects of any algorithm like designing an algorithm, computing the running time of an algorithm etc. For many of the problems where precise algorithms are not available, the next best option is to use heuristics. Designing heuristics is akin to art. Most of the real life problems requires one to come up with a healthy mix of algorithmic solutions and heuristics based solutions. IBM Deep Blue is an amazing example that shows the intermixing of algorithms and heuristics can beat one of the best human minds. Indeed there will be a few cognitive tasks that will be out of computer’s reach and humans have to focus on skills that are inherently human. A book length treatment is given to this topic by Geoff Colvin in his book (Humans are underrated)

Memory : Information and Secret codes

Computers do not have infinite memory. There needs a way to measure the information stored in the memory. An n bit memory can store n bits of data. But we need to know how many bits are required to store a certain form of input. One can think of various ways for doing so. One could use the "representative" definition, i.e think about how each character in the input is represented on a computer and then assess the total number of bits required to represent the input in a computer. Let’s say this blog post has 25000 characters and each character takes about 8 bits on my computer, then the blog post takes up 0.2 million bits. 8 bits for each character could be a stretch. May be all the characters in this post might require only 6 bits per character. Hence there would 0.15 million bits required to represent this post. The problem with this kind of quantifying information is that it is "representation" dependent. Ideal measure would be the minimum number of bits needed to represent this information. Hence the key question here is, "How much can you compress a given text without losing information?".

Let’s say one wants to store the text of "War and Peace", the information size obtained by multiplying 8 bits times number of characters in the novel would give a upper bound. By considering various forms of compression such as 6 bit encoding, taking advantage in regularities of data, take advantage of grammar associated with the language, one can reduce the information size of the novel. In the end, the compression that uses the best available statistical methods would probably reach an average representation size of fewer than 2 bits per character-about 25 percent of the standard 8-bit character representation.

If the minimum number of bits required to represent the image is taken as a measure of the amount of information in the image, then an image that is easy to compress will have less information. A picture of a face, for example, will have less information than a picture of a pile of pebbles on the beach, because the adjacent pixels in the facial image are more likely to be similar. The pebbles require more information to be communicated and stored, even though a human observer might find the picture of the face much more informative. By this measure, the picture containing the most information would be a picture of completely random pixels, like the static on a damaged television set. If the dots in the image have no correlation to their neighbors, there is no regularity to compress. So, pictures that are totally random require lot many bits and hence contain lot of information. This go against our notion of information. In common parlance, a picture with random dots should have less information that a picture with some specific dot pattern. Hence it is important for computers to store meaningful information. Indeed that’s how many image and sound compression algos work. They discard meaningless information. Another level of generalization of this idea is to consider a program that can generate the data that is being stored. This leads us to another measure of information:

The amount of information in a pattern of bits is equal to the length of the smallest computer program capable of generating those bits.

This definition of information holds whether the pattern of bits ultimately represents a picture, a sound, a text, a number, or anything else.

The second part of this chapter talks about public-key private key encryption and error correction mechanisms. The author strips off all the math and explains it in plain simple English.

Speed: Parallel Computers

"Parallel computing" is a word that is tossed around at many places. What is the basic problem with a normal computer that parallel computing aims to solve? If you look at the basic design of a computer, it has always been that processing and memory were considered two separate components, and all the effort was directed towards increasing the processing speed. If you compare the modern silicon chip based computer to let’s say the vintage room filled computer, the basic two-part design has remained the same, i.e. processor connected to memory design. This has come to be known as sequential computer and has given the need for parallel computing. To work any faster, today’s computers need to do more than one operation at once. We can accomplish this by breaking up the computer memory into lots of little memories and giving each its own processor. Such a machine is called a parallel computer. Parallel computers are practical because of the low cost and small size of microprocessors. We can build a parallel computer by hooking together dozens, hundreds, or even thousands of these smaller processors. The fastest computers in the world are massively parallel computers, which use thousands or even tens of thousands of processors.

In the early days of parallel computing, many people were skeptical about it for a couple of reasons, the main being Amdahl’s law. It stated that no matter how many parallel processor you use, there will be at least 10% of the task that needs to be done sequentially and hence the task completion time time does not do down as rapidly as one uses more and more processors. Soon it was realized that most of the tasks had negligible amount of sequential component. By smart design, one could parallelize many tasks.

Highly parallel computers are now fairly common. They are used mostly in very large numerical calculations (like the weather simulation) or in large database calculations, such as extracting marketing data from credit card transactions. Since parallel computers are built of the same parts as personal computers, they are likely to become less expensive and more common with time. One of the most interesting parallel computers today is the one that is emerging almost by accident from the networking of sequential machines. The worldwide network of computers called the Internet is still used primarily as a communications system for people. The computers act mostly as a medium-storing and delivering information (like electronic mail) that is meaningful only to humans.Already standards are beginning to emerge that allow these computers to exchange programs as well as data. The computers on the Internet, working together, have a potential computational capability that far surpasses any individual computer that has ever been constructed.

The author gives some basic ideas on which adaptive learning has been explored via algorithms. Using the example of neutral networks, perceptron, etc. the author manages to explain the key idea of machine learning.

Beyond Engineering

Brain cannot be analyzed via the usual "divide and conquer" mechanism that is used to understand "sequential computer". As long as the function of each part is carefully specified and implemented, and as long as the interactions between the parts are controlled and predictable, this system of "divide and conquer" works very well, but an evolved object like the brain does not necessarily have this kind of hierarchical structure. The brain is much more complicated than a computer, yet it is much less prone to catastrophic failure. The contrast in reliability between the brain and the computer illustrates the difference between the products of evolution and those of engineering. A single error in a computer’s program can cause it to crash, but the brain is usually able to tolerate bad ideas and incorrect information and even malfunctioning components. Individual neurons in the brain are constantly dying, and are never replaced; unless the damage is severe, the brain manages to adapt and compensate for these failures. Humans rarely crash.

So, how does on go about designing something that is different from "engineering"? The author illustrates this via "sorting numbers" example in which one can go about as follows

• Generate a "population" of random programs
• Test the population to find which programs are the most successful.
• Assign a fitness score to each program based on how successfully they sort the numbers
• Create new populations descended from the high-scoring programs. One can think of many ways here; only the fittest survive / "breed" new programs by pairing survivors in the previous generation
• When the new generation of programs is produced, it is again subjected to the same testing and selection procedure, so that once again the fittest programs survive and reproduce. A parallel computer will produce a new generation every few seconds, so the selection and variation processes can feasibly be repeated many thousands of times. With each generation, the average fitness of the population tends to increase-that is, the programs get better and better at sorting. After a few thousand generations, the programs will sort perfectly.

The author confesses that the output from the above experiments works, but it is difficult to "understand" why it works.

One of the interesting things about the sorting programs that evolved in my experiment is that I do not understand how they work. I have carefully examined their instruction sequences, but I do not understand them: I have no simpler explanation of how the programs work than the instruction sequences themselves. It may be that the programs are not understandable-that there is no way to break the operation of the program into a hierarchy of understandable parts. If
this is true-if evolution can produce something as simple as a sorting program which is fundamentally incomprehensible-it does not bode well for our prospects of ever understanding the human brain.

The chapter ends with a discussion about building a thinking machine.

Takeaway :

The book explains the main ideas of "computer science" in simple words. Subsequently, it discusses many interesting areas that are currently being explored by researchers and practitioners. Anyone who is curious about things like "How do computers work ?", "What are the limitations of the today’s computer hardware and software ?", etc. and wants to get some idea about them, without getting overwhelmed by the answers, will find this book interesting.

The author tries to answer two questions via this book :

1. What are the core elements of a fulfilling work ?
2. How should we make a change to our current path so that our work is in line with our being.

Give meaning to work

The author talks about five aspects relevant to work :

1. earning money
2. achieving status
3. making a difference
4. following our passions
5. using our talents

The author categorically states the earning money and achieving status might not give meaning to our work. Most often than not, one gets on to a "hedonistic" treadmill that is difficult to get off. Instead pursuing "making a difference" path is a better option. It might lead an individual to try out a variety of jobs, learn a variety of skills, and become a portfolio worker. It is one of the ways to explore "many selves" that lie within us. The chapter also explores the various challenges that one might have to face in following any of the above paths. Personally I find "using your talents" path to be quite appealing. I have taken a path of "developing talents and then doing your bit in making your talents useful to others". If you forget the monetary aspect and status aspect of any work and immerse yourself in developing talents, I guess one might find that learning a specific skill and putting it to use, could be a wonderful experience. There are obvious challenges that one must encounter on day to day basis. However I guess the "love" element towards your work will give all the strength to face the challenges. The author’s message from this chapter is that there are three core elements of a fulfilling work, i.e. meaning, flow and freedom. Each of these elements are concisely discussed via examples and anecdotes. Light reading but there is no fluff here.

If you are going to get good at something you need a tunnel vision.

– Wayne Davies

Act First, Reflect Later

One often hears stories of somebody abandoning a certain career and heading out to do something radical. These stories are inspiring but they need a lot of courage. Not many would be in a state of mind to take such extreme step. Instead the author suggests three alternate ways

• branching projects : start doing things in the context of your project on weekends or mini-breaks
• conversational research : talk to people who might be working in the field that you want to work, get to know some details of the work etc.

The above three suggestions involve doing first and then reflecting later. While working on alternatives, one must seek out "flow" experiences. These tend to be activities in which we lose the sense of time. However one must always remember that there might be some activities where the struggle itself could be something that we find rewarding. Imagine understanding a certain branch of math. To get to "flow" state will take a LOT of time. However the struggles that you take up on a day to day basis and the incremental understanding of the subject could provide the kick needed to be continue working. To get more clarity about "flow" experiences, it is better to keep a monthly log of such experiences and reflect on them from time to time.

Based on many interviews with independent consultants, the author says that "bespoke consultant job" is both wonderful and awful. However once experiencing the freedom of a freelancer, most of them say that they would never trade-in with a 9-to-5 job. On the contrary, one can explore even in a routine job. There is a story about Wallace Stevens which goes like this:

By day he worked in an insurance company, eventually becoming vice-president of an established firm in Connecticut. But he was no workaholic: he returned home each evening to write verse, and was considered one of the great modernist poets of the early twentieth century. Stevens kept these two lives separate: he always felt something of an imposter in his day job; it was "like playing a part", he wrote. He regarded poetry as his "real work"- even if he wasn’t paid for it- and never wanted to commercialize his art by becoming a "professional" poet. After winning the Pulitzer Prize in 1955 he was offered a faculty position at Harvard that would have allowed him to write poetry for a living, but he turned it down to stay in his insurance job.

How to grow a vocation

The book ends with a saying that there is no right vocation that one can "find". One has to "grow" in to it. Once you start cultivating your talents, you see that those talents will give you the freedom to do certain things, and by taking those actions, you grow in to a vocation. This opinion is similar to the one offered by Cal Newport in his book,"So good that they can’t ignore you". The story of Marie curie illustrate the point that there is no magical calling that will become apparent to you one fine day. Real vocation is one where you grow in to it, rather than finding one.

The book is very compact and can be read in a few hours. What I like about the book is the various stories that the author manages to put in, so that the main takeaways of the book are implicit and do not sound preachy.

In this post, I will attempt to briefly summarize the main points of the book

## An optimistic skeptic

The chapter starts off by saying that there are indeed people in the world who become extremely popular, make tons of money, get all the press coverage, by providing a perfect explanation, after the fact. The author gives one such example of a public figure who rose to fame explaining events post-fact, Tom Friedman. At the same time, there are many people who are lesser known in the public space but have an extraordinary skill at forecasting. The purpose of the book is to explain how these ordinary people come up with reliable forecasts and beat the experts hands down.

Philip Tetlock, one of the authors of the book is responsible for a landmark study spanning 20 years(1984-2004) that compares experts predictions and random predictions. The conclusion was, the average expert had done little better than guessing on many of the political and economic questions. Even though the right comparison should have been a coin toss, the popular financial media used "dart-throwing chimpanzee pitted against experts" analogy. In a sense, the analogy was more "sticky" than the mundane word, "random". The point was well taken by all; the experts are not all that good in predicting outcomes. However the author feels disappointed that his study has been used to dish out extreme opinions about experts and forecasting abilities such as, "all expert are useless". Tetlock believes that it is possible to see into the future, at least in some situations and to some extent, and that any intelligent, open-minded, and hardworking person can cultivate the requisite skills. Hence one needs to have "optimistic" mindset about predictions. It is foolhardy to have a notion that all predictions are useless.

The word "Skeptic" in the chapter’s title reflects the mindset on must possess in this increasingly nonlinear world. The chapter mentions an example of Tunisian man committing suicide that leads to a massive revolution in the Arab world. Could anyone have predicted such catastrophic ripple effects of a seemingly common event ? It is easy to look backward and sketch a narrative arc, but difficult to actually peer in to the future and forecast? To make effective predictions, the mindset should be that of an "optimistic skeptic".

So is reality clock-like or cloud-like? Is the future predictable or not? These are false dichotomies, the first of many we will encounter. We live in a world of clocks and clouds and a vast jumble of other metaphors. Unpredictability and predictability coexist uneasily in the intricately interlocking systems that make up our bodies, our societies, and the cosmos. How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances.

In fields where forecasts have been reliable and good, one sees that the people who make these forecasts follow, Forecast, measure, revise. Repeat. procedure. It’s a never-ending process of incremental improvement that explains why weather forecasts are good and slowly getting better. Why is this process non-existent in many stock market predictions, macro economic predictions? The author says that it is a demand-side problem. The consumers of forecasting don’t demand evidence of accuracy and hence there is no measurement.

Most of the readers and general public might be aware of the research done by Tetlock that produced the dart-throwing chimpanzee article. However this chapter talks about another research study that Tetlock and his research partner&wife started in 2011, Good Judgment Project. The couple invited volunteers to sign up and answer well designed questions about the the future. In total, there were 20,000 people who volunteered to predict event outcomes. The author collated the predictions from this entire crew of 20,000 people and played a tournament conducted by IARA, an intelligence research agency. The game comprised predicting events spanning a month to a year in to the future. It was held between 5 teams, one of which was GJP. Each team would effectively be its own research project, free to improvise whatever methods it thought would work, but required to submit forecasts at 9 a.m. eastern standard time every day from September 2011 to June 2015. By requiring teams to forecast the same questions at the same time, the tournament created a level playing field-and a rich trove of data about what works, how well, and when. Over four years, IARPA posed nearly five hundred questions about world affairs. In all, one million individual judgments about the future. In all the years, the motley crowd of forecasters of The Good Judgment project beat the experts hand down. The author says that there are two major takeaways from the performance of GJP team

1. Foresight is real : They aren’t gurus or oracles with the power to peer decades into the future, but they do have a real, measurable skill at judging how high-stakes events are likely to unfold three months, six months, a year, or a year and a half in advance.
2. Forecasting is not some mysterious gift : It is the product of particular ways of thinking, of gathering information, of updating beliefs. These habits of thought can be learned and cultivated by any intelligent, thoughtful, determined person.

The final section of the first chapter contains author’s forecast on the entire field of forecasting. With machines doing most of the cognitive work, there is a threat the forecasting done by humans will be no match to that supercomputers. However the author feels that humans are underrated(a book length treatment has been given by Geoff Colvin). In the times to come, the best forecasts would result from a combination of human-machine teams rather than humans only or machines only forecasts.

The book is mainly about specific type of people, approximately 2% of the volunteered forecasters who did phenomenally well. The author calls them "superforecasters". Have they done well because of luck or skill? If it is skill, what can one learn from them ? are some of the teasers posed in the first chapter of the book.

## Illusions of Knowledge

Given the number of books and articles that have been churned out, talking about our cognitive biases, there is nothing really new in the chapter. The author reiterates the System-1 and System-2 thinking from Daniel Kahneman’s book. He also talks about the perils of being over-confident of our own abilities. He talks about various medical practices that were prevalent before the advent of clinical trials. Many scientists advocated medicine based on their "tip-of-your-nose" perspective without vetting their intuitions.

The tip-of-your-nose perspective can work wonders but it can also go terribly awry, so if you have the time to think before making a big decision, do so-and be prepared to accept that what seems  obviously true now may turn out to be false later.

The takeaway from this chapter is obvious from the title of the chapter. One needs to weigh in both System-1 and System-2 thinking in most of our decisions. An example of Magnus Carlsen is given that illustrates this kind of mixed thinking. In an interview, the grandmaster disclosed that his intuition tells him what are the possible steps immediately(10 seconds), and he spends most of the time double checking his intuition. Only then does he make the next move in a chess tournament. Its an excellent practice to mix System-1 thinking and System-2 thinking, but one requires conscious effort to do that.

## Keeping Score

The chapter starts with the infamous statement of Steve Ballmer who predicted that iPhone was not going to have a significant market share. To evaluate Ballmer’s forecast in a scientific manner, the author looks at the entire content of Ballmer’s speech and says that there are many vague terms in the statement that it is difficult to give a verdict on the forecast. Another example is the "open letter" to Bernanke that was sent by many economists to stop QE to restore stability. QE did not stop and US has not seen any of the dire consequences that economists had predicted. So, is the forecast wrong ? Again the forecast made by economists is not worded precisely in numerical terms so that one can evaluate it. The basic message that the author tries to put across is, "judging forecasts is difficult".

The author’s basic motivation to conduct a study on forecasting came while sitting on a panel of experts who were asked to predict the future of Russia. Many of the forecasts were a complete disaster. However that did not make them humble. No matter what had happened the experts would have been just as adept at downplaying their predictive failures and sketching an arc of history that made it appear they saw it coming all along. In such scenario, how does one go about testing forecasts ? Some of the forecasts have no time lines. Some of the forecasts are worded in vague terms. Some of them are not worded in numbers. Even if there are numbers, the event happened cannot be repeated and hence how does one decide whether it is luck or skill ? We cannot rerun history so we cannot judge one probabilistic forecast- but everything changes when we have many probabilistic forecasts. Having many forecasts helps one pin down two essential features of any forecast analysis, i.e. calibration and resolution. Calibration involves testing whether the forecast and the actual are in sync. Resolution involves whether the forecast involved are decisive probabilistic estimate and not somewhere around 40%-60%. The author takes all the above thoughts in to consideration and starts a 20 year project from 1984-2004 that goes like this :

• assemble experts in various fields
• ask a large number of questions with precise time frames and unambiguous language
• require that forecast be expressed using numerical probability scales
• measure the calibration of the forecasters
• measure the resolution of the forecasters
• use brier score to evaluate the distance between the forecast and the actual

The author patiently conducts the study for 20 years to see the results of all the forecasts. The following are the findings/insights from the project :

• To make a good analogy, the author says big idea thinkers are akin to "Hedgehogs" and many idea thinkers are akin to "foxes"
• Foxes were better forecasters than Hedgehogs
• Foxes don’t fare well with the media. Media likes authoritative statements to probabilistic statements.
• Aggregating among a diverse set of opinions beats hedgehogs. That’s why averaging from several polls gives a better result than single poll. This doesn’t mean "wisdom of any sort of crowd" works. It means "wisdom of certain type of crowd" works.
• The best metaphor for developing various perspective is to have a dragonfly eye. Dragonflies have two eyes, but theirs are constructed very differently. Each eye is an enormous, bulging sphere, the surface of which is covered with tiny lenses. Depending on the species, there may be as many as thirty thousand of these lenses on a single eye, each one occupying a physical space slightly different from those of the adjacent lenses, giving it a unique perspective. Information from these thousands of unique perspectives flows into the dragonfly’s brain where it is synthesized into vision so superb that the dragonfly can see in almost every direction simultaneously, with the clarity and precision it needs to pick off flying insects at high speed. A fox with the bulging eyes of a dragonfly is an ugly mixed metaphor but it captures a key reason why the foresight of foxes was superior to that of hedgehogs with their green-tinted glasses. Foxes aggregate perspectives.
• Simple AR(1), EWMA kind of models performed better than hedgehogs and foxes

## Superforecasters

The chapter starts off recounting the massive forecasting failure from the National Security Agency, the Defense Intelligence Agency, and thirteen other agencies that constitute the intelligence community of US government. These agencies had a consensus view that IRAQ had weapons of mass destruction. This view made everyone support Bush’s policy of waging the Iraq war. After the invasion in 2003, no WMDs were found. How come the agencies that employ close to twenty thousand intelligence analysts were so wrong? Robert Jervis who has critically analyzed the performance of these agencies over several decades says that the judgment was a reasonable one but wrong. This statement does require some explanation and the author provides the necessary details. The takeaway from the story is that the agencies did some errors that would have scaled back the probability levels that were associated with the consensus view. Who knows it would have changed the course of Iraq’s history?

After this failure, IARPA(Intelligence Advanced Research Projects Activity) was created in 2006. Its mission was to fund cutting-edge research with the potential to make the intelligence community smarter and more effective. They approach the author with a specific type of game in mind. IARPA’s plan was to create tournament-style incentives for top researchers to generate accurate probability estimates for Goldilocks-zone questions. The research teams would compete against one another and an independent control group. Teams had to beat the combined forecast-the "wisdom of the crowd"-of the control group. In the first year, IARPA wanted teams to beat that standard by 20%-and it wanted that margin of victory to grow to 50% by the fourth year. But that was only part of IARPA’s plan. Within each team, researchers could run experiments to assess what really works against internal control groups. Tetlock’s team beat the control group hands down. Was it luck ? Was it the team had a slower reversion to mean ? Read the chapter to judge it for yourself. Out of several volunteers that were involved GJP, the author finds that there were certain forecasters who were very extremely good. The next five chapters are all about the way superforecasters seem go about forecasting. The author argues that there are two things to note from GJP’s superior performance :

1. We should not treat the superstars of any given year as infallible. Luck plays a role and it is only to be expected that the superstars will occasionally have a bad year and produce ordinary results
2. Superforecasters were not just lucky. Mostly, their results reflected skill.

## Supersmart?

The set of people whom the author calls superforecasters do not represent a random sample of people. So, the team’s outcome is not the same thing as collating predictions from a large set of random people. These people are different, is what the author says. But IQ or education are not the boxes based on which they can be readily classified. The author reports that in general the volunteers had higher IQ than others but there was no marked distinction between forecasters and superforecasters. So it seems intelligence and knowledge help but they add little beyond a certain threshold-so superforecasting does not require a Harvard PhD and the ability to speak five languages.

The author finds that superforecasters follow a certain way of thinking that seems to be marking better forecasters

• Good back of the envelope calculations
• Starting with outside view that reduces anchoring bias
• Subsequent to outside view, get a grip on the inside view
• Look out for various perspectives about the problem
• Think thrice/four times, think deeply to root out confirmation bias
• It’s not the raw crunching power you have that matters most. It’s what you do with it.

Most of the above findings are not groundbreaking. But what it emphasizes is that good forecasting skills do not belong to some specific kind of people. It can be learnt and consciously cultivated.

For superforecasters, beliefs are hypotheses to be tested, not treasures to be guarded. It would be facile to reduce superforecasting to a bumper-sticker slogan, but if I had to, that would be it.

## Superquants?

Almost all the superforecasters were numerate but that is not what makes their forecasts better. The author gives a few examples which illustrate the mindset that most of us carry. It is the mindset of Yes, No and Maybe, where Yes mean very almost certainty, No means almost impossible and Maybe means 50% chance. This kind of probabilistic thinking with only three dials does not help us become a good forecasters. Based on the GJP analysis, the author says that the superforecasters have a more fine grained sense of probability estimates than the rest of forecasters. This fine grained probability estimates are not a result of some complex math model, but are a result of careful thought and nuanced judgment.

## Supernewsjunkies?

The chapter starts with the author giving a broad description of the way a superforecaster works:

Unpack the question into components. Distinguish as sharply as you can between the known and unknown and leave no assumptions unscrutinized. Adopt the outside view and put the problem into a comparative perspective that downplays its uniqueness and treats it as a special case of a wider class of phenomena. Then adopt the inside view that plays up the uniqueness of the problem. Also explore the similarities and differences between your views and those of others-and pay special attention to prediction markets and other methods of extracting wisdom from crowds. Synthesize all these different views into a single vision as acute as that of a dragonfly. Finally, express your judgment as precisely as you can, using a finely grained scale of probability.

One of the things that the author notices about superforecasters is their tendency to make changes to the forecasts frequently. As things/facts change around them, they revise their forecasts. This begs the question, "Does the initial forecast matter ?". What if one starts with a vague prior and keep updating it based on the changing world. The GJP analysis shows that superforecasters initial estimates were 50% more accurate than the regular forecasters. The real takeaway is that "updating matters";frequent updating is as demanding as challenging and it is a huge mistake to belittle belief updating. Both under and overreaction to events happening can diminish accuracy. Both can also, in extreme cases, destroy a perfectly good forecast. Superforecasters have little ego invested in their initial judgments and the subsequent judgments. This makes them update their forecasts far quicker than other forecasters. Superforecasters update frequently and update in smaller increments. Thus they tread the middle path between over forecasting and underforecasting. The author mentions one superforecaster who uses Bayes theorem to revise his estimates. Does that mean Bayes is the answer to getting forecasts accurate? No, says the author. He found that even though all the superforecasters were numerate enough to apply Bayes, but nobody actually crunched numbers that explicitly. The message is that all the superforecasters appreciate the Bayesian spirit, though none had explicitly used a formula to update their forecasts. But not always "small updations" work. The key idea that the author wants to put across is that there is no  "magic" way to go about forecasting. Instead there are many broad principles with lots of caveats.

## Perpetual Beta

The author starts off by talking about Carol Dwecks’ "growth mindset" principle and says that this is one of the important traits of a superforecaster.

We learn new skills by doing. We improve those skills by doing more. These fundamental facts are true of even the most demanding skills. Modern fighter jets are enormously complex flying computers but classroom instruction isn’t enough to produce a qualified pilot. Not even time in advanced flight simulators will do. Pilots need hours in the air, the more the better. The same is true of  surgeons, bankers, and business executives.

It goes without saying that practice is key to becoming good. However it is actually "informed practice" that is the key to becoming good. Unless there is a clear and timely feedback about how you are doing, the quantity of practice might be an erroneous indicator of your progress. This idea has been repeated in many books in the past few years. An officer’s ability to spot a liar is generally poor because the feedback of his judgment takes long time to reach him. On the other hand, people like meteorologists, seasoned bridge players learn from failure very quickly and improve their estimates. I think this is the mindset of a day trader in financial markets. He makes a trade, he gets a quick feedback about it and learns from the mistakes. If you take a typical mutual fund manager and compare him/her with a day trader, the cumulative feedback that the day trader receives is far more than what an MF manager receives. Read any indexing book and you will always read arguments debating whether Mr.XYZ was a good fund manager or not. You can fill in any name for XYZ. Some say luck, Some say skill. It is hard to tease out which is which when the data points are coarse grained. However if you come across a day trader who consistently makes money for a decent number of years, it is hard to attribute luck to his performance, for the simple reason that he has made far more trades cumulatively than an MF manager. The basic takeaway at least for a forecaster is that he/she must know when the forecast fails. This is easier said/written than done. Forecasts could be worded in ambiguous language, the feedback might have a large time lag like years by which time our flawed memories can no longer remember the old forecast estimate. The author gives a nice analogy for forecasters who do not have timely feedback. He compares them with basketball players doing free throws in the dark.

They are like basketball players doing free throws in the dark. The only feedback they get are sounds-the clang of the ball hitting metal, the thunk of the ball hitting the backboard, the swish of the ball brushing against the net. A veteran who has taken thousands of free throws with the lights on can learn to connect sounds to baskets or misses. But not the novice. A "swish!" may mean a nothing-but-net basket or a badly underthrown ball. A loud "clunk!" means the throw hit the rim but did the ball roll in or out? They can’t be sure. Of course they may convince themselves they know how they are doing, but they don’t really, and if they throw balls for weeks they may become more confident-I’ve practiced so much I must be excellent!-but they won’t get better at taking free throws. Only if the lights are turned on can they get clear feedback. Only then can they learn and get better.

Towards the end of this chapter, the author manages to give a rough composite portrait of a superforecaster :

 Philosophical Outlook Cautious Nothing is certain Humble Reality is infinitely complex Non Deterministic What happens is not meant to be and does not have to happen Abilities and thinking styles Actively open-minded Beliefs are hypotheses to be tested, not treasures to be protected Intelligent, Knowledgeable with a need for cognition Intellectually curious, enjoy puzzles and mental challenges Reflective Introspective and self-critical Numerate Comfortable with numbers Methods of forecasting Pragmatic Not wedded to any idea or agenda Analytical Capable of stepping back from the tip-of-your-nose perspective and considering other views Dragon-fly eyed Value diverse views and synthesize them into their own Probabilistic Judge using many grades of maybe Thoughtful updaters When facts change, they change their minds Good-intuitive psychologists Aware of the value of checking thinking for cognitive and emotional biases Work ethic Growth mindset Believe it’s possible to get better Grit Determined to keep at it however long it takes

The author says that the single most predictor of rising to the ranks of superforecasters is to be in a state of "perpetual beta".

## Superteams

The author uses GJP as a fertile ground to ask many interesting questions such as :

• When does "wisdom of crowd" thinking help ?
• Given a set of individuals, does weighing team forecasts work better than weighting individual forecasts?
• Given that there are two groups, forecasters and superforecasters, does acknowledging superforecasters after the year 1 performance works in improving or degrading the subsequent year performance for superforecasters?
• How do forecasters perform against prediction markets ?
• How do superforecasters perform against prediction markets ?
• How do you counter "groupthink" amongst a team of superforecasters?
• Does face-to-face interaction amongst superforecasters help/worsen the forecast performance ?
• If aggregation of different perspectives gives better performance, should the aggregation be based on ability or diversity ?

These and many other related questions are taken up in this chapter. I found this chapter very interesting as the arguments made by the author are based on data rather than some vague statements and opinions.

I found it difficult to keep my attention while reading this chapter. It was trying to address some of the issues that typical management books talk about. I have read enough management BS books that my mind has become extremely repulsive to any sort of general management gyan. May be there is some valuable content in this chapter. May be there are certain type of readers who will find the content in the chapter appealing.

## Are they really Super?

The chapter critically looks at the team of superforecasters and tries to analyze viewpoints of various people who don’t believe that superforecasters have done something significant. The first skeptic is Daniel Kahneman who seems to be of the opinion that there is a scope bias in forecasting. Like a true scientist, the author puts his superforecasting team in a controlled experiment that gives some empirical evidence that supeforecasters are less prone to scope bias. The second skeptic that the author tries to answer is Nassim Taleb. It is not so much as an answer to Taleb, but an acknowledgement that superforecasters are different. Taleb is dismissive of many forecasters as he believes that history jumps and these jumps are blackswans (highly improbable events with a lot of impact). The author defends his position by saying

If forecasters make hundreds of forecasts that look out only a few months, we will soon have enough data to judge how well calibrated they are. But by definition, "highly improbable" events almost  ever happen. If we take "highly improbable" to mean a 1% or 0.1% or 0.0001% chance of an event, it may take decades or centuries or millennia to pile up enough data. And if these events have to be not only highly improbable but also impactful, the difficulty multiplies. So the first-generation IARPA tournament tells us nothing about how good superforecasters are at spotting gray or black swans. They may be as clueless as anyone else-or astonishingly adept. We don’t know, and shouldn’t fool ourselves that we do.

Now if you believe that only black swans matter in the long run, the Good Judgment Project should only interest short-term thinkers. But history is not just about black swans. Look at the inch-worm advance in life expectancy. Or consider that an average of 1% annual global economic growth in the nineteenth century and 2% in the twentieth turned the squalor of the eighteenth century and all the centuries that preceded it into the unprecedented wealth of the twenty-first. History does sometimes jump. But it also crawls, and slow, incremental change can be profoundly important.

So, there are people who trade or invest based on blackswan thinking. Vinod Khosla invests in many startups so that one of them can be the next google. Taleb himself played with OTM options till one day he cracked it big time. However this is the not the only kind of philosophy that one can adopt. A very different way is to beat competitors by forecasting more accurately-for example, correctly deciding that there is a 68% chance of something happening when others foresee only a 60% chance. This is the approach of the best poker players. It pays off more often, but the returns are more modest, and fortunes are amassed slowly. It is neither superior nor inferior to black swan investing. It is different.

## What Next ?

The chapter starts off by giving a few results of the opinion polls that were conducted before the Scotland’s referendum of joining UK. The numbers show that there was no clear sign of which way the referendum would go. In any case, the final referendum was voted NO. It was hard to predict the outcome. There was one expert/pundit, Daniel Drezner, who came out in the open and admitted that it is extremely easy to give an explanation after the fact but doing so, before the fact forecast, is a different ball game. Drezner also noted that he himself had stuck to NO for sometime before switching to YES. He made an error while correcting his prior opinion. As a learning, he says, in the future he would give a confidence interval for the forecast, rather than a binary forecast. The author wishes that in the future many more experts/forecasters adopt the confidence interval mindset. This shift from point estimate to interval estimate might do a world of good, says the author. What will this 500 page book do to the general reader/society ? The author says that there could be two scenarios.

• Scenario 1: forecasting is mainly used to advance a tribes interests. In all such situations, the accuracy of the forecast would be brushed aside and whoever makes the forecast that suits the popular tribe will be advertised and sadly actions will be taken based on these possibly inaccurate forecasts. This book will be just another book on forecasting that is good to read, but nothing actionable will come out of it.
• Scenario 2 : Evidence based forecasting takes off. Many people will demand accuracy, calibration results of experts

Being an optimistic skeptic, the author feels that evidence based forecasting will be adopted in the times to come. Some quantification is always better than no quantification(which is want we see currently). The method or system used in the forecasting tournament to come out ahead is a work-in-progress, admits the author. However that doesn’t mean it is not going to improve our forecasting performance.

Towards the end of the book, the author does seem to acknowledge the importance of Tom Friedmans of the world, not because of their forecasting ability. It is their vague forecasts that are actually superquestions for the forecasters. Whenever pundits give their forecasts in a imprecise manner, that serves as the fodder for all the forecasters to actually get to work. The assumption the author makes is that superforecasters are not superquestioners. Superquestioners are typically hedgehogs who have one big idea, think deeply and see the world based on that one big idea. Superforecasters, i.e. foxes are not that good at churning out big questions, is what the author opines. In conclusion, he says an ideal person would be a combination of superforecaster and superquestioner.

### Takeaway:

This book is not "ONE BIG IDEA" book. Clearly the author is on the side of foxes and not hedgehogs. The book is mainly about analyzing the performance a specific set of people from a forecasting team that participated in IARPA sponsored tournament. The book looks at these superforecasters and spells out a number of small but powerful ideas/principles that can be cultivated by anyone, who aspires to become a better forecaster.

The book starts by explaining an example project that one can download from the author’s github account. The project files serve as an introduction to reproducible research. I guess it might make sense to download this project, try to follow the instructions and create the relevant files. By compiling the example project, one gets a sense of what one can accomplished by reading through the book.

Introducing Reproducible Research
The highlight of an RR document is that data, analysis and results are all in one document. There is no separation between announcing the results and doing number crunching. The author gives a list of benefits that accrue to any researcher generating RR documents. They are

• better work habits
• better team work
• changes are easier
• high research impact

The author uses `knitr` / `rmarkdown` in the book to discuss Reproducibility. The primary difference between the two is that the former demands that document be written using the markup language associated with the desired output. The latter is more straightforward in the sense that one markup can be used to produce a variety of outputs.

Getting Started with Reproducible Research
The thing to keep in mind is that reproducibility is not an after thought – it is something you build into the project from the beginning. Some general aspects of RR are discussed by the author. If you do not believe in the benefits of RR, then you might have to carefully read this chapter to understand the benefits as it gives some RR tips to a newbie. This chapter also gives a road map to the reader as to what he/she can expect from the book. In any research project, there is data gathering stage, data analysis stage and presentation stage. The book contains a set of chapters addressing each stage of the project. More importantly, the book contains ways to tie each of the stages so as to produce a single compendium for your entire project.

Getting started with R, RStudio and knitr/rmarkdown
This chapter gives a basic introduction to `R` and subsequently dives in to `knitr` and `rmarkdown` commands. It shows how one can create a `.Rnw` or `.Rtex` document and convert in to a pdf either through RStudio or the command line. `rmarkdown` documents on the other hand are more convenient for reproducing simple projects where there are not many interdependencies between various tasks. Obviously the content in this chapter gives only a general idea. One has to dig through the documentation to make things work. One learning for me from this chapter is the option of creating `.Rtex` documents in which the syntax can be less baroque.

Getting started with File Management
This chapter gives the basic directory structure that one can follow for organizing the project files. One can use the structure as a guideline for one’s own projects. The example project uses gnu make file for data munging. It also gives a crash course of bash.

Storing, Collaborating, Accessing Files, and Versioning
The four activities mentioned in the chapter title can be done in many ways. The chapter focuses on Dropbox and Github. It is fairly easy to learn to use the limited functionality one gets from Dropbox. On the other hand, Github demands some learning from a newbie. One needs to get to know the basic terminology of git. The author does a commendable job of highlighting the main aspects of git version control and its tight integration with RStudio.

Gathering Data with R

This chapter talks about the way in which one can use GNU make utility to create a systematic way of gathering data. The use of make file makes it easy for other to reproduce the data preparation stage of a project. If you have written a make file in C++ or in some other context, it is pretty easy to follow the basic steps mentioned in the chapter. Else it might involve some learning curve. My guess is once you start writing make files for specific tasks, you will realize their tremendous value in any data analysis project. A nice starting point for learning make file is robjhyndman’s site.

Preparing Data for Analysis
This gives a whirlwind tour of data munging operations and data analysis in R.

Statistical Modeling and knitr
The chapter gives a brief description of chunk options that are frequently used in an RR document. Out of all the options, `cache.extra` and `dependson` are the options that I have never used in the past and is a learning for me. One of the reasons I like `knitr` is its ability to cache objects. In the `Sweave` era, I had to load separate packages, do all sorts of things to run a time intensive RR document. It was very painful to say the least. Thanks to `knitr` it is extremely easy now. Even though `cache` option is described at the end, I think it is one of the most useful features of the package. Another good thing is that you can combine various languages in RR document. Currently `knitr` supports the following language engines :

• Awk
• Bash shell
• CoffeeScript
• Gawk
• Highlight
• Python
• R (default)
• Ruby
• SAS
• Bourne shell

Showing results with tables
In whatever analysis you do using `R` , there are always situations where your output is in the form of a `data.frame` or `matrix` or some sort of `list` structure that is formatted to display on the console as a table. One can use `kable` to show `data.frame` and `matrix` structures. It is simple, effective but limited in scope. `xtable` package on the other hand is extremely powerful. One can use various statistical model fitted objects and pass it on to `xtable` function to obtain a `table` and `tabular` environment encoded for the results. The chapter also mentions `texreg` that is far more powerful than the previous mentioned packages. With `texreg` , you can show the output of more than one statistical model as a table in your RR document.There are times when the output classes are not supported by `xtable`. In such cases, one has to manually hunt down the relevant table, create a data frame or matrix of the relevant results and then use `xtable` function.

Showing results with figures
It is often better to know basic `LaTeX` syntax for embedding graphics before using `knitr`. One problem I have always faced with `knitr` embedded graphics is that all the chunk options should be mentioned in one single line. You cannot have two lines for chunk options. Learnt a nice hack from this chapter where some of the environment level code can be used as markup rather than as chunk options .This chapter touches upon the main chunk options relating to graphics and does it well, without overwhelming the reader.

Presentation with knitr/LaTeX
The author says that much of the LaTeX in the book has been written using Sublime Text editor. I think this is the case with most of the people who intend to create an RR. Even though RStudio has a good environment to create a LaTeX file, I usually go back to my old editor to write LaTeX markup. How to cite bibliography in your document? and How to cite R packages in your document? are questions that every researcher has to think about in producing RR documents. The author does a good job of highlighting the main aspects of this thought process. The chapter ends with a brief discussion on Beamer. It gives a 10,000 ft. of beamer. I stumbled on to a nice link in this chapter that gives the reason for using `fragile` in beamer.

Large knitr/LaTeX Documents: Theses, Books, and Batch Reports
This chapter is extremely useful for creating long RR documents. In fact if your RR document is not large, it makes sense to logically subdivide in to separate child documents. For `knitr`, there are chunk options to specify `parent` and `child` relationships. These options are useful in knitting child documents independently of the other documents embedded in the parent document. You do not have to specify the preamble code again in each of the child documents as it inherits the code from the parent document. The author’s also shows a way to use `Pandoc` to change rmarkdown document to `tex`, which can then be included in the RR document.

The penultimate chapter is on `rmarkdown`. The concluding chapter of the book discusses some general issues of reproducible research.

Takeaway:

This book gives a nice overview of the main tools that one can use in creating a RR document. Even though the title of the book has the term “RStudio” in it, the tools and the hacks mentioned are IDE agnostic. One can read a book length treatment for each of the tools mentioned in the book and might easily get lost in the details. Books such as these give a nice overview of all the tools and hence motivate the reader to dive into specifics as and when there is requirement.

This book can be used as a companion to a more pedagogical text on survival analysis. For someone looking for an appropriate `R` command to use, for fitting certain kind of survival model, this book is apt. This book neither gives the intuition nor the math behind the various models. It appears like an elaborate help manual for all the packages in `R`, related to event history analysis.

I guess one of the reasons for the author writing this book is to highlight his package `eha` on CRAN. The author’s package is basically a layer on `survival` package that has some advanced techniques which I guess only a serious researcher in this field can appreciate. The book takes the reader through the entire gamut of models using a pretty dry format, i.e. it gives the basic form of a model, the `R` commands to fit the model,and some commentary on how to interpret the output. The difficulty level is not a linear function from start to end. I found some very intricate level stuff interspersed among some very elementary estimators. An abrupt discussion of Poisson regression breaks the flow in understanding Cox model and its extensions. The chapter on cox regression contains detailed and unnecessary discussion about some elementary aspects of any regression framework. Keeping these cribs aside, the book is useful as a quick reference to functions from `survival`, `coxme`, `cmprsk` and `eha` packages.

As the title suggests, this book is truly a self-learning text. There is minimal math in the book, even though the subject essentially is about estimating functions(survival, hazard, cumulative hazard). I think the highlight of the book is its unique layout. Each page is divided in to two parts, the left hand side of the page runs like a pitch, whereas the right hand side of the page runs like a commentary to the pitch. Every aspect of estimation and inference is explained in plain simple English. Obviously one cannot expect to learn the math behind the subject. In any case, I guess the target audience for this book comprises those who would like to understand survival analysis, run the model using some software packages and interpret the output. So, in that sense, the book is spot on. The book is 700 pages long and so all said and done, this is not a book that can be read in one or two sittings. Even thought the content is easily understood, I think it takes a while to get used the various terms, assumptions for the whole gamut of models one comes across in survival analysis. Needless to say this is a beginner’s book. If one has to understand the actual math behind the estimation and inference of various functions, then this book will equip a curious reader with a 10,000 ft. view of the subject, which in turn can be very helpful in motivating oneself to slog through the math.

Here is a document that gives a brief summary of the main chapters of the book.

This book is vastly different from the books that try to warn us against incorrect statistical arguments present in media and other mundane places. Instead of targeting newspaper articles, politicians, journalists who make errors in their reasoning, the author investigates research papers, where one assumes that scientists and researchers make flawless arguments, at least from stats point of view. The author points a few statistical errors, even in the pop science book, “How to lie with statistics?”. This book takes the reader through the kind of statistics that one comes across in research papers and shows various types of flawed arguments. The flaws could arise because of several reasons such as eagerness to publish a new finding without thoroughly vetting the findings, not enough sample size, not enough statistical power in the test, inference from multiple comparisons etc. The tone of the author isn’t deprecatory. Instead he explains the errors in simple words. There is minimal math in the book and the writing makes the concepts abundantly clear even to a statistics novice. That in itself should serve as a good motivation for a wider audience to go over this 130 page book.

In the first chapter, the author introduces the basic concept of statistical significance. The basic idea of frequentist hypothesis testing is that it is dependent on p value that measure Probability(data|Hypothesis). In a way, p value measures the amount of surprise that you find in the data given that you have a specific null hypothesis in mind. If the p value turns out to be too less, then you start doubting your null and reject the null. The procedure at the outset looks perfectly logical. However one needs to keep in mind, the things that do not form a part of p value such as,

• It does not per se measure the size of the effect.
• Two experiments with identical data can give different p values. This is disturbing as one assumes that p value somehow knows the intention of the person doing the experiment.
• It does not say anything about the false positive rate.

By the end of the first chapter, the author convincingly rips apart p value and makes a case for using confidence intervals. He also says that many people do not report confidence intervals because they are often embarrassingly wide and might make their effort a fruitless exercise.

The second chapter talks about statistical power, a concept that many introductory stats courses do not delve in to, appropriately. The statistical power of a study is the probability that it will distinguish an effect of a certain size from pure luck. The power depends on three factors

• size of the bias you are looking for
• sample size
• measurement error

If an experiment is trying to test a subtle bias, then there needs to be far more data to even detect it. Usually the accepted power for an experiment is 80%. This means that the probability of bias detection is close to 80%. In many of the tests that have negative results, i.e the alternate is rejected, it is likely that the power of test is compromised. Why do researchers fail to take care of power in their calculations? The author guesses that it could be because the researcher’s intuitive feeling about samples is quite different from the results of power calculations. The author also ascribes to the not so straightforward math required to compute the power of study.

The problems with power also plague the other side of experimental results. Instead of detecting the true bias, the results often show inflation of true result, called M errors, where M stands for magnitude. One of the suggestions given by the author is : Instead of computing the power of a study for a certain bias detection and certain statistical significance, the researchers should instead look for power that gives narrower confidence intervals. Since there is no readily available term to describe this statistic, the author calls it assurance, which determines how often the confidence intervals must beat a specific target width. The takeaway from this chapter is that whenever you see a report of significant effect, your reaction should not be “Wow, they found something remarkable", but it needs to be, "Is the test underpowered ?". Also just because alternate was rejected doesn’t mean that alternate is absolute crap.

The third chapter talks about pseudo replication, a practice where the researcher uses the same set of patients/animals/ whatever to create repeated measurements. Instead of bigger sample sizes, the researcher creates a bigger sample size by repeated measurements. Naturally the data is not going to be independent as the original experiment might warrant. Knowing that there is a pseudo replication of the data, one must be careful while drawing inferences. The author gives some broad suggestions to address this issue

The fourth chapter is about the famous base rate fallacy where one ascribes the p value to the probability of alternate being true. Frequentist procedures that give p values merely talk about the surprise element. In no way do they actually talk about the probability of alternate treatment in a treatment control experiment. The best way to get a good estimate of probability that a result is false positive, is by considering prior estimates. The author also talks about Benjamini-Hochberg procedure, a simple yet effective procedure to control for false positive rate. I remember reading this procedure in an article by Brad Efron titled, “The future of indirect evidence”, in which Efron highlights some of the issues related to hypothesis testing in high dimensional data.

The fifth chapter talks about the often found procedure of testing two drugs with a placebo and using the results to compare the efficiency of two drugs. Various statistical errors can creep in. These are thoroughly discussed. The sixth chapter talks about double dipping, i.e. using the same data to do exploratory analysis and hypothesis testing. It is the classic case of using in-sample statistics to extrapolate out-of-sample statistics. The author talks about arbitrary stopping rules that many researchers employ for cutting short an elaborate experiment when they find statistically significant findings at the initial stage. Instead of having a mindset which says, "I might have been lucky in the initial stage", the researchers over enthusiastically stops the experiment and reports truth inflated result. The seventh chapter talks about the dangers of dichotomizing continuous data. In many research papers, there is a tendency to divide the data in to two groups and run significance tests or ANOVA based tests, thus reducing the information available from the dataset. The author gives a few examples where dichotomization can lead to grave statistical errors.

The eighth chapter talks about basic errors that one does in doing regression analysis. The errors highlighted are

• over reliance on stepwise regression methods like forward selection or backward elimination methods
• confusing correlation and causation
• confounding variables and Simpson’s paradox

The last few chapters gives general guidelines to improve research efforts, one of them being “reproducible research”.

Takeaway

Even though this book is a compilation of various statistical errors committed by researchers in various scientific fields, it can be read by anyone whose day job is data analysis and model building. In our age of data explosion, where there are far more people employed in analyzing data and who need not necessarily publish papers, this book would be useful to a wider audience. If one wants to go beyond the simple conceptual errors present in the book, one might have to seriously think about all the errors mentioned in the book and understand the math behind them.

The book serves a nice intro to Bayes theory for an absolute newbie. There is minimal math in the book. Whatever little math that’s mentioned, is accompanied by figures and text so that a newbie to this subject “gets” the basic philosophy of Bayesian inference. The book is a short one spanning 150 odd pages that can be read in a couple of hours.  The introductory chapter of the book comprises few examples that repeat the key idea of Bayes. The author says that he has deliberately chosen this approach so that a reader does not miss the core idea of the Bayesian inference which is,

Bayesian inference is not guaranteed to provide the correct answer. Instead, it provides the probability that each of a number of alternative answers is true, and these can then be used to find the answer that is most probably true. In other words, it provides an informed guess.

In all the examples cited in the first chapter, there are two competing models. The likelihood of observing the data given each model is almost identical. So, how does one chose one of the two models ? Well, even without applying Bayes, it is abundantly obvious which of the two competing models one should go with. Bayes helps in formalizing the intuition and thus creates a framework that can be applied to situations where human intuition is misleading or vague. If you are coming from a frequentist world where “likelihood based inference” is the mantra, then Bayes appears to be merely a tweak where weighted likelihoods instead of plain vanilla likelihoods are used for inference.

The second chapter of the book gives a geometric intuition to a discrete joint distribution table. Ideally a discrete joint distribution table between observed data and different models is the perfect place to begin understanding the importance of Bayes. So, in that sense, the author provides the reader with some pictorial introduction before going ahead with numbers.

The third chapter starts off with a joint distribution table of 200 patients tabulated according to # of symptoms and type of disease. This table is then used to introduce likelihood function, marginal probability distribution, prior probability distribution, posterior probability distribution, maximum apriori estimate . All these terms are explained using plain English and thus serves as a perfect intro to a beginner. The other aspect that this chapter makes it clear is that it is easy to obtain probability of data given a model. The inverse problem, i.e probability of model given data, is a difficult one and it is doing inference in that aspect that makes Bayesian inference powerful.

The fourth chapter moves on to  continuous distributions. The didactic method is similar to the previous chapter. A simple coin toss example is used to introduce concepts such as continuous likelihood function,  Maximum likelihood estimate, sequential inference, uniform priors, reference priors,  bootstrapping and various loss functions.

The fifth chapter illustrates inference in a Gaussian setting and establishes connection with the well known regression framework. The sixth chapter talks about joint distributions  in a continuous setting.  Somehow I felt this chapter could have been removed from the book but I guess keeping with the author’s belief that “spaced repetition is good”, the content can be justified. The last chapter talks about Frequentist vs. Bayesian wars, i.e. there are statisticians who believe in only one of them being THE right approach. Which side one takes depends on how one views “probability” as – Is probability a property of the physical world or is it a measure of how much information an observer has about that world ? Bayesians and increasingly many practitioners in a wide variety of fields have found the latter belief to be a useful guide in doing statistical inference. More so, with the availability of software and computing power to do Bayesian inference, statisticians are latching on to Bayes like never before.

The author deserves a praise for bringing out some of the main principles of Bayesian inference using just visuals and plain English. Certainly a nice intro book that can be read by any newbie to Bayes.

Takeaway

“Write your code as though you are releasing it as a package” – This kind of thinking forces one to standardize directory structure, abandon adhoc scripts and instead code well thought out functions, and finally leverage the devtools functionality to write efficient, extensible and shareable code.