The author urges the reader to develop a “probabilistic perspective” towards many aspects that one comes across in life. Rational thought about randomness is far better than irrational emotional responses. Whenever we see things being reported, coincidences being sighted, it is better to start off by understanding the question “how unlikely is the event?”. One needs to figure ,” Out of How many ? question to answer the likelihood of the event . For example, the famous 6 degrees of separation that is often quoted, sounds pretty reasonable once you do a back of the envelope calculation.

Another example is the Birthday problem that is cited as a good party trick. The reason it is termed as a trick is : Most of us do not get the “Out of how many “ question right. If 23 people are selected at random in a party, there is just over 50% probability that they have the same birthday. Well if one thinks it should 23/365(6.3%), one must understand that 23/365 is the right answer to a different question,”If you choose 23 people and see how many of them share a __specific__ day as their birthday?”. In the case of birthday party problem, one needs to look at “Out of how many”. Since there are 253 pairs that can be formed in a set of 23 people, the upper limit should be 253/365(69%) and taking in to account the double counting that happens, the probability works out to be 50.7%. Simple notions such as “Out of how many?”, “understanding clumping of outcomes” are enough to develop a probabilistic perspective to many things that we come across.

Out of 16 essays on probability, I will try to summarize a few of them that I found interesting:

__Randomness to the Rescue: When Uncertainty Is Your Friend__

This section talks about situations where randomness is used deliberately to create fair play. While discussing various examples the author mentions about the famous example of monkey sitting on a type writer and eventually producing Shakespeare’s work after infinite time. It is said that the monkey will take trillion, trillion, trillion, trillion years to type the statement , “* it was the best of times, it was the worst of times*”. One can take the statement at face value as somehow the probability that such a sentence comes out of random typing seems very low. But what if one was asked to calculate the average time, assuming we all live for infinite time. Even though the book does not go in to math behind it, Just pause for a minute and think about it. It is not a trivial question. How long does a monkey take to type the above phrase ? It can be solved by formulating a discrete martingale process. One solution is as follows :

Consider a situation where there are infinite gamblers who are willing to bet on monkey’s output. Let the time be measured in seconds. At each second, the monkey types in a letter at random (assume for simplicity that monkey can type in small cap letters only (26) and use only 2 punctuation marks out of the 14 in English language, space and comma). The sequence of letters and punctuation marks form a series of IID observations. So, the monkey at each second will type one of the 28 characters (26 letter + 2 punctuation symbols).

Now here is the set up : Every second a gambler walks in and bets $1 on the event that the monkey will type “i” the first character. If he wins, he bets that “t” would be the second character that the monkey types. If he wins, he bets that space as the third character, etc..The gambler is betting the monkey will type in “it was best of the times, it was worst of the times” in 51 seconds, one second for each character. The first second he gambles and gets it right he wins $28 dollars As soon as he wins $28 dollars, he uses all his money to bet on the subsequent character. If he gets it right on the second letter, he wins $28^2. He bets the entire amount on the third character, i.e. the space character, so on and so forth.The gambler drops out of the game and wins nothing when the monkey does not type the phrase in exactly 51 seconds. The game stops when any gambler wins $28^51 = 63 trillion, trillion, trillion, trillion, trillion, trillion dollars. Given this setup, one can construct a discrete martingale process for each gambler and sum up the payoffs for all gamblers till the game ends. One can show that the expected value of time when the game ends has the most significant term as 28^51 seconds, followed by 28^11 seconds. So, if you take the most significant term, the average time taken for the game to stop, i.e. the monkey to type the phrase, “it was the best of times, it was the worst of times” will be 2 million, trillion, trillion, trillion, trillion , trillion years. Human authors should not drop their pens just yet

The section mentions examples like the game Rock Paper Scissors, transmission of packets on the internet, cryptography and other areas where randomness is used widely. There are two examples that I have particularly liked. One is in the field of sports. There was a big debate after 1996 Olympics where a runner was disqualified as he had anticipated the starter gun. The issue was finally resolved by a random number generator. When the runners line up and are ready to go, the time at which the starter gun is fired is at a random time based on exponential distribution. By making the start time a random time, runners were prevented from anticipating the starter gun, while not penalizing them for extraordinary fast reaction times.

The other example I liked is the way author describes Markov chain Monte Carlo (MCMC).

Suppose you wanted to measure the average pollution level in a large wilderness lake system. You might proceed by setting out in a boat, and paddling this way and that way, from inlet to inlet and lake to lake, throughout the park, without any particular destination. Every five minutes you take a water sample and measure the pollutants. Each new sample would be taken just a short distance from the previous sample, thus continuing from where the previous sample left off. Still if you average the pollution levels in many different samples over many days of paddling, eventually you get an accurate picture of the lake system.

Basically the above set up is not be confused with sampling the level at random places. This set up means that your random move from one spot to another spot should be independent on the history and your next inlet should be a random choice. Nothing in the path should influence that decision. In more technical terms, you are basically specifying a random walk where each step in the random walk is completely independent of the steps before the current position. The proposed next step has no dependence on where the walk has been before, and the decision to reject or accept the proposed step has no dependence on where the walk has been before.

__Evolution, Genes, and Viruses: Randomness in Biology__

This section talks about various examples relating to “Branching processes”. Any undergrad course on probability introduces a student to branching processes where the student is asked to compute the probability of an eventual extinction of species, given a certain regenerating behavior. Various examples are given that help the reader built an intuition for the branching processes. One of the nice examples that I came across is the chain mail spam that we often get. The message always tends to be “ Send this message to 5/10/X friends”. This is a branching process where the mail replicates itself based on how many people you forward the message. The message is typically NEVER “Send this message to 1 or 2 friends”. The content is always asking you to send to multiple people, typically more than 5. You can actually prove mathematically that the chain will die very soon if the message told you to forward it to just 1 or 2 friends.

__That Wily Monty Hall: Finding Probabilities from Clues__

This section talks about conditional probability and examples where we adjust probabilities given the evidence either too much or too little. When we are talking about probability of an event given that we see some data, the event space of the experiment (if we have that mental model) shrinks. This means that we need to reassess the probabilities given the data. Instead of updating the priors too much or too little, we need to adjust just about the right amount. The section ends with discussing about Frequentists and Bayesians, the philosophical difference being the latter set view all probabilities as conditional probabilities while the former set is comfortable about talking in absolute probability terms. I think it is a nice way of verbalizing the need for understanding conditional probability, i.e. “for not updating the priors too much or too little”.

__Spam, Spam, Probability, and Spam: Blocking Unwanted E-mail__

I learnt something unrelated to probability in this section. The origin of the word “Spam”.

Spam was originally a canned-meat product developed by Hormel corporation in 1937. During the fresh-meat shortage of World War II, Spam was distributed widely and consumed by soldiers, civilians worldwide. The 1970s comedy group Monty Python mocked the widespread availability of Spam in their famous skit about restaurant that offers breakfast delicacies such as Spam Sausage, Spam, Spam bacon, Spam tomato etc. This skit made the word “Spam” synonymous with any item that is overly abundant.

This section as it is obvious from the title, talks about using probability theory to solve classification problem, i.e. assigning a score for an incoming message as being spam or ham. Based on this, the system learns to keep your inbox spam-free. BTW, the Gmail version where you are also given the power to mark messages as spam fall under the category of filters called “bogofilters”.

__Ignorance, Chaos, and Quantum Mechanics: Causes of Randomness__

The last essay makes a point using Quantum mechanics that nature is random at its core. While scientific community had to go through years of debate and research to prove it, we should be glad to be living in today’s world where the notions of probability and randomness are developed to a large extent. More importantly having a probabilistic mindset is becoming a necessary skill to separate the signal from an overwhelming noise that we come across daily.