 This book has been one of the MOST challenging books to work through. I had tried going through this book many times in the past but could not get past the first 10 pages of the book.  The very first concept that is mentioned in the book is the extension of measure from  a semi-algebra to a sigma algebra. The proof was just beyond me for the simple reason that my fundamentals were shaky. Not wanting to give up, I had to look for an alternative path to this book.  My alternate path was to work through some of the basic fundamentals of real analysis, read around the subject , read about the historical developments behind Lebesue measure and integral, understand Lebesgue integration from a non-measure theoretical perspective,etc. I realize that I have read about dozen books in order to work through this book. Here is the list of 12 books that have helped me:

 # Title Summary 1 Understanding Analysis Summary 2 Metric Spaces Summary 3 The Calculus Gallery Summary 4 Lebesgue Measure and Integration Summary 5 Probability Theory and its applications – I Summary 6 Lebesgue Stieltjes Integration Summary 7 A Radical Approach to Real Analysis Summary 8 A Radical approach to Lebesgue’s theory of integration Summary 9 Probability through Problems Summary 10 Probability Essentials Summary 11 Measure Integral and Probability Summary 12 Lebesgue Stieltjes Integration Will write someday

I think I was extremely dumb not to follow the book at the first go/second go. There were umpteen number of mathematical concepts and ideas that I was unaware of. However spending time on these dozen books had given me confidence to go over Rosenthal’s book. I am kind of happy with myself that I have managed to get past the daunting chapters of this book and finally understand the underpinnings of axiomatic probability. Probability and Statistics go hand in hand. The better one understands axiomatic probability, the better one is in a position to understand advanced statistics. I hope understanding measure theory helps me in someway in my future work.

It is kind of difficult to review or summarize this book with out using mathematical symbols. I can give an overview of each chapter. However I will do something different .I will try to list down list of questions (top of the mind) that this book answers. If some of the questions make you curious, then this book might be worth your time. These questions are in no particular order.

1. What do you intuitively mean by a semi-algebra of a collection of subsets of X ? Define in mathematical terms.
2. What do you intuitively mean by an algebra of a collection of subsets of X ? Define in mathematical terms.
3. What do you intuitively mean by sigma-algebra of a collection of subsets of X ? Define in mathematical terms.
4. What do you understand by a Monotone class ?
5. What is Lebesgue outer measure ?
6. What is measurable space ?
7. What is a measure space ?
8. Why can’t you define a probability measure on a semi-algebra of collection of closed intervals in [0,1]? In other words why is a measure defined on an semi-algebra of intervals, not a valid triple ?
9. Why can’t you define a probability measure on a algebra of collection of closed intervals in [0,1]? In other words why is a measure defined on an algebra of intervals, not a valid triple ?
10. What’s the difference between Lebesgue sigma algebra and Borel sigma algebra ?
11. Define simple measurable functions and state at least half a dozen of their properties?
12. Define non measurable functions and state at least half a dozen of their properties?
13. What’s the difference between measurable function on a measurable space and a measurable function on a measure space ?
14. What you mean by a measurable function being integrable with respect to a specific measure ?
15. Give an example of a function that is Lebesgue integrable but not Riemann integrable
16. How do you characterize a non-negative measurable function in terms of a sequence of non-negative simple measurable functions ?
17. How do you intrinsically characterize a non-negative measurable function?
18. Give an example where Dominated Convergence theorem can be used.
19. Given an example where Monotone Convergence theorem can be used
20. Given an example where Bounded Convergence theorem can be used ?
21. What is a metric space ? Is the space of Lebesgue integrable functions a metric space ?
22. What are measurable sets ? How do you identify a measurable set ? What are the properties of measurable sets ?
23. Is Cantor set measurable ?
24. Give an example of Non-measurable function ?
25. What is a set function ? When can a set function be called a measure ?
26. Is conditional probability a random variable ? If so, What guarantees its existence ?
27. What are finite measures ?
28. What are singular measures ? Give an example
29. If a measure is not discrete, does it necessarily have to be absolutely continuous ?
30. What’s the relevant of Fubini’s theorem in probability ?
31. What are the modes of convergence ?
32. What’s the relationship between point wise, almost sure, uniform and almost uniform convergence ?
33. Can there be a countable additive set function on intervals other than the length function?
34. Why should the distribution function of a random variable be right continuous ? What’s the connection between right continuous and countably additivity property ?
35. Finite additivity + Countably subadditive property of measures is equivalent to countably additive property of measures
36. Starting from a semi-algebra of collection of intervals in [0,1], how do you construct a measure on a sigma algebra ?
37. What’s the key result that comes from applying Borel-Cantelli Lemma ?
38. Why does an integral crop up when talking about expected value of a random variable ?
39. State Markov’s , Chebyshev’s, Jensen’s and Schwartz inequalities. How do they become useful in proving law of large numbers ?
40. What’s the difference between strong law of large numbers and weak law of large numbers ?
41. How can one restate weak law of large numbers by removing the strict condition on the boundedness of second moment ?
42. How can one restate strong law of large numbers by removing the strict condition of finite third moment ?
43. What are Lebesgue measurable functions and Borel measurable functions ?
44. How do you prove the existence of Markov chain ?
45. Definitions of transient state, recurrent state, null recurrent state, positive recurrent state, irreducible chain, aperiodic chain, ergodic chain
46. For a  Discrete Markov chain , will the time average be always equal to the ensemble average ? If not, under what conditions do they converge ?
47. Given an finite discrete Markov chain, How do you compute the stationary distribution ?
48. What’s the connection between renewal theory and stationary distribution of a finite discrete ergodic chain ?
49. Does convergence in distribution imply convergence in probability ? Does convergence in probability imply convergence in distribution ?
50. In a generic setting, Conditional expectation of a random variable has to be guessed. There is no clear cut formula for computing conditional expectation. So, how does one verify whether the guess is an appropriate one ?
51. Can you relate the formula of total variance to ANOVA in statistics ?
52. State a few important properties of Conditional Expectation
53. How do you handle probability on events with 0 measure ?
54. State Lebesgue decomposition theorem ?
55. What are Hilbert Spaces ? What’s the connection between Hilbert spaces and class of Lebesgue intergrable functions ?
56. What do you understand by product sigma algebra ? How is it relevant to multivariate distributions ?
57. What are Lp spaces? How does one define norm on such spaces ? How do these spaces connect with the concept of “moments” for random variables ?
58. What’s the connection between orthogonal property of Hilbert spaces and the correlation between two random variables ?
59. Intuitively explain the difference between almost sure convergence and convergence in probability ?
60. When you extend a measure from a semi-algebra to sigma-algebra, how do you check whether the extended measure is unique ?
61. Is a symmetric simple random walk , a null recurrent Markov chain or a positive recurrent markov chain  ? Prove it
62. Define a null set
63. If a family of random variables are uniformly integrable, what does it mean ?
64. Conditional probability is not a number. It is a random variable. Explain the intuition behind the reason for thinking in terms of random variable
65. Define Martingale, sub Martingale and super Martingale ?
66. What do you means by Event happening infinitely often ? What do you mean by Event happening almost always ? Can you make it precise using set notation ?
67. Expectation operator is order preserving . Prove it
68. Why should one not be satisfied with Riemann Integral ? What’s the intuition behind Lebesgue integral ?
69. When you condition a variable on a sub-sigma algebra , what does it mean ? How does it relate to conditioning a variable on another variable ?
70. What is law of total expectation ?
71. What’s the connection between moment generating function, characteristic function, probability generating function, Laplace transformation ?
72. Why not define measure on the outcome space rather than event space ( I guess it’s a dumb question to ask , but I never thought this aspect at all for many years !)
73. Why can’t a countably additive measure be defined for all subsets of [0,1]
74. What is the condition on the function to be Riemann integrable ?
75. What is the condition on the function to be Lebesgue integrable ?
76. State the fundamental theorem of calculus using Lebesgue integral and Lebesgue measure
77. Convergence in pth norm need not imply convergence almost everywhere. Is it true ? If so , give an example
78. What do you mean “ Measure A dominates Measure B” ?
79. What’s the connection between conditional expectation and projection ?
80. Can a set function be countably additive and not finitely additive  ?
81. What are some of the areas where Martingales are being used ?
82. Convergence in measure is often a weaker result than convergence in probability ? Explain
83. Weak convergence of distributions is equivalent to point wise convergence of characteristic functions. Prove it

I think this is one of the best books on axiomatic probability. Stochastic processes described in the book are in the discrete setting only. However seeing something in a discrete world clearly, and understanding it, is a stepping stone for working the continuous parameter – continuous state space.