This book is more an historical account of two scientists (Alexander von Humboldt, Carl Friedrich Gauss) than a work of fiction.Daniel Kehlmann weaves interesting fiction around these two brilliant scientists exploring their lives, their view points, their personalities that seem to be completely opposite. Alexander von Humboldt believed that knowledge comes from exploring the world, while Gauss pretty much developed everything sitting in an observatory.
Alexander von Humboldt
Carl Friedrich Gauss
I will try to summarize the contrasting lives that the book that this book brings out.
Humboldt came from a wealthy family and he used a part of the wealth to explore the world. In modern day parlance he never needed any VC funding. He funded his explorations all by himself. Gauss on the other hand always skirted with poverty in the process of living in the “Numbers” world. He takes up a surveying job to take care of his bread and butter. He remarries second time so that there is someone to take care of his children as he could not afford a nurse. The scientists thus have a contrasting financial background.
Humboldt’s vision was that of a unified science of earth, vegetation, animals, human beings , a science that integrated everything. He self taught every branch of science. This is really unbelievable stuff if you look at the range of contributions that Humboldt made. He self taught everything be it measurement types, instruments, instrument design, design of experiments, etc. Humboldt believed that “Whenever things were frightening, it was a good idea to measure them”. I am reminded of Robert Almgren from NYU who said a similar thing about intra-day volatility. If intra-day volatility is frightening, all the more reason to measure it.
Humboldt also knew that there is no easy way out for knowledge. An incident described in the book strikingly makes this point. Humboldt orders his servant to place two hot cupping glasses on his back so that it produces blisters on his back. He wanted to test some hypothesis about the flow of current. If a person can subject himself to such a torture to understand stuff, his motivation must have been completely different. Somewhere in the book he says,”A great deal of knowledge escapes man because he is afraid of the pain. The man who deliberately undergoes pain learns things that he didn’t. One wanted to know because one wanted to know.”
Gauss on the other hand had markedly different beliefs. For him learning was effortless. He could see things instantly and formed conjectures and theories very easily. A brief balloon ride made him realize that space was curved and parallel lines can meet based on the type of assumptions one made. For Gauss, everything was obvious. He could add up numbers from 1 to 100 with in no time. He could count primes like one counts numbers. He disparaged people who could not match his intellect which included his son too. His looked down upon people who could not think as fast as he could. He sometimes felt that “People wanted Peace. Most People wanted to eat and sleep and have other people be nice to them. What they didn’t want to do was think”. Needless to say with this kind of personality he was a loner and most of his work was result of singular brilliance. Contrast to this, Humboldt was a classic collaborator. His contributions grew by the kind of associations he formed with various scientists, philosophers, mathematicians etc.
Humboldt spent his time practicing stuff before he could make his contribution. For about one year, he took daily measurements of the air pressure, he mapped the magnetic field, tested the air, the water, the earth and color of the sky. He practiced dismantling and reassembling every instrument until he could do it blind, standing on one leg, in rain, surrounded by a herd of fly-tormented cows. He believed that “A hill whose height remained unknown was an insult to his intelligence” and hence spent an enormous time measuring things. The measurements and the travel helped him become universalist. Direct experience through the senses of world, but measured experience made Humboldt a true quantitative researcher.
As for Gauss, he spent a lot of time silently watching the various planets, making measurements of various heavenly bodies. His deliberate practice involved sitting hours , observing and noting measurements diligently. So, this is one commonality of the two geniuses. Malcolm Gladwell’s theory of “10k hours to become a genius” seems to holding water in many cases.
Humboldt begins working at mining academy. Unlike today’s mining world, a lot of new theories were put forth by mining experts. He would have learnt a lot of math/ earth science. Biology / geology were just being created around the time of Humboldt. This initial exposure to the mining world made him believe that “Whole world as a lab”. Gauss on the other hand, theorized effortlessly.
Family & Social Life
Gauss contemplated suicide incase his love was to be rejected. To his biggest surprise his love was accepted and he aborted the suicide attempt. Humboldt never married and remained a solitary individual throughout his life. Gauss had a mistress whom he loved more than his second wife. After the death of his first wife, he even considers marrying his mistress but finally doesn’t after being dissuaded by one of his friends. He remarries so that there could be someone to take care of his children as employing a nurse was expensive for him. He hated his children as none could match up his expectations. Gauss spent an enormous time seeing through his telescope in the observatory to take the measurement of the planets, heavenly bodies etc. He practically never had to move to work. His work was entirely sedentary. This is in marked contrast with Humboldt who formed instant connections with strangers who sometimes were part of his major explorations. Humboldt loved travel and hated home. He disliked his mother and never cried when mom died. Gauss on the other hand loved his mother more than anybody else in the world. For him mother came first and only then came others.
Work & Key Achievements
Gauss shot to prominence after he accurately predicted the precise date and time of the appearance of a planet. Subsequently he started studying and working in the field of astronomy.
Humboldt believed that data had to be presented well in order to get insights. He was the first person who was pivotal in making science more visual. During his explorations, he was actively charting stuff, creating new kind of maps( ISO Maps). He was first person who analyzed the world geographically and not historically ( new world vs old world kind of stuff). He was the first person to discover the Natural canal that connects Orinico and Amazon.
His work inspired Darwin who went on to create an outstanding piece of scientific literature – “Origin of Species”. Darwin described him as ‘the greatest scientific traveler who ever lived’. Goethe declared that one learned more from an hour in Humboldt’s company than eight days of studying books and even Napoleon was reputed to be envious of his celebrity.
Humboldt spent 5 years in exploring the world and wrote profusely for the next 30 years. One wonders whether this prolific output is one of the main reasons for him becoming a household name by the turn of century. Most of his written text romanticizes his expedition and findings. The 30 odd volumes describing his measurements and adventures shows that he was totally passionate about his measurements, his explorations etc.
The book is a clever play on contrasting narrative. Humboldt firmly believed that to become the scientist, you have to travel. Gauss philosophy, on the other hand, can be understood by the following statement from the book:
Gauss observed the moment of the magnetic needle by the light of an oil lamp for hour after hour. No sound penetrated to him. Just as the balloon flight showed him what space was, at some point he would understand the restlessness in the heart of NATURE. One didn’t need to clamber up mountains or torment oneself in the jungle. Whosoever observed the needle was looking in to the interior of the world.
Despite these contrasting life styles and beliefs, both the scientists made ground breaking discoveries. This book delves in to their personalities and provides a fantastic narrative which makes it a worthwhile read.
To visualize data efficiently, one must make a transition from a state where you use point and click interfaces (a user’s view) to a state where you can code your own graphic( a developer’s view). At the outset, “User’s view” is very appealing as you can use a cutesy GUI to draw some graphs etc. However if you have discover something in the data, this “User’s view” is of no use and one needs to acquire developer skills to visualize graphics. The book is about a package “grid”, written by Prof.Paul Murrell. Well, one might think that base graphics is good enough. Why go in for grid ? Well, the power of grid lies in the fact that it gives complete control of various graphic elements, their position and their characteristics. Let’s say you have a scatterplot which is produced by base R. By using Grid package, you can rebuild the entire graphic piece by piece. This in itself is not the purpose of grid package, but it shows that any graphic that you have seen in base graphics/ lattice can be updated, created, modified at your heart’s content. This book gives a very detailed description of the grid package to create and explore various graphics.
At the outset, grid does not provide high level functions to create plots, meaning there is no specific function to create a histogram/ boxplot etc. The functions in the grid package provide the basis for creating high level graphics, ability to add sophisticated annotations to an existing graphic etc. So, the three big things that you can accomplish using grid are
- Create and control different graphical regions and coordinate systems.
- Control which graphical region and coordinate system graphical output goes into.
- Produce graphical output (lines, points, text, …) including controlling its appearance (colour, line type, line width, …).
If the reader is already aware of base / traditional graphics, he/she can jump to Chapter 5 to understand the grid package in depth. However the first chapters are very well written and deserve to be read.
Chapter 1: An Introduction to R graphics
R Graphics follows a “painter’s model” where a graphic is built using sequentially one component at a time. Most of the software apps that churn out graphics have a few commands to produce graphics. R gives a unique capability to add components bit by bit thus creating extremely rich graphics. Before I started using R, the only way I drew graphics was Excel, a few functions in SAS, Minitab. I learnt Ruby on Rails, mainly to build GUI quickly. I had painstakingly built a Ruby on Rails interface for a stat arb project during my research work, while one of the Phds in the group, built an awesome interface of the same using MATLAB. MATLAB’s interface building is extremely quick with the immense support of graphic libraries. I veered towards R and have been amazed at the graphic capabilities. One can safely say in the current context that, R beats Matlab in terms of sheer diverse nature of the graphs that R can churn out. The other day I was listening to Hadley Wickham’s talk at Google where he talked about the interactive nature of R which is yet to be explored by the R community. The open source nature makes it all the more powerful. Thanks to R graphics, the range of graphics that I am able to explore is definitely high. However there is still a ton of stuff that I need to explore. Ggobi is high on my priority list and someday I will have to take time to understand it.
I had never wanted to write summaries about R books but I changed my mind after reading some good summaries of a few R books on a blog. That’s when I decided to write summaries of R books that I have read / will be reading, the rationale being that the summaries might serve as a catalogue of various packages and programming hacks that are available in various books.
The first chapter of the book starts off with giving a flavour of graphs that are possible with R base package and other additional packages available such as maps , mapproj , CircStats , vcd , grid , party. Subsequently the book goes on to give a skeletal view of R graphic system.
The graphics engine consists of functions in the grDevices package and provides fundamental support for handling such things as colors and fonts, and graphics devices for producing output in different graphics formats.The graphics package mentioned in “Graphic Systems” refer to the base graphics package available in R. This is the default package that gets installed under a fresh R installation. There are additional packages that are add-ons for the graphics system / grid system. Lattice and ggplot2 build on grid systems and they have a ton of functionality that gives a tremendous flexibility for drawing graphics. Lattice also comes as a default package under the R installable. Infact the grid package started off with the intent of providing extensions to the lattice package(mentioned in one of the Paul Murrell’s talk)
These packages do take some time to understand/ get used to / apply in a project. Mastering base graphics package takes time. I had never ever used a painter’s model for building a graphic, so the learning curve was very steep for me. As of R2.13, there are about 87 objects in graphics which are
Combined with these 87 objects are various function arguments that go with it. So, it is quite a feat to remember most of the functions and arguments from the package. Thanks to excellent documentation for the various functions, the coder can refer to help as and when the situation demands. About the other packages, there are about 145 objects in lattice, 171 objects in grid, 554 objects in ggplot2. So, in total there are about 957 graphics related objects in the 4 packages, i.e graphics, lattice, grid and ggplot2. Well actually the functions matter not the objects, but the number of objects kind of gives an idea of the possibilities that one can explore. Working with the 87 objects in the base graphics was a steep learning curve for a newbie like me. Even though producing a simple scatterplot is shown as a starting example of using R graphics package, one has to work through, remember arguments, functions, etc and basically slog it out to draw a good visual. In the process one needs to remember the attributes for various arguments that can be used in a function. I have referred Chapter 2 and Chapter 3 umpteen number of times in my work.
Which graphic packages needs to be used? It obviously depends on the requirement. One needs to look at the broad functions of a package before deciding to learn and use it. Functions in the graphics systems and graphics packages can be broken down into three main types: high-level functions that produce complete plots; low-level functions that add further output to an existing plot; and functions for working interactively with graphical output. So, if one’s requirement is simple visual, the base graphics do superb job. However if you want to update the graphic, embellish the graphic and work on it bit by bit, lattice/ggplot2 are the packages to be used.
One of the powerful features of R graphics is the range output that one can push the graphic to, with the help of a few commands. The graphic devices that one can output an R graphic are Microsoft Windows window, Mac OS X Quartz window ,Adobe PostScript file, Adobe PDF file, LATEX PicTEX file ,XFIG file, GhostScript conversion to file ,PNG bitmap file ,JPEG bitmap file,Windows Metafile file , Windows BMP file ,GTK window (gtkDevice), Java Swing window (RJavaDevice) ,SVG file (RSvgDevice).The other fascinating aspect about R is the output from a Sweave file is a Latex file which can be easily made in to a publication ready document. Recently I stumbled on to beamer and I am fascinated by the range of output documents that beamer can produce.Beamer package has been used by statisticians for quite sometime. I am kind of ashamed of the fact that I had never used Beamer before. Need to work on it sometime soon. An additional and useful point made towards the end of the chapter is about “onefile” argument that can be used to produce output to a single file / multiple files.
Chapter 2 : Simple Usage of Traditional Graphics
The traditional graphics system provides a standard set of basic plot types. Plot() function produces scatterplots, barplot() produces barplot, hist() produces histograms , boxplot() produces boxplots, pie() produces pie charts, matplot() function is not a plot() method but a plot() with x and y as matrices, stripchart() for producing univariate scatterplot, curve() for drawing a mathematical function, stem() for stem-and-leaf plot. Basic arguments of the function that are used very often are col, lty, font, xlab, ylab, main, xlim, ylim, For a graph involving more than 2 variables, one can use contour(), filled.contour(), symbols(), image(), pairs(), stars(), mosaicplot(). For more than 2 variables there are packages like scatterplot3d, rgl , Rggobi that can be explored.
Chapter 3 : Customizing Traditional Graphics
I have referred to this chapter uncountably many times for drawing visuals from base graphics. If one restricts to using base graphics, then this chapter gives one all the nuts and bolts of using the functions and arguments in the base package. The chapter is divided in 5 sections.
The first section covers the traditional graphics model. In this context, the drawing region is divided in to three regions, outer margins, figure region, plot region. The figure region is used to draw the axes , labels, and the plot region is used to draw data points , symbols, lines etc. When you use a par(mfrow=..) or par(mfcol=..) , the inner region( region devoid of outermargins) is split in to specific rows and columns. Here is a master list of all setting for par( there are 113 of them)
Each one of the above 113 settings corresponds to some aspect of graph!. The chapter then goes on to explain the way to use colors in the visuals, color sets, lines, text, fonts, data symbols and the way they can be set for various visuals. The chapter then moves on describing layouts, one of the most important method of drawing and organizing content on an R graph. Even though par(mfrow) and par(mfc) can help you organize stuff on an R plot, the use of layout is extremely powerful as it expands the options that you have to organize stuff on an R plot
Chapter 4: Trellis Graphics: the Lattice Package
The output from a lattice function gives an object called “trellis”. Since this is a graphical object, the object can be directly updated like title, font etc. The author motivates the reader for using lattice by giving the following highlights of lattice
- The default appearance of lattice is much better than base graphics
- The lattice plot functions can be extended in several very powerful ways
- Powerful grid features are available for annotating, editing, and saving the graphics output
I think the biggest advantage of using lattice is the conditional plots. One of the things that one needs to notice while learning a new language is the terms introduced by the package. In the context of lattice, one of the most important terms that is used in Lattice package is shingle. A “shingle” is a data structure used in Trellis, and is a generalization of factors to ‘continuous’ variables. It consists of a numeric vector along with some possibly overlapping intervals. The intervals are the ‘levels’ of the shingle. One can explicitly generate the levels or used equal.count to cut a continuous variable in to various levels. The chapter then introduces trellis.par.set that is used to configure the settings such as font, col, lty, lwd , cex, col, font and pch. Another important aspect of lattice plots is arranging the plots on a panel. One can use layout argument, aspect argument and the index.cond argument to customize the layout of the graphic. One needs to play with the arguments to see the various ways to arrange the plots on the panel. The chapter ends with some basic explanation of panel functions and strip functions.
Chapter 5: Grid Graphics Model
This chapter introduces the grid graphics model in a gentle way. Instead of overwhelming the reader with too many details, the author handholds the first time user by explaining concepts gently. Grid Graphics Model has basically two types of functions. First type of functions are for drawing basic output(lines, rectangles, text) and second type of functions are used for specifying the location, colors and font. There is no predefined region for the graphical output in grid. However there is facility to define the regions(using viewport), which in my opinion is one of the most powerful features of grid graphics.
I had learnt ggplot2 before moving on to this book and hence I can relate to the author’s terminology of “Painters model”, meaning the graph needs to be built up one layer over the other. This type of graph building is very powerful as you can create a very complex graph using simple functions and then building one layer at a time.
The chapter starts off with giving a simple example of producing a scatter plot using grid package and provides some motivation to go over grid package. To produce simple scatterplot, a gamut of functions are used such as:
pushViewport, plotViewport, dataViewport, grid.points, grid.rect() grid.xaxis(), grid.yaxis(), grid.text(),grid.edit()upViewport(2), grid.rect(),downViewport()
For a first timer, learning all these functions might seem like a stretch to merely produce a scatterplot. Hence the author mentions two advantages of learning grid package, one complex plots can be produced by adding simple components one over the other, the second is the ability to edit other graphic packages like lattice and ggplot2. The two most popular graphics packages in R community are ggplot2 and lattice packages, both of which are built on grid graphics model. Hence an exposure to grid graphics will help you understand/edit/create functions from these packages. My personal experience is that, there is a learning curve to ggplot2 and one can ride this learning curve quickly if one has a good exposure of grid graphics model.
Most of the base stats options are set using par. Similarly grid uses gpar functions to set the options like fill,line type, etc. All primitives accept gpar, name and viewport as input. The options that can be used with gpar are
There are two types of graphical context here. First is explicit graphical context where one can set the above parameters to specific values and the specific grid object is displayed using explicit graphical parameters. Second type is the explicit graphical context is the default context and is invoked whenever a new graph is created.
One key feature of cex and alpha is that their effects are cumulative. Meaning if you push a grid with cex = 0.5, then you push another viewport with cex = 0.5 , then the cumulative effect is that it is multipled 0.25. Similarly if you set alpha and then push a layer with a specific color , a cumulative effect is shown on the grid. All the graphical primitive functions take vectorized input and hence a whole lot of complex graphics can be drawn using the grid.rect, grid.polygon etc.
Viewport is then introduced in the chapter when you basically get a drawing context, comprising geometric context + graphical context. A geometric context consists of a set of coordinate systems for locating and sizing output. Graphical context consists of explicit graphical parameter setting for controlling the appearance of the output.
One needs to understand certain terms in the context of grid package. Any graphic region is referred to as a “viewport”. A viewport is merely a description of the graphical region. It needs to be pushed in order to see it on a graphical device. A viewport needs to be created first and then pushed to create a geometric context upon which grid objects can be drawn. Basically you create a viewport, push the viewport and draw stuff on it. You can push any number of viewports and draw on the respective viewports. One needs to use functions like pushViewport, popViewport, upViewport, downViewport, seekViewport. pushViewport, popViewport are two functions used to push or pop the viewports. These must be distinguished from upViewport and downViewport which preserve the structure of viewports that are used in the session. The former operations do not remember the list of viewport history whereas up and down operations remember the viewport history. Grid maintains a tree of pushed viewports on each device. seekViewport might sound as a redundant option but the more you code, you realize that the seekViewport option is invaluable. You can push a viewport tree with parent and children viewport descriptions at one go and then you can use seekviewPort function repeatedly to operate on the specific viewport.
To create a viewport you need to pass the standard arguments of x coordinate, y coordinate, the width and height. One this viewport is created you can push it on to the grid.
Viewports can be smartly used to create graphics such as above using the clip attribute of pushViewport function. If you want to restrict the graphic to only the viewport you have pushed, then you have to select the clip attribute while pushing the viewport. By choosing the clip attribute as on/off/inherit, the graphical context region is altered.
Another alternative and easier way to create viewports is through layout option. Using layout option, one can push this layout viewport and then choose whichever layout that one wants to work with. An interesting feature of using layouts to structure viewports is the “null” option available to structure the graphics available.
Using the layout option, you can slice and dice the display panel. The range of options that grid package provides has motivated me to explore my own version of Billion-O-gram( from Visual Information ). The exercise was very useful in exploring various options of grid and layout functions. The chapter ends with discussion on customizing the output from lattice package.
A key takeaway from this chapter is the coordinate system that gives the user amazing number of options. First is the null option and one needs to remember that the meaning depends on the context
- unit(1, "null") inside a layout, specifies a relative width/height; outside a layout is zero.
- unit(1, "null") + unit(2, "null") inside a layout, specifies a relative width/height; outside a layout is zero.
The other units that are frequently used are "npc", "native", "inches", and "strwidth".
Chapter 6: The Grid Graphics Object Model
The chapter talks about grid objects, i.e the graphical output from grid functions. These objects are called grobs (grid objects) and gTrees( grid Trees). When you draw any grid object, the package creates a grob that you can access by a name. Once you get a handle on grob, you can update and change the content of the grob. getNames() gets you all the current grobs. The typical functions used to work on grobs are grid.get(), grid.edit(), grid.add(), grid.remove(), grid.set(). All these functions need a grob name to work with. The other type of storage mechanism is a gTree where a set of grobs are stored in a tree format. In all the above functions, one can also give the path to a grob contained in a gTree. One can use gPath to specify the element of a tree and then use grid.edit to change the respective grob. gTree can take in a viewport as an input too, in which case a viewport is pushed before gTree is pushed and then popped afterwards.
For each grid function that produces graphical output, there is a counterpart that produces a graphical object and no graphical output. The functions available are rectGrob(), textGrob(), arrowsGrob() etc.
One of the advantages of grid package is to capture the output of lattice or ggplot2 and modify the output based on your needs. For example a boxplot generated in lattice using bwplot, you can grab the tree list using grid.grab() function. You can use childNames to plot all the elements of a tree, for example in a bwplot from lattice package, the following are the grobs used :
You can access any grob and edit/delete the grob based on the requirement.
The last chapter of the book was fairly advanced for me. It talks about using the grid package to produce new graphic functions. Will come back to this chapter at a later date.
The book gives in-depth knowledge on grid package, that forms the basis for packages like lattice and ggplot2. Once you understand the principles behind grid, you can easily modify/edit/annotate output from other packages. So in that sense this book gives the tools to tweak the regular output.