February 2012


clip_image001

I like books that explain things visually and this book falls in that category. I am reading this book after reading and understanding the basics of Python from “Think Python” and “Learn Python the Hard Way”. This book serves a nice visual recap of Python 101.  I have listed down some of the points in various chapters mainly to ruminate over the learning’s from the previous two books.

Chapter 1- Getting Started

  • Python environment variables

    • PYTHONPATH : Has a role similar to PATH. This variable tells the Python interpreter where to locate the module files you import into a program. PYTHONPATH should include the Python source library directory and the directories containing your Python source code.
    • PYTHONSTARTUP : Contains the path of an initialization file containing Python source code that is executed every time you start the interpreter
    • PYTHONCASEOK : Used in Windows to instruct Python to find the first case-insensitive match in an import statement. Set this variable to any value to activate it.
    • PYTHONHOME : An alternative module search path. It’s usually embedded in the PYTHONSTARTUP or PYTHONPATH directories to make switching module libraries easy
    • IDLE (Integrated DeveLopment Environment) is similar to RGUI. You can use it to type in programs, code modules, debug them
  • use import sys , sys.argv to access the arguments passed to Python script
  • Like R, you can programs in interactive mode or script mode

Chapter 2 – Expressions and Statements

  • The dir() function prints a list of currently defined names
  • del deletes only a variable, not the object to which it points, because another variable may be pointing to the same object. Python keeps track of objects internally and deletes them automatically when they’re no longer in use.
  • Every object has a unique identity, which is the address of the object in memory, expressed as an integer or long integer. An object’s identity doesn’t change; it is constant during the object’s lifetime. Use the id() function to determine an object’s identity.
  • In assignment of one variable to another, there is a crucial point that needs to be noted. If x is a string and you assign x to another variable y. Now y is also pointing to the same string. Now if you make changes to y, the value that x is pointing to remains and a new value is created and the name y gets binded to this new value. So, there is no problem when you assign one variable to another and then go and make changes to variables. Instead let’s say x is a list variable and you assign it to y. You make changes to y, it gets reflected in x. Why? List is mutable and hence both x and y point to the same data.
  • From highest to lowest priority, the operators are not, and, and or. The expression x and not y or z, for example, is equivalent to (x and (not y)) or z.
  • Python doesn’t have an exclusive-OR (XOR) Boolean operator or a built-in Boolean type
  • An is comparison matters only for mutable objects such as lists and dictionaries
  • You can chain comparisons to create expressions such as this: x<=y<z .Only adjacent operands are compared

Chapter 3 – Working With Numbers

  • Python has four numeric types , Integer, Long, Float and Complex Number. Python rounds floats to 17 significant digits.
  • print sys.maxint gives the maxint allowed on your machine
  • You can use the coerce() function to see how Python promotes numbers
  • int(x) to convert x to a plain integer , long(x) to convert x to a long integer, float(x) to convert x to a floating-point number,str() function to convert a number to a string
  • Useful functions from random module – seed, randrange, choice, shuffle, random, uniform

Chapter 4 – Working With Strings

  • Python implicitly constrains indexes that are too small or too large to sensible values. Reversed indexes return empty strings
  • s.isalpha() to test whether all the characters in s are letters. s.isdigit() to test whether all the characters in s are digits.
  • string.maketrans(from, to) creates a transition table and then one can use s.translate(tablename) to get all the translation done
  • int(s) , long(s),float(s),complex(s),list(s),tuple(s) are all useful for converting string in to different datatypes
  • repr(expr) converts object to string
  • the string returned by repr() can be passed to eval() to (re)create an object with the same value
  • There are a ton of functions related to strings. Probably this is the reason why Python is used in a lot of bioinformatics research

Chapter 5 – Working With Lists and Tuples

  • Why both Lists and Tuples ? Tuples are faster to access and consume less memory than lists. Python also requires tuples in certain situations, such as string formatting, dictionary keys, and multiple function arguments. Tuples also are handy when you want to be sure that your data can’t be modified by an alias
  • To convert list L to tuple T , use (L,)
  • The reverse of tuple packing is sequence unpacking: x, y, z = a
  • s[i][j] used to index tuple or list
  • If a is a mutable object such as a list, however, a = b doesn’t create a copy of a but instead creates an alias. a and b will refer to the same object, so changes in one are reflected in the other
  • [0] * n is an efficient way to initialize a list with n zeroes
  • Repetition creates shallow copies.
  • in, count, index can be used to test membership
  • s[:0] = t to add at the beginning of list, s[len(s):] = t to add at the end of list, s[i:j] = [] to remove elements from the list, s[i:i] = t to insert elements
  • reverse() modifies the original list. Reverse a copy if you want to maintain the original.sort() modifies the original list. Sort a copy if you want to maintain the original.
  • The bisect module contains functions for manipulating sorted lists. The insort() function inserts an item into a list, and maintains the sort order
  • The value returned by bisect() can be used to partition a list
  • cmp(x, y) returns -1 if x < y, 0 if x == y, and 1 if x > y.

Chapter 6 – Working With Dictionaries

  • d.popitem() returns a random item from dictionary
  • You can embed a dictionary in a dictionary by using the key as id(dict_object)
  • d.keys() , d.values(), d.items() to retrieve stuff from a dictionary
  • clear() method to delete all key-value pairs from a dictionary
  • d1.update(d2) to update d1 with all the key-value pairs of d2. If d1 and d2 contain matching keys, the d2 items prevail.

Chapter 7 – Control Flow Statements

  • In an expression surrounded by (), [], or , press Enter, and continue the expression on the next line
  • You can use the -t and -tt command-line options to detect inconsistent use of tabs and spaces in your programs
  • Equivalent to seq function in R – range([start,] stop [,step]) to return a list of integers from start to, but not including, stop.
  • One common use of range() is to assign consecutive values to a group of variables to create an enumeration

Chapter 8 – Functions

  • Difference between parameters and arguments
  • In R, you type the function and you get to see the entire code. In Python, one can use print lassname.__doc__ to see the documentation of the function
  • Parameters with default values must come after parameters without default values
  • When you call a function, Python passes the object references of the arguments from the caller to the function. For immutable objects (numbers, strings, and tuples), this behavior effectively creates a local, temporary copy of the argument inside the called function; the function cannot change the original object. The story is different for mutable objects (lists and dictionaries). Passing a reference creates an alias of the original object inside the called function. aliases are variables that share references to the same object. The original object (in the caller) will change if you modify the passed-in object in place (in the function). On the other hand, if you reassign the aliased parameter a value, the reference to the original object is lost, and changes in the function won’t be reflected in the caller.
  • To avoid modifying a shared mutable object in a function, pass in a copy of the object
  • It’s good practice to make local copies of mutable arguments in a function to ensure absolutely that there are no side effects. It should be the responsibility of the function, and not its caller, to prevent side effects.
  • Functional Programming tools
    • Lamda :Creates small, anonymous functions
    • apply() : Indirectly calls a function and passes it positional and keyword arguments . The apply() function takes a function, a tuple, and a dictionary as arguments. It calls the function by using the tuple items as positional arguments and the dictionary items as keyword arguments, and returns the result of the function call.
    • map() :Does the same thing as apply family in R.Applies a function to sequence items and returns a list containing the results.However in R, there are many more variations of apply. I hope that someone writes a plyr , reshape equivalent modules in Python. In R, I use them so often that i have put them in the startup script .
    • zip() : Takes a variable number of sequences and returns a list of tuples, where the n-th tuple contains the n-th item of each sequence
    • filter() : Returns a list containing the items of a sequence that satisfy a given condition
    • reduce() : Applies a function to sequence items, starting with the first two items and successively using the result of a function call together with the next item, ultimately reducing the sequence to a single value
  • Differences between lambda and def
    • lambda is an expression, whereas def is a statement. You can use a lambda expression anywhere you’d use a normal expression: as an argument, as a list item, as a dictionary value, and so on.
    • The def block permits multiple statements, whereas the lambda body must be a single expression.
    • Nonexpression statements such as if, while, for, and print are forbidden in the body of a lambda expression, which restricts the number of operations you can cram into a lambda function.
    • You can use a lambda expression without assigning it a name.
  • A list comprehension is a concise and often clearer alternative to creating lists by using lambda, map(), and filter(). [expr for var in seq if cond] This appears like a very interesting way to cut down unnecessary code. This is something I have not seen much in the previous two books on Python that I have read. So , I guess this is one new learning for me after slogging through 250 pages of this book is list comprehension. Yet to use list comprehension in a code fragment

Chapter 9 – Modules

  • Flow of content is the module is as follows : documentation string, import statements, global variable definition, module’s classes, module’s functions, and finally if statement that checks whether the module is being imported in to another program or is it run on a standalone basis
  • Because Python doesn’t rerun module code after the first import, a variable created in a module won’t revert to its initial value on subsequent imports
  • dir(object) is similar to str function in R. It shows the list of important attributes of object
  • from module import doesn’t import names that begin with an underscore, which are termed private names
  • Python executes module code only once, during the first import. You can force Python to reload and rerun a module’s code by using the built-in reload() function. This function is useful when you want to see module changes without stopping Python.
  • You can append a directory in the sys path using the function sys.path.append(dir)
  • You can determine how a module has been loaded by inspecting its built-in attribute
  • sys.modules.keys() gives the list of all modules loaded
  • There are three types of namespaces
    • Local namespaces. Python creates a new local namespace whenever a function is called. The local namespace contains a key-value pair for each function parameter and each local variable defined in the function’s body. Variables in functions that are declared global are not in this namespace (they’re in a global namespace). Each function has its own local namespace, which Python deletes when the function returns a value or raises an exception.
    • Global namespaces. Python creates a new global namespace when a module is first loaded via import. The global namespace contains a key-value pair for each top-level (global) variable, function, class and imported module. Each module has its own global namespace, which Python deletes when the interpreter halts.
    • Built-in namespace. Python creates the built-in namespace when the interpreter starts. The built-in namespace contains a key-value pair for each of Python’s built-in functions (len, str, and so on) and exception (TypeError, SyntaxError, and so on) names. There’s only one built-in namespace, which Python deletes when the interpreter halts.
  • locals() to return a dictionary corresponding to the current local namespace
  • globals() to return a dictionary corresponding to the current global namespace.
  • vars(object) to return a dictionary corresponding to the local namespace of object.
  • compile() is used to compile a string into a code object. eval() returns the result of the evaluated expression. exec used to run a file, object or code object

Chapter 10 – Files

This chapter contains a list of functions that one would typically use to work with files stored on the disk. The note on pickle and cpickle module was something that I haven’t tried till date. Have to work on it soon.

Chapter 12 – Classes

The following gives a schematic diagram of Python in-built types

clip_image003

 

  • In the above diagram , Slice objects represent extended slices, which are used in Numerical Python (NumPy). Extended slices don’t work with standard sequences (lists, strings, and tuples) .May be once I go over NumPy, I will learn about Slice objects in greater detail.
  • Python provides several predefined class attributes. The class’s attribute holds the class’s attribute names and values in a dictionary
  • __class__ gives the name of instance class
  • A class variable exists independently of class instances. The variable is available no matter how many class instances are created, even if none is created.
  • If you change a class variable’s value at run time by qualifying it with the class object (class.attr = value), Python dynamically changes the value of attr to value in all existing (and future) class instances
  • You can access the attribute of a class before instantiating the class. This is so different from the other languages I have coded.
  • You can do functional overloading, Operator overloading by writing __XYZ__ functions where XYZ is the relevant operator or function that you are trying to overload
  • pt = Point(3, 9) is equivalent to Point.__init__(pt, 3, 9)
  • set the truth value of an instance by writing your custom your custom __nonzero__()
  • To make instance callable , use instance callable, use the special method __call__(self [,args])
  • Names in a class’s namespace (__dict__ attribute) aren’t visible in the class’s methods, so you must
    fully qualify names that refer to class attributes outside a method
  • Private names begin __ but don’t end__ with a double underscore
  • isinstance() ,issubclass() functions used to check class membership

 

image Takeaway :

With abundant screenshots scattered through out , this book indeed offers a quick recap of all the basic elements of Python programming.

 

clip_image001

Prof. Hadley Wickham, the creator of ggplot2 and other useful packages like plyr, reshape etc. has one strong advice to R programmers – “Read other’s code”. This comes from a person who has developed 30 packages till date. We all have an immense urge to program, code up something, view the results, tweak our code to make it work etc. However pausing to read somebody else’s source code requires a certain amount of hard work, willingness to learn from others . In R particularly, where all the functions are documented really well, one hardly NEEDS to go in to the code. But that’s exactly what Hadley Wickham recommends.

In that sense, this book by Zed A Shaw has a similar message for Python. You have to read code on a regular basis and it is typically hard work to read what other people have written. I guess that’s the reason why this book is titled, “Learn Python The Hard Way”. This book introduces Python step by step in 52 exercises where the author gives pointers to various modules, websites for the reader to figure out stuff. So, all the exercises have one common structure – “ introduce a topic and make the reader curious to check out things from other sources”.

As a newbie, I found this book interesting for a couple of reasons. Firstly, the author urges the reader to type out every single line of code in the book. No copy pasting allowed when you learning something new. The other thing I liked about the book is about author giving clear instructions to the reader to follow a directory structure for a Python project. For a long time I never followed any specific directory structure funda for many projects in whatever languages I have coded. However once I learnt Ruby on Rails, I understood the advantage of following a nice standardized directory structure for any task/ project/ library. Not all frameworks make strong recommendations like Ruby on Rails. So, the programmer has to figure out something that works. That’s usually a trial and error process. Starting from a well thought out directory structure in Python is going to be helpful in the long run when you want to go back, review or commit the project to Version control system.

Let me list down the things that I learnt from this book.

  • %d , %s, %r  are used for substituting stuff in the string
  • %() is the used to substitute the respective variable in the string
  • Learnt about close , read , write , readline , truncate functions
  • raw_input is used to get input from the user
  • Functions appear similar to functions defined in ruby.
  • Functions should start with def
  • You need not put an open parenthesis right after the function name
  • You can leave spaces after the parenthesis (
  • You can leave spaces after the closing parenthesis ) and colon
  • You go back to usual code environment from a functional environment by writing with no indent. Unlike ruby there is no need to put an end at the end of every def
  • At the end of def statement , there is a need to put colon
  • You have to indent all the lines of code in a function with 4 spaces.
  • Duplicate argument names are not allowed
  • Variables in script are not connected to variables in the function
  • The variables in the function are not connected to variables in the script.
  • f.seek(0) takes you back to start of the file
  • f.readline() reads a specific line in the text return at the end of function can be be used to return something from a function.
  • Exercise 23 was very interesting as it asked me to go and visit bitbucket.org and then browse a random python project, click on source and write about whatever I could find interesting about the project. This exercise says this When you do this exercise, think of yourself as an anthropologist, trucking through a new land with just barely enough of the local language to get around and survive. Despite hardly knowing any aspects of Python, I looked up bitbucket.org and started randomly browsing a source program stumbled on bootstrap-py3k.py file from pyquery. This is what I could make out from the file.
    • You can import a ton of libraries by listing them down separated by comma
    • import X imports the module X, and creates a reference to that module in the current name space. Or in the other words, after you’ve run this statement, you can use X.name to refer to things defined in module X
    • from X import * imports the module X, and creates references in the current name space to all the public objects defined by that module. X in itself is not defined. So X.name will not work but name will work
    • from X import a, b, c imports the module X and creates references in the current namespace to the given objects
    • try Except. Similar to the try catch in java and other languages
    • Unlike R, If loop has no bracket and has a terminating colon
    • for x in list() – This is similar to what you find in R.
  • The program that I randomly browsed was overwhelming. Goes on to say the distance I need to walk before I can code properly in Python.
  • beans, jars, crates = secret_formula(start_point) – This is very different from the usual assignment that you get to see in other languages. In R something like this, c(x,y,z) <- test() does not work.
  • I learnt the function pop that can be used on words
  • Any block of code needs to have 4 spaces before the actual statement begins
  • function definition or an if definition should terminate with colon
  • if – elif – else , elif is equivalent to else if
  • Format strings –  %f for floating decimal point, %d for single integer decimal %r for string %s for string
  • Some of the functions associated with lists – append(x) appends the elements of x , extend(x) extends x by adding all items in the list. Some of the other functions are insert, remove, pop, index, count, sort, reverse
  • You can use lists as stacks, use append and pop function to mirror LIFO principle
  • Python lists start from index 0
  • Wrote a program that incorporates all the 100 key words mentioned in the first  36 chapters.I think the author’s suggestion write such a program is pretty useful in getting up to speed with the syntax
  • In Python, any number that begins with 0 is considered as Octal number
  • The built-in number objects in Python support integers, floating-point numbers and complex numbers.
  • All numbers in Python are immutable objects, meaning that when you perform any operation on a number object, you always produce a new number object
  • Long integer has no predefined memory and its minimum and max values are dictated by machine architecture
  • floating point in python is similar to double in C – 53 bits precision
  • Always use “ ” for strings so that you can use single quotes with in the string
  • Use triple-quoted string for a bigger string. Line breaks in the literal are preserved as new line characters
  • Tuples are like lists, except they are immutable
  • Tuples may contain immutable objects
  • join is an interesting keyword. If you say stringvar.append(list[3:5]) , it adds stringvar between each of the list items
  • dir(li) returns a list of all the methods of a list.
  • dir(d) returns a list of the names of dictionary methods
  • Tuples may contain immutable objects
  • Lists and Dictionaries are the power horses in Python. They are best utilized as iterators.Lists are ordered collection whereas Dictionary is an unordered collection
  • Understood the importance of map operators.
  • Lists can have functions embedded in them.
  • Each of the functions in the class takes self
  • Indentation is very appealing. I don’t have to worry about the painful flower brackets
  • You can assign functions to any variable , you can put functions in a list. 
  • __init__ sets up all the initial variables of the class. It is the constructor function for the class
  • self is a key word that is used in the context of the class. it is similar to `this’ function in C++
  • To instantiate a dictionary , either use X = dict() or x= {}
  • Use str function to convert numbers to strings
  • Another use of dictionaries is that you can assign some data to a key and be certain that there will no be duplicates. If at all you try to add a duplicate to dictionary, the data for the last entered item would be taken as the relevant data for the key.Let’s say you are reading a lot of items and you want to find the last encountered data, you can use dict for that purpose
  • Names starting AND ending with double underscores work differently.
  • __doc__ when used in this sense print x.__doc__ prints the comment in the function.
  • __dict__ contains attributes in a class instance
  • Learnt about pass keyword. pass is a null operation – when it is executed, nothing happens. It is useful as a placeholder when a statement is required syntactically, but no code needs to be executed for example
  • Learnt the way to use setattr  and getattr 
  • The takeaways from PEP8 , the python coding style guide are,
    • Arguments on the first line forbidden if you are not using vertical alignments
    • Use spaces or tabs. Don’t mix both. If possible use only spaces
    • Maximum line length is 74 for code and 72 for comments and doc string.
    • Separate top-level function and class definitions with two blank lines
    • Method definitions inside a class are separated by a single blank line
    • Use blank lines in functions ,sparingly, to indicate logical sections.
    • Imports should be on separate lines
    • Imports should be at the top of the file
    • Imports should be grouped in the following order : standard library imports, related third party imports, local application/library specific imports
    • Avoid whitespaces
      • Immediately inside parentheses, brackets or braces
      • Immediately before comma, semicolon, or colon
      • Immediately before the open parenthesis that starts the argument list of a function call
      • Immediately before the open parenthesis that starts an indexing or slicing
      • More than one space around an assignment operator to align it with another
    • Always surround binary operators with a single space on either side
    • Use spaces around arithmetic operators
    • Don’t use spaces around sign when used to indicate a keyword argument or a default parameter value.
    • Multiple statements in a single line are generally discouraged
    • Module names should have short all-lowercase names
    • Class names use CapWords convention
    • function names should be lowercase
    • Always use self for the first argument to instance methods
    • Always use cls for the first argument to class method
    • Method names inside the class – use lowercase words separated by underscores
    • Leading underscore in a variable name for private variables
    • Two underscores to invoke Python’s name mangling rules
    • Constants are CAPS_LETTERS
  • Here are the steps that I have followed to install python packages
    • Download ez_setup.py in to Scripts folder
    • Set the System Path, Windows path variable to contain this directory
    • Double click on this ez_setup.py module
    • RPy needs R.dll and hence you need to give to append the relevant folder name in the path variable
    • easy_install executable and other scripts get populated in Scripts folder
    • Installed pip using  easy_install pip
    • Installed numpy .
    • Download win32all, windows extensions from Mark Hammond .
    • Downloaded RPy installable
    • Installed scipy using easy_install
    • Currently I am using R 2.13. Rpy is built for R 2.12. Hence had to download R2.12 , give its bin in windows path and then invoke the python script.
    • Some description of the packages that I have installed
      • pip – used for installing packages which are present in the PyPy package index
      • distribute – Is a lower level tool for building, installing and managing Python packages
      • VirtualEnv – Is a tool to create isolated environments for Python
      • Nose – Extends unittest to make testing easier
      • NumPy/SciPy : This pair of libraries provide array and matrix structurs, linear algebra routines, numerical optimization, random number generation, statistics routines, differential equation modeling, Fourier transforms etc. Basically you get the entire MATLAB toolkit for free
      • Need to check out these modules sometime – IPython, Cython, PyTables, PyQT, TreeDict, SQLAlchemy, Sage, Enthought, Sphinx. Loooooong way to go !
    • Understanding setup.py 
      • What is it ? This is used by Python distutils as a standard way for installing third party Python modules. Before distutils, module creators would have to create install files for all the different platforms. That activity is made redundant ,thanks to setup.py.
      • What does this file contain ? It contains all the calls Python makes to distutils
      • What happens when we run it ? There are two things that happen. First is the build step which puts all the files which need to be copied in to the distribution root’s build directory. Second step is the install phase where all the files are copied to the install directory for installation
  • The steps I carried out in order to install a custom build python module are :
    • Make a copy the default skeleton package and rename it to project name, XYZ
    • Rename the NAME module to the XYZ
    • Rename NAME in the all the relevant files
    • Ceate a rk module and put it in XYZ folder
    • Edit the setup.py to contain
    • Run python setup.py dist – This will create a zip file in dist folder
    • Extract the dist folder wherever you want and run pythonsetup.py  install
    • The module is installed in python and you can import in the code and start using it
    • If you have to uninstall this egg, you have to run pip uninstall “name given in the setup.py”
  • Came across HitchHiker’s guide to Packaging on the net. Will go over it someday at leisure
  • Exercise 46 has a nice introduction to Python packaging. Somehow I find packaging in Python much easier than R. Well I should not comparing two different things , but as a amateur programmer, I think that, it is easy to install R packages , at least there is no additional stuff to install to install packages. a command like install.packages() is all that is needed. For Python, it is not as straight forward as it seems. You need to install pip, or an executable and then use easy_install command. For packaging though, I found steps in python to be crystal clear and easy to do. However in R, for some reason I found it a bit difficult to learn this stuff. R CMD makes it easy though, but if you want to put in documentation for function, tests, etc , it will take some time to learn packaging in R. May be I am just rambling here. R enthusiasts will dismiss my statement that it is far easier to package stuff in R than in any other language. May be ..But somehow after going through this chapter, creating an installable in Python appears very intuitive and easy. The fact that Dropbox, a startup that became a fantastic years in the recent times uses Python for everything says a lot about the versatility and usefulness of Python packaging.
  • Some gyan on unit testing
    • Write one test file for each module you make
    • Keep test cases short
    • Reference to doctests and nosetests Have to read more about them someday
  • Learnt about isdigit function that can be invoked on a string to check whether it is a string
  • I quickly browsed Chapter 50-51-52 as I don’t think I will doing any webdev work in the times to come. If at all I need to do something I will probably use ruby on rails and get it done. 

The author concludes the book with a superb reminder to any programmer

Which programming language you learn and use doesn’t matter. Do not get sucked into the religion surrounding programming languages as that will only blind you to their true purpose of being your tool for doing interesting things.

image

If there is a lot of data parsing and cleaning that needs to done before modeling, I tend to follow one of the three paths :

  • Path 1: Use Python to clean the data, export the data structure in to a file/database. Leave Python environment and move in to R to do the modeling.
  • Path 2 :  Use Python to clean the data, Stay in Python environment and invoke R to do the modeling. Rpy is the go to module in this context.
  • Path 3 :  Painfully do the data cleaning in R, despite R hogging memory, and then model stuff in R

Path 3  is something I take very often. However Paths 1 and 2  are also interesting as they give a ton of modules that one can use from Python. A few years back I had used some data types of Python, mainly the dictionary and had worked on something I don’t even remember properly. It was more of an ad-hoc task and had since then never used Python in a big way but for some basic data cleaning tasks. Over the years I have slowly graduated to performing the entire data cleaning exercise in R itself and completely avoid Python. Lately I have realized that I have followed a convenient path instead of a hard but worthwhile paths(1&2). So, I picked up this book to get a decent understanding of data types and modules in Python . In this post, I will list all the points that I found relevant in this book for a newbie like me :

  • The three common elements of natural languages, i.e Ambiguity, Redundancy and “Not literal in meaning” , do not apply to programming languages. They have exactly the opposite attributes. They are Non Ambiguous, Non Redundant and Literal.
  • Syntax rules come in two types, tokens and structures
  • Python is an interpreted language. The error with Python code could be Syntax error, Run time error or Semantic error.
  • >>> symbol is called chevron.
  • invalidSyntax and invalidToken are the usual exceptions that you come across.
  • Python variables are case sensitive.
  • PEMDAS – Useful mnemonic for remembering python order precedence.
  • ^ is not an exponent operator in python.
  • type() is a function that is useful to know the type of the object in Python. It is similar to class function in R.
  • Semantic errors are tough to catch.
  • There are at least 31 keywords in Python.Type import keyword ; print keyword.kwlist to get the list.
  • import math will fetch all the math functions from the python standard library.
  • Function definition has to be executed before the first time it is called
  • Stack diagrams are used for understanding function environments
  • void functions – nice way to call functions that don’t return anything.
  • To see where Python searches for libraries, type the command import sys; sys.path. The output is a list and the first item in the list is a null string symbolizing the current directory.
  • Passwords are never stored as plain text , be it in a file or a database. They are converted to a hash code and stored. When a user enters the password it is checked with the hash code stored internally. The best thing about hash code is it involves a mathematical one way process between password and code. It is very unlikely that you will be able to crack the password given a hash code
  • There can be compiled python code also. These days there are some apps that are distributing compiled python code instead of the usual py files
  • Set environment variable PYTHONPATH to a colon separated list of directories so that python searches for the relevant folders while execution
  • The trick if __name__ = “__main__” exists in Python so that Python files can act as either reusable modules, or as standalone programs.
  • How to draw a fractal using Python ?.
  • Functions that return something are given a  catchy name, Fruitful functions
  • If a function returns no value and you try to print, it will print None
  • Some examples of recursion mentioned are gcd function, palindrome function
  • Interesting use of the word bisection : Debugging by bisection. Well, basically apply the bisection method to finding out the bug.
  • eval function can be used to evaluate python commands in a string
  • Functions that can be used with string are len, for, slicing, upper, lower, find
  • Index range in string means including the first index to second index , excluding the second index
  • The word “in” is a boolean operator that takes two strings and returns True if the first appears as a substring in the second
  • Python does not handle uppercase and lowercase letters the same way that people do. All the uppercase letters come before all the lowercase letters.
  • Strings are immutable
  • split function takes a third argument x[a:b:c] means from a to b in steps of c
  • The following code is to reverse a string :x[::-1]
  • Program testing can be used to show the presence of bugs, but never to show their absence!
  • There are some string related functions built in to Python that can be used for text processing stuff.
  • zfill() function can be used to pad zeroes for a number
  • int() is equivalent to as.numeric() in R
  • Saw similarities between map operator and functions in R. May be this is the reason why people world over love python.
  • The list object stores pointers to objects, not the actual objects themselves. The size of a list in memory depends on the number of objects in the list
  • The time needed to get or set an individual item is constant , no matter what the size of list it
  • The time needed to insert an item depends on the size of the list, or more exactly, how many items that are to the right of the inserted item(O(n)) . In other words, inserting item at the end is fast than at the beginning
  • The time needed to reverse a list is proportional to O(n)
  • The time needed to sort a list varies, worst case is O(n logn)
  • Useful functions that go with lists “in”, “extend”, “append”, “sort”, “sum”,+,*,[a:b]
  • A nice analogy between map, filter and reduce functions applicable to lists. Capitalize is like map where you apply a function to each element, filter is like selecting only some items from the list , reduce is like summing up all the elements or counting all the elements in a list
  • list(s) creates a list out of the elements of the string
  • split is a function that returns list
  • The most dangerous aspect of python unlike R is that , if you assign a list to a variable X, then you say Y=X , if you make changes to Y , the changes are reflected in X. Basically it is pass by reference and not pass by value.In R, the external object is not changed , pass by value happens . In Python code, the external object is changes as pass by reference happens. This means most of the functions in R are pure functions.
  • in operator to check the presence of an element in a list. One can also use index function to check for the presence of the element.
  • To reverse a word, use the following code x[::-1]
  • One can use sorted function for list
  • append modifies the list and returns none
  • always use append instead of a = a +[x]
  • This is a fantastic thing in Python. “in” operator in dictionary takes the same time irrespective of the size of the dictionary.
  • While checking for an word in a list of words, the “in"" function is slower than bisection method , which is slower than the access through hash table in a dictionary.
  • For dictionaries, python uses an algorithm called “hash table” that has this remarkable property that “in” operator takes the same amount of time irrespective of the size of the dictionary
  • Hash tables are apparently used to create a 2 dim array where you store keys as hash values and use these hash values to map to the actual values. Basically hash table is extremely useful when doing stuff with a large number of strings
  • You cannot use lists as keys in dictionary as lists are mutable. Any mutable object cannot be used as keys for a dictionary – Mutable objects give rise to duplicate hashes. Similarly a dictionary cannot be used as a key
  • A previously computed item that is meant for later use is called memo
  • Any variable defined outside the scope of a function is treated as global variable. You can happily use them in a function. However you cannot set them in a function as any setting operation introduces a new variable in the function whose scope is only limited till the function is running.
  • To set a global variable in a function, you have to define the variable as global.
  • You can add, remove or replace elements of a global list but if you want to reassign the variable, you have to declare it
  • Learnt about a way to check duplicates using “set”  in Python.
  • Tuples are a sequence of values. The values can be of any type, and they are indexed by integers. They are immutable. This is the key to understanding tuples. Unlike dict and lists, tuples are immutable.
  • You cannot modify a tuple but you can replace one tuple with another
  • Tuples are good for swapping operations and return values
  • There are certain functions like divmod where you enter the input by scatter
  • zip is another feature of tuples that is very useful
  • items function on a dict returns tuples
  • You can compare tuples
  • Decorate, Sort, Undecorate pattern is useful in many situations like sorting, counting, etc
  • You can use tuples or dictionaries to pass parameters. To pass tuple as parameter , append * . To pass dictionary as parameter, append ** for the parameter value.
  • A few days back I stumbled on to Ziff’s law in an NY Times article.  I managed to use dict,tuples, lists to empirically check ziff’s law on Jane Austen’s novel, Emma
  • Also I have started exploring the Rpy module to invoke R from python. There is some initial pain in learning how to install Rpy. Once that is done, R can be used seamlessly in Python.
  • Computed a Markov Chain for Phrases in Jane Austen’s novel Emma
  • Tuples are very useful in sorting stuff or creating results similar to table in R
  • You can sort a list if the elements are arranged in the form of tuples
  • After reading the chapters on text parsing and string handling, I have this feeling that Python is the king for text parsing. No wonder it is used in Google and other places where they have to deal with a ton of text.
  • To randomly select items from a histogram, one can create a list where each element is repeated x number of times, where x is frequency of the word
  • Random is a useful module which has functions like random(), randint() , choice()
  • If you want to remove punctuation from strings , you can use string.punctuation to check the elements that need to be removed.
  • os module has many useful functions like os.getcwd(), os.path.getabspath(), os.path.exists(), os.path.isdir(), os.path.isfile(), os.path.listdir(), os.path.isfile()
  • There is a mention of pipe in python that is useful in reading very big zip files.
  • repr(s) can be useful for debugging.
  • Learnt to screen scrape using Python
  • Reorganized my iTunes folder with Python.My iTunes folder had duplicate files and I had to remove those duplicates. Obviously manually going over them was a nighmare. Firstly I removed some obvious files like video files and other non music files from the 1500 files from the folder. Then I used MD5 function, used the function to find duplicates in my music folder and removed them programmatically. Now I have about 978 music files in total that I will categorize someday in to various playlists. 
  • The classes are pretty peculiar in Python. These classes have no attributes but you can generally assign instance.x, instance.y to some value. I mean this is weird for me as I have always thought that classes having attributes and functions
  • I am coming from R to Python and I am alarmed at the fact that Python passes objects by reference. In R objects get copied. In Python a reference to the object is sent. This means that function can totally change the object that you are passing
  • You can check whether two objects alias to the same data by using the `is’ operator.
  • copy module has two types of copy functions, one is the shallow copy and one is the deep copy.
  • hasattr is a function that can be used to check the attributes of the object
  • Learnt new terms like invariants, pure functions and modifiers.
  • Functional programming is a style of program design in which the majority of the functions are pure. Pure meaning that whatever input is received by the function, it is not modified.
  • Came to know about datetime module , probably the most important module that I will use.
  • There is a strange thing about invoking functions in Python. When I first came across functions, I was left wondering why there is a need to pass self in to each of the function. At least in the languages that I have coded I have never passed self object. This chapter made me realize that the major reason for passing self is that there are basically two types of invocations in Python. Let’s say I have a class X that has a method test. Either I can invoke it via X.test(obj) or via obj.test() . If the method takes an argument, then you code the function with the first argument as self and give the rest of the arguments in the usual way
  • use __init__method for default constructor
  • use __str__ for giving a string representation of the object
  • use __add__ for operator overloading of +
  • Think of overwriting some operator, look at python docs to find the exact string to use , let’s say it is YYY , then write a method with the name __YYY__ and your class has the operator overloading set.
  • use __dict__ to get the attributes of the class
  • use getattr function to get attributes of the class
  • Learnt about the use of __radd__
  • pass statement has no effect in a class definition. It is only necessary because a compound statement must have something in its body.
  • The concept of deep copy and shallow copy is something that I came across it in the context of Python, after my initial encounter with them in C++.
  • Default values get evaluated ONCE, when the function is defined; they don’t get evaluated again when the function is called.
  • Even though two lists or tuples are distinct objects with different memory addresses, Python tests for deep equality. In some other instances ,shallow equality is tested.
  • There is a difference between instance attributes and class attributes.
  • At the time of class definition itself, the inheritance structure is defined in Python
  • With the help of cards, deck and hand, the chapter on inheritance gives a good introduction to the various concepts related to inheritance.
  • I have ignored chapter 19 as I am planning to use R for visualization. Only if I cannot do something in R, would I probably come back to this book and learn about GUI capabilities of Python.
  • Appendix talks about debugging. Typically there are three types of errors that one comes across in programming. First are the syntax errors. Second are the run time errors such as Name error, Type error, Key error, Attribute error, and Index error. The third type of errors is semantic error and is often difficult to crack as compared to the first two types of errors.

image

Alan Jacobs, the author of this book is an English Professor at Wheaton College, Illinois. Given his position as a professor, his students and other people often ask him, “What are the 10 best books on literature that every educated person must read", "Dear Prof, Can you suggest some books to read this summer?", This book is written to answer all such questions. So one might think this book is basically a recommendation type / instructional / didactic guide to reading. Far from it, this 150 page long essay on reading at Whim, with no fixed pattern, with only one objective in mind, "Pleasure".

The book starts off with the author noticing that many people including his son are put off by books such as, “How to Read a Book?”,How to Read Literature like a Professor ?” , “ The New Lifetime Reading Plan” , etc. The premise behind all these kind of books is that reading needs to be systematically carried out and there are certain books that need to be read to appreciate and become good at understanding literature. Most of these books smell of Responsibility, Obligation and Virtue, the very attributes that make people make run away from reading. So, he says, reading needs a model that works, i.e "Read at Whim". The people who look out for such "10 best books to read" recommendations actually don’t really want to read a book, but want to check things off from a mental bucket list. They want to say,“Yes, now I am done with this book”. Reading at Whim means reading something that gives you pleasure,i.e there is nobody that we are signaling to , nobody that we are trying to impress.It is really out of pure enthusiasm that one reads. One usually sees this in children when you give them a book. They read it for the pure joy of it. There are tons of authors out there who feel that reading must not be frivolous, meaning, Harry Potter is not serious book, in their opinion. In fact they have this assumed checklist of books that,`Ought to be read’ by a serious reader. The author states that this model is broken, and says, "Read at Whim" should be the new model.

Ok, fine. You should read at Whim, So pick up whatever you feel like reading and the one that you think will give you pleasure. Done deal. 20 pages in to the book, the author makes this abundantly clear.So, Is there any point in going over the 130 odd pages in the book ?

Well, the rest of the book is NOT reiterating this message over and over again. `Reading at Whim’ is the foundation of the model that the author talks about in the book. If this were the only principle that we follow, soon we will be facing with situations as these

  • Let’s say you like Jane Austen novels and you get joy/pleasure reading her words. However you soon hit a limit. She wrote only six novels. So, once you read these books, you would want to go back and read these books again. But,as we all know, too many rereading squeezed into too narrow a time frame will drain the books’ power and leave them forever inert on the shelves. So you face this Law of Diminishing returns when all you want to read are a few select books.
  • Let’s say you like Lord of Rings and really loved reading it. Soon, you might start reading books that carry stories similar to Lord of Rings.Given the amount of books published in a year, it is certain that there will be enough imitators of hits. You might get frustrated reading those books as they fail to match up to the original. You might accept them for what they are, imitators of the original and keep reading. The second behavior can be dangerous as you might start accepting plots less clever, characters less vivid, prose less dynamic and thoughts less insightful.

So, you see reading at Whim can take us only so far. In this context, the author talks about the second element of the model, i.e self-knowledge and discernment. These are crucial to develop while reading. These will help you chuck the books midway, if you think prodding through the text doesn’t give you pleasure. This also makes you aware of your tastes and preferences. “ Self-knowledge and Discernment” are precisely the things that you will not develop if you tend to follow somebody else’s recommendations, maintain a list of books to be read, etc.

“How to Read a Book” and similar guides offload accountability for our reading: they say, implicitly, that self-knowledge and discernment aren’t needful because experts can take care of that for us. But if we reject that implicit claim, the next question that needs to be addressed is,“ How to move from `blind propensity’ to `informed consent’ to `Whim’s sovereignty’ ” ? One of the suggestions by the author is to “Read Upstream”, i.e read books that your favorite authors have read as they give a peek in to your favorite books’ characters, plots and imagination. This kind of upstream reading is also useful in math. You might come across a good application of a technique, but if you read upstream you might get to read all the trials and tribulations that went behind the technique etc. For example,Baire’s failure in categorizing functions helped Lebesgue in defining measurable and non-measurable functions. If you read ONLY about Lebesgue and don’t look in to the development made by Baire, you are likely to miss a lot of action. Reading upstream need not be only be about historical developments behind a technique. It might be about things that make you wonder, curious about life in general. If you look at Cantor’s math and read about the Cantor’s life, what shaped his ideas about infinite infinities, What drove him mad, what made him die alone in an asylum, What made his story tragic but his achievements a mathematical breakthrough, you will forever look upon George Cantor in a completely different light.

The author then makes a strong case for annotating the text/ reading with a pencil. By turning our passive reading style in to an active one, the book tends to offer more than what it might seem in the beginning. There is also warning against highlighting, as Highlighters allow you very quickly and easily to mark a text, but only by covering it with a bright color; and the very quickness and easiness of the process are inimical to the kind of active reading that is needed. This point is similar to Dr.Medina’s finding mentioned in his book Brain Rules. By making the initial contact about an idea/phrase/character more elaborate , it is likely that one remembers better. By reading fast we miss on the opportunity of elaborate encoding. Obviously this does not apply to every book. One should not read Harry Potter with a pencil , such books are good when the reader goes with the momentum, the less stoppages the better. This means that as a reader, the decision to annotate or go with the flow of the book is important.

Reading Slowly is the next aspect that author focuses on. Most of us read fast because of the implicit thought that “ Time is too short to read all the books”. Yes time IS short, but one crucial aspect that gets neglected by making reading . `a race’ ,is, “Books become better when they are reread. Unless you annotate , read slowly, your re-read would be equivalent to a new read". Reading fast – It’s like you have the content uploaded in your working memory, feel good about it, check off that item from the list, move on to the next book." Considering the short term nature of working memory, its like all the content is in RAM. Once the application shuts off, RAM is erased. If you want the stuff to get stored in long term memory, you have to read slowly, annotate and MOST IMPORTANT part is to re-read. Whenever you have the urge to read a set of blogs / books in quick succession, pause and ask yourself, “ Do you want to read’ ? ” or “Do you want to have read ?”. An honest answer will keep you off the speed track.

Via a Poem from W.H.Auden, the author makes a case for `eye-on-the-object’ look that is needed for getting pleasure from a book, i.e we must cultivate attention while reading. We need to be attentive of words , phrases, characters, etc. so that we can lose ourselves in the process of reading. The poem mentioned in this context is very beautiful and goes like this,

You need not see what someone is doing to know if it is his vocation.

You have only to watch his eyes; a cook mixing a sauce, a surgeon

making a primary incision, a clerk completing a bill of lading,

wear the same rapt expression, forgetting themselves in a function.

How beautiful it is, that eye-on-the-object look.

There is a section that talks about a 12th century Abbot, Hugh’s advice to his monks. Even though it belongs to advice centuries ago, it is equally relevant for people whose motives for reading are far from monastic. Hugh’s advice on humility is relevant to the book as it says the reader should keep in mind three aspects,

  • Hold no knowledge or writing whatsoever in contempt
  • Should not blush to learn from any man
  • When he has attained learning himself, he not look down upon anyone else.

These lessons mean that one should not only be attentive to what one studies, but also positively disposed towards it: friendly,even affectionate.

Amidst all this discussion about reading, the author takes a radical view point , i.e Schools can never teach students to deep-read. Irrespective of which class a student is in, there is always this feeling that, ` I will be graded’ lurking in his mind. So, the kind of attentiveness that is proper to school is more of `hyper attention’ than `deep attention’. Look at any kids curriculum, you will amazed at the QUANTITY that is covered as a part of syllabus. With grades and the competitive pressure, Can a student deep-read ? No , says the author as reading textbooks and the like-does not require extended unbroken focus. It requires discipline not raptness.I don’t agree to this point. Yes, a student probably can’t deep-read all subjects but I think focusing on a few subjects and understanding them really well, might be better than knowing a bit about all the subjects. Yes, the student might fall behind on the average grade across subjects, but he will graduate from a school or a college with a better frame of mind. However looking at the way the educational system in India, I think the author might be right as LOT is taught and tested from the young minds that there is no choice but to cram.

One of the most important points that I found relevant to my reading habits is : Reread. I tend to read math /stats books a lot and I find it imperative to reread them.Well, one aspect of summarizing and posting them to a blog is that, these summaries serve as a starting point when I reread a book. The author makes a strong case for reading and I quote the author ,

If most of us read too fast, most of us also read too many books and are unwisely reluctant to return to something we think we already know. I use "think" here advisedly, because , as my examples show, a first encounter with a worthwhile book is never a complete encounter and we are usually in error to make it a final one. But those who want to have read, who are checking books off their bucket list , will find the thought of rereading even more repulsive than the thought of reading slowly and ruminatively. And yet rereading a book can often be a more significant dramatic and new experience than encountering an unfamiliar work

This visual broadly gives the structure.model explained in the book

clip_image002

 

imageTakeaway :

We usually read for information or understanding or entertainment. Dismissing all the so called expert recommendations that one receives on reading, the book has one central message , "Read at Whim". It warns the reader from making reading in to a ‘have read’ activity.

image

This book contains most of the productivity hacks that one comes across in various articles/blogs/books.  In one sense, this book is a laundry list of hacks that one can try out to increase productivity. A big font size for the text and rich images scattered through out the book, makes it a coffee table book.

Some of the hacks that I found interesting are,

  • On a daily basis, Try to use pen and paper at least a few minutes to work something out , be it a math problem or a back of the envelope calculation of something or simply draw images that capture whatever you are working on. Writing makes one’s relation to work intensely personal and more so using a pen and paper.
  • Keep a Swipe file – Your swipe file should contain good ideas and examples from your field of work or interest and from other fields. This hack is spot on for a programmer. You got to keep a folder that lists the tasks and the most efficient code that you have figured out for the task.
  • Force a image or word association with a number and vice-versa. This little hack is very useful for remembering stuff.
  • Declare a MAD – Massive Action Day :  Pick a day and focus on only one task. Switch off TV, Mobile, RSS alerts, email alerts, Internet etc. and work on something for 8-10 hours at a stretch.
  • Keep an “IDEAS BOX” – Twyla Tharp mentions this as a key to her successful career.
  • Make a NOT-TO-DO list. We always seem to know what we need to do and we don’t care or think about what not to do, in a day
  • Use time pods – 45 min time period where you focus on only one task. More on the lines of Pomodoro technique where you work on 25 min time slots.