Python Profiling

I did a presentation at our local Python User Group meeting tonight. It was well received, but shorter than I had expected. I should've added a lot more code examples. 

We talked about usage of cProfile, pstats, runsnakerun and timeit. 

Here are the slides from the presentations: 

The slides were done using latex-beamer, but I wrote the slides in reStructuredText and used rst2beamer to create the tex file which was then converted to pdf using pdflatex. 

The source code for the slides are available on github.

Programming - A Gateway Drug to Math

I decided to try my hand at the Stanford's AI Class. The pre-requisites mentioned Probability and Linear Algebra. So I started watching Probability videos on KhanAcademy

Sal Khan was teaching how to find the probability of 2 heads when you toss a coin 5 times.

A classic nCk problem: 

The probability of getting 2 heads while tossing a coin 5 times is:

But I wanted to find out the probability of getting at least 2 heads when I toss 5 coins.
Its really simple. All I had to do is P(2) + P(3) + P(4) + P(5). 
But then computingby hand (or a calculator) was painfully slow, let alone do it 4 times.
So I wrote two little functions in Python that will calculate factorial (yes I reinvented the wheel) and
Nothing teaches you math faster than trying to write a program to do the math for you. 
Writing a program is the same as teaching the computer how to do a certain task. The only way you can teach someone to do a task is to become a master at doing that task yourself.

Bonus: It also teaches you corner cases like 0! = 1 and  that you wouldn't think of otherwise.

Rant about C++ dependency hell

When was the last time I vented about C++? The answer for that is always:

"TOO LONG AGO". 

The initial friction to setup a substantial project using C++ is unfucking bearable.

When we started code revamp at work recently, I decided to be a good citizen and decided to incorporate cpptest, a unit testing framework in our project.

It made me realize how unreasonably complicated Makefiles can be. After 3 hours of peeling away at the complexity I managed to add cpptest to the build dependency of the project. 

Now time to write a few tests and check it out. I'm thinking "We are almost there". 

FALSE!

Compilation gives me a gazillion error messages that make absolutely no sense. After about 30mins of StackOverflowing and Googling, I find out that its not enough to include string.h and map.h in my  header files, but I also need to namespace it. Of course there is no indication (not even a hint) of that in the error messages. So I add 'using namespace std' and get past it.

Awesome my first test is compiling successfully. Time to run this baby and declare victory. 

Close! But no cigar.

The executable was unable to load the CppTest library during runtime. Argh!

I set my LD_LIBRARY_PATH env variable and now it's running. But I can't ask everyone in my team to do that, so I have to figure out how to statically link that library. 

It's already 6pm and I'm hungry. That'll have to wait for another day. 

TL;DR - C++ and Makefile can burn in a fire of thousand suns.

Rapid Prototyping in Python

I was recently assigned to a new project at work. Like any good software engineer I started writing the pseudocode for the modules. We use C++ at work to write our programs.

I quickly realized it's not easy to translate programming ideas to English statements without a syntactic structure. When I was whining about it to Vijay, he told me to try prototyping it in Python instead of writing pseudocode. Intrigued by this, I decided to write a prototype in Python to test how various modules will come together.

Surprisingly it took me a mere 2 hours to code up the prototype. I can't emphasize enough, how effortless it was in Python.

What makes Python an ideal choice for prototyping:

Dynamically typed language:

Python doesn't require you to declare the datatype of a variable. This lets you write a function that is generic enough to handle any kind of data. For eg:

def max_val(a,b):
    return a if a >b else b

This function can take integers, floats, strings, a combination of any of those, or lists, dictionaries, tuples, whatever.

A list in Python need not be homogenous. This is a perfectly good list:

[1, 'abc', [1,2,3]]

This lets you pack data in unique ways on the fly which can later be translated to a class or a struct in a statically typed language like C++.

class newDataType
{
    int i;
    String str;
    Vector vInts;
};

Rich Set to Data-Structures:

Built-in support for lists, dictionaries, sets, etc reduces the time involved in hunting for a library that provides you those basic data-structures.

Expressive and Succinct:

The algorithms that operate on the data-structures are intuitive and simple to use. The final code is more readable than a pseudocode.

For example: Lets check if a list has an element

>>> lst = [1,2,3]    # Create a list
>>> res = 2 in lst   # Check if 2 is in 'lst'
True

If we have to do it in C++.

list lst;
lst.push_back(3);
lst.push_back(1);
lst.push_back(7);
list::iterator result = find(lst.begin(), lst.end(), 7); 
bool res = (result != lst.end())

Python Interpreter and Help System:

This is a huge plus. The presence of interpreter not only aids you in testing snippets of code, but it acts as an help system. Lets say we want to look up the functions that operate on a List.

>>> dir([])
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
'__delslice__', '__doc__', '__eq__', '__format__', '__ge__', 
'__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__',
'__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__',
'__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__',
'__setslice__', '__sizeof__', '__str__', '__subclasshook__', 'append',
'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

>>> help([].sort)
Help on built-in function sort:
     
sort(...)
    L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
    cmp(x, y) -> -1, 0, 1

Advantages of prototyping instead of pseudocode:

  • The type definition of the datastructures emerge as we code.
  • The edge cases start to emerge when you prototype.
  • A set of required supporting routines.
  • A better estimation of the time required to complete a task.

New Laptop

I finally ordered a new Macbook air for myself. One of my friends remarked at the fact that this is the first brand new laptop that I've ordered for myself. Since I'm a bit of a Linux fanatic, I tend to restore old computers and install a linux distro and make them useable. So I always get old laptops for cheap for myself. But this time I decided it's time to checkout Mac OS X. So I'll be replacing my Netboook (yep!) with the Macbook air. Anyone need a Lenovo S10 netbook :). I will even do a clean-install of Ubuntu or your choice of Linux distro. 

This new Macbook air will be my primary development machine. Let's hope it can take the abuse.

Falsetto dude and the Fat man

A typical conversation between my wife and I: 

Playing Ne Me Quitte Pas by Nina Simone

Yoshi : Ow! What is that abomination? 
Amjith: It's a french song sung by the great Nina Simone.
Yoshi : It's a woman? Sounded like a dude singing in falsetto.
Yoshi : I'm leaving the room if you don't change the song.
Amjith: FINE. You play something then.

Yoshi puts on Ave Maria by Pavarotti. 

I wait for 2 mins.

Amjith: Hey! Your fat man seems to be yelling at Maria. 

Yosh leaves the room and I sleep on the couch.

The End

Scripting Tmux Layouts

Tmux is an awesome replacement for Screen. I have a couple of standard terminal layouts for programming. One of them is show below.

  • Vim editor on the left.
  • Top right pane has the bpython interpreter. 
  • Bottom right pane has the bash prompt. 

I have a small tmux script in my ~/.tmux/pdev file that has the following lines

selectp -t 0              # Select pane 0
splitw -h -p 50 'bpython' # Split pane 0 vertically by 50%
selectp -t 1              # Select pane 1
splitw -v -p 25           # Split pane 1 horizontally by 25%
selectp -t 0              # Select pane 0

In my tmux.conf file I have bound <prefix>+P to sourcing this file. So now anytime I want to launch my python dev layout, I hit <prefix>+<shift>+p. 

bind P source-file ~/.tmux/pdev

Downloading Specific Filetypes using 'wget'

I decided to prepare myself for the Intro to AI, a free online course offered by Stanford. I found the course website: http://www.stanford.edu/class/cs221/schedule.html and wanted to download all the slides. 

It's time to pull all the ppt files from that page.

wget -r -A.ppt http://www.stanford.edu/class/cs221/notes/ 

This created a tree of empty directories with one of them that had all the ppt files. Time to clean up the empty folders:

find -depth -type d -empty -exec rmdir {} \;

Ta-da! Empty folders are gone. 

Remember if you are typing a command more than once in succession there is a way to automate it. 

How to Find Local Tech Jobs

Looking for tech jobs can be daunting. Networking is touted as the magic bullet for job seekers. But where do you start?

Here are some robust ways to build your network.

Users Group:

User groups are typically monthly meetings for geeks who get together to talk about their favorite programming language or operating system. Usually they are accompanied with a mailing list which is used to announce the meetings, ask questions and post job openings. So sign up to the mailing list and start attending the meetups. They are full of really nice people who are willing to help. 

  • Utah Python - Utah Python Users Group
  • URUG - Utah Ruby Users Group
  • SLLUG - Salt Lake Linux Users Group
    • SLLUG-JOBS - Mailing list to announce job postings
  • PLUG - Provo Linux Users Group

Local Conferences:

Most cities have some tech conferences that are a great source for networking. I found out about a lot of the user group by going to one of the following conference.

UTOSC - Utah Open Source Conference.

HackUTOS  - Utah Open Source Project Day - Geeks, snacks and open source.

LaunchUp - A local entreneurship clinic. A great way to learn about the local start-up scene. You can meet new CEOs and fresh companies looking to hire tech talent. A must for job-seekers.

I hope this helps someone.

Contributing to Open Source

Last week I successfully submitted my first patch to an open source project and it was accepted. 

I like the bpython interpreter for all my python needs. It is quite handy for a python newbie like me. A few weeks ago I was in the middle of building an elaborate datastructure to learn list comprehension in python, when bpython crashed and took all the history with it. I whined about it on twitter and one of the developers of the project prompted me to submit a bug report. I was quite impressed by the fact that a core developer of bpython replied to my bitching on twitter.

After I filed the bug report, I decided to get the source code and poke around. I finally implemented a feature that saved the history after each command instead of waiting till the end of a session. 

The following factors were the main impetus that led me to contribute to the project. 

Project Hosting: 

The project was hosted on bit bucket which is a Github equivalent for mercurial. This makes it so easy to fork a project and issue pull requests, compared to the traditional source forge model of submitting patches in a mailing list. The social coding sites like Github and BitBucket have reduced much of the initial friction in starting an open source project.

Project Size:

This one has a huge impact when I decide to dive into the code. Traditional C projects tend to have a ton of files that are too big which is daunting for a beginner. The bpython project was written in python and had a total of 13 .py files. This makes it dead simple to make a quick change and run the project without compiling it. Again the choice of language has a lot to do with this. 

IRC:

The welcoming nature of the community around a project does a lot to encourage a new comer. The IRC channels are a great way to interact with the developers compared to a passive form of communication such as emails. I jumped on #bpython irc channel and started asking questions when I ran into an issue with bpython source code. People on that channel are really helpful and prompt in answering questions.

Persistence:

My first pull request was scrutinized by the core developers and some suggestions for improvements were given. During that process I learned a lot about code review and how to check for corner cases. Finally after I made all those improvements the pull request was accepted and merged with the main repo. So having a beginners mind (no ego) is an absolute must when getting started on any project. Don't be discouraged if your first attempt is unsuccessful. 

Now I'm proud to say my name is listed in the AUTHORS file of bpython project.