New & revamped
New & revamped


Because of constant badgering from loyal users (Navdeep, Chinmay, Hari, Sandip, Vidyaraj, Kartik, Vivek, Arjun, Leo, Ravi and others), I finally had to dedicate a weekend to fixing up However, instead of just fixing up the old code base, I rewrote it to use Compojure instead of the deprecated Noir library, and along the way, I re-did some of the code design to make it more flexible for editing and debugging.

I can’t believe people still use the site 3 years after I wrote the first version and put it up, especially with so many comparison shopping sites for India announced in these 3 years which are more functional and cover more categories. But, hey, can’t argue with those users :)

Caveats: It’s still a work-in-progress, the JSON API, etc. are still not present in this version and more ecommerce stores have to be added, will work on those going forward. And I can use the help if anybody has time, it’s open-sourced at

We have “database APIs” such as abstraction layers over multiple SQL databases and ORMs. But why not take it to the next step and make it a REST API like any other network call that we can make?

Database as a REST API
Database as a REST API

Advantages would be:

Did we just sort-of reinvent Datomic?

Of course, this is not a new idea at all, take restSQL as an example – my question is why is this not talked about more often?

Do most frameworks support this? If not, why not? If so, why don’t most frameworks don’t talk about such a use case in their documentation? If I use Django, I’ll start writing the models and use South to create migrations, and that’s that. If I have to reuse those model, from say, Java, then you’re on your own. The point is that, by default, Django (or Rails) doesn’t encourage you to do such a thing. If you go for a lighter framework such as Flask, then this becomes easier because the ORM is anyway not part of the framework.

Is this concept felt needed only in a polyglot case (multiple database systems, multiple programming languages)?

P.S. Also read Stevey’s Google Platforms rant.

Update on [2013-04-28 Sun]: Also see the very useful tech talk Designing a Beautiful REST+JSON API.

Recently the Obama re-election campaign employed similar population-wide behavioral analytics to micro-target voters to ensure his re-election. There is no reason why we in India must not look to technology to devise ingenious methods for near real time data collection and population-wide analytics of social performance. This will not only help micro-target and localise welfare Interventions by local Governments (as opposed to centralised schemes) but it will also shift the focus away from agenda-driven politicking based on lagging indicators and towards a debate on actionable interventions that can make a difference here and now.

I love this idea in this article in Niti Central on how we can leverage technology for social welfare.

The bit about Obama’s re-election campaign is best read about in this article called The nerds go marching in, although I still haven’t been able to find specifics about the tools that they built.

P.S. Also see Wikipedia article on Microtargeting.

A HBR article titled Smartphones, Silly Users perfectly describes why I have moved my personal information management system away from apps that sync across desktop and mobile:

  1. “We don’t remember anything anymore.”
    • “We’re increasingly outsourcing our personal memory banks to Google and other search engines, effectively wiping our own brains of easily accessible information.” a.k.a. the Google effect
  2. “We waste time preserving optionality.”
    • “We’re refusing to finalize our plans until critical moments. The ability to make reservations, check opening hours, look up driving directions, and review ratings on our mobile devices means that we’re increasingly iterating our schedules and keeping our options open until the very last moment before that meeting, lunch, or coffee catchup is set to begin.”
  3. “We get stuck in the infinite notification loop.”
    • “As we endlessly loop between Facebook, Twitter, LinkedIn, and other app notifications, our attention fragments, and it becomes difficult to focus on larger, more important tasks.”

Till this month, I was obsessed with syncing everything across my desktop and mobile. The problem was that I became obsessed with the mobile phone unnecessarily and once you’re using the phone, Point no. 3 kicks in – the infinite notification loop swallows a lot of time and attention.

Once I shifted my system to laptop-only, I don’t have all my tasks and calendar at hand, I’m forced to remember things (see point 1 above), and strangely, I’m more likely to remember things to pick up from the grocery store now than I was likely to remember to check my mobile phone app for things to buy when I was near a grocery store!

The most important thing is that notes and todos are in the same place, for example, if I’m on a call, I can take notes and then I can keep referring back to those notes while creating todos and working on tasks. The tasks come out of notes, they’re not separate! It really helps to have one system that can handle and encourage the normal flow instead of being forced to use separate notes and tasks apps.

Today, I’m all OrgMode. Again.


Recently, I finished reading the latest “early access” version of the Big Data Book by Nathan Marz.

What is Big Data

Let’s look up Wikipedia:

In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, analysis, and visualization.

So, Big Data is relevant for any technical and business person whose company deals with lots of information and wants to make use of it. For example, Gmail search, etc.

Why this book is awesome

The book has been a fascinating and engaging learning for me because of two reasons:

First, it has a strong and simple “first principles” approach to an architecture and scalability problem, as opposed to the confusing (to me) and mushrooming complexity and treating Hadoop as a panacea in the Big Data world.

Second, Nathan Marz was one of the only 3 engineers who made the BackType search engine (the company was acq-hired by Twitter):

BackType captures online conversations, everything from tweets to blog comments to checkins and Facebook interactions. Its business is aimed at helping marketers and others understand those conversations by measuring them in a lot of ways, which means processing a massive amount of data.

To give you an idea of the scale of its task, it has about 25 terabytes of compressed binary data on its servers, holding over 100 billion individual records. Its API serves 400 requests per second on average, and it has 60 EC2 servers around at all times, scaling up to 150 for peak loads.

It has pulled this off with only seed funding and just three employees: Christopher Golda, Michael Montano and Nathan Marz. They’re all engineers, so there’s not even any sysadmins to take some of the load.

Note: BackType’s (now open sourced) real time data processing engine Storm powers Twitter’s analytics product and real-time trends among other things.


When my wife was editing my books, she used for live preview of the text so that she knows what the output is going to be like. The caveat was that does plain Markdown and not Pandoc format which would mean the preview would be screwed up whenever there was a code block, etc., so, today morning, I hacked up an app called “Kalam” which does exactly that – live preview for Pandoc text.

The app is based on top of node-webkit (which I came across when I was wondering what Light Table is built upon), created by Roger Wang and others at Intel China Open Source centre, they’ve basically integrated node.js into webkit and disabled all the security restrictions, which makes it the almost-perfect cross-platform desktop toolkit – write HTML, CSS, JavaScript and use any node.js module!

Update: There’s also AppJS which is the same concept as node-webkit but looks more polished (via @aravindavk)

I’m astounded on how popular Whatsapp (messaging app for phones) really is:

  • My wife’s friend who runs a boutique went to an old market to buy cloth material for her shop – the salesman asked her to send the specific color she wants with a picture via Whatsapp. Think of an old dusty market and think of this again.
  • My wife’s friend who is a recent mom talks to her paediatrician via whatsapp for advice and general questions, and the doctor replies back (regardless of location).
  • An uncle and aunt in US go for shopping in the big malls and send photos to each other of whether they should pick up that item or not.
  • My uncle and aunt were in town and we went shopping – again, we sent photos of the T-shirts to their son in another town to ask whether he likes the shirt enough to buy it – decision done, shirt bought, no risk of a T-shirt going unused.
  • Recently, there was an incident in Bangalore because of which SMSes were restricted, which is like a heart attack for teenagers, including my sister – they all downloaded Whatsapp and shifted to it in an instant. “SMS costs – be gone!” Luckily, Whatsapp was free for a day or so in the iTunes app store around the same time and my sister, who is using my old iPhone (which is still working after 4 years) and does not have a credit card, grabbed it with eager fingers and is loving it.

I could go on and on, the point remains that the pervasiveness of it still surprises me. And it surprises many of my peers who grew up with email, Yahoo! Messenger, Google Talk, etc.

So I was imagining what could be the reasons that Whatsapp is so popular, and here are some wild guesses:

  • 2G on mobile is finally affordable? Now that 3G is more common and has been around for a couple of years, the slower predecessor has finally become cheap enough.
  • WiFi is more common now?
  • BlackBerry “BB-PIN” popularized the concept of instant messaging to a new phone-using generation, but people needed something cross-platform and Whatsapp was in the right place at the right time?
  • Whatsapp is available on most mobile operating systems including many older generation platforms such as Symbian, so people are not left out of the conversation.
  • Why didn’t people simply use GTalk? I’m guessing it’s because of the “create a Google account” barrier as well as GTalk not being as feature-full?
  • Talking about features – groups, photo sharing, video sharing is a natural extension that was just meant to happen, Whatsapp makes it free (as opposed to SMS/MMS)
  • The details in Whatsapp are great – for example, every message has two ticks – one that says it has gone from your phone to the server, the second tick shows that it has gone from the server to the other person’s phone – an in-built message delivery status as opposed to guessing whether the SMS has reached the other person
  • Did I mention how useful the groups feature is? I’m keeping in touch with friends all over the world through the same – in particular, one group has people in USA, Singapore, India all in one group and having a conversation at the same time.
  • So why didn’t email do the same? Because people have a work email/personal email distinction whereas a phone is undoubtedly personal? Because people don’t like to differentiate between a subject line and a body line (don’t laugh, I did too until I realized this is actually a barrier), they just want to “chat” because people are already familiar with SMS?

These are just my rough line of thought about this whole thing and I just wanted to write it down because many of my friends have asked the same question.

What are your thoughts?

Some people, when confronted with a problem, think “I know, I’ll use a wiki” Now they have no idea how many problems are there.

Andrew Clay Shafer

The above quote summarizes my experience in the past years in having converted A Byte of Python book into a wiki. I was hoping to make it much easier for readers to make corrections and contributions to the book. The only thing it enabled is tons of spam.

The second issue was that lots of readers kept emailing me to ask about ebook and kindle versions of the book which I could not do because it was stuck in a wiki format.

The solution was obvious – Pandoc, but it just seemed too daunting a task to do and hence I delayed it for years. It has been done now, thanks to my wife who did the conversion from the earlier Mediawiki syntax.

I also wrote a small Fabric file to update the websites in a single command invocation – so now I can edit the book and by running one command, it will update the book chapters pages on the WordPress site as well as generate and update the PDF and EPUB files stored in AWS S3.

I have also made a few quick changes to the text:

  • Overhauled installation and first steps chapters for Python 3 and explaining how to find and open a terminal application in detail
  • Recommending newbies to start with ActiveState Komodo Edit editor and instructions on how to use it to create and run a source file
  • Removed unnecessary sections such as nonlocal, metaclasses, exec and eval, etc.
  • Moved escape sequences in strings to the ‘More’ chapter in the end, it was an unrequired hurdle

I haven’t done an exhaustive review of the text yet, because frankly, that is draining, but I hope this is the start of me responding to readers’ suggestions on how to improve the book.

Enjoy the new EPUB and PDF formats. The sources are at Kindle edition, etc. coming soon.

I once happened to attend a RubyConfIndia talk by C42’s Steven Deobald who said:

data > functions > macros > compilers

That kind of stuck in my head even though I didn’t know what it meant at that time. I understood it only after learning Clojure and “The Clojure / Lisp way”. I realized it when I was writing Python code for work, and I suddenly noticed I was writing code differently and I had one of those good aha moments that is supposedly the start of a person’s Lisp journey.

I’m now amused at how often I break down my Python or Java code into lots of little functions instead of the 100-liner functions that I used to write before and am still surprised that I never realized I was writing them! The good thing about the “lots of little functions” is the modularity and the ease with which I can write, read, understand and importantly test the code without having to build an object hierarchy first.

For example, my code has now suddenly started looking like this, where data structure is explicitly written down and the processing code is separate from it – this makes the code really reusable. It is a contrast to my earlier programming style where I would’ve probably had the data structure implicit in the parsing code (which makes it less maintainable) or worse, had classes and objects to do the same and it would certainly have not been so reusable! Think of a typical Java programming workflow where I would have had to create a class to represent the data input and passed that to a processor class instance and so on.

import xlrd

START_ROW = 3 # skip headings

# Explicit structure of the data
    'name' : 0,
    'class' : 1,
    'maths' : 2,
    'geography' : 3,
    'english' : 4,

def row_to_dict(sheet, row_number):
    assert isinstance(sheet, xlrd.sheet.Sheet)
    assert isinstance(row_number, int) and row_number > 0 and row_number < sheet.nrows
    # Code that will work with changing structure
    return dict([(key, sheet.cell_value(rowx=row_number, colx=COLUMN_MAPPING[key])) for key in COLUMN_MAPPING.keys()])

def import_excel(content):
    book = xlrd.open_workbook(file_contents=content)
    sheet = book.sheet_by_index(DATA_SHEET_NUMBER)
    # Code that will work with different spreadsheet formats
    sheet_data = [row_to_dict(sheet, row_number) for row_number in range(START_ROW, sheet.nrows)]
    sheet_data = [data for data in sheet_data if len(data['name']) > 0] # Ignore empty rows
    return sheet_data

if __name__ == '__main__':
    from pprint import pprint
    pprint( import_excel(open('test.xls', 'rb').read()) )

To be clear, Python was a good first step, what changed was the mindset after attempting to learn a Lisp language. As Peter Norvig once said:

Basically, Python can be seen as a dialect of Lisp with “traditional” syntax (what Lisp people call “infix” or “m-lisp” syntax). One message on comp.lang.python said “I never understood why LISP was a good idea until I started playing with python.” Python supports all of Lisp’s essential features except macros, and you don’t miss macros all that much because it does have eval, and operator overloading, and regular expression parsing, so some–but not all–of the use cases for macros are covered.

A good friend of mine once said that Python is more popular because it is more approachable by traditional programmers and hence a more “social” programming language, whereas Lisp is a powerful language but not for everyone. That is explained in detail in the Lisp Curse essay.

So first good thing about Clojure is that it is a Lisp. Second is that it runs on the JVM which has solid performance, sometimes 20x better if you use it right. Third is solid Java interoperability. This was important to me because as a consultant, Java is unavoidable and I’ve written more Java code this year than I ever have. And using a good dynamic language on top of JVM with good Java interoperability is a path to making my work go faster. At least, that was how I got started. After all, your code will end up reflecting your company.

The downside I felt when I was grokking Clojure is that syntax is not simple even though that is the claim of traditional Lisps, for example #”” is regex, #{} is a set, #_() elides the form (compiler checks the code but acts as if it was commented out), #() is an anonymous function, #’ derefs to vars, and so on.

Here is a quick idea about Clojure’s philosophies that I was pointed to:

clojure three circles

Another interesting point is that functional programming languages are growing and it is probably because the future is DSLs again.

If you’re still not convinced, you should watch The Curious Clojureist. And you should definitely watch all the Rich Hickey talks.

How to learn Clojure

The O’Reilly Clojure book is best book that I’ve come across yet.

However, equally important, my strong recommendation is that Clojure is good only when combined with Emacs and ghoseb’s emacs setup. After learning Clojure in that environment, writing Python again makes me miss so many goodies (To get up to the same productivity in a few ways, I’m using PyCharm these days and am enjoying that).

To make my learning solid, I rewrote for the third time in Clojure. The source code is at – be prepared to read some amateurish Clojure code.

I got a lot done in ~280 lines of Clojure code compared to 480+ lines of code in Ruby/Rails and a ton more boilerplate code. This difference in number of lines of code repeats often.

One interesting point is that because of the Clojure way of thinking, I ended up using a simple combination of future and core.cache to do the fetching of prices from book stores in parallel rather than bringing a full-fledged background jobs processor (delayed_jobs) to do that which vastly simplified the system. You can read that code in stores.clj.

Ending Thoughts

I got started with this journey because of frustrations with Java and at the same time I was trying to be not be narrow-minded with experience in just Python/Ruby/Perl languages (they are so similar). I kept reminding myself of what Douglas Crockford said:


That’s an easy one—lack of curiosity. They were so satisfied with the work that they were doing was good enough (without an understanding of what ‘good’ was) that they didn’t push themselves.

I’m much more impressed with people that are always learning. The brilliant programmers I’ve been around are always learning.

You see so many people get into one language and spend their entire career in that language, and as a result aren’t that great as programmers.

Programming languages becoming popular is almost never about the merits of the language itself and rather just a virtuous cycle of availability of programmers or platform requirements – Javascript and Objective-C are popular because you have no other choice, not only because of the merits of the language. Similarly, Clojure is leveraging the JVM and whatever native platform it runs on and hence is getting that initial lift needed to make the language appealing since people don’t want to learn and start on yet another ecosystem.

This is best explained by Alan Kay himself:

Q: What should Java have had in it to be a first-quality language, not just a commercial success?

Alan Kay: Like I said, it’s a pop culture. A commercial hit record for teenagers doesn’t have to have any particular musical merits. I think a lot of the success of various programming languages is expeditious gap-filling. Perl is another example of filling a tiny, short-term need, and then being a real problem in the longer term. Basically, a lot of the problems that computing has had in the last 25 years comes from systems where the designers were trying to fix some short-term thing and didn’t think about whether the idea would scale if it were adopted. There should be a half-life on software so old software just melts away over 10 or 15 years.

It was a different culture in the ’60s and ’70s; the ARPA (Advanced Research Projects Agency) and PARC culture was basically a mathematical/scientific kind of culture and was interested in scaling, and of course, the Internet was an exercise in scaling. There are just two different worlds, and I don’t think it’s even that helpful for people from one world to complain about the other world—like people from a literary culture complaining about the majority of the world that doesn’t read for ideas. It’s futile.

Did you know that Lisp and Smalltalk are not so much in vogue because they were killed by bad hardware!?:

Alan Kay: Yes, actually both Lisp and Smalltalk were done in by the eight-bit microprocessor—it’s not because they’re eight-bit micros, it’s because the processor architectures were bad, and they just killed the dynamic languages. Today these languages run reasonably because even though the architectures are still bad, the level 2 caches are so large that some fraction of the things that need to work, work reasonably well inside the caches; so both Lisp and Smalltalk can do their things and are viable today. But both of them are quite obsolete, of course.

Lastly, I wanted to mention that my Clojure journey would not have sustained if it wasn’t for Baishampayan Ghose (a.k.a. @ghoseb, a.k.a BG) whose untiring answers to my dumb questions was instrumental in me finally gaining some understanding of Clojure and Lisp in general. Thanks BG!

P.S. Watch this 2011 talk by Alan Kay. As @ghoseb would say, Be prepared to blow your mind.