Carlos Buenos’ Mature Optimization Handbook was an excellent concise read over the weekend.
The handbook is about the philosophical approach to performance measurements and optimizations and monitoring of our backend systems. After all, the author was part of the Facebook Performance Engineering team.
The good part about the book was that it gave you the right mindset to start thinking about the problem. The bad part is that it is like diving off the deepend of a swimming pool after reading the book. You may drown in the jargon and graphs or you may be successful in learning how to negotiate the waters. I’m still at the edge of the swimming pool myself, so we’ll see how it goes.
A couple of favorite excerpts from the book:
About our predictions of performance:
Our ability to create large & complex systems fools us into believing that we’re also entitled to understand them. I call it the Creator Bias, and it’s our number-one occupational disease. Very smart programmers try to optimize or debug or capacity-plan without good data, and promptly fall right out of the sky. How a program works and how it performs are very different things… Never forget that our human-scale understanding of what’s supposed to happen is only a very rough approximation of what actually does happen, in the real world on real hardware over real users and data.
About basics of computer performance:
Remember that a computer really does only two things: read data and write data. Performance comes down to how much data the computer must move around, and where it goes. Throughput and latency always have the last laugh. This includes CPU instructions, the bits and bytes of the program, which we normally don’t think about.
The kinds of computers in use today have four major levels of “where data goes”, each one hundreds to thousands of times slower than the last as you move farther from the CPU.
Registers & CPU cache: 1 nanosecond
RAM : 102 nanoseconds
Local drives : 105 to 107 nanoseconds
Network : 106 to 109 nanoseconds
Memory controllers try mightily to keep the first level populated with the data the CPU needs because every cache miss means your program spends 100+ cycles in the penalty box. Even with a 99% hit rate, most of your CPU time will be spent waiting on RAM. The same thing happens in the huge latency gap between RAM and local drives. The kernel’s virtual memory system tries to swap hot data into RAM to avoid the speed hit of talking to disk. Distributed systems try to access data locally instead of going over the network, and so on.
The handbook goes on to explain about instrumenting samples and metrics, storage of data, visualization, monitoring and diagnosis, and feedback loops.
I do wish the book ended with some recommendations of software to actually get started such as statsd and graphite and stuff like that, at least pointers in the right direction.
Go download the free book and give it a read.