Please Optimize

Every now and then, I find quotes against optimization, just like this:
This is quite surprising, since in many cases the speed of a program is fundamental for its success. A UI (or a website) cannot be slow or become unresponsive in some situation. Managing a huge amount of data in few seconds instead of minutes can make the difference from a top seller to an unwanted app.

[A similar reasoning may applies to RAM or disk space too, but in this post I'll be more focused on the execution time.]

The only quote I totally agree with is
premature optimization is the root of all evil.
- Donald Knuth
The explanation is just few rows below.

(At this link you can download the paper)

[A good programmer] will be wise to look carefully at the critical code; but only after that code has been identified.

Identify The Critical Code

It's not always easy to understand where bottlenecks are. A developer with enough experience may imagine which part of the code needs to be optimized but:
  • he cannot be sure (scientifically speaking), and
  • he needs to have a measure of the improvements.
For this reason, you need to measure the length of (almost) every operation, paying attention to feed the application with a realistic set of data. Another good practice is to collect many samples for every dataset and calculate the average, in order to remove the noise produced by other processes running on the same environment.

After having analyzed the results, you can start to make changes on the parts that last longer, possibly one at time. And then, measure again. Was your modification faster? Good job! Go on with another part. Was it slower? Try another solution.

Now you may want to know which tool to use to take the measurements. There are many performance analyzers out there, but I prefer to collect timestamps from the right places.

There are three reasons behind this choice:
  1. I have to review the code and this is important because I'll have the structure in mind when I start to make changes;

  2. some profilers are not very accurate (for example, they return an estimation about which functions take the most execution time, but cannot tell you if this is because they have been called a million times);

  3. I have a great control over the measured code so, once identified a slow function, I can set more timestamps.

How Much Should I Optimize?

Even if it seems a silly question, there are many different levels of optimization. But the most important thing to consider is that the compiler usually has its own strategies to compile and optimize our code. For this reason, what seems a great improvement, once compiled may not lead to any difference. This is why it's a good idea to compile turning on optimizations.

In addition, please consider code readability, otherwise there is the risk that in the future another developer will get rid of your efforts just because it's too hard to understand. If this kind of optimization is really needed, use comments to explain it why you wrote such an obscure code.

Believe it or not, once this happened to me too: there was a really complicated block of code (with no comments explaining it) that I substituted with few lines of code, just to roll back the change once seen the execution time.

Horror Code - Loop And Re-loop

Some time ago, a colleague of mine told me to look at a function. It was something similar to this:
void foo(struct bar array[], unsigned int count)
{
        /* some initialization code */

        for (int i = 0; i < count; i++) {
                /* 30 rows of code
                   doing something with array[i]*/
        }

        for (int i = 0; i < count; i++) {
                /* other 20 rows of code
                   doing something with array[i]*/
        }

        /* some cleanup code */
}
At first, I thought that, in the first loop, some data needed by the second loop were calculated. But after a closer look, I found that this was not the case. Furthermore, I saw that the first five or six rows of both loops were the same.

The explanation is that the second loop has been added years later the first by a different developer that didn't want to waste time in understanding what the first loop did. You may not like it, but it works, unless you have performance issues. Personally, I think there are funnier ways to make two loops in a row.

Corkscrew (Cedar Point) 01

Don't Wait For Bad Things To Happen

Things only get as bad as you are willing to let them.
This thing has happened to me so many times that I start to think bad luck is real. The situation is the following: a product is on the market since several years and everything works fine. At some point, a customer reports a strange behavior. You start to look at the problem and found an horrible bug that is there since the beginning. In the time you think about a solution, implement and test it, at least two other customers report the same issue.

How is it possible? How is possible that for years everything worked fine and in one week three different people find the same bug? The only answer I have is...

The Murphy's Law

There are several versions for it but I believe that it can be summarized in this way:
If anything can go wrong, it'll go in the worst possible way.
This may seem pessimistic but knowing that every bug can be potentially catastrophic, can help us to be more focused and more critical about our code. What I've seen frequently is cutting corners to meet deadlines (yes, Your Worship, I'm guilty too) with the promise (to whom?) of doing the right thing in the future. But usually that future will come when it's too late and a customer has already found the problem.

The only way I know to prevent this kind of issues is to plan periodical revisions of the code that can lead to refactoring sessions. Another idea may be to have a checklist of things to verify before put your program in production. For C programs it may be something like this:
  • no strcpy() allowed - use strncpy()
  • no sprintf() allowed - use snprintf()
  • check for NULL pointers
  • check for memory leaks
  • ...
So now you are ready to revise all the code of your team to improve it, right? No!

If It Ain't Broke, Don't Fix It!

This is an old adage that is difficult to deny. So, what's the right balance? I've seen performance optimizations made by removing vital checks. I've seen commit messages claiming "removed useless code" made by developers that didn't understand why that code was there.

Well, to me, it all depends on your experience and your knowledge of the code you are gonna change. You are allowed... nay you must improve the code, but you must also know what you are doing. And this is the most important thing!

By the way, if you are in doubt, ask someone more experienced than you.

Check For Memory Leaks!

Last week I've lost at least three hours in understanding and fixing a small open source library that was leaking memory. The incredible thing was the amount of allocated memory (half of whom never freed). Basically, the library is a overcomplicated implementation of a binary tree in C that, for less that 1 KB of data, leaks 8 KB of RAM.

My first intention was to throw away that piece of junk code, but unfortunately I didn't have the time to rewrite it, so I started hunting. But understanding the mass of functions and when they are called was taking too long, so I decided to call my old friend Valgrind.

Valgrind is an excellent tool for detecting memory leaks. The simplest way to use it is the following:
valgrind --leak-check=yes program_to_test [parameters]
This is enough to provide you the total amount of allocated memory with a list of blocks that have not been freed (if present). And, for everyone of these, there is the full call hierarchy to let you quickly identify why it was allocated.

Of course, Valgrind can do much more than this but its usage to find memory leaks is the minimum thing that every developer must do before releasing a software. And the fact that the code is open source is not an excuse: you must ensure the quality of your program, no matter how many people will read the source code.

3 Ways To Open A Lock - Part 3

This is the third and last part of a series (first post, second post).

Simply Asking For The Key

Although in the real life it's unlikely that a thief asks you for the key of your home, in the digital world this is the most common and successful type of attack. The technical name is phishing and, according to Wikipedia,
[It] is the attempt to acquire sensitive information such as usernames, passwords, and credit card details (and sometimes, indirectly, money) by masquerading as a trustworthy entity in an electronic communication.
I'm pretty sure that you have received at least one email from a bank that asked you to check the movements of your credit card. In the email there was a link to a fake website very similar to the bank's one. By logging in you simply give your password to a scammer.

The thing that makes possible this kind of attacks is called social engineering and it does its best in the internet era, even if it has been used probably since the dawn of time. It consist in a series of techniques that aims to make the victim perform actions that normally he wouldn't have done.

The fake email from the bank (or from Facebook, etc.) is similar to trawling. But there is also a technique called spear phishing which targets a specific person. To do this, the attacker starts to collect as many information as he can about the victim, included his mother maiden name and the name of his first pet ;-)

Unfortunately, sometimes attacks are a little more direct.


Image from xkcd licensed under a Creative Commons Attribution-NonCommercial 2.5 License.