Speed Up Your Searches

Usually, in Computer Science classes, sorting algorithms are studied in depth. The reason is quite simple: they are widely used and can be applied to several contexts. Moreover they represent a good way to introduce the concept of computational complexity.

In the real world, one of the reasons to sort items is for searching. On a large set of data, finding an item by scanning every element to see if it matches can be a very slow operation. If the array is sorted by a key, we can perform binary search in order to lower complexity on worst case from O(N) to O(log N). This is a good improvement on large set of data, isn't it? Now, what if I say that I can lower the computational complexity to O(1)?

No Magic, Just Some Math

A small phone book as a hash table (image by Jorge Stolfi)
The trick is to find a way (or an algorithm, if you prefer) to convert the key (that can be a number, a string or whatever) into an index that directly refers to the desired information. This "way" is called hash function and the data structure hash table.

There are hundreds of articles about how hash tables works under the hood and at least the same number of implementations. In many high level languages hash tables are available even as built-in types, often called associative arrays, dictionaries or maps.

Hash tables are widely used when the data set is huge and high response speed is needed. For example, in some RDBMS, indexes are implemented with hash tables. In addition, unlike arrays, you are not forced to have object of the same size stored into hash tables, so the used space (in memory or disk) is optimized and this also affects the average speed of a search.

All That Glitters Is Not Gold

Unfortunately there are some drawbacks with hash tables; the two biggest are both related to the hash function: speed and collisions. The speed issue is related to the execution time of the hash function itself and it's critical when the hash table doesn't contain so much data, so the cost of the linear search in the medium case may be comparable with execution of the hash function.

The collision is a more subtle problem: the hash function may produce the same output for different keys and exist various ways to store different data in this case but they all take some time, reducing the performances of the hash table.

The good news is that many implementations let you choose to use the built-in hash function or provide your own, so, if the speed is a critical requirement of your project, you can create a hash function that better fits your needs.

Conclusions

Hash tables are a very powerful and flexible data structure that can increase the speed of your searches on a huge set of data.

If you are wondering which hash table implementation I use in C, the answer is GLib Hash Table.


Ideas Are Not Enough

Image by Pictofigo
Today, +Seth Godin in his daily post spoke about something that is essential for me: the importance of going from ideas to implementation.

A great architect isn't one who draws good plans. A great architect gets great buildings built.

You don't know how many times I've heard people (including myself) complaining about a new product presented by a competitor saying "I've had the same idea years ago".

So are you trying to say you are smarter? Probably not so smart to get things done. Or maybe did you think that your idea would be translated into a project by your subordinates?

Stop fooling yourself! If you want your dreams come true, stop sleeping and start working!

I've already wrote about this concept in Avoid Perfection and briefly in Everyone Matters.

[Linux] How to Define a Path for Shared Objects

In Linux, the predefined paths for shared objects (.so) are /lib/ and /usr/lib/. During normal usage this is OK but sometimes it's necessary to specify other additional paths. There are several way to do this task.

The most common way is to define an environment variable in the shell:

$ export LD_LIBRARY_PATH=/path/to/shared/objects

This operation must be done in every new shell/terminal emulator before starting the process that needs the shared objects located there. To avoid this, you can insert the export in your .bashrc/.profile file (located in your home folder): in this way the variable is set every time you log in.

But sometimes you need a shared object to be called from daemons or processes run by various users. The solution is the file /etc/ld.so.conf. In this file you can add all the paths where shared object should be searched.

In Debian-based distros (like Ubuntu), this file only include the following line:

include /etc/ld.so.conf.d/*.conf

This means that you don't have to change this file but it's sufficient to add your paths to a file with extension .conf located in the directory /etc/ld.so.conf.d/

Pay attention! Adding directories not owned by root can be a security breach because it can lead to the execution of arbitrary code.

Cleaning Up the Path in 5 Easy Moves

Freddie Mercury - "I Want to Break Free" music video
Freddie cleaning his house
The idea for this post came during last weekend while I was cleaning my house. In fact, this activity is made by several parts: some are funny (like using the vacuum cleaner while singing "I Want to Break Free"), some other are boring, others are awful.

The development of a big project is very similar: there is the challenging part, the damn-long part and the stupid part. Here there are some advice to help you to accomplish your job in the best way.

1. Split the project in tasks and subtasks - this is obviously the first thing to do; start developing headlong is something to avoid.

2. Try to see if there are constraints - it's important to understand which tasks must be made before others and define a clear path between them.

3. Start with the task you consider the worst - it may be the longest or the most boring or the most annoying, the choice is up to you, but when it will be done, the rest of the project it's all downhill.

4. Work on a single task at a time - you have a road map (defined at point 2), why do you need to get rid of it?

5. Work always on tasks related to those already accomplished - in this way, you are always sure that new pieces fits properly in the existing structure and it's easier to test your progresses.

Well, that's all folks. Let me just add another general purpose suggestion: always remember the 80/20 rule, that can be declined as "details will cost you the majority of your effort".

[Linux] 2 Alternatives to Standard Terminal Emulators

Nothing to complain with gnome-terminal: I've used it for years and I can say that it does its dirty work very well. But lately, too often I need to have several different shells opened and not overlapped in order to check various things at the same time.

With standard terminal emulators I haven't found a way to automatically arrange the windows on the screen, so I've searched. And I've found Terminator.

Terminator

Terminator screenshot
Click to enlarge

As you can see from the above image, in a single window you can arrange several terminal emulators (4 in this example) by splitting the window both horizontally and vertically. Moreover the proportions may be changed and every function can be triggered by a customizable keyboard shortcut.

Other handy functions are:

  • the possibility to type the same text/command in more than one terminal at once;
  • the temporary maximization of one terminal window;
  • tabs management (if you need even more shells).

On the appearance side, you can customize font, colors (with built in themes), set the background (color, image or transparent) and the scrollbar aspect.

Guake

Example of Guake overlapped to other windows
Click to enlarge

Sometimes a terminal window is too much for me. I mean, maybe I just need to know if the Internet connection is working properly with a ping and then forget. Or know which IP address the DHCP assigned to my network interface. Or quickly take a look to a man page to answer a colleague.

Opening a normal terminal window for such simple operations and then closing it right after may be a solution but, I prefer to have a dropdown shell that disappear when it's no more needed, just to reappear few minutes later when I press a key.

This is exactly the behavior of Guake (which name recalls the FPS Quake because of the similitude with its chat), with the addition that it supports multiple tabs. One thing I like is the fact that it remains the same across multiple workspaces.

Guake is a little less customizable than Terminator especially in the graphical part but I hope you don't mind since it will be hidden for most of the time.

Conclusions

Even if desktop managers have greatly evolved in the last years, in Linux systems the fastest way to do several things is using terminal emulators. And with these two my productivity has boosted ;-)

Battles and Management

Lev Tolstoy in a rare color photo
Have you ever read War and Peace by Lev Tolstoy? It's a great book with many stories wrapped up with History. It is based in Russia during the Napoleon era and, of course, it also tells about battles.

There is one thing in the Tolstoy's point of view that caused me some thinking. More or less, this is the reasoning of the author: most of the times, battles are not decided by generals or strategists but by single episodes of bravery or dastardliness in the troops.

Of course there is a part of truth in this but, as usual, life is slightly more complicated. A good commander should be able to understand if its soldiers are motivated, which are their strengths and their weaknesses. And a good commander always has an ace in the hole.

Victory and Fortune

Many great commanders in the past were undoubtedly lucky and often luck is considered an essential attribute of good strategist and I'm sure you have already heard that fortune favors the bold. But a commander that points everything on its luck is doomed to lose.

This is why, really great commanders always have some troops reserved for difficult moments (for example Napoleon had the Old Guard) and they know when engage a battle and when retire their troops.

For a manager it works the same. If he plans the team at one hundred percent of the time, there will no room for contingencies. If he accepts every job proposed by the salesmen, the risk is to provide a poor job or to miss deadlines. He can put its subordinates under pressure for some weeks but not forever.

Sometimes managers have to take some risk and they can be lucky, but fortune doesn't lasts forever.

Elegance Doesn't Pay

I didn't know exactly how unicode strings are encoded in UTF-8 format until I've read this post by James Coglan. Basically it uses a variable number of bytes (from 1 to 4) to represent characters. If you are developing in C, correctly allocate and parse UTF-8 strings can be annoying without a dedicated library. It would have been better (and more elegant) if there were always 3 bytes for each char, don't you think?

But, just like in the real world, elegance has its tradeoffs. In this case mainly one: compatibility with ASCII. This encoding ensures that every ASCII string is also a valid UTF-8 string. A great achievement in my opinion, compared with the impalpability of the elegance.

When developing an application, searching for elegance in code, data structures or classes may make you miss the key point: you are creating applications for the users, not to make other developers think that you're smart.