Versions Madness

Last week, Linus Torvalds, the creator of Linux, published this post on Google+.

So, I made noises some time ago about how I don't want another 2.6.39 where the numbers are big enough that you can't really distinguish them.

We're slowly getting up there again, with 3.20 being imminent, and I'm once more close to running out of fingers and toes.

I was making noises about just moving to 4.0 some time ago. But let's see what people think.

So - continue with v3.20, because bigger numbers are sexy, or just move to v4.0 and reset the numbers to something smaller?
It seems that Linus considers the version number just like a name, unrelated to commercial consideration and even to product features. But often, the choice of the version number is considered a science.

I Like It Complicated

The major-dot-minor format is quite common and also the meaning of those numbers are quite standard: minor changes when there are small improvements while major increases on bigger changes. But after those numbers there may be a wide variety of things:
  • a build number, automatically increased at every successful compilation,
  • a distribution number or letter, changed every time a build is delivered to testers or customers,
  • a letter indicating the build type (alpha, beta, final, etc.),
  • abbreviations for special releases (pre, RC, QA, ...)
The funny thing is that some of the above cases may be combined together so, for example, you can find 1.7d RC or 2.1.B.174. By the way, for some years I've used a four-number system to identify delivered versions of my software: after major and minor there was a counter to keep track of small functions changes or refactorings while the last number was related to bug fixed.

The Tech Side

Your software may expose API or use functions provided by other programs. In this case, the version number has a fundamental purpose. Is through this string that your application is related to the others.

Understanding how other developers deal with version numbers can help you to know with which releases of third party softwares your program is compatible. And save you from some serious headaches when a customer claims that nothing is working.

The Commercial Point Of View

Aside these technical considerations, there is also the commercial side of version numbers. To a user or a customer, a change in the major number means that big changes and big improvements have been made. This generates greed for the new version in some people that can be used by salesmen to raise prices. The real important thing in this situation is to meet the expectations of the customer.

Even if the software is free, there is a rise of expectations whenever the change involves the leftmost numbers. And also in this case you cannot disappoint the users. However, for the Linux kernel, the situation is quite different: it's not directly used by final users but only by other developers and system administrators. In this case, Linus' idea is not so bad, in my opinion.

Conclusions

The real important thing is to use a system for indicating the version of your software. It has to be meaningful to you and to your organization and it must be clear enough for the final users. If you want my opinion, Semantic Versioning is pretty good.

3 Ways To Open A Lock - Part 1

If someone wants to open your lock, he has three ways:
  • use lockpicking tools;
  • steal your key;
  • ask you for the key.
But, you may say, what does this has to do with computer science? I'm glad you asked. A password in the virtual world is exactly like a key in real life. And this similitude continues also when someone wants to enter your bank account (or inbox, or Google account or...).

Lockpicking

Lockpicking Tools
The virtual equivalent of lockpicking tools is the classic way of guessing the password. There are two main ways to do this: brute-force and dictionary based.

The latter is quite simple. Hackers and crackers have big lists of common passwords and other words ready to use. If this attack is not successful, there is the brute-force.

The dumb version simply composes passwords by evaluating all combinations of uppercase and lowercase letters, numbers, and special characters (hyphen, underscore, percent, etc.). This method it's proven to succeed, soon or later. The issue for an attacker here is the time needed to check all the combinations that can be in the order of (billion) years.

But there is a fastest way. When people creates a password, digits are usually grouped together and at the end of a word. For example it's uncommon to have a password like "8such3fun" while "suchfun83" is more common.

Another pattern often used is to capitalize the first letter of common words (e.g. "SuchFun83") or use numbers to substitute some letters (e.g. "5uchFun83"). Using a combined attack brute-force and dictionary-based is quite easy to guess this kind of password.

If now you are wondering how strong is your password, there are a couple of sites that can answer your question:
  • How Secure Is My Password tells you how long it will take to an attacker to crack your password, considering a combined brute force & dictionary attack
  • The Password Meter is a little less accurate, since it only considers length and characters variability, but it suggests some rules to create really strong passwords
While playing with the above sites (or any other password checker), do not enter your real passwords, because they can be stolen by the website itself or during the transmission (if sent in plain text).

How C Compilers Work Part 4: Linker

Now we are at the point where we have produced one or more object files and we want to create an executable. Under GNU/Linux systems, this job is done by ld, the GNU linker.

As seen in the previous part, the compiler always works on one file at once, so, every time there is the need to access a symbol (function or variable) defined somewhere else, a reference is used. The first work of the linker is to check the correctness of all these references.

ELF layout
ELF layout
Once this operation has ended successfully, it's time to produce the executable. To do this, all the object files are split into their basic components which are reassembled according to the ELF format. For example, all the fixed strings go in the string table, the names of the used symbols in the symbol table, etc.

This also happens to static libraries (that are just a set of object files packed in an archive) and shared objects if the flag -static has been specified.

In case of dynamic linking (that happens 99% of the times), the linker appends:
  • a section called dynamic symbol table (.dynsym) containing the names of the external symbols, and
  • a section called simply .dynamic that, among other things, contains also the name of the shared objects needed at runtime.
When the process will be executed, the dynamic linker will append to the process image the images of the shared objects listed in the .dynamic section (but this is a different story).

Troubleshooting

"Undefined reference" is the most common error that you can get. It means that a function or a variable defined as extern has not been found. The most common case is a typo or a missing shared object.

When working with multiple toolchains or different versions of a shared object, it may happen that the linker signals errors for undefined references even if everything seems correct. This happens because the reference to the symbol (usually added by a .h file) does not match what's inside the shared object being linked. Or, to say it in other words, the shared object the linker is referring to, is not consistent with the header file that has been included. A solution is to check the path of the header and the library (flags -I and -L in gcc).

Another sneaky error may appear when a program is executed in a system different from the one where it has been created. The message usually shown is the pretty misleading "No such file or directory". But the message is absolutely correct, a file is missing (or it's in an unexpected location). The missing file is a dynamically linked shared object. To check which one is, you have to use readelf.
$ readelf -d <process_name>
The first rows show the names of the needed shared objects. Now you only have to make sure they are present in your system. If you are able to find them but you get the same error, try to specify additional paths for them.

Other posts in this series


Happy 1st Birthday

Birthday cake, Downpatrick, April 2010 (02)
Today this blog turns one. One year ago, when it started, I was not sure whether it would have survived more than few months. Conversely, with 68 published posts and more than 10.000 pageviews (many of those in the last month), I'm pretty satisfied.

To these newly arrived visitors, I want to provide a list of the seven posts I've written, that I consider the most important. This list doesn't match the one with the most viewed articles, but I'll get over it ;-)
So now, what's next? More posts, of course. Maybe some guest-post (I've asked a couple of colleagues of mine but apparently they are lazier than me).

Thank you all for reading these pages, and, if you have some ideas on how to make them better, please share your thoughts.

Sincerely,
Luca