DMail And The False Sense Of Security

Email is around  since several decades. Nevertheless, it is still one of the most used form of communication, especially in business. Therefore the security of an email message is really important.

Over the years, different methods have been implemented. Lately also the guys behind Delicious have implemented their solution, starting a new service called DMail. The claim is:

Self-Destructing Email
Finally, sent email has a delete button
At present, the service is in Beta version, and it's free, so I've tried it to understand how it works.

I have to admit it: I was pretty skeptical... and I was right. But first things first.

How It Works

Let's start by saying that it works only on top of GMail (not even Inbox) and only with Google Chrome/Chromium. In fact, by clicking on the button "Try it now!", you are redirected to the extension page in the Chrome Web Store.

Once installed, you can see the DMail logo on the top-right part of the GMail screen. The same logo appear in the compose window near the enable slider and the combo to choose the expiring time.


You can write your email and send it like you normally do, but what happens is that the message is not sent to the recipient but it's stored in a DMail server in a encrypted form (or so they say).

The addressee receives an email with a link to the message that is unencrypted on the fly just for him. When the time expires or if the sender decides to destroy the message, the link will show the following message:


Message Unavailable
This message is no longer available for viewing..
It seems cool, right? Uhm, not so sure...

Security?

Let's see where the pitfalls are. First, the mail with the link to the encrypted message is (obviously) unencrypted and it contains the codes (KEY and CLIENT) that I suppose are used to decode the message. This means that, if the email is transmitted over an insecure connection, the message should be considered compromised.

Moreover, the email can be forwarded - and also from the forwarded email is possible to access the message.

You can argue that once the message has expired, no one is able to see it. This is also what Snapchat promised, right? The countermeasures are quite simple. You can save the page with the message from your web browser or take a screenshot or a photo. And make the destroyed message live forever.

Besides, you must remember that you messages lay on someone else's computer that can be compromised and your data be stolen.

Conclusions

To me, this service is pretty useless. The idea is good but, at present, I don't think the right technology exists to provide what they promise in a really secure way.

Recursion And Stack Overflow

 
Few days ago, the Twitter account of StackOverflow (the famous site of technical questions and answers) has published the following tweet:
The intention was to recall the name of the site by presenting a situation that every developer has been warned about.

Of course I got the pun but a light bulb has turned on in my mind: tail calls. Some years ago I've read something about it on the blog of Gustavo DuarteTail Calls, Optimization, and ES6.

Do It Yourself

It's not that I don't trust Gustavo, but I've tried to do my own test (on x86-64bits with gcc 4.8):
#include <stdio.h>

int main()
{
        fprintf(stderr, "Go!\n");
        return main();
}
Compiled with the optimization enabled:
gcc -O2 -o so so.c
Launching the executable so (that stands for stack overflow), it prints a never ending series of "Go!", without crashing. The explanation is exactly the same Gustavo presented in his old post: the compiler optimizes this call with a jump. So basically it transforms a recursion in a loop. This is clear looking at the disassembled code, with the following command:
objdump -d so
This is the result (leaving only the interesting part):
0000000000400480 <main>:
  400480:   48 83 ec 08            sub    $0x8,%rsp
  400484:   0f 1f 40 00            nopl   0x0(%rax)
  400488:   48 8b 0d b1 0b 20 00   mov    0x200bb1(%rip),%rcx   # 601040 <__TMC_END__>
  40048f:   ba 04 00 00 00         mov    $0x4,%edx
  400494:   be 01 00 00 00         mov    $0x1,%esi
  400499:   bf 14 06 40 00         mov    $0x400614,%edi
  40049e:   e8 cd ff ff ff         callq  400470 <fwrite@plt>
  4004a3:   eb e3                  jmp    400488 <main+0x8>
As you can see, fprintf is mapped with a callq to fwrite, while the recursion is made by the instruction jmp

Conclusions

Basically there are three considerations that can be done:
  1. Gustavo was right;
  2. compilers are smarter than you think;
  3. the same thing I've wrote about goto and do-while loops apply also for recursion.

Image taken from Wikimedia Commons (public domain).

You Need A Bug Tracker

For some years I've worked without the support of any bug tracking system (BT for short) but now it's hard to do without. I think that its standard usage is clear enough, otherwise Wikipedia has a good explanatory page. In this post I want to cover some more things you may find it useful for.

Your Historical Memory

Sometimes bugs tend to represent over time. In this case they are called regressions. A good starting place when you want to create regression tests is your bug tracker. All the bugs with steps to reproduce and support files are stored in a single place. The QA Team just needs to take the information from there and put them into the automatic test suite (because you use it, right?).

Besides, sometimes you choose to not fix some bugs for a variety of reasons. If there is some sort of workaround to get rid of them, it must be registered in your BT for future reference.

Not Only For Bugs

Nobody said that you cannot use it for keeping track of the development of new features. Many BTs provide separated categories for bugs and new implementations. In this way it's easier to filter them and make statistics to be shown to your manager.

Decent Bug Trackers give you the possibility to set the priority for each entry. Using this feature, you can easily manage the work of your team.

And why not using it also for something unrelated to programming, such as the documentation?

Make It Public

Warning! This suggestion should be carefully evaluated by the highest levels of your company because it can be potentially dangerous.

Generally speaking, letting your users signal bugs and requiring new features may increase the level of loyalty. On the other hand, your company must be well organized in order to quickly answer every request in a proper manner. And sometime the proper manner may be "we won't do that". Of course, too many "no" may give a bad idea of your company to potential customers.

Another issue is that one of your customers may feel like you are taking more care in resolving issues signaled by other customers.

However, if your company agrees on this point, it's better to have separated bug trackers for users and developers.

Conclusions

A bug tracking system is one of the most useful things that your company can adopt to manage defects and the project workflow.


Image by Messer Woland from Wikimedia Commons licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.

About Projects Driven By Users

I do believe that users are the most important part of a project. Really. They can provide you useful feedback, great ideas and suggestions on what can make your product better. Unfortunately the previous sentence only apply to few users.

In my experience, users are only interested to their own small needs. They don't care about your product but just to solve the problem they have today. More than once it happened to me that a customer asked me a change on a software to solve an urgent issue and few days later he wanted me to roll back the modification.

Having the development of your product driven only by users' requests is the fastest way to go crazy.

Feedback and hints are important but must be taken with a grain of salt. Don't abdicate the development of your product to users and customers.

A similar situation apply to tests. I've heard sometimes "The customer will do the tests." It may be OK if we are speaking about a small customization. But in all other cases, the result will be terrible. A user will test only the three or four functions that uses the most and, if something is not working properly, it's probable that he will find a quick workaround instead of spending half of an hour on the phone with your customer service. And even if they spend some time to report the issues to you, it's very likely that most of the messages would be like those reported in this page.

Again, users are not interested in your product. They only want a tool to solve their own problems and you should just thank them because your software has been elected. But don't ask them to do your job.

Code Review

During past few weeks, I've been reviewing an old codebase. Some functions were in place since 2008. You may think that those functions are bug-free. After seven years of usage, every issue should have emerged.

But the story is different. The number of errors I've found is impressive. Memory leaks, files left open, checks on conditions that can never be true (see also the last Horror Code), and, worst of all, logical errors. What do I mean with logical errors? Let me give you an example.

A Logical Error

There is a file descriptor declared as a global variable (OK, this could be considered a logical error too, but please keep reading) and a function that does some elaborations and then writes the result in a file descriptor. Among the parameters of this function there is a file descriptor... unfortunately it is never used. The writing is done on the global variable.

Everything works fine just because the global fd is the only one used in that program. What if in the future someone would have used that function passing a different fd? How long it would have take to find the bug?

By the way, the compiler has signaled with a warning that a parameter of the function was not used but nobody cared. You should always take care of warnings!

Conclusions

A code review is always a good thing to do. Probably you won't find big bugs but surely the stability and the quality of your software will be improved. And maybe that strange situation so difficult to reproduce will never be reported again.

Image by Randall Munroe licensed under a Creative Commons Attribution-NonCommercial 2.5 License.

The Day That Never Comes

The deadline is close. The customer is waiting for your fix. Your mate needs your patch before going home. No matter which of the above situations applies: the only way to accomplish your job is taking shortcuts and cutting corners.

You do not check some error conditions, use a fixed string instead of a localized one, do not properly free all allocated memory, etc. Your code compiles and seems to work fine but you know that it must be improved as soon as possible. So you tell your boss and/or the product manager. The response they usually give me is: "As soon as there will be some time, we'll fix it."

Guess what? That time never comes. There is always something more important or urgent to do, until a customer (usually an important one) reports an issue with the corners you have cut. Now the priority is to fix the problem as soon as possible, not to review the code to make sure it cannot happen again.

There is a logic in this: the customer doesn't care about code quality (even if he should). He just wants his software to work without errors. But for your company, it must be different. Why isn't it so?

Well, the answer I found is that for a customer is more important to have a quick solution than a bug-free software. It may seem pretty odd but just think about yourself. You buy a new smartphone and it just works as expected: you probably don't spam every social network to tell the world that your new iSomething is OK.

But I bet that if you find an issue and the customer service is really kind with you and the problem is solved in a couple of days, you'll tell your experience and suggest that brand to your friends.

This is called marketing and, on the past, there has been a PC producer that used to take advantage of this mechanism. But this is another story. At present, the only thing I can suggest you is to avoid shortcuts. At least unless you are in the marketing department.

Image by Nic McPhee licensed under Creative Commons Attribution-ShareAlike 2.0 Generic.

Horror Code - Copying & Pasting Errors

The following piece of code contains a trivial error.
        int v;
        char buf[11], *p;
        memset(buf, 0, sizeof(buf));

        /* Fill buf with a string */

        v = strtoul(buf, &p, 16);
        p = NULL;      // <--- This is wrong!!
        if (buf == p) {
                fprintf(stderr, "format error <%s>\n", buf);
                return -1;
        }

        /* Some other code */

        memset(buf, 0, sizeof(buf));

        /* Fill buf with a string */

        v = strtoul(buf, &p, 16);
        p = NULL;      // <--- This is wrong!!
        if (buf == p) {
                fprintf(stderr, "format error <%s>\n", buf);
                return -1;
        }
The content inside the if's will never be executed, because buf is statically allocated hence it will never be NULL. Of course the issue is the row "p = NULL;". Inside p there should be the pointer to the first invalid character (i.e. not an hexadecimal digit) inside buf. So the condition should evaluate to true if the string does not contain any valid hex number.

Kopimizm
This is a trivial error, but the thing that makes it funny is that it has been repeated twice in few rows, probably because of an unfortunate copy & paste operation.

I'm a big fan of the Ctrl+C Ctrl+V sequence, but I know that it can make you lose more time than what you save by not retyping the same thing.

In this particular case, the error is quite subtle and it can remain unnoticed for years (as actually it did). This is because strtoul() returns zero whenever the string doesn't start with an hexadecimal digit. But zero may be a valid value for further operations.

My suggestion is to always check twice when you copy and paste some code: you can easily replicate unnoticed errors.

Cookies And The Law

On June the 2nd, a new Italian law about cookies took effect. Basically it imposes:
  • to notify the user about the usage of so called technical cookies, and
  • to ask the permission to use profiling cookies (and blocking them until this permission is accorded).
For bloggers (like me) whom do not own the platform where their contents are published, this is quite troublesome since I don't think I'm able to stop anything that is delivered by Google (that is the kind provider of this virtual place).

By the way, I think that, if someone is concerned about privacy violations made through cookies, he can simply disable them in his browser. Here there are the instructions for the most common browsers:
By the way, cookies are just one among many ways to track users. Below there is a small (and incomplete) list.

Local Storage

This feature, introduced by HTML5, allows a website to save some data on your PC to be recalled later. If now you are thinking that this is what cookies are for, you are right. There is only one small difference: cookies are meant to be used by the server, while local storage is managed client side only.

But this is not a big protection, since with JavaScript is quite easy to transfer local storage information to the server.

Flash Cookies

Videos, apps and animations based on this old technology are gradually disappearing, however Adobe Flash is still widely used and its cache can be used to store information. So even if regular cookies are deleted, they can be recreated from this cache.

You can protect yourself with this Firefox add on that deletes Flash cookies when you close the browser (I don't know if there is something similar for other browsers).

Images

No, I'm not joking. Images generated on the fly by the server and the usage of the entity tag (ETag) to deal with cache invalidation can be used as a extremely persistent cookie.

The main defense you can adopt against this threat is to always use private mode navigation (or incognito mode) in your browser.

Evercookie

There is also a proof of concept that uses all the above methods and many other in order to create an immortal cookie. More information about evercookie at this link.

Your Browser

Every browser provides some information to the servers it connects to, like the underlying operating system, the screen resolution, the number of installed addons, the system fonts, etc. All these data can be used to create a fingerprint of your browser that is suitable for tracking.

You can learn more in this article on the EFF website.

Conclusions

Blocking regular cookies is just a resolution that doesn't fix the problem of being tracked in our habits. This is a more general problem that a law enforced by a single nation cannot solve.

Code Will Tear Us Apart

There's nothing worst than read the code of someone you consider a good programmer and find tons of anti-patterns. Of course often there are good reasons behind some choices. Such as deadlines.

Jokes aside, I am conscious that my old code sucks too. This is because I continuously try to improve my knowledge and learn from my colleagues and from my mistakes. And also from my colleagues' mistakes.

Are We Writers?

Sometimes I read about parallels between novel writers and programmers (yes, I'm guilty too). It may work, but only on a superficial level. Because the code we write is not judged for the style. This is also why there are more "good" programmers than good writers.

From the customers' point of view, the only thing that matters is that the software does what he wants. But we know that this is quite impossible: bugs happen. In addition, new features are required.

From a developer's point of view, it's important that the code is understandable and easily extensible. The problem is that sometimes it's more convenient to rewrite a part of code instead of understanding and fixing it.

Just as an example, once in a file, I found a function called manage_parameters(). It was quite long, so I didn't analyzed in deep its code but it seemed correct. The next function in the file was manage_parameters_without_making_mess(). The developer that wrote the latter told me that he didn't have time to understand why the first function sometimes failed.

The Truth (?)

The truth is that we forgive (and often forget) our own mistakes, hiding behind poor excuses. But, at the same time, we are ready to point our fingers against other developers, especially if they are considered good programmers.

Bottom Line

If you think I've read your code and this post is about you, maybe you should spend some time to review what you have developed in the past.

Image by Miguel Angel licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic License.

4 Easy Tips To Work Better With Git

Maybe they are quite trivial and only come from common sense, but sometimes it's useful to repeat. By the way, these are general purpose suggestions that can be used also with other versioning systems.

1. Use Tags To Label Versions

Every single release of your software that leaves your PC must have a commit tagged with the version number, this way:
git tag ver_1.02
Remember to push the tags to the remote repository with:
git push --tags
This will help you to understand which release you have to use to reproduce issues that testers or users find. It's also useful when you need to create a changelog from a previous release. But this is related also with the next point.

2. Write Meaningful Commit Messages

Explaining why you made a commit and which parts are involved can help other developers (and especially you, in the future) to quickly understand the change made without having to deal with differences in the code.

If you are using a bug tracking system, always mention the id of the issue that usually include detailed information on how to reproduce a particular bug. And this may help when that code will be changed in the future.

Besides, good written commit messages can help you to create a complete changelog, even with automatic tools.

For a quick reference about the style to adopt for commit messages, you can look here, while a more complete explanation can be found here.

3. Signal Particular Commits

When you work in a team, it's possible that someone does not respect the conventions about tabs/spaces, position of curly braces, etc. If you are a coding style nazi, you'll fix all of these as soon as you notice them.

The important thing is to not mix these kind of changes with modifications on the behavior of the code. Another good practice is to make clear in the commit message that you are just doing cosmetic changes.

A good way is to start the message with the word "[WHITESPACES]" or "[COSMETIC]". On some projects  I also use prefixes for changes on test cases and documentation.

4. Each Commit Should Leave The Repository In A Good Status

This means that your project must always build without errors, no matter which issue you fix. The repository is not your daily backup.

If for some reason this is not possible, for example because your modification needs someone else's work to be completed, it must be signaled in the commit message (see also previous point).

Conclusions

Following these suggestions is not a waste of time. Believe me, when you are in trouble you'll be grate to easily access all the information you need. The following infographic can help you to remember these tips (click for a larger version).



Studio Vs Live In Software Development

Led Zeppelin By tony morelli (originally posted to Flickr as led zeppelin) [CC BY-SA 2.0 (http://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons

I like rock music. And I love live performances. Well, you know, studio albums provide a great sound quality and the cure of small acoustic details. But songs registered during a tour are on another planet.

Because a song evolves during time. It is never played in the same way twice. In some cases, it is completely different, with another rhythm, other arrangements or with a new guitar solo. There may be some small imperfections but they are quickly forgiven by the energy provided by the musicians. And eventually there is the interaction with the public that sings and claps.

The first release of a software is just like a studio version of a song. It's perfect, right? It has a wonderful structure, meaningful variable names and good comments before each function. Then the testers find some bugs and slightly your code starts to change. Then the project manager asks you to modify the behavior of a particular dialog. Then the customers want to add new functions. Eventually the code becomes a mess and you do the refactoring.

If you try to execute git blame on your code at this point, you'll find that more than 50% of the functions have been changed since the initial release. But this is not a problem. It's tested, reviewed and appreciated by users. Now your software is absolutely better. Just like a good song played live.

Which Programming Language You Should Not Learn

Programming Languages
Lately, I've seen many times young developers asking which programming languages are worth learning. Obviously I've seen a lot of superficial answers, mainly because people tend to suggest things they like. Programming has many levels of complexity, so, before providing an answer, it's better to understand why you want to learn a new language.

What Is Your Goal?

If you want to find the best paid job, you should focus on languages that are old but with a large base of existing legacy code (such as ADA or COBOL or Fortran) or vice versa, on brand new ones (Go, Rust). Both choices have a couple of drawbacks. There are few workplaces in the world (this is why they are well paid) and you cannot know if your job will still be required in five years.

But maybe you just want to get a job quickly an not too far from home. In this case JavaScript is your choice. You can use it client-side, server-side, and even in mobile app development. Pretty cool, isn't it? Unfortunately for you, the world is full of good JS developers, so you'd better become a damn good one if you want to stay over the average and be noticed by some company.

Do you want to explore the Object Oriented paradigm? You have multiple choices: C++ (you'll need just some years to master it), Java (if security issues won't kill its virtual machine before) or Python (when you'll understand its concept of reference).

If you want to understand better how things works at a lower level, C is still a choice, especially in embedded systems but you have to be ready to deal with raw data management, dirty pointers, and memory leaks.

If you are an Apple fan, try with Objective C and Swift. If you love MicroSoft, C# and Visual Basic are for you. But be conscious that you are tighten to a single company that tomorrow can decide to completely change the language and make you throw away your old code (who said VB 6?).

Conclusions

I can go on with many others (Lua, Scala, PHP, Ruby, Perl...) but I think you have understood: the perfect language does not exist. In any case, having a good experience of a couple of widely used programming languages and knowing the basis of some of the newcomers is probably the best choice you can make.

Eclipse: Good Editor, Bad Build System

For a large C/C++ project I'm involved in, Eclipse has been chosen as default IDE.

I have to say that I really like the editor. It has some functions that make me save time, such as the automatic highlight of the variables, the search for references inside a single project or the whole workspace, the possibility to work with gdb remotely (you know, I work on embedded systems), etc. I also think that some other things can be improved, for example the search, but all in all it's a good tool, in my opinion.

There's only one thing I have to complain: the build system. Because:
  • it is really slow,
  • the management of dependencies doesn't work as expected, and
  • errors and warnings are presented in a misleading way.
For this last point, I suppose this is because when running the compiler and the linker, stdout and stderr are parsed separately and then printed in the same window. The result is that error messages may appear after an unrelated build command.

The dependencies are another thing that in my opinion needs to be improved. For example, it would be great if the build stops as soon as one of the related builds fails. I'm really surprised this doesn't happen.

Speaking about the build speed and why it's so important, I totally agree with this post. Please consider spending five minutes of your time and read it.

The solution to these issues is to use custom makefiles. You may spend some time to set them up but, believe me, you'll save precious time when compiling and you won't be losing the concentration.

Narcissus Vs Getting The Things Done

Narcissus-Caravaggio (1594-96)
"Narcissus by Caravaggio"
Do you know who Narcissus is? He is a character of ancient Greek mythology, so attracted from his own beauty to forget to eat only to look to his reflection in a river. And what about you, my dear developer? Are you like Narcissus? Do you spend hours in creating wonderful structures and classes, fancy functions and stunning algorithms? Don't you feel just like Narcissus?

Assuming you are a developer, for what purpose do you write code?

If you answered anything else from "solving problems", I'm afraid you are similar to Narcissus. This is what we are paid for: to solve someone else's problems. Possibly in the best way. But the amount of work should not be excessive.

I've been there before, I know what I'm saying. Once, when I was a (bad) C++ programmer, I've designed a wonderful class hierarchy to solve a trivial problem. The worst thing is that the function that uses those classes is probably used by 1% of the customers.

The general rule is: the effort must be proportional to the importance.

The most important thing is to complete a project. And right after it comes code readability. Beauty is on the bottom. Not because beauty is not important, but because it is not the purpose of your job. And you must be open to dirty your code when needed. But this is material for another post.

Please Optimize

Every now and then, I find quotes against optimization, just like this:
This is quite surprising, since in many cases the speed of a program is fundamental for its success. A UI (or a website) cannot be slow or become unresponsive in some situation. Managing a huge amount of data in few seconds instead of minutes can make the difference from a top seller to an unwanted app.

[A similar reasoning may applies to RAM or disk space too, but in this post I'll be more focused on the execution time.]

The only quote I totally agree with is
premature optimization is the root of all evil.
- Donald Knuth
The explanation is just few rows below.

(At this link you can download the paper)

[A good programmer] will be wise to look carefully at the critical code; but only after that code has been identified.

Identify The Critical Code

It's not always easy to understand where bottlenecks are. A developer with enough experience may imagine which part of the code needs to be optimized but:
  • he cannot be sure (scientifically speaking), and
  • he needs to have a measure of the improvements.
For this reason, you need to measure the length of (almost) every operation, paying attention to feed the application with a realistic set of data. Another good practice is to collect many samples for every dataset and calculate the average, in order to remove the noise produced by other processes running on the same environment.

After having analyzed the results, you can start to make changes on the parts that last longer, possibly one at time. And then, measure again. Was your modification faster? Good job! Go on with another part. Was it slower? Try another solution.

Now you may want to know which tool to use to take the measurements. There are many performance analyzers out there, but I prefer to collect timestamps from the right places.

There are three reasons behind this choice:
  1. I have to review the code and this is important because I'll have the structure in mind when I start to make changes;

  2. some profilers are not very accurate (for example, they return an estimation about which functions take the most execution time, but cannot tell you if this is because they have been called a million times);

  3. I have a great control over the measured code so, once identified a slow function, I can set more timestamps.

How Much Should I Optimize?

Even if it seems a silly question, there are many different levels of optimization. But the most important thing to consider is that the compiler usually has its own strategies to compile and optimize our code. For this reason, what seems a great improvement, once compiled may not lead to any difference. This is why it's a good idea to compile turning on optimizations.

In addition, please consider code readability, otherwise there is the risk that in the future another developer will get rid of your efforts just because it's too hard to understand. If this kind of optimization is really needed, use comments to explain it why you wrote such an obscure code.

Believe it or not, once this happened to me too: there was a really complicated block of code (with no comments explaining it) that I substituted with few lines of code, just to roll back the change once seen the execution time.

Horror Code - Loop And Re-loop

Some time ago, a colleague of mine told me to look at a function. It was something similar to this:
void foo(struct bar array[], unsigned int count)
{
        /* some initialization code */

        for (int i = 0; i < count; i++) {
                /* 30 rows of code
                   doing something with array[i]*/
        }

        for (int i = 0; i < count; i++) {
                /* other 20 rows of code
                   doing something with array[i]*/
        }

        /* some cleanup code */
}
At first, I thought that, in the first loop, some data needed by the second loop were calculated. But after a closer look, I found that this was not the case. Furthermore, I saw that the first five or six rows of both loops were the same.

The explanation is that the second loop has been added years later the first by a different developer that didn't want to waste time in understanding what the first loop did. You may not like it, but it works, unless you have performance issues. Personally, I think there are funnier ways to make two loops in a row.

Corkscrew (Cedar Point) 01

Don't Wait For Bad Things To Happen

Things only get as bad as you are willing to let them.
This thing has happened to me so many times that I start to think bad luck is real. The situation is the following: a product is on the market since several years and everything works fine. At some point, a customer reports a strange behavior. You start to look at the problem and found an horrible bug that is there since the beginning. In the time you think about a solution, implement and test it, at least two other customers report the same issue.

How is it possible? How is possible that for years everything worked fine and in one week three different people find the same bug? The only answer I have is...

The Murphy's Law

There are several versions for it but I believe that it can be summarized in this way:
If anything can go wrong, it'll go in the worst possible way.
This may seem pessimistic but knowing that every bug can be potentially catastrophic, can help us to be more focused and more critical about our code. What I've seen frequently is cutting corners to meet deadlines (yes, Your Worship, I'm guilty too) with the promise (to whom?) of doing the right thing in the future. But usually that future will come when it's too late and a customer has already found the problem.

The only way I know to prevent this kind of issues is to plan periodical revisions of the code that can lead to refactoring sessions. Another idea may be to have a checklist of things to verify before put your program in production. For C programs it may be something like this:
  • no strcpy() allowed - use strncpy()
  • no sprintf() allowed - use snprintf()
  • check for NULL pointers
  • check for memory leaks
  • ...
So now you are ready to revise all the code of your team to improve it, right? No!

If It Ain't Broke, Don't Fix It!

This is an old adage that is difficult to deny. So, what's the right balance? I've seen performance optimizations made by removing vital checks. I've seen commit messages claiming "removed useless code" made by developers that didn't understand why that code was there.

Well, to me, it all depends on your experience and your knowledge of the code you are gonna change. You are allowed... nay you must improve the code, but you must also know what you are doing. And this is the most important thing!

By the way, if you are in doubt, ask someone more experienced than you.

Check For Memory Leaks!

Last week I've lost at least three hours in understanding and fixing a small open source library that was leaking memory. The incredible thing was the amount of allocated memory (half of whom never freed). Basically, the library is a overcomplicated implementation of a binary tree in C that, for less that 1 KB of data, leaks 8 KB of RAM.

My first intention was to throw away that piece of junk code, but unfortunately I didn't have the time to rewrite it, so I started hunting. But understanding the mass of functions and when they are called was taking too long, so I decided to call my old friend Valgrind.

Valgrind is an excellent tool for detecting memory leaks. The simplest way to use it is the following:
valgrind --leak-check=yes program_to_test [parameters]
This is enough to provide you the total amount of allocated memory with a list of blocks that have not been freed (if present). And, for everyone of these, there is the full call hierarchy to let you quickly identify why it was allocated.

Of course, Valgrind can do much more than this but its usage to find memory leaks is the minimum thing that every developer must do before releasing a software. And the fact that the code is open source is not an excuse: you must ensure the quality of your program, no matter how many people will read the source code.

3 Ways To Open A Lock - Part 3

This is the third and last part of a series (first post, second post).

Simply Asking For The Key

Although in the real life it's unlikely that a thief asks you for the key of your home, in the digital world this is the most common and successful type of attack. The technical name is phishing and, according to Wikipedia,
[It] is the attempt to acquire sensitive information such as usernames, passwords, and credit card details (and sometimes, indirectly, money) by masquerading as a trustworthy entity in an electronic communication.
I'm pretty sure that you have received at least one email from a bank that asked you to check the movements of your credit card. In the email there was a link to a fake website very similar to the bank's one. By logging in you simply give your password to a scammer.

The thing that makes possible this kind of attacks is called social engineering and it does its best in the internet era, even if it has been used probably since the dawn of time. It consist in a series of techniques that aims to make the victim perform actions that normally he wouldn't have done.

The fake email from the bank (or from Facebook, etc.) is similar to trawling. But there is also a technique called spear phishing which targets a specific person. To do this, the attacker starts to collect as many information as he can about the victim, included his mother maiden name and the name of his first pet ;-)

Unfortunately, sometimes attacks are a little more direct.


Image from xkcd licensed under a Creative Commons Attribution-NonCommercial 2.5 License.

When Should I Create A Function (Or A Class)?

This is a damn good question. As I've suggested in this post, functions should be short and do just one thing. The Linux kernel coding style sets even a maximum length of fifty rows for a function.

But following to the letter these rules on big projects, leads to thousands of micro snippets that make the code almost impossible to understand. This situation is more evident in object oriented languages where you may be tempted to create classes even if a simple structure would be enough. One of the techniques that make this thing evident in C++ is to implement every trivial class in its own .hpp file.


The general rule is that, if a function or a class is used only in a single piece of code, maybe it doesn't need to be separated. It seems reasonable but it may lead to awful functions difficult to understand. Another exception may done if you have the feeling that, in the future, that function/class will be used somewhere else.

The truth is that it's all about your sensitivity as a programmer. In this case, the experience should guide you to the right choice, because
The rules are... there ain't no rules!
And, in any case, a good refactoring from time to time can improve your code.

3 Ways To Open A Lock - Part 2

In the previous post, I've talked about guessing your password. Now I'm gonna cover another case.

Stealing Your Key

Now you have a super secure password and you feel calm and safe, right? But thieves are in ambush. Whenever you use your password in an app or online site, they can steal it: this is a man-in-the-middle attack.

Man In The Middle Attack

It can be possible if your credentials are transmitted in plain text through a unencrypted connection (for example the WiFi of hotels or airports) but there are some situations where it succeeds even over a secure connection (SSL/TLS).

Of course, your password can also be stolen if the thief can access a website database. You may argue that if someone has the access to a server content, it should not care about passwords: all your data are in his hands. This is not completely true: hackers know that many people use the same password across different services, so, for example, knowing your email password can lead them to access also your bank account. This is why it's a good practice to use a different password for every service.

Nowadays, I don't think any online service is crazy enough to save password in plain text. Simple hashing has been quite common in the past years, until someone discovered that it can be overtaken by hash dictionaries and rainbow tables. For this reason, a bit of salt has been added to plain hashing. Salting, in this situation, means adding few random characters (called "salt") to passwords before hashing. To increase security, salts are stored in a separate database.

In any case, a weak password can easily be discovered, no matter the server security, so the suggestion is always the same: use strong passwords!

The Worst Problems Of IoT

Internet of Things
Security and privacy. Well, this post could have ended here, but since I'm a bit talkative (just a bit), I'll try to argument.

First of all, Internet of Thing

[...] is the network of physical objects or "things" embedded with electronics, software, sensors and connectivity to enable it to achieve greater value and service by exchanging data with the manufacturer, operator and/or other connected devices.
This short description should be enough to make you understand why privacy is a serious issue. If you don't feel the risk, just think that smart TVs send everything you say to a remote data center - and some of them don't even encrypt it.

The security risk needs some more explanations. Since embedded devices are usually not so powerful (mainly for a matter of cost), it's not possible to apply all the best practices needed to ensure a top level of security.

In addition, as new exploits and security flaws come out, those devices should be updated, but the producer may decide to not provide patches for devices considered too old. This is what happens today with many smartphone producers that don't distribute new Android versions for relatively old devices.

Don't get me wrong, I'm sure that IoT can improve our houses and be wealthy for the environment. But, this should not make us forget the other aspects.

Suggested read: An Internet of Things that do what they’re told

Coding Is Funny, Debugging Not So Much

Measurement of code quality
I have to admit it. I have some problems with dynamically typed languages. Even if I love Python, I'm always worried about using one of those languages in a production environment. It's not because I absolutely want to define the type of variables but because I don't trust other developers. And with "other developers" I mean also myself in the past.

My colleague +Daniele Veneroni provided me a great example with his funny post a couple of weeks ago. I'm sure that he is smart enough to never write such a function, but out there there are lots of bad programmers.

Let's take a close look to the function.
local function isStaticPlugin(pluginName)
  if staticPlugins[pluginName] then
    return true
  elseif dynamicPlugins[pluginName] then
    return false
  else
    return 'WTF'
  end
end
This violates at least three golden rules:
  1. every function must do a single thing (and do it well),
  2. the name do not match what the function does (in fact it does also check if the plugin is dynamically loaded or it's not present), and
  3. the name suggests that the possible returned values are just true and false.
You may object that it's just a matter of style. No, sir, it isn't. Now I'll show you why.

Bugs Hard To Find

Just to be clear, these issues are not specifically related to Lua or to dynamically typed languages. You can do the same mistakes in C by returning an int. Someone using this function may check only for the true value and then put an else to handle the condition of dynamic plugins. This obviously will lead to unexpected situations or an error whether the plugin does not exist.

Dynamically typed languages aggravates the situation. If the value returned by the function is stored in a variable, every time that variable will be used in the code, there will be the risk of type mismatch errors.

In addition, in Lua ('WTF' == true) evaluates to false but 'WTF' evaluates to true in a if condition. So, depending on how the checks is made you may have false positives both on static and dynamic plugins.

Bottom Line

Many dynamically typed languages are interpreted, not compiled, so I hope you are using a really good linter.

Image by smitty42 licensed under CC BY-ND 2.0

Versions Madness

Last week, Linus Torvalds, the creator of Linux, published this post on Google+.

So, I made noises some time ago about how I don't want another 2.6.39 where the numbers are big enough that you can't really distinguish them.

We're slowly getting up there again, with 3.20 being imminent, and I'm once more close to running out of fingers and toes.

I was making noises about just moving to 4.0 some time ago. But let's see what people think.

So - continue with v3.20, because bigger numbers are sexy, or just move to v4.0 and reset the numbers to something smaller?
It seems that Linus considers the version number just like a name, unrelated to commercial consideration and even to product features. But often, the choice of the version number is considered a science.

I Like It Complicated

The major-dot-minor format is quite common and also the meaning of those numbers are quite standard: minor changes when there are small improvements while major increases on bigger changes. But after those numbers there may be a wide variety of things:
  • a build number, automatically increased at every successful compilation,
  • a distribution number or letter, changed every time a build is delivered to testers or customers,
  • a letter indicating the build type (alpha, beta, final, etc.),
  • abbreviations for special releases (pre, RC, QA, ...)
The funny thing is that some of the above cases may be combined together so, for example, you can find 1.7d RC or 2.1.B.174. By the way, for some years I've used a four-number system to identify delivered versions of my software: after major and minor there was a counter to keep track of small functions changes or refactorings while the last number was related to bug fixed.

The Tech Side

Your software may expose API or use functions provided by other programs. In this case, the version number has a fundamental purpose. Is through this string that your application is related to the others.

Understanding how other developers deal with version numbers can help you to know with which releases of third party softwares your program is compatible. And save you from some serious headaches when a customer claims that nothing is working.

The Commercial Point Of View

Aside these technical considerations, there is also the commercial side of version numbers. To a user or a customer, a change in the major number means that big changes and big improvements have been made. This generates greed for the new version in some people that can be used by salesmen to raise prices. The real important thing in this situation is to meet the expectations of the customer.

Even if the software is free, there is a rise of expectations whenever the change involves the leftmost numbers. And also in this case you cannot disappoint the users. However, for the Linux kernel, the situation is quite different: it's not directly used by final users but only by other developers and system administrators. In this case, Linus' idea is not so bad, in my opinion.

Conclusions

The real important thing is to use a system for indicating the version of your software. It has to be meaningful to you and to your organization and it must be clear enough for the final users. If you want my opinion, Semantic Versioning is pretty good.

3 Ways To Open A Lock - Part 1

If someone wants to open your lock, he has three ways:
  • use lockpicking tools;
  • steal your key;
  • ask you for the key.
But, you may say, what does this has to do with computer science? I'm glad you asked. A password in the virtual world is exactly like a key in real life. And this similitude continues also when someone wants to enter your bank account (or inbox, or Google account or...).

Lockpicking

Lockpicking Tools
The virtual equivalent of lockpicking tools is the classic way of guessing the password. There are two main ways to do this: brute-force and dictionary based.

The latter is quite simple. Hackers and crackers have big lists of common passwords and other words ready to use. If this attack is not successful, there is the brute-force.

The dumb version simply composes passwords by evaluating all combinations of uppercase and lowercase letters, numbers, and special characters (hyphen, underscore, percent, etc.). This method it's proven to succeed, soon or later. The issue for an attacker here is the time needed to check all the combinations that can be in the order of (billion) years.

But there is a fastest way. When people creates a password, digits are usually grouped together and at the end of a word. For example it's uncommon to have a password like "8such3fun" while "suchfun83" is more common.

Another pattern often used is to capitalize the first letter of common words (e.g. "SuchFun83") or use numbers to substitute some letters (e.g. "5uchFun83"). Using a combined attack brute-force and dictionary-based is quite easy to guess this kind of password.

If now you are wondering how strong is your password, there are a couple of sites that can answer your question:
  • How Secure Is My Password tells you how long it will take to an attacker to crack your password, considering a combined brute force & dictionary attack
  • The Password Meter is a little less accurate, since it only considers length and characters variability, but it suggests some rules to create really strong passwords
While playing with the above sites (or any other password checker), do not enter your real passwords, because they can be stolen by the website itself or during the transmission (if sent in plain text).

How C Compilers Work Part 4: Linker

Now we are at the point where we have produced one or more object files and we want to create an executable. Under GNU/Linux systems, this job is done by ld, the GNU linker.

As seen in the previous part, the compiler always works on one file at once, so, every time there is the need to access a symbol (function or variable) defined somewhere else, a reference is used. The first work of the linker is to check the correctness of all these references.

ELF layout
ELF layout
Once this operation has ended successfully, it's time to produce the executable. To do this, all the object files are split into their basic components which are reassembled according to the ELF format. For example, all the fixed strings go in the string table, the names of the used symbols in the symbol table, etc.

This also happens to static libraries (that are just a set of object files packed in an archive) and shared objects if the flag -static has been specified.

In case of dynamic linking (that happens 99% of the times), the linker appends:
  • a section called dynamic symbol table (.dynsym) containing the names of the external symbols, and
  • a section called simply .dynamic that, among other things, contains also the name of the shared objects needed at runtime.
When the process will be executed, the dynamic linker will append to the process image the images of the shared objects listed in the .dynamic section (but this is a different story).

Troubleshooting

"Undefined reference" is the most common error that you can get. It means that a function or a variable defined as extern has not been found. The most common case is a typo or a missing shared object.

When working with multiple toolchains or different versions of a shared object, it may happen that the linker signals errors for undefined references even if everything seems correct. This happens because the reference to the symbol (usually added by a .h file) does not match what's inside the shared object being linked. Or, to say it in other words, the shared object the linker is referring to, is not consistent with the header file that has been included. A solution is to check the path of the header and the library (flags -I and -L in gcc).

Another sneaky error may appear when a program is executed in a system different from the one where it has been created. The message usually shown is the pretty misleading "No such file or directory". But the message is absolutely correct, a file is missing (or it's in an unexpected location). The missing file is a dynamically linked shared object. To check which one is, you have to use readelf.
$ readelf -d <process_name>
The first rows show the names of the needed shared objects. Now you only have to make sure they are present in your system. If you are able to find them but you get the same error, try to specify additional paths for them.

Other posts in this series


Happy 1st Birthday

Birthday cake, Downpatrick, April 2010 (02)
Today this blog turns one. One year ago, when it started, I was not sure whether it would have survived more than few months. Conversely, with 68 published posts and more than 10.000 pageviews (many of those in the last month), I'm pretty satisfied.

To these newly arrived visitors, I want to provide a list of the seven posts I've written, that I consider the most important. This list doesn't match the one with the most viewed articles, but I'll get over it ;-)
So now, what's next? More posts, of course. Maybe some guest-post (I've asked a couple of colleagues of mine but apparently they are lazier than me).

Thank you all for reading these pages, and, if you have some ideas on how to make them better, please share your thoughts.

Sincerely,
Luca

VeryBello Is The New Italia.it

In the beginning (2007) there was Italia.it, a promotional website commissioned by the Italian government. Its main goal should have been attracting foreign tourists. The current version is not so bad, compared to the first one, but there are still some things that don't work.

I'm not referring to technical details (except the carousel that changes image too fast on my browser - and carousels are known to be an usability issue even at an average speed). The biggest marketing issue I see here is the fact that the site has been localized only in the major European languages. For example, Chinese, Russian and Arabic (three languages spoken by many tourists that visit Italy nowadays) are missing.

Back to the present. During last weekend, the Italian government presented a brand new site, named VeryBello!, with, more or less, the same purpose of the other. The bad news is that it's even worse.

Slow, not accessible, localized in Italian only, not to mention the name. Besides, in the first version of the top picture, a part of Italy was missing. I hope that no one in the world will judge Italian web designers and programmers basing on the quality of those institutional websites.

But, over all, I hope that nobody judges the whole Italy by the job of few people and the inability of its government to tell if that job is well done.

Double Facepalm

When You Must Write Unreadable Code

Punch card from a typical Fortran program.
Code was quite unreadable in the old days
Well, if you know me or read this blog since some time, you should know that I consider code readability even more important than correctness. This is because a bug-free code does not exist, thus, soon or later someone will have to fix it in the smallest time possible.

Nevertheless there are at least two situations where is needed that your code is difficult to understand.

You Want To Be Indispensable

It may be because you are a contractor and you want that for the company is easier to ask changes to you instead of try to manage them internally. Or, if you are an employee, you may be afraid that someone else can take your position, and the company can decide to fire you.

No matter the reason, writing unreadable and poorly commented programs is a good way to make the understanding of your code really hard and time consuming for everyone, except you. With these premises, the company best choice is to don't give your code to anyone else.

You Hate Your Colleagues

This is a sort of revenge. Do your mates have a better salary than you, bombastic titles in their business cards, and the boss always praises them? The only way you can punish them is by forcing them to understand your terrible code.

This can be done in a large scale by including a refactoring session each time you add a new feature or fix a bug in an understandable file. Of course, your main goal is to mess things.

They have to feel the pain each time they are requested to change something you touched.

Drawbacks

There aren't many, just a couple. First: your colleagues may hate you and consider you a bad programmer. If you are writing unreadable code just to annoy them, this should not bother you too much.

Second: the code will be difficult to understand for you too. So, after a couple of months it will be painful for you too to manage you own code. If you are an hourly contractor, this can be a good thing since every change will take longer.

Conclusions

I personally don't see any other reason to write poorly readable code. And also the above two are quite questionable. Always remember that with a good VCS is easy to detect who introduced the mess in the code and, at some point, someone may decide that is better to restart a project from scratch instead of having only one developer able to manage it.

RTFMC

RTFM by xkcd
If you don't know, the acronym RTFM means "Read The Friendly Manual". And this is exactly what I've done several months ago, when I've used a third party library. I've made a simple test program and everything seemed to work just fine.

This week, I've used that library for work and I've started to see some strange behaviors in my app. The output of the old test program was still correct, but the same code in a bigger application caused wrong values being shown and some crashes.

It took me hours to understand where the problem was. And, can you guess?, I found it only after reading the friendly manual carefully. It was clearly written that some returned data were references to members of a structure.

What I was trying to do was to access them after the structure was freed. The simple test program seemed to work fine just because it was too short, so the memory just freed was not overwritten by anything else.

Lesson learned: it's not enough to read the manual; you have to read it carefully. Or, if you prefer an acronym: RTFMC.


Image from xkcd licensed under a Creative Commons Attribution-NonCommercial 2.5 License.

How To Recover Deleted Git Commits

In many Git tutorials it's written "never use git reset --hard". And there is a good reason: this command deletes commits, and, if you don't have pushed to a remote repository, your changes are lost (if you don't know where to find them).

A Little Story

This happened to me some year ago, when I was a Git newbie. There was a bug on a software so I created a new local branch starting from the master and started to fix it. In the meantime, a colleague of mine asked me for a quick workaround to continue his work. So I switched back to the master and added a couple of temporary commits.

After a week, the state of the repository was this:


The bug was fixed, the temporary commits could be removed and the branch merged to the master. Easy to say, easy to do, easy to mess.

My idea was to move to the master, delete the temporary commits and then merge the fix branch. Unfortunately when I run...
git reset --hard HEAD^^
...I was on the wrong branch. The good commits were gone. Panic!

Where Have They Gone?

What I've learned from this experience is that deleted commits are still there, at least until you run git gc or git prune. The problem is to find a way to bring them back. What I did at the time was to use grep to search for the commit message under the directory .git of the repository.

In this way I've discovered that in the directory .git/logs/refs/<branch-name>, in the logs are also recorded the hashes for every commit. With hashes it has been easy to checkout the second commit (going in a 'detached HEAD' state) and verify that nothing was missing.

At that point, I've created a new branch (with git checkout -b new_fix) and carefully executed the original plan, this time without surprises.
I love it when a plan comes together!
- John "Hannibal" Smith

The Long Way To Dual Boot

During the Christmas holidays I've bought a new HP laptop with Windows 8.1 preinstalled. I'm not a fan of Microsoft operating systems but my wife is not addicted to Linux like me, so I have to manage the dual boot.

First Boot With Windows 8.1

After the initial configuration, where I've disabled every automatic update procedure, I've looked around to see changes compared to Windows 8. Well, there are three main difference I've noticed:
The Start button of Windows 8.1
  1. at startup you get transported directly to the Desktop,
  2. there is a Window button in the bottom bar (see image) that opens the infamous tile screen, and
  3. while hovering the mouse of the top of applications with the new interface, a title bar with the X button appears.
Well, I've seen too much, let's start to do something serious... but, wait, who downloaded 20 MB from Internet?

Partitioning

Hard disk is 1 TB but there are some extra partitions (rescue, boot loader and an unknown one) that reduce the available space to 900 MB. My idea was to have two 250 MB partitions (Windows and Linux) and the remaining space (about 400 MB) for the data.

So I tried to shrink the 900 MB partition with the Disk Management tool provided by Microsoft but the minimum allowed size was 800 MB. Remember, I've said to myself, defrag is your friend.

The last time I've run a defragmentation tool it was several years ago on Windows XP and I remember there was a graphical ribbon that showed used and free sectors. In that diagram, fixed (i.e. unmovable) files were shown with a different color. The last version of the defrag tool doesn't have that visual hint anymore. This lack could seem irrelevant, but it would have made me save a lot of time. Let's see why.

When the defrag ended its job, I was able to resize the partition, but the smallest size was about 455 MB. Still too much. So I tried to remove the so called crapware and run the defrag again. [Honestly, compared to my previous PCs, this time the unwanted/useless preinstalled software were very few.]

No results. The minimum size for that partition was always 455 MB. I tried to remove some other programs, reboot and run defrag again but nothing. So I did a quick search on internet and I found that other guys around the world have the same issue. They solved by using this free tool. And so did I. After a couple of hours I was eventually able to resize Windows partition.

Dealing With UEFI

In the meantime, I've prepared a bootable USB pen drive with Xubuntu 14.04 64 bits. Once plugged into the laptop, I expected that the system would automatically boot from there. No way. Windows boot started immediately and there where no indication about the key to press to enter the UEFI settings. I've had to reboot at least three times and to press as many F-keys as possible before being able to enter. So, the first thing was to set the waiting time to five seconds.

Then I've reshuffled the boot sequence, saved the changes and installed Xubuntu. After few minutes it ended, I rebooted and... Windows started. Reboot again, press ESC and a boot menu appeared. Now I was able to boot Linux but I was not satisfied: I wanted Xubuntu as the default choice.

So, since Google is my best friend, I've found in which UEFI submenu I had to select the Ubuntu bootloader instead of Windows one.

Finished? Not Yet

Well, let's see if I can access Windows filesystem from Linux.


The key parts are the following:
Windows is hibernated, refused to mount.
and
Please resume and shutdown Windows fully (no hibernation or fast restarting)
The best way to resolve this issue is to disable the fast start. This function does not let Windows to shut down when you ask for it, but just hibernates it, to make next boot faster. This leaves the disk in a state that it's not safe to modify; for this reason, Linux refuses to mount it in read/write.

Conclusions

Before starting I thought it would have been more painful. It has taken more time than what should be, but fortunately I've been able to make the dual boot work properly without bothering too much.


Cover image created with Picfont.