The 80-20 Rule: Pareto And The Devil

Probably only the number of webpages with images of cats is greater than those talking about Pareto principle. Nevertheless I want to add my own just because I think this is not so clear to all my colleagues.

A rough definition of this law may be:
To accomplish the final 20% of a job, you'll need the 80% of the total time.
If you are close to the deadline of a big project, this sentence sounds quite depressing, isn't it?


But if you have some experience, you can say that this principle is absolutely correct. How much time have you spent in moving UI objects one pixel at time until the Project Manager is satisfied? And what about colors? And that damn string that doesn't fit the printed page? And the final comma to be removed from JSON arrays? And that strange bug that only happens on few PCs?

All these things are important - but not fundamental. They don't represent the core of the application, just some details. In fact, another way to express nearly the same concept is:
The Devil hides in the details.
In my opinion, it is all in the difference between a proof of concept and a real application. A software, in order to be given to a customer, must be:
  • efficacious - it must do its job
  • efficient - the job must be done in the best way possible (according to time constraints and remembering that perfection is impossible to achieve)
  • reliable - it must handle failures in a proper way and always preserve user's data
  • usable - the user must find it natural to use (*)
People that doesn't have experience in coding "real" applications usually underestimate the benefits of usability, reliability and sometimes efficiency. Without these characteristics, your software will be nothing more than a proof of concept. And you are not creating proof of concept, aren't you?


(*) = I know this is not the most complete definition of usability, but I think that it gives an idea about what should be the final intent of any user interface.

Bitwise Right Shift With Signed Int

It's never too late to learn something new. And sometimes the compiler is smarter than what you think. In this case, I've discovered this behavior a couple of weeks ago while I was working on a C function that manipulates the bits of an unsigned integer variable using the right-shift operator (>>).

The function worked pretty well, until someone wanted to use it with signed integers, too. The issue here is that the sign bit (the leftmost one) is replaced with zero when right-shifting. The solution has been absurdly easy: cast to signed int.

This is an example with 8 bits:
Bitwise right shift with unsigned and signed int
The right shift produces different results depending on the signedness of the variable

Too Good To Be True?

This is wonderful but there is a little thing to consider when you plan to use it. According to the C standard:
1196 If E1 has a signed type and a negative value, the resulting value is implementation-defined.
This means that every compiler can behave in its own way and, worst of all, for different architectures the same compiler can act differently. So, what's the solution?

There isn't just one, of course. The following function should work pretty fine in all cases, but I'm sure you can find at least another algorithm to do the same thing.
int right_shift_with_sign(int var, unsigned int amount)
{
        int mask = (var < 0) ? -1 >> amount : -1;
        int result = var >> amount;
        return (result | ~mask);
}
[Code not fully tested - use it at your own risk]

If you need way to detect how your compiler behave, you can use this short snippet:
if ((-1 >> 2) == -1) {
        /* right shift preserves the sign */
} else {
        /* right shift DOES NOT preserve the sign */
}

How C Compilers Work Part 3: Compiler

The properly so called compiler is a really complicated piece of software. It performs many different tasks on single files. This means that no matter if your project is made of thousand source files: they will be compiled one by one.

Now let's see the common operations. First of all, it does a lexical and syntax analysis over the code. Many of the errors about missing semicolons and misspelled variables will be caught here. For functions there is a small caveat. According to C standard, if a function has not been declared, the compiler assumes that:
  • it exists in some other file of your project, linked library or common library (such as libc), and
  • it returns an integer value.
This will lead to warnings like this:
warning: implicit declaration of function ‘fprintf’ [-Wimplicit-function-declaration]
        fprintf(stderr, "Error! Expected %d param\n", ARGS);
or this:
warning: initialization makes pointer from integer without a cast
        char *token = strtok(my_string, "/");
To remove them, you just have to add the declaration of the functions you are using, by including the right header file.

Other things that the compiler checks:
  • types of variables in assignments;
  • type and number of parameters of called functions must match the corresponding prototypes (usually declared in header files included by the preprocessor);
  • const variables and parameters have to remain untouched.
Another important task that is performed is the optimization. This task can improve your code mainly in two ways: increasing the speed or reducing the size of the final output. There are many things that can be done to achieve one of these goals. GCC, for example, has lots of options to let you do a fine tuning but, in the end, the more used are -O2 (to optimize for speed) and -Os (to optimize for size).

Eventually, the compiler produces a binary file (often called object file) full of references to unknown things. To make those things known is the job of the linker.

Other posts in this series

YouTube And The 2 Billion Views Video

Last week, Google stated that the video "Gangnam Style" by Psy has overcome 2 billions of visits or 2,147,483,647 (that is 231-1) to be precise. This is the highest number that can be represented with a 32 bits signed integer variable. Once that value is reached, if you add one, you get a negative number. As you can understand, this can be a problem for a counter. But now let's do a step behind to see the whole picture.

First Thing First: Who Is Psy?

Why a South Korean artist that sings in his own native language should be so popular worldwide? Well, the song is really catchy and the video is funny, you may say. These two reasons can explain the success after the video became quite popular. But the starting point had nothing to do with music.

It all begun when someone at 4Chan noticed that Psy was very similar to Kim Jong Un, the North Korean dictator. In addition, Psy was the perfect nemesis of the Justin Bieber, a Canadian singer quite hated by 4Chan guys. (4Chan is an anonymous community where every kind of ideas [mostly not safe for work] are spread daily. Just as an example, many of the internet memes that you can see have an ancestor in 4Chan. And Psy is just one of those memes.) You can find the whole story here (in Italian).

The Tech Side

According to TechTimes, Google engineers noticed that the counter was close to the limit some weeks ago, so they could fix it by simply double the size of the variable. In this way, it will take some centuries to reach the new limit of 9,223,372,036,854,775,807 (please note that this is the maximum value that can be represented with a 64 bits signed integer).

Now you may want to ask why using a signed integer? Unsigned counters can double the number of numbers represented with a signed variable of the same size. This would have been quicker to implement. Since the size remains the same, database structure should not be changed. Only the front-end will be affected.

My assumption is that negative values are used internally for some other purpose, so it would have been probably most impacting to change this behavior instead of double the size of a field in the DB.

It's Marketing, After All

The thing, from a technical point of view, could have passed without much clamor. But the number is something that can be considered unreachable for any other video sharing platform. In addition, this is also a goal the classic television channels cannot easily reach.

So this claim around a number is also a demonstration of power that Google spreads worldwide. And the competitors aren't just other online platforms but TV companies, too.

Conclusions

Behind a simple counter there can be several stories. And when we speak about a great company, such as Google, the computer science and marketing always mix together.

Every Language Sucks

Languages Yin Yang
It would be a mistake to consider this page only as an amusement of some funny developers. You may or may not agree with the criticisms about your favorite programming language (and sometimes I don't) but the truth is that the perfect language does not exists.

There are several reasons for this lack. The most important is that each language (or better, each couple language/compiler or language/interpreter) has been made for a specific purpose.

For example, JavaScript developers may considered harmful strong typing of C++, but it's needed to ensure reliability.

Garbage collector can ease developers' work, but it's much less efficient than freeing memory by hand when it's no longer needed.

The epoch when a language was created also influences its characteristics. The increasing power and amount of memory of the average computers have determined the raise of interpreted languages and made the pointers disappear.

Eventually, each new language has tried to keep the best part (or what its creator thought it was the best part) of existing languages and reworked the rest.

The conclusion is that we are surrounded by languages that suck but we are lucky because we can choose the one we think sucks less.

Image created with Tagxedo web app.

Horror Code: Paid By The Number Of Rows

The first thing I've thought is: "I'm missing something". But after few seconds the doubt that the author of the following code is paid for the number of lines of code come to my mind. Judge yourself:
struct data_node *node = calloc(1, sizeof(*node));
if (node) {
        node->data = NULL;
        node->info = NULL;
        node->next = NULL;
}
comm->data_list = node;
Since node isn't used anywhere else, those six lines are exactly equivalent to the following:
comm->data_list = calloc(1, sizeof(*(comm->data_list)));
The function calloc returns a pointer to a memory area that is guarantee to be blank, so there is non need to assign NULL to the members of the structure.

Checking the value returned by calloc is a good thing. It may be that there is no more memory available so that error should be handled. But in this case the author didn't take any action in case of failure.

Use a temporary variable may be useful in some situations, but this is not one of those.

In conclusion, the code is formally correct: it doesn't have bugs or cause harms and maybe the compiler can optimize it. But the point is that redundant and useless code can lead to difficulties in reading and understanding besides subtle hard-to-find bugs.

Image by goopy mart licensed under CreativeCommons by-nc-sa 2.0

How C Compilers Work Part 2 - Preprocessor

As said in the previous post, in modern compilers, preprocessing is not a separate phase but it's made together with compilation. Nevertheless, understanding the role of preprocessor is really helpful. The first thing to say is that it basically understands only rows that start with character hash (#).

In a standard program, those rows specifies header files to include and constants/macro to substitute in the rest of the file. Another frequent used feature is the conditional compilation (#if, #ifdef, etc.) to enable some part of code only if a condition is met at compile time. In this case the flag -D of GCC can be really useful.

A strange thing is the #pragma directive, used to issue commands directly to the compiler, in some cases to enable some vendor specific option. Other directives commonly used are #warning and #error; they force the compiler to present a warning or an error in special situations (usually depending on which other files are included or not included in the project).

An Example

Now let's see what a preprocessor does. Look at this simple program:
#include <stdio.h>
#include <string.h>

#define ARGS    1
#define TEST    "test_arg"

/* Main function */
int main (int argc, char **argv)
{
        if (argc != ARGS + 1) {
                fprintf(stderr, "Error! Expected %d param\n", ARGS);
                return 1;
        }

        if (strcmp(argv[1], TEST) != 0) {
                fprintf(stderr, "Error! Expected %s\n", TEST);
                return 2;
        }

        fputs("OK!\n", stdout);
        return 0;
}
Now if you compile it with:
gcc -Wall -E -o main_pp.c main.c
you'll get another C file named main_pp.c as a result (the flag -E tells GCC to only execute the preprocessor). If you don't have a compiler available, you can look at it here. Pretty big, isn't it?

What you should notice is that #include and #define directives have been processed and the comment has been removed. This obviously helps the programmer but basically almost all the work done by the preprocessor can be bypassed. In other words, the preprocessor is not indispensable. If you compile the following piece of code, you'll notice no differences in program execution compared to the original one.
typedef struct foo FILE;
FILE *stdout;
FILE *stderr;

int fprintf(FILE*, char*, ...);
int fputs(char*, FILE*);
int strcmp(const char*, const char *);

int main (int argc, char **argv)
{
        if (argc != 1 + 1) {
                fprintf(stderr, "Error! Expected %d param\n", 1);
                return 1;
        }

        if (strcmp(argv[1], "test_arg") != 0) {
                fprintf(stderr, "Error! Expected %s\n", "test_arg");
                return 2;
        }

        fputs("OK!\n", stdout);
        return 0;
}
How is this possible? How can it be that struct foo is a FILE? And what about other functions? For the answer, you'll have to wait the next two chapters of this series.

Troubleshooting

Usually preprocessor errors are easily understandable. For example:
failed.c:1:21: fatal error: missing.h
means that the header file missing.h does not exist or is not in the path. Another comprehensible error is the following:
failed.c:3:0: error: unterminated #ifdef
which remind us that an #endif is missing.

References

  • If you want to play with the above examples, source files are here.
  • A full explanation of the GCC preprocessor can be found at this page.
  • The idea for the second example has been taken from this blog post.

Reliability First - Spacecrafts

Artistic image of Rosetta, Philae and
comet 67P/Churyumov–Gerasimenko
I bet you've heard that last week, for the first time, a human artifact has landed on a comet (named 67P/Churyumov–Gerasimenko). The lander Philae and it's companion the space probe Rosetta of the ESA (European Space Agency) have done a long  and great work. At this page you can see a resume of their ten years of journey in the Solar System.

But it was not a bed of roses. The mission had some troubles, starting from the delayed launch and ended with the not so perfect landing of Philae. There have been some technical issues but Rosetta has been reliable enough to accomplish its duty.

There is a lesson that a developer can learn from this story: create your software as it should survive ten years in space without maintenance. Check every possible failure case and make it work even if the situation is not perfect.

Of course, this is a reminder to me too, since too many times I think the system would never run out of memory or disk space.

Image by European Space Agency licensed under CreativeCommons by-sa 2.0

The Day That Killed Groupon Reputation

This Thursday I was looking at the social networks in a moment of rest, when the hashtag #defendGNOME gained my attention. For the ones that are not Linux addicted, GNOME is a famous desktop environment that had several forks (XFCE, Mate, etc.) and it has been the main choice for Ubuntu for many releases.

All the messages in Twitter and Google+ (and I suppose Facebook too) linked to a communicate in the GNOME Foundation website. Below there is the relevant part of the message, followed by the request to donate.
[...] Recently Groupon announced a product with the same product name as GNOME. Groupon’s product is a tablet based point of sale “operating system for merchants to run their entire operation." The GNOME community was shocked that Groupon would use our mark for a product so closely related to the GNOME desktop and technology. It was almost inconceivable to us that Groupon, with over $2.5 billion in annual revenue, a full legal team and a huge engineering staff would not have heard of the GNOME project, found our trademark registration using a casual search, or even found our website, but we nevertheless got in touch with them and asked them to pick another name. Not only did Groupon refuse, but it has now filed even more trademark applications [...]. To use the GNOME name for a proprietary software product that is antithetical to the fundamental ideas of the GNOME community, the free software community and the GNU project is outrageous. Please help us fight this huge company as they try to trade on our goodwill and hard earned reputation. [...]
In few words, last spring Groupon decided to create a tablet with a proprietary OS and name it Gnome, in spite of that name is already being used since several years from an open source project. So the Foundation started a fundraising with an associated social network campaign.

The result has been quick and massive. I just want to show you one tweet among the others, very representative of the dimension of Groupon fail:

Probably someone at Groupon had understood what was going to happen and quickly tried to calm the things down with this is the communicate (the emphasis is in the original).
Groupon is a strong and consistent supporter of the open source community, and our developers are active contributors to a number of open source projects. We’ve been communicating with the Foundation for months to try to come to a mutually satisfactory resolution, including alternative branding options, and we’re happy to continue those conversations. Our relationship with the open source community is more important to us than a product name. And if we can’t come up with a mutually acceptable solution, we’ll be glad to look for another name.

UPDATE: After additional conversations with the open source community and the Gnome Foundation, we have decided to abandon our pending trademark applications for “Gnome.” We will choose a new name for our product going forward.
So the happy ending has come. The name GNOME is safe and it will continue to indicate only an open source software. And Groupon will remember for many years that the community is more important than lawyers and money.

Image by ilnanny licensed under CreativeCommons by-nc-nd 3.0

How C Compilers Work Part 1 - Introduction

Build button
I'm writing this series of posts because it seems to me that many young programmers lack a clear idea about what's behind the "Build" button on their IDE. In my opinion, this happens because nowadays the interpreted languages (such as JavaScript, Python or Lua) are perceived to be more fancy and cool to learn.

The truth is that many companies still use compiled languages like C and C++. This ends in several developers (some of them absolutely remarkable) that keep losing time to understand how to fix an "undefined symbol" error.

An Overview

C Compiler
Operating diagram of a C compiler
The main purpose of a C compiler is to generate a binary file starting from one or more C source files. The generated file may be an executable or a library but in every case it contains low level machine-code instructions that a processor can execute.

A C compiler (such as GCC) is made of three logical blocks: the preprocessor, the compiler properly so called and the linker. Each block performs a set of specific tasks and produces an output for the next block but for performances reasons, usually the first two are implemented together.

Just as a side note, the first C++ compilers were implemented as preprocessors for C compilers.

Other posts in this series

Reliability First - Embedded Systems

A well-designed watchdog
This post could end with a single word: watchdog. But designing a good watchdog is a challenging task.

A hardware chip that cuts the power supply to the main processor is indispensable to provide real reliability. This chip should be pinged at regular intervals, otherwise a power cycle is done. If well calibrated, this system can be effective enough for single-thread application running on microcontrollers. But for microprocessors with an operating system and several processes running, a software watchdog is needed too.

The Software Side

Obviously the watchdog process (WDP from now on) must be tied with his hardware counterpart. In this way, if the WDP crashes, the system will reboot, ensuring that other processes don't remain without a monitor. This is the easy part; how the WDP checks that everything is working fine is another kettle of fish.

One solution may be monitoring the status of every process (ensuring that it's running and not zombie) and abnormal usage of CPU and RAM. The hard part here is defining what "abnormal" means.

Besides this rough check, we can make each process to feed the WDP at regular intervals. The drawback is that we need to complicate each process inserting code not related to its core business. If it seems a not so big disadvantage, try to imagine the amount of code needed if you have a big process with tenths of threads running concurrently. Unfortunately, this is the price to pay for a really reliable system.

Management Of The Failure

OK, now you have your WDP running on your system with other processes that fed it. The next step is to decide what to do in case of failure. If a program is going to consume all the system resources, an obvious thing to do is killing it. And then?

The answer depends on the process and the system architecture. For some process, the right solution may be trying to restart them; for others a system reboot may be required. Additional rules may be set on the number of failures in a certain time. Probably, in an average system, all these strategies should be applied to different processes.

Reports

After the WDP has done its dirty work, is more likely that the failure will reappear. It can be because of a bug, an unmanaged situation or for a memory leak that is slowly consuming the RAM. A good way to understand what happened is to have a memory dump of the "bad" process to perform a post mortem debug. But unfortunately, often this is not enough.

In a complex system where processes interact, a log that shows information from the last minutes before the WDP intervention can be really useful. This ends in other extra code added to the processes.

Conclusions

Designing an effective and reliable watchdog for embedded systems is a complex task and it often implies additional code added to the other processes. But believe me, it's worth the hassle.

When Unit Tests Fail

This week, my colleague +Giancarlo B. showed me this short function.
char *unescape(char *in)
{
        char *tmp;
        int i, x;
        char b[5];
 
        tmp = calloc(strlen(in), sizeof(char));
        x = 0;
        for(i = 0; i < strlen(in); i++) {
                if(in[i] == '+')
                        tmp[x++] = 32;
                else if(in[i] == '%') {
                        memset(b, 0, 5);
                        strncpy(b, &in[i + 1], 2);
                        tmp[x++] = (char) strtol(b, NULL, 16);
                        i += 2;
                } else {
                        tmp[x++] = in[i];
                }
        }
        tmp[x] = 0;
        return tmp;
}
It's purpose is to convert a string such as "this+is+a%20space" into "this is a space". The function works pretty well provided that the input string contains at least a "%xx" sequence. If not, the output allocated string is one character too short. To fix this, it's sufficient to substitute
        tmp = calloc(strlen(in), sizeof(char));
with
        tmp = calloc(strlen(in) + 1, sizeof(char));
The thing that makes this bug special is that it can only be found by looking at the code. Let's see why.

The Speed Of calloc

Image by Martin Maciaszek https://www.flickr.com/photos/fastjack/The first thing to understand is how calloc works. I've learn this thing a couple of years ago when searching for the differences in speed with malloc + memset. Basically, calloc returns a pointer to a memory area that belongs to an already blank page, so there is no need to clear it, saving time. At this link there is an extended explanation.

This means that is (almost) guaranteed the next byte after the memory returned by calloc is blank. Or, in other words, that the string is NULL-terminated. But unfortunately this is true only until another calloc is called.

This second call is likely to return a pointer to the first unallocated byte that can be later changed into something different from NULL, generating unexpected behaviors.

False Negative

Now you should have understood why unit tests can fail here. Suppose you have this code:
int do_test()
{
        int err = 0;       /* 0 = no errors */
        char *s1 = "this+is+a%20space";
        char *s2 = "thisisnotaspace";
        char *t1 = "this is a space";
        char *t2 = "thisisnotaspace";

        char *r1 = unescape(s1);
        if (strcmp(t1, r1) != 0)
                err = 1;   /* error on the fist test */

        char *r2 = unescape(s2);
        if (strcmp(t2, r2) != 0)
                err = 2;   /* error on the second test */

        free(r1);
        free(r2);
        return err;
}
I expect that this function always returns 0, that means no errors. But obviously this is wrong. And, even if I know a couple of way to modify the test code to catch this particular error, it's not said that enabling or disabling some compiler flags we obtain the same behavior.

Conclusions

I really believe that unit tests are useful in order to find a wide range of bugs, but, in some situation, developer's experienced eyes are indispensable.

Should We Forget IT Security?

In the last months, a high number of security flaws has been reported. Starting from bugs in the management of SSL/TLS protocol (Apple and GNU), continuing with Heartbleed, Shellshock, BadUSB up to POODLE, just few days ago.

So now you may ask: what the hell is happening here? Why are there so many threats in such a short time? Are today developers less skilled than their predecessors?

Well, I simply think that we are looking at the wrong side. Many of the above bugs are there from years and nobody reported them (even if I think that they have been used by someone for illegal purposes).

But lately, there is a great request for security. The reason is simple: the amount of money that every day flows through internet. This is why the attention to security has grown so dramatically and thus the number of bugs being found.

So, what should we expect from next months? In my opinion, even more security flaws will be disclosed. And this is a really good thing.

Insanity And 4 Other Bad Things

Dilbert by Scott Adams
The definition of insanity is doing the same thing over and over and expecting different results.
Some say this sentence has been first pronounced by Benjamin Franklin, others attribute it to Mark Twain or Albert Einstein. They all are wrong. But the quality of the people to whom this quote is ascribed should tell you something about its correctness.

There is also an ancient Latin maxim (by Seneca) that states a similar concept:
Errare humanum est, perseverare autem diabolicum et tertia non datur.

To err is human; to persist [in committing such errors] is of the devil, and the third possibility is not given.

[Thanks to Wikipedia]
With this premises I have to conclude that the Devil is causing so much insanity in the world nowadays. Take this as a general discourse but it seems to me that many people keep doing the same things in the same old way, facing every time the same problems and delays without understanding that things can go really better just changing few things in their way of acting.

Excluding supernatural interventions, in my experience, this kind of behavior is mainly due to four reasons.

1. (Bad) Laziness

Not that kind that makes you find the fastest solution to solve a problem. This laziness is absolutely harmful; it's the concept of comfort zone amplified to the maximum. "I don't wanna change!" and "I don't wanna learn anything new!" are his/her mantra.

Every change in procedures is considered a total waste of time and a new developing environment is simply useless. If you have a couple of people of this kind in your team, you can be sure that every innovation will be hampered.

To overcome this behavior you can try to propose a total revolution in order to obtain a small change.

2. Arrogance

"I'm sure I've made the right choice!" no matter if this decision has been made years ago and now the world has changed. By the way, the initial choice may have been wrong from the beginning but nothing can make him/her change his/her mind. Probably this has something to do with self-esteem.

It's quite impossible to work together with this kind of developers, since they will never admit their faults and they'll try to put the blame on others.

Sometimes a good strategy may be to suggest things as they have been proposed by the arrogant himself.

3. Ignorance

There's nothing bad in not knowing something. The problem is when he/she doesn't care about his/her nescience (see point 1), when he/she doesn't want to admit it (see point 2) or when he/she doesn't trust others' suggestions.

This last point may seem a little strange: if I don't know something, I have to trust on someone that is more informed or skilled than me, right? Unfortunately it doesn't work this way. If you need a demonstration, search "chemtrails" on Google.

I don't have a suggestion on how to minimize the impact of these guys in your team. Maybe a training can be useful but the risk is that they don't trust the teacher.

4. Indifference

This is the worst, especially if referred to a manager. He/she doesn't care about the feeling of his/her subordinates. "There is no need they should be happy doing their job" and "It's not a problem if they spend more time than what's needed in trivial tasks that can be automatized" are his/her thoughts when someone is complaining.

I don't know if there is some sadism in this behavior, but it's quite frustrating. And it's very bad for the team and for the whole Company.

Conclusions

During my life, I've had the "opportunity" to work with people belonging to one or more of the above categories and I can assure that the last is the worst. You simply cannot team up with someone that doesn't care about you.

Suggested complementary read: Is Better Possible? by Seth Godin.

Horror Code - Why?

while (x >= 0) {
        x--;
        y--;
}
I've only a question: why?

ShellShock: Impact On Average People

In the previous post, I've written about the ShellShock vulnerability in a general way. Now I want to talk about how this vulnerability can impact all the average internet users.

So the question is: what can you do to protect yourself when surfing the web? The same good old things.

Check Your Router

As said in the previous post, there is a remote possibility that your router (if you have one) is vulnerable. To understand if you are at risk, the best thing to do is  is to take a look at the producer website. If you are lucky enough, a patch is already available. In any case, you should try before you trust.

Offline tests:

Online tests (not recommended - it's not a good thing to let someone know that your router can be attacked):

Use An Updated Browser

Since ShellShock vulnerability can be used to inject malicious code in trusted websites, this probably will result on several tries to take advantage of old and new known browser breaches. If you keep your browser always up to date, you'll be less vulnerable. Avoiding Internet Explorer is a good solution too.

Something should be said also for two products that usually act as plugins for the browser: Java and Flash. There are plenty of exploits based on vulnerability of these two products so it's better to disable them by default and allow their execution only if they are really needed.

Use An Updated OS

I know that you feel comfortable with Windows XP but you should know that Microsoft is not providing security patches anymore. This means that every vulnerability being discovered will never be fixed.

[If you feel comfortable with Windows Vista, please contact a doctor <grin />]

Use An Updated Antivirus

Nowadays AVs are smart enough to detect a wide range of malicious web attacks, even unknown ones with their heuristic algorithms.

There are plenty of good free and non-free antivirus out there: pick one and install it. An average AV is better than no AV.

This suggestion is basically for Windows and Adroid users but Mac addicted should worry too.

Conclusions

As you can see, all the above suggestions give you  more or less the same hint: keep everything up to date. This is because security is a process. This means that there is nothing that can be considered truly attack proof except if it is turned off and with the cable (or the battery) unplugged.

ShellShock, What I've Understood

Disclaimer: I'm not a security specialist; if you are running a webserver, please consider asking a qualified technician.

If last week you have been to Mars, maybe you haven't heard about the new security issue named ShellShock (CVE-2014-6271), claimed to be even more dangerous than HeartBleed.

In few words, it's a vulnerability in the Burn Again Shell (bash) in all Unix-derived systems (Linux, BSD and MacOS) [and maybe Windows too] that can be triggered via a remote call to a web server that uses CGIs. The following is a simple method to understand if your system is vulnerable (taken from here):

env X="() { :;} ; echo busted" /bin/sh -c "echo completed"

env X="() { :;} ; echo busted" `which bash` -c "echo completed"

Run these two commands from a shell; if you read "busted", your system is vulnerable.

Obviously, the preferred targets for attacks based on this vulnerability are the webservers directly connected to internet. And this don't mean only those that run large companies websites, but also embedded devices, such as routers and IP cams.

Embedded Systems: Not So Bad

Due to the limited power of the hardware, embedded systems usually don't have the standard bash binary but use the shell functions provided by BusyBox that is not vulnerable.

This is true for many embedded devices but not for all; for this reason, if you own one of those,  I suggest you to verify with the producer or by yourself, if you are skilled enough.

Webservers: Pretty Bad

I'll assume that your system is correctly configured: since this vulnerability does not involve privilege escalation, all the commands being executed by the exploiter will run with the same privileges of the webserver application and its CGIs. Of course, if Apache is running with root permissions, you are in big troubles even if your system is not vulnerable, but this is a different story.

That said, this vulnerability can be tapped in five ways.
  1. Making the server slow or unavailable, for example running busy loops or filling the memory with an infinite series of forks.
  2. Stealing your data (restricted pages on the website and databases).
  3. Deleting all data accessible from webserver and CGIs (webpages and databases): you can mitigate this risk with frequent backups.
  4. Being used for a DOS attack against other sites.
  5. Injecting malicious scripts or redirecting the visitors to malicious sites. This is the worst scenario: with sed or awk it is quite simple to change every HTML page (but also JavaScript files or Python CGIs) to inoculate code that can take advantage of some browser vulnerability - or simply deface your homepage.
To add some more fear, some of the above can be combined to increase the damage.

Conclusions

ShellShock is a vulnerability that should not be underestimated but probably it's not so bad as it has been reported. Many embedded systems, that are more likely to not receive an update, are not affected. And nowadays many technologies other than CGI are used.

Now, if you are asking if there is something that average people can do to mitigate the risk, you'll have to read this post for the answer.

0 Errors - 0 Warnings

Are you one of those that don't care about warnings when compiling? Well, if you write conditions like the following,  probably you are:

if (2 == foo)

At first, it seems reasonable: if you forget one "=" by mistake, the compiler will stop the build with an error, making it obvious that you did something wrong. If the condition was written in the opposite way, it only would present you a warning message. But this approach has a big pitfall: the false sense of security.

If you think that whenever your code compiles without errors it is OK and the warnings are just annoying messages that can be easily ignored, you are probably missing the following problem:

if (bar = foo)

Every decent compiler will warn you by telling that, if an assignment is what you really want to do, it's better to surround it with double parenthesis. But if you ignore warnings, you'll get an unexpected (for you) behavior.

In my opinion, compiler warnings are even more important than errors: an error is something that is illegal in the format of a program while a warning is telling you that something looks strange in the logic of your code.

Many warnings are similar to a question: "are you sure you want to do that?" Or better: "doing that you'll get this result: is it what you want?" From my personal experience, many times the answer was "No!"

Bottom line: if you want to be sure that your code won't compile if there are warnings, the flag -Werror of gcc is what you need.

C++ And goto Don't Match Together

As you can see by reading this blog, I'm a fan of goto when used in an appropriate way. Unfortunately it works for C only - not C++.

The issue is all in this short paragraph (6.7/3) of the C++ standard:
It is possible to transfer into a block, but not in a way that bypasses declarations with initialization. A program that jumps from a point where a local variable with automatic storage duration is not in scope to a point where it is in scope is ill-formed unless the variable has POD type (3.9) and is declared without an initializer (8.5).

This means that your C++ compiler will not build many C sources if you have used gotos to bypass initializations.

The solution to maintain the same functionality and the same structure of your code is to use exceptions. I'm sure you are thinking "why cannot I simply use some nested ifs?" My answer is "readability over all!"

An Example

Let's see a quick example, as I would write it in C.
int foo()
{
        int err = 0;
        X *x = NULL;
        Y *y = NULL;

        x = x_factory();
        if (x == NULL) {
                err = 1;
                goto clean_exit;
        }

        y = y_factory(x);
        if (y == NULL) {
                err = 2;
                goto clean_exit;
        }

        while ( /* condition */ ) {
                while ( /* other condition */ ) {
                        /* some code */

                        if ( /* critical error */ ) {
                                err = 3;
                                goto clean_exit;
                        }
                }
        }

        /* some other code */

clean_exit:
        if (x != NULL)
                x_free(x);
        if (y != NULL)
                y_free(y);

        return err;
}
The same code in C++ won't work, if in the code inside or after the loops there is some initialization. So, let's see how you would be tempted to rewrite it.
int foo()
{
        int err = 0;
        X *x = NULL;
        Y *y = NULL;

        x = x_factory();
        if (x != NULL) {
                y = y_factory(x);
                if (y != NULL){
                        while ( /* condition */ ) {
                                while ( /* other condition */ ) {
                                        /* some code */

                                        if ( /* critical error */ ) {
                                                err = 3;
                                                break;
                                        }
                                }

                                if (err != 0)
                                        break;
                        }

                        if (err != 0) {
                                /* some other code */
                        }
                } else {
                        err = 2;
                }
        } else {
                err = 1;
        }

        if (x != NULL)
                x_free(x);
        if (y != NULL)
                y_free(y);

        return err;
}
Too many elses and too indented, in my opinion. Below there is my version, using one of the most powerful constructs of C++: exceptions.
int foo()
{
        int err = 0;
        X *x = NULL;
        Y *y = NULL;

        try {
                x = x_factory();
                if (x == NULL)
                        throw(1);

                y = y_factory(x);
                if (y == NULL)
                        throw(2);

                while ( /* condition */ ) {
                        while ( /* other condition */ ) {
                                /* some code */

                                if ( /* critical error */ )
                                        throw (3);
                        }
                }

                /* some other code */
        } catch (int exception_code) {
                err = exception_code;
        }

        if (x != NULL)
                x_free(x);
        if (y != NULL)
                y_free(y);

        return err;
}
This is my opinion; what's your?

All You Need To Know About Software Development


While I was reading this (long) article, I've felt like all the different pieces of the puzzle in my head go in the right place.

This is the most complete and correct description of the software development best and worst practices I've ever read. Michael Dubakov covered every single aspect and analyzed each factor that can impact on the speed of a software project.

I've only to add a single small note about refactoring: I'm not sure that it is a non-value added activity. Generally speaking it may be so but often, after a refactoring I've found my code run faster and/or have a smaller memory footprint.

That said, it's definitely a great article. Take your time to read and re-read it.


You Are Not A Programmer


So you write code every day, maybe in a nerdy language like C or even in assembly. And a company is paying you for this job. When someone asks you "what do you do?", it's normal for you to reply "I'm a programmer", isn't it?

Well, let's see if you are a liar. This is a simple yes/no questionnaire about what you have done in the last two years.

The Real Programmer Test

  1. Have you studied a new programming language?

  2. Have you used a new technology?

  3. Have you spent some time to optimize your code?

  4. Have you programmed for your pleasure out of the working hours?

  5. Have you eaten at least 50 pizzas?

  6. Have you drunk at least 3 coffees every day?

  7. At least once did you choose to not use your favorite programming language because you thought it was not the best choice for a project?

  8. Have there been more happy days than sad days when doing your job?

If you replied "yes" at more than half of the above questions, congratulations, you are a real programmer!

Explanation of the Test

If you are not a real programmer, maybe you cannot understand how the above questions come from, so here there are some hints.

  • A programmer is curious by nature: he likes to learn new languages and technologies, even if they are not required by his job (questions 1 and 2).

  • A programmer knows that every code needs some refactoring at some point (question 3).

  • A programmer is happy when he can write code (questions 4 and 8).

  • A programmer is realistic: he knows that one-size-fits-all doesn't exists in computer science; in other words, for some purposes a language/technology can be better than another (question 7).

  • A programmer needs to have it's brain constantly fed by carbohydrates (pizza) and sometimes powered by caffeine (questions 5 and 6).

Having said that, you may argue that many of these characteristics are innate. Well, you are right! Many people write code because they think it's just like any other job but they are wrong. Programming needs passion, devotion and the right way of thinking. And over all (as I've read in a pizzeria):

If it were an easy job, everyone would be able to do it

Image by icudeabeach

Authors In The Open Source World

Last week, Seth Godin wrote another great post. This time the argument is the difference between companies and authors. No company would endorse a competitor while writers often suggest books written by others.

The implicit message is that culture is not a product.

Open source logo
Image by Andrew
For FOSS developers it works almost the same. If someone is creating a good software, his project will be not only praised but also improved by other developers. And the good part is that they share their work for free.

For these reasons I think that with the following sentence, Seth is describing a situation wider than he thought.

It's not a zero-sum game. It's an infinite game, one where we each seek to help ideas spread and lives change.

Back To The Past

In three weeks the new Windows 9 (or whatever will be its name) will be unveiled. There are some rumors about its features but the most recurrent ones are related to the reintroduction of the Start button.

A feature introduced by Microsoft almost 20 years ago (with Windows 95) is the most awaited in 2014. Pretty curious, isn't it?

You can say that this is due to the laziness of the users and the cognitive overhead that a brand new UI implies. This is part of the truth, in my opinion. But there's also something else.

The Desktop Is The Key

When Windows 8 was released, millions of users around the world asked: "where is the desktop?" and, only after finding it, they asked "where is the Start button?" Having more than one window open and quickly switch from one to another has been one of the successful features of Windows since the release 3.0.

But at some point, Microsoft decided to switch to a more modern interface (in fact its name is Modern UI) designed mainly for tablets and smartphones. As Jakob Nielsen pointed out, it is not so bad on a small touch screen. But on a desktop wide screen it simply sucks.

So the desktop have been maintained (also because it would have been a commercial suicide to not have the support to millions of existing applications) but relegated to a small tile on the new Start screen.

The intention was clear: the future will be without application windows and with a unique interface for every device. The problem for MS is that to date Windows phones have a very small market share compared with Android and iOS so the reference market for Windows is still the PC world.

This is probably the reason that pushed Microsoft to reintroduce the Start button. For the happiness of millions of guys (like me) that don't have to explain a completely new UI every time their parents/friends/relatives buy a new PC.

Reliability First - Applications

What does reliability mean in computer science? Speaking about an application, how can we say it is reliable? I don't know if there is a shared opinion but mine has maturated after a scary situation.

Some years ago, on my previous workplace, we created a huge file with a very powerful and even more expensive third party software. But some seconds after having pressed the save button, the software crashed. Panic. We searched for the saved file and we found it. Don't panic. So we restarted the powerful-and-expensive-third-party-software to reopen the file but it failed. We tried several times even on other PCs without success. The (binary and proprietary) file seemed to be corrupted. Okay, panic!


Fortunately we also owned a licence of a similar software, much less powerful and much cheaper (about twenty times cheaper). We had nothing to lose so we tried to open the file with this cheap software and... it worked! All our job was there. So we saved the file with a different name in the cheap software and eventually we were able to open it with the expensive software.

After that incident I have a clear idea of what reliability means when speaking about applications. And you?

Image created with GIFYouTube. Scene taken from movie "Airplane II: The Sequel".

Horror Code: the Matryoshka Functions

Matryoshka Dolls
Image by Fanghong and Gnomz007
Some years ago, when I was a Windows developer, the maintenance of a big project has been assigned to me. My job was to fix a couple of minor bugs and add some new functions. The problem was that the creator (and previous maintainer) have resigned years before, leaving absolutely no documentation.

The only thing I could do was to look the code and read the few comments. At some point, I found a call to the function GetName(). Changing the name returned by that function was one of the purpose of my job, so I looked for the implementation. And this is what I have found:

CString GetName()
  {
    return GetValue("Name");
  }

Pretty useless, don't you think? OK, let's see the implementation of GetValue():

#define MAIN_APP   "Main"

CString GetValue(CString szField)
  {
    return GetField(MAIN_APP, szField);
  }

Is this a joke? Well, let's see where the story ended:

#define MAIN_INI   "main.ini"

CString GetField(CString szSection, CString szField)
  {
    char szResult[100];

    GetPrivateProfileString(
        (LPCTSTR) szSection,
        (LPCTSTR) szField,
        "App",
        szResult,
        100,
        MAIN_INI
    );

    return szResult;
  }

Surprised? Now I'm sure you are wondering why someone had done this. I'm still wondering too.

Write and Rewrite (and Make it Better)


I'm not comparing myself to Hemingway, but, when I write a new piece of software, for me it works the same. I usually write code in a quick-and-dirty way just to make things work. I have to follow my stream of consciousness and put down the basis of the algorithm.

Then, when something starts to work, I begin to make it look better. This means that I change variables like goofy, pluto, etc. with more descriptive and meaningful names. I try to see if I can move some piece of code into a function or if there is a more performing algorithm to be used. In the end, I look at errors, particular situations and memory leaks.

The final result is pretty different from the original code, but it's surely (?) better written, more readable and more performing. Maybe you can argue that I could write directly a better version and limit the modifications to small cosmetic things, so I would have saved time.

My answer is that it's not so easy. For example, managing error codes and particular cases or finding meaningful names for variables and functions is something that takes some time and, if the stream stops, it probably will take longer to accomplish the work.

What about you? Do you ever follow your stream of consciousness? Or do you prefer to immediately take care of all the details?


Image create with Pinwords - Picture of Ernest Hemingway taken from here (public domain)

Speed Up Your Searches

Usually, in Computer Science classes, sorting algorithms are studied in depth. The reason is quite simple: they are widely used and can be applied to several contexts. Moreover they represent a good way to introduce the concept of computational complexity.

In the real world, one of the reasons to sort items is for searching. On a large set of data, finding an item by scanning every element to see if it matches can be a very slow operation. If the array is sorted by a key, we can perform binary search in order to lower complexity on worst case from O(N) to O(log N). This is a good improvement on large set of data, isn't it? Now, what if I say that I can lower the computational complexity to O(1)?

No Magic, Just Some Math

A small phone book as a hash table (image by Jorge Stolfi)
The trick is to find a way (or an algorithm, if you prefer) to convert the key (that can be a number, a string or whatever) into an index that directly refers to the desired information. This "way" is called hash function and the data structure hash table.

There are hundreds of articles about how hash tables works under the hood and at least the same number of implementations. In many high level languages hash tables are available even as built-in types, often called associative arrays, dictionaries or maps.

Hash tables are widely used when the data set is huge and high response speed is needed. For example, in some RDBMS, indexes are implemented with hash tables. In addition, unlike arrays, you are not forced to have object of the same size stored into hash tables, so the used space (in memory or disk) is optimized and this also affects the average speed of a search.

All That Glitters Is Not Gold

Unfortunately there are some drawbacks with hash tables; the two biggest are both related to the hash function: speed and collisions. The speed issue is related to the execution time of the hash function itself and it's critical when the hash table doesn't contain so much data, so the cost of the linear search in the medium case may be comparable with execution of the hash function.

The collision is a more subtle problem: the hash function may produce the same output for different keys and exist various ways to store different data in this case but they all take some time, reducing the performances of the hash table.

The good news is that many implementations let you choose to use the built-in hash function or provide your own, so, if the speed is a critical requirement of your project, you can create a hash function that better fits your needs.

Conclusions

Hash tables are a very powerful and flexible data structure that can increase the speed of your searches on a huge set of data.

If you are wondering which hash table implementation I use in C, the answer is GLib Hash Table.


Ideas Are Not Enough

Image by Pictofigo
Today, +Seth Godin in his daily post spoke about something that is essential for me: the importance of going from ideas to implementation.

A great architect isn't one who draws good plans. A great architect gets great buildings built.

You don't know how many times I've heard people (including myself) complaining about a new product presented by a competitor saying "I've had the same idea years ago".

So are you trying to say you are smarter? Probably not so smart to get things done. Or maybe did you think that your idea would be translated into a project by your subordinates?

Stop fooling yourself! If you want your dreams come true, stop sleeping and start working!

I've already wrote about this concept in Avoid Perfection and briefly in Everyone Matters.

[Linux] How to Define a Path for Shared Objects

In Linux, the predefined paths for shared objects (.so) are /lib/ and /usr/lib/. During normal usage this is OK but sometimes it's necessary to specify other additional paths. There are several way to do this task.

The most common way is to define an environment variable in the shell:

$ export LD_LIBRARY_PATH=/path/to/shared/objects

This operation must be done in every new shell/terminal emulator before starting the process that needs the shared objects located there. To avoid this, you can insert the export in your .bashrc/.profile file (located in your home folder): in this way the variable is set every time you log in.

But sometimes you need a shared object to be called from daemons or processes run by various users. The solution is the file /etc/ld.so.conf. In this file you can add all the paths where shared object should be searched.

In Debian-based distros (like Ubuntu), this file only include the following line:

include /etc/ld.so.conf.d/*.conf

This means that you don't have to change this file but it's sufficient to add your paths to a file with extension .conf located in the directory /etc/ld.so.conf.d/

Pay attention! Adding directories not owned by root can be a security breach because it can lead to the execution of arbitrary code.

Cleaning Up the Path in 5 Easy Moves

Freddie Mercury - "I Want to Break Free" music video
Freddie cleaning his house
The idea for this post came during last weekend while I was cleaning my house. In fact, this activity is made by several parts: some are funny (like using the vacuum cleaner while singing "I Want to Break Free"), some other are boring, others are awful.

The development of a big project is very similar: there is the challenging part, the damn-long part and the stupid part. Here there are some advice to help you to accomplish your job in the best way.

1. Split the project in tasks and subtasks - this is obviously the first thing to do; start developing headlong is something to avoid.

2. Try to see if there are constraints - it's important to understand which tasks must be made before others and define a clear path between them.

3. Start with the task you consider the worst - it may be the longest or the most boring or the most annoying, the choice is up to you, but when it will be done, the rest of the project it's all downhill.

4. Work on a single task at a time - you have a road map (defined at point 2), why do you need to get rid of it?

5. Work always on tasks related to those already accomplished - in this way, you are always sure that new pieces fits properly in the existing structure and it's easier to test your progresses.

Well, that's all folks. Let me just add another general purpose suggestion: always remember the 80/20 rule, that can be declined as "details will cost you the majority of your effort".

[Linux] 2 Alternatives to Standard Terminal Emulators

Nothing to complain with gnome-terminal: I've used it for years and I can say that it does its dirty work very well. But lately, too often I need to have several different shells opened and not overlapped in order to check various things at the same time.

With standard terminal emulators I haven't found a way to automatically arrange the windows on the screen, so I've searched. And I've found Terminator.

Terminator

Terminator screenshot
Click to enlarge

As you can see from the above image, in a single window you can arrange several terminal emulators (4 in this example) by splitting the window both horizontally and vertically. Moreover the proportions may be changed and every function can be triggered by a customizable keyboard shortcut.

Other handy functions are:

  • the possibility to type the same text/command in more than one terminal at once;
  • the temporary maximization of one terminal window;
  • tabs management (if you need even more shells).

On the appearance side, you can customize font, colors (with built in themes), set the background (color, image or transparent) and the scrollbar aspect.

Guake

Example of Guake overlapped to other windows
Click to enlarge

Sometimes a terminal window is too much for me. I mean, maybe I just need to know if the Internet connection is working properly with a ping and then forget. Or know which IP address the DHCP assigned to my network interface. Or quickly take a look to a man page to answer a colleague.

Opening a normal terminal window for such simple operations and then closing it right after may be a solution but, I prefer to have a dropdown shell that disappear when it's no more needed, just to reappear few minutes later when I press a key.

This is exactly the behavior of Guake (which name recalls the FPS Quake because of the similitude with its chat), with the addition that it supports multiple tabs. One thing I like is the fact that it remains the same across multiple workspaces.

Guake is a little less customizable than Terminator especially in the graphical part but I hope you don't mind since it will be hidden for most of the time.

Conclusions

Even if desktop managers have greatly evolved in the last years, in Linux systems the fastest way to do several things is using terminal emulators. And with these two my productivity has boosted ;-)

Battles and Management

Lev Tolstoy in a rare color photo
Have you ever read War and Peace by Lev Tolstoy? It's a great book with many stories wrapped up with History. It is based in Russia during the Napoleon era and, of course, it also tells about battles.

There is one thing in the Tolstoy's point of view that caused me some thinking. More or less, this is the reasoning of the author: most of the times, battles are not decided by generals or strategists but by single episodes of bravery or dastardliness in the troops.

Of course there is a part of truth in this but, as usual, life is slightly more complicated. A good commander should be able to understand if its soldiers are motivated, which are their strengths and their weaknesses. And a good commander always has an ace in the hole.

Victory and Fortune

Many great commanders in the past were undoubtedly lucky and often luck is considered an essential attribute of good strategist and I'm sure you have already heard that fortune favors the bold. But a commander that points everything on its luck is doomed to lose.

This is why, really great commanders always have some troops reserved for difficult moments (for example Napoleon had the Old Guard) and they know when engage a battle and when retire their troops.

For a manager it works the same. If he plans the team at one hundred percent of the time, there will no room for contingencies. If he accepts every job proposed by the salesmen, the risk is to provide a poor job or to miss deadlines. He can put its subordinates under pressure for some weeks but not forever.

Sometimes managers have to take some risk and they can be lucky, but fortune doesn't lasts forever.