The 80-20 Rule: Pareto And The Devil

Probably only the number of webpages with images of cats is greater than those talking about Pareto principle. Nevertheless I want to add my own just because I think this is not so clear to all my colleagues.

A rough definition of this law may be:
To accomplish the final 20% of a job, you'll need the 80% of the total time.
If you are close to the deadline of a big project, this sentence sounds quite depressing, isn't it?


But if you have some experience, you can say that this principle is absolutely correct. How much time have you spent in moving UI objects one pixel at time until the Project Manager is satisfied? And what about colors? And that damn string that doesn't fit the printed page? And the final comma to be removed from JSON arrays? And that strange bug that only happens on few PCs?

All these things are important - but not fundamental. They don't represent the core of the application, just some details. In fact, another way to express nearly the same concept is:
The Devil hides in the details.
In my opinion, it is all in the difference between a proof of concept and a real application. A software, in order to be given to a customer, must be:
  • efficacious - it must do its job
  • efficient - the job must be done in the best way possible (according to time constraints and remembering that perfection is impossible to achieve)
  • reliable - it must handle failures in a proper way and always preserve user's data
  • usable - the user must find it natural to use (*)
People that doesn't have experience in coding "real" applications usually underestimate the benefits of usability, reliability and sometimes efficiency. Without these characteristics, your software will be nothing more than a proof of concept. And you are not creating proof of concept, aren't you?


(*) = I know this is not the most complete definition of usability, but I think that it gives an idea about what should be the final intent of any user interface.

Bitwise Right Shift With Signed Int

It's never too late to learn something new. And sometimes the compiler is smarter than what you think. In this case, I've discovered this behavior a couple of weeks ago while I was working on a C function that manipulates the bits of an unsigned integer variable using the right-shift operator (>>).

The function worked pretty well, until someone wanted to use it with signed integers, too. The issue here is that the sign bit (the leftmost one) is replaced with zero when right-shifting. The solution has been absurdly easy: cast to signed int.

This is an example with 8 bits:
Bitwise right shift with unsigned and signed int
The right shift produces different results depending on the signedness of the variable

Too Good To Be True?

This is wonderful but there is a little thing to consider when you plan to use it. According to the C standard:
1196 If E1 has a signed type and a negative value, the resulting value is implementation-defined.
This means that every compiler can behave in its own way and, worst of all, for different architectures the same compiler can act differently. So, what's the solution?

There isn't just one, of course. The following function should work pretty fine in all cases, but I'm sure you can find at least another algorithm to do the same thing.
int right_shift_with_sign(int var, unsigned int amount)
{
        int mask = (var < 0) ? -1 >> amount : -1;
        int result = var >> amount;
        return (result | ~mask);
}
[Code not fully tested - use it at your own risk]

If you need way to detect how your compiler behave, you can use this short snippet:
if ((-1 >> 2) == -1) {
        /* right shift preserves the sign */
} else {
        /* right shift DOES NOT preserve the sign */
}

How C Compilers Work Part 3: Compiler

The properly so called compiler is a really complicated piece of software. It performs many different tasks on single files. This means that no matter if your project is made of thousand source files: they will be compiled one by one.

Now let's see the common operations. First of all, it does a lexical and syntax analysis over the code. Many of the errors about missing semicolons and misspelled variables will be caught here. For functions there is a small caveat. According to C standard, if a function has not been declared, the compiler assumes that:
  • it exists in some other file of your project, linked library or common library (such as libc), and
  • it returns an integer value.
This will lead to warnings like this:
warning: implicit declaration of function ‘fprintf’ [-Wimplicit-function-declaration]
        fprintf(stderr, "Error! Expected %d param\n", ARGS);
or this:
warning: initialization makes pointer from integer without a cast
        char *token = strtok(my_string, "/");
To remove them, you just have to add the declaration of the functions you are using, by including the right header file.

Other things that the compiler checks:
  • types of variables in assignments;
  • type and number of parameters of called functions must match the corresponding prototypes (usually declared in header files included by the preprocessor);
  • const variables and parameters have to remain untouched.
Another important task that is performed is the optimization. This task can improve your code mainly in two ways: increasing the speed or reducing the size of the final output. There are many things that can be done to achieve one of these goals. GCC, for example, has lots of options to let you do a fine tuning but, in the end, the more used are -O2 (to optimize for speed) and -Os (to optimize for size).

Eventually, the compiler produces a binary file (often called object file) full of references to unknown things. To make those things known is the job of the linker.

Other posts in this series

YouTube And The 2 Billion Views Video

Last week, Google stated that the video "Gangnam Style" by Psy has overcome 2 billions of visits or 2,147,483,647 (that is 231-1) to be precise. This is the highest number that can be represented with a 32 bits signed integer variable. Once that value is reached, if you add one, you get a negative number. As you can understand, this can be a problem for a counter. But now let's do a step behind to see the whole picture.

First Thing First: Who Is Psy?

Why a South Korean artist that sings in his own native language should be so popular worldwide? Well, the song is really catchy and the video is funny, you may say. These two reasons can explain the success after the video became quite popular. But the starting point had nothing to do with music.

It all begun when someone at 4Chan noticed that Psy was very similar to Kim Jong Un, the North Korean dictator. In addition, Psy was the perfect nemesis of the Justin Bieber, a Canadian singer quite hated by 4Chan guys. (4Chan is an anonymous community where every kind of ideas [mostly not safe for work] are spread daily. Just as an example, many of the internet memes that you can see have an ancestor in 4Chan. And Psy is just one of those memes.) You can find the whole story here (in Italian).

The Tech Side

According to TechTimes, Google engineers noticed that the counter was close to the limit some weeks ago, so they could fix it by simply double the size of the variable. In this way, it will take some centuries to reach the new limit of 9,223,372,036,854,775,807 (please note that this is the maximum value that can be represented with a 64 bits signed integer).

Now you may want to ask why using a signed integer? Unsigned counters can double the number of numbers represented with a signed variable of the same size. This would have been quicker to implement. Since the size remains the same, database structure should not be changed. Only the front-end will be affected.

My assumption is that negative values are used internally for some other purpose, so it would have been probably most impacting to change this behavior instead of double the size of a field in the DB.

It's Marketing, After All

The thing, from a technical point of view, could have passed without much clamor. But the number is something that can be considered unreachable for any other video sharing platform. In addition, this is also a goal the classic television channels cannot easily reach.

So this claim around a number is also a demonstration of power that Google spreads worldwide. And the competitors aren't just other online platforms but TV companies, too.

Conclusions

Behind a simple counter there can be several stories. And when we speak about a great company, such as Google, the computer science and marketing always mix together.

Every Language Sucks

Languages Yin Yang
It would be a mistake to consider this page only as an amusement of some funny developers. You may or may not agree with the criticisms about your favorite programming language (and sometimes I don't) but the truth is that the perfect language does not exists.

There are several reasons for this lack. The most important is that each language (or better, each couple language/compiler or language/interpreter) has been made for a specific purpose.

For example, JavaScript developers may considered harmful strong typing of C++, but it's needed to ensure reliability.

Garbage collector can ease developers' work, but it's much less efficient than freeing memory by hand when it's no longer needed.

The epoch when a language was created also influences its characteristics. The increasing power and amount of memory of the average computers have determined the raise of interpreted languages and made the pointers disappear.

Eventually, each new language has tried to keep the best part (or what its creator thought it was the best part) of existing languages and reworked the rest.

The conclusion is that we are surrounded by languages that suck but we are lucky because we can choose the one we think sucks less.

Image created with Tagxedo web app.