Demystifying premature optimization

Premature optimization is the root of all evil

This is a phrase every programmer eventually hears…a lot, that phrase was popularized by Donald Knuth. Most people in this time go beyond its original meaning and usually says:

You should never optimize your code.
You should profile your code and given is slow enough, then you should optimize.

These are the awkward deformation of that popular phase that is indeed optimized, the full phrase is something like this:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

It was first written long time ago in 1974, and the author and the book were pro-optimization. In those times, usually you would be writing code in assembly.
There are some handful and tricky optimizations that can be done in assembly to make the code faster, but for a human to write and maintain those programs, the assembly language (there are several assembly languages, not just one) is not that easy, but the easy thing is to mess it up.

So in favor to have a better readable and maintainable code, in some places you’d better trade performance for readability, maybe something like this:

mov eax, 0 (put 0 in the eax register) was usually optimized in this way:
xor eax, eax (if you xor a value with itself, it is always 0)

Because xor eax, eax it was shorter to write in assembly (i.e, the instruction occupied less memory) and the CPU already knows this instruction as “put 0 in eax”.

Granted, the performance benefit is there but is not much, to be honest.

Today we have many visual tools to help us in the code developments.
Wonderful IDEs, profilers, platforms to make us develop code faster.

Nowadays, it is easier to optimize a code and beforehand, choose the right tool set and craft a good, optimized architecture & code.

Many programmers rely on that phrase to justify:

Their laziness.
Their lack of knowledge.

And it is true, optimization takes effort, sometimes more sometimes less, and it requires a lot of knowledge or investigation. If you do know a technology, you’d often find where a code is fast enough for a given task because there are no magic, you just know what is happening under the hood, and you know if that code will scale well or can give problems in the future.

That’s why I did this benchmark in the lovely Node.js & PostgreSQL, to encourage programmers to think about their code, look up those technologies on Google or their favorite search engine, to know them, read ~~darn~~ the docs about the technologies they plan to use, the pro’s and con’s, their standard, and figure it out how it works.

The better you know a technology, the better you can use it. Yeah, granted, a lot of those things are implementation details and can change from time to time or across versions.

Without further ado, let’s begin.

Imagine a service that should look for the data of ALL people who meet the following criteria: People that their ages age equal or bigger than 18.
So in fact, the service should return all people who are >= 18 years old.

Background

We’ll have several implementations of the same table and the same service, to compare its relative performance using the same dataset. The goal is to make the service be more performant.

In this scenario, every table will contain the same data:

1,500,000 entries
Approx. 2/3 of them have their age over 18.

You can find the full code and instructions on how to run this benchmark:
here in GitHub

Implementation #1

Anybody can come with some naive code like this:

The SQL query looks like this:

SELECT * FROM people_v1;

export default async function people_by_age_v1()
{
  // Load people list from database.
  let people = await SQL.exe.any( Files.select_noob , { table : Tables.v1 });

  // Transform the list.
  people = people
    .sort( ( a , b ) => b.age - a.age ) // Sort by age.
    .map( p => // Map to full name.
    {
      return {
        ...p ,
        full_name : `${p.first_name} ${p.last_name}`
      };
    })
    .filter( p => p.age >= 18 ); // Filter by age.

  // Return the data.
  return people;
};

people_v1 is defined here as the following:

CREATE TABLE people_v1
(
  id         BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY ,
  first_name TEXT     NOT NULL ,
  last_name  TEXT     NOT NULL ,
  age        SMALLINT NOT NULL
);

According to our benchmark iterations, that code took 4608ms to run.

Can we make it better?

Implementation #2

The SQL query still looks like this:

SELECT * FROM people_v1;

export default async function people_by_age_v2()
{
  // Load people list from database.
  let people = await SQL.exe.any( Files.select_noob , { table : Tables.v1 });

  // Transform the list.
  people = people
    .filter( p => p.age >= 18 ) // Filter by age.
    .sort( ( a , b ) => b.age - a.age ) // Sort by age.
    .map( p => // Map to full name.
    {
      return {
        ...p ,
        full_name : `${p.first_name} ${p.last_name}`
      };
    });

  // Return the data.
  return people;
};

The benchmark now says it took 3556ms (77% of the previous time) to run.

The improvement was 33% faster.

What did we change?

Enhancements

Do the filter operation before sort & map.

Explanation

If the set is smaller, transform operations sort & map will take less time.

But, can we make it faster?

Implementation #3

The SQL query now is the following:

SELECT *
  FROM people_v1
  WHERE age >= 18
  ORDER BY age DESC;

export default async function people_by_age_v3()
{
  // Load people list from database.
  let people = await SQL.exe.any( Files.select_amateur , { table : Tables.v1 });

  // Transform the list.
  people = people.map( p => // Map to full name.
  {
    return {
      ...p ,
      full_name : `${p.first_name} ${p.last_name}`
    };
  });

  // Return the data.
  return people;
};

The benchmark now says it took 2720ms (76.5% of the previous time) to run.

The improvement was 23.5% faster.

What did we change?

Enhancements

filter & sort in the database. (Hint: without using indexes).

Explanation

We got rid of filter & sort and we use the database to make those operations instead. Database can execute queries request in parallel, unlike NodeJS that will execute those operations synchronously. (no matter how much async do you type) If set there is too big, NodeJS will spend too much time on them and halt other requests, that’s not good, NodeJS should return a request as fast a possible, if you need a significant amount of data processing you better delegate the operation more efficient technologies and then bring the result to NodeJS, if you need to execute those operations IN the server/node for whatever reason, NodeJS is not the ideal technology for this.

This is proved because despite the database expending more time filtering and sorting because of the lack of an index, it is still faster than doing it in plain NodeJS. Remember that NodeJS needs to transform the data that comes from the database driver to a format that you can manipulate. (i.e: A native JavaScript type).

It is all we got?

Implementation #4

Both the underlying SQL query & the JS is the same as in the implementation #3.
But now we used another table people_v2 (with the same data).

people_v2 is defined as the following:

CREATE TABLE people_v2
(
  id         BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY ,
  first_name TEXT     NOT NULL ,
  last_name  TEXT     NOT NULL ,
  age        SMALLINT NOT NULL
);

CREATE INDEX ON people_v2( age ); -- Notice the new index.

And the JavaScript code is also the same:

export default async function people_by_age_v4()
{
  // Load people list from database.
  let people = await SQL.exe.any( Files.select_amateur , { table : Tables.v2 /* Notice the table has changed */ });
  // ...same code as <people_by_age_v3 >
};

The benchmark now says it took 2515ms (92.5% of the previous time) to run.

The improvement was 7.5% faster.

Enhancements

filter & sort in the database. (Hint: bot NOW using indexes).

Explanation

Same as before, but now the database is more efficient thanks to the index.
- We should take into account that INSERT/UPDATE/DELETE in that table will be a bit slower because the database now will need to update the index as well, so we always need to make indexes that we ACTUALLY/WILL use for sure.

Can we go further?

Implementation #5

We still use the same table as before people_v2

We changed the query to the following:

 SELECT *,
       ( p.first_name || ' ' || p.last_name ) AS full_name
FROM people_v2 p
WHERE age >= 18
ORDER BY age DESC;

This is our brand-new JS code:

export default async function people_by_age_v5()
{
  // Load & return people list from database.
  return await SQL.exe.any( Files.select_pro , { table : Tables.v2 });
};

The benchmark now says it took 1723ms (68.5% of the previous time) to run.

The improvement was 31.5% faster.

Enhancements

Transform full_name field in the database.

Explanation

Obviously, the database will expend more time to bring the data if it needs to do the concatenation, but actually PostgreSQL was programmed in C, and in C there are no built-in garbage collectors, and naturally, object-copying is more expensive in JavaScript.
Even if V8 can do some “copy elision” (See the link for more information).
There are more things around the whole JavaScript implementation that the V8 engine needs to take care of than PostgreSQL.
→ An exception is present on the version v4.1, see service_v4.js for further information.

There are better solutions?

Implementation #6

Now, the table use is people_v2, defined as:

CREATE TABLE people_v3
(
  id         BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY ,
  first_name TEXT     NOT NULL ,
  last_name  TEXT     NOT NULL ,
  age        SMALLINT NOT NULL ,
  full_name  TEXT GENERATED ALWAYS AS ( first_name || ' ' || last_name ) STORED
);

CREATE INDEX ON people_v3( age );

The query used in this service is this:

SELECT p.id,
       p.first_name,
       p.last_name,
       p.age,
       p.full_name
FROM people_v3 p
WHERE age >= 18
ORDER BY age DESC;

The JS implementation is the same, but instead we just target the new table (people_v3).

The benchmark now says it took 1715ms (99.5% of the previous time) to run.

The improvement was 0.5% faster. (at least in NodeJS)

Enhancements

Now we use a STORED GENERATED COLUMN in PostgreSQL.

Explanation

If full_name is a common pattern, it could be worth it to spend the extra space and store computation.

Old plan New plan

Actually, we saved ~100ms PostgreSQL using an STORED COLUMN instead of just computing it each time. In percentage terms in PostgreSQL, that would be:

Computing columns: 440ms (100%)
Stored columns: 326ms (74%)
Saved time: 114ms (26%)

Nowadays, it is a common practice to trade storage space per speed.
STORED COLUMNS are a way of Memoization.

Bonus - Implementation #4.1

Let’s go back to our implementation 4 and do this modification (in JavaScript).

export default async function people_by_age_v4_1()
{
  // Load people list from database.
  let people = await SQL.exe.any( Files.select_amateur , { table : Tables.v2 });

  // Transform the list.
  for( const p of people )
    p.full_name = `${p.first_name} ${p.last_name}`;

  // Return the data.
  return people;
};

The JS implementation is the same, but instead we just target the new table (people_v3).

The benchmark now says it took 1578ms (it is our best runtime).

Enhancements

Get rid of .map( { ...p , <concatenation> } )

Explanation

By ECMA standard that JS need to comply, .map() must return a new array without modifying the old one. V8 JIT compiler may not see that the old array won’t be used, so instead of not copying the object and just add a property .full_name with the desired string, it’s actually creating a new object:
(copying the previous object properties) + doing the concatenation.

But it is really interesting that V8 spends more time parsing the computed properties from the PostgreSQL driver than computing the data itself. But these are implementation details, so YMMV (Your mile may vary).

So in this enhanced version, we reuse the same object, and we do a manual ‘copy-elision’.

So, Enmanuel what implementation you’d use?

We’ll actually it will depend on the requirements, based on the sole requirement I’ve got, I’ll stick with the implementation #6. Reasons:

It’s modern.
It uses memoization and caching is good because usually, we have more space than CPU/GPU time (this may not be true for embedded devices),
It’s pretty fast.
It’s cleaner, both in NodeJS and PostgreSQL.
It’s more consistent to have that data transformation on the database than in NodeJS.
Finally, it’s easier to consume from other services and to export the data to other sites (i.e. data warehouses).

But you’ll always depend on your given requirement and the evolution of the business, but knowing will give you more options, if you know about internals, implementation details, how things works, you could find a flexible & performant solution for your software.

Doing a prior and in-depth investigation of your set of technologies and how you’ll approach current and future requirements will help you to design better a codebase, in fact, you’ll be optimizing earlier, optimizing from the very beginning. That will save you a lot of time, headaches, refactorizations, and money, you’ll optimize more than just…code.
— Enmanuel Reynoso.