Premature optimization is the root of all evil
This is a phrase every programmer eventually hears…a lot, that phrase was popularized by Donald Knuth. Most people in this time go beyond its original meaning and usually says:
These are the awkward deformation of that popular phase that is indeed optimized, the full phrase is something like this:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
It was first written long time ago in 1974, and the author and the book were pro-optimization. In those times, usually you would be writing code in assembly.
There are some handful and tricky optimizations that can be done in assembly to make the code faster, but for a human to write and maintain those programs, the assembly language (there are several assembly languages, not just one) is not that easy, but the easy thing is to mess it up.
So in favor to have a better readable and maintainable code, in some places you’d better trade performance for readability, maybe something like this:
mov eax, 0
(put 0 in the eax
register) was usually optimized in this way:
xor eax, eax
(if you xor a value with itself, it is always 0)
Because xor eax, eax
it was shorter to write in assembly (i.e, the instruction occupied less memory) and the CPU already knows this instruction as “put 0 in eax
”.
Granted, the performance benefit is there but is not much, to be honest.
Today we have many visual tools to help us in the code developments.
Wonderful IDEs, profilers, platforms to make us develop code faster.
Nowadays, it is easier to optimize a code and beforehand, choose the right tool set and craft a good, optimized architecture & code.
Many programmers rely on that phrase to justify:
And it is true, optimization takes effort, sometimes more sometimes less, and it requires a lot of knowledge or investigation. If you do know a technology, you’d often find where a code is fast enough for a given task because there are no magic, you just know what is happening under the hood, and you know if that code will scale well or can give problems in the future.
That’s why I did this benchmark in the lovely Node.js & PostgreSQL, to encourage programmers to think about their code, look up those technologies on Google or their favorite search engine, to know them, read darn the docs about the technologies they plan to use, the pro’s and con’s, their standard, and figure it out how it works.
The better you know a technology, the better you can use it. Yeah, granted, a lot of those things are implementation details and can change from time to time or across versions.
Without further ado, let’s begin.
Imagine a service that should look for the data of ALL people who meet the following criteria: People that their ages age equal or bigger than 18.
So in fact, the service should return all people who are >= 18 years old
.
We’ll have several implementations of the same table and the same service, to compare its relative performance using the same dataset. The goal is to make the service be more performant.
In this scenario, every table will contain the same data:
You can find the full code and instructions on how to run this benchmark:
here in GitHub
Anybody can come with some naive code like this:
The SQL query looks like this:
SELECT * FROM people_v1;
export default async function people_by_age_v1()
{
// Load people list from database.
let people = await SQL.exe.any( Files.select_noob , { table : Tables.v1 });
// Transform the list.
people = people
.sort( ( a , b ) => b.age - a.age ) // Sort by age.
.map( p => // Map to full name.
{
return {
...p ,
full_name : `${p.first_name} ${p.last_name}`
};
})
.filter( p => p.age >= 18 ); // Filter by age.
// Return the data.
return people;
};
people_v1
is defined here as the following:
CREATE TABLE people_v1
(
id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY ,
first_name TEXT NOT NULL ,
last_name TEXT NOT NULL ,
age SMALLINT NOT NULL
);
According to our benchmark iterations, that code took 4608ms
to run.
Can we make it better?
The SQL query still looks like this:
SELECT * FROM people_v1;
export default async function people_by_age_v2()
{
// Load people list from database.
let people = await SQL.exe.any( Files.select_noob , { table : Tables.v1 });
// Transform the list.
people = people
.filter( p => p.age >= 18 ) // Filter by age.
.sort( ( a , b ) => b.age - a.age ) // Sort by age.
.map( p => // Map to full name.
{
return {
...p ,
full_name : `${p.first_name} ${p.last_name}`
};
});
// Return the data.
return people;
};
The benchmark now says it took 3556ms
(77%
of the previous time) to run.
The improvement was 33%
faster.
What did we change?
Enhancements
sort
& map
.Explanation
sort
& map
will take less time.But, can we make it faster?
The SQL query now is the following:
SELECT *
FROM people_v1
WHERE age >= 18
ORDER BY age DESC;
export default async function people_by_age_v3()
{
// Load people list from database.
let people = await SQL.exe.any( Files.select_amateur , { table : Tables.v1 });
// Transform the list.
people = people.map( p => // Map to full name.
{
return {
...p ,
full_name : `${p.first_name} ${p.last_name}`
};
});
// Return the data.
return people;
};
The benchmark now says it took 2720ms
(76.5%
of the previous time) to run.
The improvement was 23.5%
faster.
What did we change?
Enhancements
filter
& sort
in the database. (Hint: without using indexes).Explanation
We got rid of filter
& sort
and we use the database to make those operations instead. Database can execute queries request in parallel, unlike NodeJS that will execute those operations synchronously. (no matter how much async do you type) If set there is too big, NodeJS will spend too much time on them and halt other requests, that’s not good, NodeJS should return a request as fast a possible, if you need a significant amount of data processing you better delegate the operation more efficient technologies and then bring the result to NodeJS, if you need to execute those operations IN the server/node for whatever reason, NodeJS is not the ideal technology for this.
This is proved because despite the database expending more time filtering and sorting because of the lack of an index, it is still faster than doing it in plain NodeJS. Remember that NodeJS needs to transform the data that comes from the database driver to a format that you can manipulate. (i.e: A native
JavaScript type).
It is all we got?
Both the underlying SQL query & the JS is the same as in the implementation #3.
But now we used another table people_v2
(with the same data).
people_v2
is defined as the following:
CREATE TABLE people_v2
(
id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY ,
first_name TEXT NOT NULL ,
last_name TEXT NOT NULL ,
age SMALLINT NOT NULL
);
CREATE INDEX ON people_v2( age ); -- Notice the new index.
And the JavaScript code is also the same:
export default async function people_by_age_v4()
{
// Load people list from database.
let people = await SQL.exe.any( Files.select_amateur , { table : Tables.v2 /* Notice the table has changed */ });
// ...same code as <people_by_age_v3 >
};
The benchmark now says it took 2515ms
(92.5%
of the previous time) to run.
The improvement was 7.5%
faster.
Enhancements
filter
& sort
in the database. (Hint: bot NOW using indexes).Explanation
INSERT/UPDATE/DELETE
in that table will be a bit slower because the database
now will need to update the index as well, so we always need to make indexes that we ACTUALLY/WILL use for sure.Can we go further?
We still use the same table as before people_v2
We changed the query to the following:
SELECT *,
( p.first_name || ' ' || p.last_name ) AS full_name
FROM people_v2 p
WHERE age >= 18
ORDER BY age DESC;
This is our brand-new JS code:
export default async function people_by_age_v5()
{
// Load & return people list from database.
return await SQL.exe.any( Files.select_pro , { table : Tables.v2 });
};
The benchmark now says it took 1723ms
(68.5%
of the previous time) to run.
The improvement was 31.5%
faster.
Enhancements
full_name
field in the database.Explanation
service_v4.js
for further information.There are better solutions?
Now, the table use is people_v2
, defined as:
CREATE TABLE people_v3
(
id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY ,
first_name TEXT NOT NULL ,
last_name TEXT NOT NULL ,
age SMALLINT NOT NULL ,
full_name TEXT GENERATED ALWAYS AS ( first_name || ' ' || last_name ) STORED
);
CREATE INDEX ON people_v3( age );
The query used in this service is this:
SELECT p.id,
p.first_name,
p.last_name,
p.age,
p.full_name
FROM people_v3 p
WHERE age >= 18
ORDER BY age DESC;
The JS implementation is the same, but instead we just target the new table (people_v3
).
The benchmark now says it took 1715ms
(99.5%
of the previous time) to run.
The improvement was 0.5%
faster. (at least in NodeJS)
Enhancements
Explanation
full_name
is a common pattern, it could be worth it to spend the extra space and store computation.
Actually, we saved ~100ms
PostgreSQL using an STORED COLUMN
instead of just computing it each time. In percentage terms in PostgreSQL, that would be:
440ms
(100%)326ms
(74%)114ms
(26%)Nowadays, it is a common practice to trade storage space per speed.
STORED COLUMNS
are a way of Memoization.
Let’s go back to our implementation 4 and do this modification (in JavaScript).
export default async function people_by_age_v4_1()
{
// Load people list from database.
let people = await SQL.exe.any( Files.select_amateur , { table : Tables.v2 });
// Transform the list.
for( const p of people )
p.full_name = `${p.first_name} ${p.last_name}`;
// Return the data.
return people;
};
The JS implementation is the same, but instead we just target the new table (people_v3
).
The benchmark now says it took 1578ms
(it is our best runtime).
Enhancements
.map( { ...p , <concatenation> } )
Explanation
By ECMA standard that JS need to comply, .map() must return a new array without modifying the old one. V8 JIT compiler may not see that the old array won’t be used, so instead of not copying the object and just add a property .full_name with the desired string, it’s actually creating a new object:
(copying the previous object properties) + doing the concatenation.
But it is really interesting that V8 spends more time parsing the computed properties from the PostgreSQL driver than computing the data itself. But these are implementation details, so YMMV (Your mile may vary).
So in this enhanced version, we reuse the same object, and we do a manual ‘copy-elision’.
So, Enmanuel what implementation you’d use?
We’ll actually it will depend on the requirements, based on the sole requirement I’ve got, I’ll stick with the implementation #6. Reasons:
But you’ll always depend on your given requirement and the evolution of the business, but knowing will give you more options, if you know about internals, implementation details, how things works, you could find a flexible & performant solution for your software.
Doing a prior and in-depth investigation of your set of technologies and how you’ll approach current and future requirements will help you to design better a codebase, in fact, you’ll be optimizing earlier, optimizing from the very beginning. That will save you a lot of time, headaches, refactorizations, and money, you’ll optimize more than just…code.
— Enmanuel Reynoso.